Short Introduction to WRDS

Retrived from course forum for ECE695: "Financial Engineering", Spring 2012


  1. Getting account from WRDS Visit https://wrds-web.wharton.upenn.edu/wrds/ and hit the "Register for a WRDS account". Once you fill in all information needed by WRDS, they will verify that you are a Purdue student. This verification step may take 2 days or longer.
  2. Introduction documents you must read before doing anything
    1. Go to WRDS Support page and Read articles such as
    2. Getting started with WRDS
    3. Top 10 Things to Know about WRDS
    4. 3 Ways to use WRDS
  3. How to access WRDS account
    1. Web Interface :Most simple queries can be done by Web interface. You need to specify search conditions such as company, from when, to when, items you want to search. The result of query will be returned to you as soon as WRDS Supercomputer finishes queries to the database, with the format you selected (ex, .csv, .txt, and so on). For example, daily returns of IBM stock from Jan 1980 to Feb 2002 can be easily obtained by giving conditions such as stock PERMNO of IBM, and “starting date” to Jan 1980, and so on. If this type of query satisfies your need, you need not write a program.
    2. Terminal access :Sometimes Web-Interface is not enough for you to find needed dataset. This could happen when your query spans multiple databases or you need to do more fine-grained search. If so, you need to learn how to program SAS. SAS is a language and also software similar to Structured Query Language (SQL). It allows you to search database by group, and merge multiple tables for complicated queries. The steps to access WRDS through terminal is as follows
      1. Open up any terminal client such as SecureCRT
      2. Choose configuration as
        • Protocol : SSH2
        • Hostname : wrds.wharton.upenn.edu
        • Port  : 22 (default)
        • Username : The user account created from "(1) Getting account from WRDS"
    3. Copying data/code to and from WRDS systems can be done by WinSCP in a WYSIWYG fashion. WinSCP is freely downloadable from WinSCP free download
    4. Please refer to this article in WRDS Secure Copy (SCP), An Alternative to FTP
  4. Basic information of your account
    • Disk size: You will be given 750MB disk space to store your code, output files. In case you want additional disk space, you can purchase more from WRDS. Check your remaining disk space by "quota -v"
    • If your SAS code output is bigger than your quota (750MB), it will cause errors and makes program stop. You may want to copy your SAS code to one of the temp directories of WRDS /sastemp1 to /sastemp8 Temporary Disk Space
    • About 2 terabytes of disk space is available on the WRDS UNIX system for immediate processing of programs and temporary storage of large files. The shared workspace is divided into a series of directories, and a specific directory is assigned to each SAS process based on available disk space and disk activity. Users can access any of these directories for other processes and temporary storage. Files are automatically removed from these directories, and the space is constantly monitored to ensure that workspace is available for users to run programs and store temporary files. To ensure availability for the largest possible number of WRDS users, long-term storage in these directories is prohibited.
    • SAStemp Directories: /sastemp1 - /sastemp8, Files located here are not backed up! Files and directories older than 48 hours are automatically removed. Temporary work libraries created by SAS (i.e., SAS_work* subdirectories) are automatically removed if the associated SAS process has ended. If any one of the SAStemp directories is more than 95% full, the user accounting for the most space on that particular file system will have all of his/her files deleted from it without notice, regardless of file age. Beyond the specific criteria listed above, WRDS may delete SAStemp files without notice if they are being used for long-term storage or a single user's files are preventing others from access to adequate SAStemp space. Fortunately the latter has been a rare event, and there is often an opportunity to contact the owner before file deletion. For more information, please go to Unix Disk space in WRDS
  5. Databases you can use
    • Stock database in WRDS are subscription-based. Purdue students can only view/query Purdue University subscribing database. The complete list of subscribed database can be checked from "myWRDS/Products". Since there are many vendors providing stock related information, their data formats are different company by company. WRDS converts all databases into one common data format so as not to mandate users to learn multiple programming languages to do research in WRDS. One of the most popular stock databases is CRSP. CRSP database does not get updated real-time based. It is either updated by monthly, quarterly or yearly according to data vendors. Therefore very recent stock price data might not be available until WRDS updates them.
    • Popular databases
    1. CRSP - if you need daily or monthly stock price, use this.
      • daily stock price dataset : /wrds/crsp/sasdata/a_stock/dsf.sas7bdat
      • monthly stock price dataset : /wrds/crsp/sasdata/a_stock/msf.sas7bdat
      • index : /wrds/crsp/sasdata/a_indexes/
      • dsp500 index : /wrds/crsp/sasdata/a_indexes/dsp500.sas7bdat
    2. COMPUSTAT - Company accounting database. Useful when you look for a company's financial statements, earning report, revenue, location, owner and so on
    3. TAQ - Huge volumes of trade and quotes data. Purdue only subscribes three years' TAQ data.
    4. I/B/E/S - Analysts' stock price forecast data, buy/sell recommend history and many others.
  6. Getting SAS/MATLAB for free
    • You don't have to install SAS software to your laptop because all ECE machines are installed with SAS. However, having your own SAS copy in your machine makes you extremely convenient since it allows you to see raw data of SAS dataset, and verify correctness. Any Purdue student can get free copy of SAS software/MATLAB from "Purdue Contracts and Licensing" department. You can install SAS/MatLAB software to as many private machines you own. You may stop by ITap Shopping Offline located in Stewart Center, Room G65, to check-out a 24-hr loaner copy of the above software. Presenting PUID is necessary.
  7. Programming SAS
    • First read CRSP data manual to get to know what's in it. If your dataset is easily downloadable and you know MATLAB well, it might be good enough not to use SAS. It will be fine you work with MATLAB once you got all the data you need. One benefit of using SAS is its fast sorting speed compared to MATLAB. Please keep in mind that merging two tables will be extremely slow in MATLAB.
    1. Location of sample SAS code
      Web interface: Home -> Support-> Data -> Sample Programs ->CRSP -> Sample Programs or here
    2. The same example code is available at WRDS unix server in command line /wrds/crsp/samples. You can see sample programs written in SAS, C, and Fortran by clicking Sample programs. For example, this program computes the CAPM beta using CRSP daily data. You can copy any sample code to your home directory and are free to modify it for your own code. One caveat is that these sample code may not be up-to-date. Because some code was written quite a time ago, it may contain deprecated table name or column name. In that case, please verify that you are referring to existing tables by looking up
      Home -> Support-> Data -> Dataset list -> CRSP dataset -> click SAS table name
      Then you can see the database table schema, i.e., the list of columns and their data type
    3. How to execute SAS code : "prompt> sas your_code.sas"
      If you want your SAS code runs even if you logged out, " prompt> nohup sas your_code.sas &"
      A job running in the background will terminate when you logout (i.e., exit the terminal session) unless you instruct it to continue on logout. To run a job in the background and also be able to logout without terminating it, use the "nohup" command (short for "no hangup on logout").
  8. Terminology
    Every new terminology is well explained in CRSP or COMPUSTAT data manual. Let me introduce some frequently used terms
    1. TICKER : Ticker symbol. Shortened company name used in exchange. Ex, MSFT (Microsoft), MS( Morgan Stanley),.. Remember that TICKER is not unique at all through time. It is reused. For this reason, do not use TICKER for search spanning long time period. If you do that, different companies that happen to be assigned same TICKER at different time-point may be recognized as one company, which possibly leads to incorrect dataset or errors.
    2. PERMNO: This is permanent number (PERMNO) assigned to a stock . From the database’s perspective, to uniquely identify a record in a table, every record in the same table must have at least one unique identifier. This is similar to Social security number in searching of a person. Likewise, a listed company should have a unique number to be uniquely identified. In CRSP, this is called permanent number (PERMNO) whereas COMPUSTAT calls gvkey. Unfortunately their data type is different so Finding a stock in CRSP with gvkey is impossible. Since two tables use different identifiers, you need to be very careful when merging multiple tables from different databases. For example, if you want to know the earnings report dates of IBM, you need to merge two CRSP tables with COMPUSTAT tables. WRDS provides us with a mapping table between CRSP PERMNO and COMPUSTAT gvkeys.
  9. Resource
    Please take advantage of WRDS forum to get some ideas from others.Go to Community
  10. FAQ
    1. What does it mean by negative stock price in CRSP?
      Short solution to this issue is to negate negative price so as to make it positive. Long answer is that this can happen if the open price/close price is unknown to that day. To signal this stock price is drawn from incomplete data, WRDS decided to negate the positive stock price.
    2. What is adjusted price and why do we need it?
      The price/volume of the SAS dataset logs the absolute stock price/number of shares at that time point. Comparing two prices at different time points will be meaningless if stock had been split or reverse-split between two time points. We use adjusted price for that reason. Here is the way you can compute adjusted price
      adjusted stock price = price/cfacpr(cumulative factor to adjust prices)
      adjusted stock volume = number of share * cfacshr ( cumulative factor to adjust share)
      With two adjusted stock prices, it is easy to calculate the holding period return, i.e., ret = (adj_price2 - adj_price1)/adj_price1
    3. I want to export the dataset to .txt or .csv files.
      The following code snippet will teach you how to export SAS dataset(toprint_data) to a text file.
      proc export data= toprint_data
      outfile = 'ref_price_1980_2002.txt'
      DBMS = DLM REPLACE;
      run;
      Whenever you export data, you must execute your code background. In other words, type in "sas your_code.sas &". Otherwise, the system will hang up, and you will wait forever to get response. If you accidentally forget to run it background, the only way to stop that program is to use Unix kill command.
    4. How to export data to .sas7bdat format which is readable from SAS viewer/software?
      The following code will produce finalret_1980_2002.sas7bdat.
      data own.finalret_1980_2002;
      set finalret;
      format begdt date.;
      run;
    5. Is there any way to save data in a Matrix like format in SAS?
      Yes there is. SAS provides SAS/IML component which looks similar to MATLAB. Once you get all needed data, it can be saved into a matrix.

If you have any questions, please let me know. Thanks


  • Write question/comment here.
    • answer here.

Back to ECE695: "Financial Engineering", Spring 2012

Alumni Liaison

Correspondence Chess Grandmaster and Purdue Alumni

Prof. Dan Fleetwood