wrds-download

TUI/CLI tool for browsing and downloading WRDS data
Log | Files | Refs | README

SKILL.md (5198B)


      1 ---
      2 name: wrds-download
      3 description: Download data from the WRDS (Wharton Research Data Services) PostgreSQL database to local Parquet or CSV files. Use when the user asks to get data from WRDS, download financial data, or mentions WRDS schemas like crsp, comp, optionm, ibes, etc.
      4 allowed-tools: Bash(wrds-dl *), Read, Grep
      5 argument-hint: [description of data needed]
      6 ---
      7 
      8 # WRDS Data Download
      9 
     10 You help users download data from the Wharton Research Data Services (WRDS) PostgreSQL database using the `wrds-dl` CLI tool.
     11 
     12 ## Prerequisites
     13 
     14 The `wrds-dl` CLI must be installed and on the PATH. Either the Go binary or the Python version works — they have the same commands and flags. The user must have WRDS credentials configured via one of:
     15 - Environment variables: `PGUSER` and `PGPASSWORD`
     16 - Saved credentials at `~/.config/wrds-dl/credentials`
     17 - Standard `~/.pgpass` file
     18 
     19 If `wrds-dl` is not found, tell the user to install it:
     20 - **Go binary**: download from https://github.com/LouLouLibs/wrds-download/releases
     21 - **Python (via uv)**: `uv tool install wrds-dl --from /path/to/wrds-download/python`
     22 
     23 ## Workflow
     24 
     25 Follow these steps for every download request:
     26 
     27 ### Step 1: Identify the table
     28 
     29 Parse the user's request to determine the WRDS schema and table. Common mappings:
     30 
     31 | Dataset | Schema | Key Tables |
     32 |---------|--------|------------|
     33 | CRSP daily stock | `crsp` | `dsf` (daily), `msf` (monthly), `dsi` (index) |
     34 | CRSP events | `crsp` | `dsedelist`, `stocknames` |
     35 | Compustat annual | `comp` | `funda` |
     36 | Compustat quarterly | `comp` | `fundq` |
     37 | Compustat global | `comp_global_daily` | `g_funda`, `g_fundq` |
     38 | IBES | `ibes` | `statsum_epsus`, `actu_epsus` |
     39 | OptionMetrics | `optionm` | `opprcd` (prices), `secprd` (security) |
     40 | TAQ | `taqmsec` | `ctm_YYYYMMDD` |
     41 | CRSP/Compustat merged | `crsp` | `ccmxpf_linktable` |
     42 | BoardEx | `boardex` | `na_wrds_company_profile` |
     43 | Institutional (13F) | `tfn` | `s34` |
     44 | Audit Analytics | `audit` | `auditnonreli` |
     45 | Ravenpack | `ravenpack` | `rpa_djnw` |
     46 | Bank Regulatory | `bank` | `call_schedule_rc`, `bhck` |
     47 
     48 If unsure which table, ask the user or use `wrds-dl info` to explore.
     49 
     50 ### Step 2: Inspect the table
     51 
     52 Always run `wrds-dl info` first to understand the table structure:
     53 
     54 ```bash
     55 wrds-dl info --schema <schema> --table <table>
     56 ```
     57 
     58 Use the output to:
     59 - Confirm the table exists and has the expected columns
     60 - Note column names for the user's requested variables
     61 - Check the estimated row count to warn about large downloads
     62 
     63 For JSON output (useful for parsing): `wrds-dl info --schema <schema> --table <table> --json`
     64 
     65 ### Step 3: Dry run
     66 
     67 For tables with more than 1 million estimated rows, or when a WHERE clause is involved, always do a dry run first:
     68 
     69 ```bash
     70 wrds-dl download --schema <schema> --table <table> \
     71   --columns "<cols>" --where "<filter>" --dry-run
     72 ```
     73 
     74 Show the user the row count and sample rows. Ask for confirmation before proceeding if the row count is very large (>10M rows).
     75 
     76 ### Step 4: Download
     77 
     78 Build and run the download command:
     79 
     80 ```bash
     81 wrds-dl download \
     82   --schema <schema> \
     83   --table <table> \
     84   --columns "<comma-separated columns>" \
     85   --where "<SQL filter>" \
     86   --out <output_file> \
     87   --format <parquet|csv>
     88 ```
     89 
     90 #### Defaults and conventions
     91 - **Format**: Use Parquet unless the user asks for CSV. Parquet is smaller and faster.
     92 - **Output path**: Name the file descriptively, e.g., `crsp_dsf_2020.parquet` or `comp_funda_2010_2023.parquet`.
     93 - **Columns**: Select only the columns the user needs. Don't use `*` on wide tables — ask what variables they need.
     94 - **Limit**: Use `--limit` for testing. Suggest `--limit 1000` if the user is exploring.
     95 
     96 #### Common filters
     97 - Date ranges: `--where "date >= '2020-01-01' AND date < '2021-01-01'"`
     98 - Specific firms by permno: `--where "permno IN (10107, 93436)"`
     99 - Specific firms by gvkey: `--where "gvkey IN ('001690', '012141')"`
    100 - Fiscal year: `--where "fyear >= 2010 AND fyear <= 2023"`
    101 
    102 ### Step 5: Verify
    103 
    104 After download completes, confirm the file was created and report its size:
    105 
    106 ```bash
    107 ls -lh <output_file>
    108 ```
    109 
    110 ## Error handling
    111 
    112 - **Authentication errors**: Remind the user to set `PGUSER`/`PGPASSWORD` or run `wrds-dl tui` to save credentials.
    113 - **Table not found**: Use `wrds-dl info` to check schema/table names. WRDS schemas and table names are lowercase.
    114 - **Timeout on large tables**: Suggest adding a `--where` filter or `--limit` to reduce the result set.
    115 - **Duo 2FA prompt**: The connection triggers a Duo push. Tell the user to approve it on their phone.
    116 
    117 ## Example interactions
    118 
    119 **User**: "Download CRSP daily stock data for 2020"
    120 → `wrds-dl info --schema crsp --table dsf`
    121 → `wrds-dl download --schema crsp --table dsf --where "date >= '2020-01-01' AND date < '2021-01-01'" --out crsp_dsf_2020.parquet`
    122 
    123 **User**: "Get Compustat annual fundamentals, just gvkey, datadate, and sales"
    124 → `wrds-dl info --schema comp --table funda`
    125 → `wrds-dl download --schema comp --table funda --columns "gvkey,datadate,sale" --out comp_funda.parquet`
    126 
    127 **User**: "I need IBES analyst estimates"
    128 → `wrds-dl info --schema ibes --table statsum_epsus`
    129 → Ask what date range and variables they need, then download.