wrds-download

TUI/CLI tool for browsing and downloading WRDS data
Log | Files | Refs | README

SKILL.md (5210B)


      1 ---
      2 name: wrds-download
      3 description: Download data from the WRDS (Wharton Research Data Services) PostgreSQL database to local Parquet or CSV files. Use when the user asks to get data from WRDS, download financial data, or mentions WRDS schemas like crsp, comp, optionm, ibes, etc.
      4 allowed-tools: Bash(wrds-dl *), Bash(uv run *wrds*), Read, Grep
      5 argument-hint: [description of data needed]
      6 ---
      7 
      8 # WRDS Data Download (Python)
      9 
     10 You help users download data from the Wharton Research Data Services (WRDS) PostgreSQL database using the `wrds-dl` Python CLI tool (via `uv`).
     11 
     12 ## Prerequisites
     13 
     14 The user must have:
     15 - **`uv`** installed (https://docs.astral.sh/uv/)
     16 - **WRDS credentials** configured via one of:
     17   - Environment variables: `PGUSER` and `PGPASSWORD`
     18   - Saved credentials at `~/.config/wrds-dl/credentials`
     19   - Standard `~/.pgpass` file
     20 
     21 Run the CLI with `uv run wrds-dl` from the `python/` directory of the wrds-download repo, or `wrds-dl` if installed via `uv tool install`.
     22 
     23 If `wrds-dl` is not found and `uv` is available, install it:
     24 ```bash
     25 uv tool install wrds-dl --from /path/to/wrds-download/python
     26 ```
     27 
     28 ## Workflow
     29 
     30 Follow these steps for every download request:
     31 
     32 ### Step 1: Identify the table
     33 
     34 Parse the user's request to determine the WRDS schema and table. Common mappings:
     35 
     36 | Dataset | Schema | Key Tables |
     37 |---------|--------|------------|
     38 | CRSP daily stock | `crsp` | `dsf` (daily), `msf` (monthly), `dsi` (index) |
     39 | CRSP events | `crsp` | `dsedelist`, `stocknames` |
     40 | Compustat annual | `comp` | `funda` |
     41 | Compustat quarterly | `comp` | `fundq` |
     42 | Compustat global | `comp_global_daily` | `g_funda`, `g_fundq` |
     43 | IBES | `ibes` | `statsum_epsus`, `actu_epsus` |
     44 | OptionMetrics | `optionm` | `opprcd` (prices), `secprd` (security) |
     45 | TAQ | `taqmsec` | `ctm_YYYYMMDD` |
     46 | CRSP/Compustat merged | `crsp` | `ccmxpf_linktable` |
     47 | BoardEx | `boardex` | `na_wrds_company_profile` |
     48 | Institutional (13F) | `tfn` | `s34` |
     49 | Audit Analytics | `audit` | `auditnonreli` |
     50 | Ravenpack | `ravenpack` | `rpa_djnw` |
     51 | Bank Regulatory | `bank` | `call_schedule_rc`, `bhck` |
     52 
     53 If unsure which table, ask the user or use `wrds-dl info` to explore.
     54 
     55 ### Step 2: Inspect the table
     56 
     57 Always run `wrds-dl info` first to understand the table structure:
     58 
     59 ```bash
     60 wrds-dl info --schema <schema> --table <table>
     61 ```
     62 
     63 Use the output to:
     64 - Confirm the table exists and has the expected columns
     65 - Note column names for the user's requested variables
     66 - Check the estimated row count to warn about large downloads
     67 
     68 For JSON output (useful for parsing): `wrds-dl info --schema <schema> --table <table> --json`
     69 
     70 ### Step 3: Dry run
     71 
     72 For tables with more than 1 million estimated rows, or when a WHERE clause is involved, always do a dry run first:
     73 
     74 ```bash
     75 wrds-dl download --schema <schema> --table <table> \
     76   --columns "<cols>" --where "<filter>" --dry-run
     77 ```
     78 
     79 Show the user the row count and sample rows. Ask for confirmation before proceeding if the row count is very large (>10M rows).
     80 
     81 ### Step 4: Download
     82 
     83 Build and run the download command:
     84 
     85 ```bash
     86 wrds-dl download \
     87   --schema <schema> \
     88   --table <table> \
     89   --columns "<comma-separated columns>" \
     90   --where "<SQL filter>" \
     91   --out <output_file> \
     92   --format <parquet|csv>
     93 ```
     94 
     95 #### Defaults and conventions
     96 - **Format**: Use Parquet unless the user asks for CSV. Parquet is smaller and faster.
     97 - **Output path**: Name the file descriptively, e.g., `crsp_dsf_2020.parquet` or `comp_funda_2010_2023.parquet`.
     98 - **Columns**: Select only the columns the user needs. Don't use `*` on wide tables — ask what variables they need.
     99 - **Limit**: Use `--limit` for testing. Suggest `--limit 1000` if the user is exploring.
    100 
    101 #### Common filters
    102 - Date ranges: `--where "date >= '2020-01-01' AND date < '2021-01-01'"`
    103 - Specific firms by permno: `--where "permno IN (10107, 93436)"`
    104 - Specific firms by gvkey: `--where "gvkey IN ('001690', '012141')"`
    105 - Fiscal year: `--where "fyear >= 2010 AND fyear <= 2023"`
    106 
    107 ### Step 5: Verify
    108 
    109 After download completes, confirm the file was created and report its size:
    110 
    111 ```bash
    112 ls -lh <output_file>
    113 ```
    114 
    115 ## Error handling
    116 
    117 - **Authentication errors**: Remind the user to set `PGUSER`/`PGPASSWORD` or configure `~/.config/wrds-dl/credentials`.
    118 - **Table not found**: Use `wrds-dl info` to check schema/table names. WRDS schemas and table names are lowercase.
    119 - **Timeout on large tables**: Suggest adding a `--where` filter or `--limit` to reduce the result set.
    120 - **Duo 2FA prompt**: The connection triggers a Duo push. Tell the user to approve it on their phone.
    121 
    122 ## Example interactions
    123 
    124 **User**: "Download CRSP daily stock data for 2020"
    125 -> `wrds-dl info --schema crsp --table dsf`
    126 -> `wrds-dl download --schema crsp --table dsf --where "date >= '2020-01-01' AND date < '2021-01-01'" --out crsp_dsf_2020.parquet`
    127 
    128 **User**: "Get Compustat annual fundamentals, just gvkey, datadate, and sales"
    129 -> `wrds-dl info --schema comp --table funda`
    130 -> `wrds-dl download --schema comp --table funda --columns "gvkey,datadate,sale" --out comp_funda.parquet`
    131 
    132 **User**: "I need IBES analyst estimates"
    133 -> `wrds-dl info --schema ibes --table statsum_epsus`
    134 -> Ask what date range and variables they need, then download.