wrds-download

TUI/CLI tool for browsing and downloading WRDS data
Log | Files | Refs | README

README.md (3942B)


      1 # wrds-dl (Python)
      2 
      3 Lightweight CLI for downloading data from the [WRDS](https://wrds-www.wharton.upenn.edu/) PostgreSQL database. No binary to install — runs anywhere Python and [`uv`](https://docs.astral.sh/uv/) are available.
      4 
      5 Same CLI interface as the [Go version](../go/) (`download` and `info` commands with identical flags), but without the TUI.
      6 
      7 ## Installation
      8 
      9 ### With uv (recommended)
     10 
     11 ```sh
     12 # Install as a tool (available system-wide)
     13 uv tool install wrds-dl --from ./python
     14 
     15 # Or run directly without installing
     16 cd python
     17 uv run wrds-dl --help
     18 ```
     19 
     20 ### With pip
     21 
     22 ```sh
     23 pip install ./python
     24 wrds-dl --help
     25 ```
     26 
     27 ## CLI
     28 
     29 ### CRSP monthly stock file
     30 
     31 ```sh
     32 # Inspect the table first
     33 wrds-dl info --schema crsp --table msf
     34 
     35 # Download prices and returns for 2020
     36 wrds-dl download \
     37   --schema crsp \
     38   --table msf \
     39   --columns "permno,date,prc,ret,shrout" \
     40   --where "date >= '2020-01-01' AND date < '2021-01-01'" \
     41   --out crsp_msf_2020.parquet
     42 
     43 # Dry run — check row count before committing
     44 wrds-dl download \
     45   --schema crsp \
     46   --table msf \
     47   --where "date = '2020-01-31'" \
     48   --dry-run
     49 ```
     50 
     51 ### CRSP daily stock file
     52 
     53 ```sh
     54 wrds-dl download \
     55   --schema crsp \
     56   --table dsf \
     57   --where "date >= '2020-01-01' AND date < '2021-01-01'" \
     58   --out crsp_dsf_2020.parquet
     59 ```
     60 
     61 ### Select specific columns
     62 
     63 ```sh
     64 wrds-dl download \
     65   --schema comp \
     66   --table funda \
     67   --columns "gvkey,datadate,sale,at" \
     68   --out funda_subset.parquet
     69 ```
     70 
     71 ### Raw SQL
     72 
     73 ```sh
     74 wrds-dl download \
     75   --query "SELECT permno, date, prc FROM crsp.msf WHERE date >= '2015-01-01'" \
     76   --out crsp_msf_2015_onwards.parquet
     77 ```
     78 
     79 ### CSV output
     80 
     81 ```sh
     82 wrds-dl download \
     83   --schema crsp \
     84   --table msf \
     85   --columns "permno,date,ret" --limit 1000 \
     86   --out crsp_msf_sample.csv
     87 ```
     88 
     89 Format is inferred from the output file extension (`.parquet` or `.csv`). Override with `--format`.
     90 
     91 ### All download flags
     92 
     93 | Flag | Description |
     94 |---|---|
     95 | `--schema` | Schema name (e.g. `crsp`, `comp`) |
     96 | `--table` | Table name (e.g. `dsf`, `funda`) |
     97 | `-c`, `--columns` | Columns to select (comma-separated, default `*`) |
     98 | `--where` | SQL `WHERE` clause, without the keyword |
     99 | `--query` | Full SQL query — overrides `--schema`, `--table`, `--where`, `--columns` |
    100 | `--out` | Output file path (required unless `--dry-run`) |
    101 | `--format` | `parquet` or `csv` (inferred from extension if omitted) |
    102 | `--limit` | Row limit, useful for testing (default: no limit) |
    103 | `--dry-run` | Preview query, row count, and first 5 rows without downloading |
    104 
    105 ### Table info
    106 
    107 Inspect table metadata without downloading data:
    108 
    109 ```sh
    110 wrds-dl info --schema crsp --table dsf
    111 ```
    112 
    113 Output:
    114 
    115 ```
    116 crsp.dsf
    117   Daily Stock File
    118   ~245302893 rows, 47 GB
    119 
    120 NAME        TYPE                          NULLABLE  DESCRIPTION
    121 cusip       character varying(8)          YES       CUSIP - HISTORICAL
    122 permno      double precision              YES       PERMNO
    123 ...
    124 ```
    125 
    126 For machine-readable output:
    127 
    128 ```sh
    129 wrds-dl info --schema crsp --table dsf --json
    130 ```
    131 
    132 ## How it works
    133 
    134 Connects directly to the WRDS PostgreSQL server using [psycopg](https://www.psycopg.org/) (v3) with bundled `libpq` for cross-platform portability (ARM macOS, x86 Linux, etc.).
    135 
    136 **Downloads** use server-side cursors to stream rows without loading the entire result into memory:
    137 - **Parquet**: rows are batched (10,000 per row group) and written with ZSTD compression via [PyArrow](https://arrow.apache.org/docs/python/).
    138 - **CSV**: rows are streamed to disk via Python's `csv` module.
    139 
    140 Progress is reported to stderr every 10,000 rows.
    141 
    142 ## Dependencies
    143 
    144 | Package | Purpose |
    145 |---|---|
    146 | [`psycopg[binary]`](https://www.psycopg.org/) | PostgreSQL driver with bundled libpq |
    147 | [`pyarrow`](https://arrow.apache.org/docs/python/) | Parquet writing with ZSTD compression |
    148 | [`click`](https://click.palletsprojects.com/) | CLI framework |
    149 
    150 ## Development
    151 
    152 ```sh
    153 cd python
    154 uv sync
    155 uv run pytest
    156 uv run ruff check src tests
    157 ```