wrds-download

TUI/CLI tool for browsing and downloading WRDS data
Log | Files | Refs | README

README.md (6217B)


      1 # wrds-dl (Go)
      2 
      3 Full-featured terminal tool for browsing and downloading data from the [WRDS](https://wrds-www.wharton.upenn.edu/) PostgreSQL database. Includes an interactive TUI for exploration and a CLI for scripted downloads. Output is Parquet or CSV — pure Go, no CGo, cross-platform.
      4 
      5 ## Features
      6 
      7 - **TUI** — browse schemas and tables, inspect column metadata, trigger downloads without leaving the terminal
      8 - **CLI** — scriptable `download` command with structured flags or raw SQL
      9 - **`info` command** — inspect table metadata (columns, types, row count) from the command line or scripts
     10 - **Parquet output** — streams rows via pgx and writes Parquet with ZSTD compression using parquet-go (pure Go)
     11 - **CSV output** — streams rows to CSV via encoding/csv
     12 - **Progress feedback** — live row count during large exports (CLI and TUI)
     13 - **Dry-run mode** — preview the query, row count, and first 5 rows before committing to a download
     14 - **Login flow** — interactive login screen with Duo 2FA support; saved credentials for one-press reconnect
     15 - **Database switching** — browse and switch between WRDS databases from within the TUI
     16 
     17 ## Installation
     18 
     19 ### Pre-built binaries (recommended)
     20 
     21 Download the latest release from the [Releases page](https://github.com/louloulibs/wrds-download/releases):
     22 
     23 | Platform | Binary |
     24 |---|---|
     25 | macOS (Apple Silicon) | `wrds-dl-darwin-arm64` |
     26 | macOS (Intel) | `wrds-dl-darwin-amd64` |
     27 | Linux x86-64 | `wrds-dl-linux-amd64` |
     28 | Windows x86-64 | `wrds-dl-windows-amd64.exe` |
     29 
     30 ```sh
     31 # macOS example
     32 curl -L https://github.com/louloulibs/wrds-download/releases/latest/download/wrds-dl-darwin-arm64 \
     33   -o /usr/local/bin/wrds-dl
     34 chmod +x /usr/local/bin/wrds-dl
     35 ```
     36 
     37 ### Build from source
     38 
     39 Requires Go 1.25+. No CGo or C compiler needed.
     40 
     41 ```sh
     42 cd go
     43 CGO_ENABLED=0 go build -ldflags="-s -w" -o wrds-dl .
     44 mv wrds-dl /usr/local/bin/
     45 ```
     46 
     47 ## TUI
     48 
     49 Launch the interactive browser:
     50 
     51 ```sh
     52 wrds-dl tui
     53 ```
     54 
     55 The TUI has three panes: **Schemas**, **Tables**, and **Preview** (column catalog with types, descriptions, and table stats).
     56 
     57 ### Keybindings
     58 
     59 | Key | Action |
     60 |---|---|
     61 | `tab` / `shift+tab` | Cycle focus between panes |
     62 | `right` / `l` | Drill into selected schema or table |
     63 | `left` / `h` | Go back one pane |
     64 | `d` | Open download dialog for the selected table |
     65 | `b` | Switch database |
     66 | `/` | Filter current list (schemas, tables, or columns) |
     67 | `esc` | Cancel / dismiss overlay |
     68 | `q` / `ctrl+c` | Quit |
     69 
     70 ### Download dialog
     71 
     72 Press `d` on a selected table to open the download form:
     73 
     74 | Field | Description |
     75 |---|---|
     76 | SELECT columns | Comma-separated column names, or `*` for all |
     77 | WHERE clause | SQL filter without the `WHERE` keyword |
     78 | LIMIT rows | Maximum number of rows to download (leave empty for no limit) |
     79 | Output path | File path; defaults to `./schema_table.parquet` |
     80 | Format | `parquet` or `csv` |
     81 
     82 ## CLI
     83 
     84 ### CRSP monthly stock file
     85 
     86 ```sh
     87 # Inspect the table first
     88 wrds-dl info --schema crsp --table msf
     89 
     90 # Download prices and returns for 2020
     91 wrds-dl download \
     92   --schema crsp \
     93   --table msf \
     94   --columns "permno,date,prc,ret,shrout" \
     95   --where "date >= '2020-01-01' AND date < '2021-01-01'" \
     96   --out crsp_msf_2020.parquet
     97 
     98 # Dry run — check row count before committing
     99 wrds-dl download \
    100   --schema crsp \
    101   --table msf \
    102   --where "date = '2020-01-31'" \
    103   --dry-run
    104 ```
    105 
    106 ### CRSP daily stock file
    107 
    108 ```sh
    109 wrds-dl download \
    110   --schema crsp \
    111   --table dsf \
    112   --where "date >= '2020-01-01' AND date < '2021-01-01'" \
    113   --out crsp_dsf_2020.parquet
    114 ```
    115 
    116 ### Select specific columns
    117 
    118 ```sh
    119 wrds-dl download \
    120   --schema comp \
    121   --table funda \
    122   --columns "gvkey,datadate,sale,at" \
    123   --out funda_subset.parquet
    124 ```
    125 
    126 ### Raw SQL
    127 
    128 ```sh
    129 wrds-dl download \
    130   --query "SELECT permno, date, prc FROM crsp.msf WHERE date >= '2015-01-01'" \
    131   --out crsp_msf_2015_onwards.parquet
    132 ```
    133 
    134 ### CSV output
    135 
    136 ```sh
    137 wrds-dl download \
    138   --schema crsp \
    139   --table msf \
    140   --columns "permno,date,ret" --limit 1000 \
    141   --out crsp_msf_sample.csv
    142 ```
    143 
    144 Format is inferred from the output file extension (`.parquet` or `.csv`). Override with `--format`.
    145 
    146 ### All download flags
    147 
    148 | Flag | Description |
    149 |---|---|
    150 | `--schema` | Schema name (e.g. `crsp`, `comp`) |
    151 | `--table` | Table name (e.g. `dsf`, `funda`) |
    152 | `-c`, `--columns` | Columns to select (comma-separated, default `*`) |
    153 | `--where` | SQL `WHERE` clause, without the keyword |
    154 | `--query` | Full SQL query — overrides `--schema`, `--table`, `--where`, `--columns` |
    155 | `--out` | Output file path (required unless `--dry-run`) |
    156 | `--format` | `parquet` or `csv` (inferred from extension if omitted) |
    157 | `--limit` | Row limit, useful for testing (default: no limit) |
    158 | `--dry-run` | Preview query, row count, and first 5 rows without downloading |
    159 
    160 ### Table info
    161 
    162 ```sh
    163 wrds-dl info --schema crsp --table dsf
    164 ```
    165 
    166 For machine-readable output:
    167 
    168 ```sh
    169 wrds-dl info --schema crsp --table dsf --json
    170 ```
    171 
    172 ## How it works
    173 
    174 `wrds-dl` connects directly to the WRDS PostgreSQL server using [pgx](https://github.com/jackc/pgx). All operations go through a single pooled connection (limited to 1 to avoid triggering multiple Duo 2FA prompts).
    175 
    176 **Downloads** stream rows from Postgres and write them incrementally:
    177 - **Parquet**: rows are batched (10,000 per row group) and written with ZSTD compression via [parquet-go](https://github.com/parquet-go/parquet-go). String columns use PLAIN encoding for broad compatibility (R, Python, Julia).
    178 - **CSV**: rows are streamed directly to disk via Go's `encoding/csv`.
    179 
    180 ## Dependencies
    181 
    182 | Package | Purpose |
    183 |---|---|
    184 | [`jackc/pgx/v5`](https://github.com/jackc/pgx) | PostgreSQL driver — all queries and data streaming |
    185 | [`parquet-go/parquet-go`](https://github.com/parquet-go/parquet-go) | Parquet file writing with ZSTD compression (pure Go) |
    186 | [`charmbracelet/bubbletea`](https://github.com/charmbracelet/bubbletea) | TUI framework |
    187 | [`charmbracelet/bubbles`](https://github.com/charmbracelet/bubbles) | List, text-input, spinner components |
    188 | [`charmbracelet/lipgloss`](https://github.com/charmbracelet/lipgloss) | Terminal layout and styling |
    189 | [`spf13/cobra`](https://github.com/spf13/cobra) | CLI commands and flags |