README.md (6217B)
1 # wrds-dl (Go) 2 3 Full-featured terminal tool for browsing and downloading data from the [WRDS](https://wrds-www.wharton.upenn.edu/) PostgreSQL database. Includes an interactive TUI for exploration and a CLI for scripted downloads. Output is Parquet or CSV — pure Go, no CGo, cross-platform. 4 5 ## Features 6 7 - **TUI** — browse schemas and tables, inspect column metadata, trigger downloads without leaving the terminal 8 - **CLI** — scriptable `download` command with structured flags or raw SQL 9 - **`info` command** — inspect table metadata (columns, types, row count) from the command line or scripts 10 - **Parquet output** — streams rows via pgx and writes Parquet with ZSTD compression using parquet-go (pure Go) 11 - **CSV output** — streams rows to CSV via encoding/csv 12 - **Progress feedback** — live row count during large exports (CLI and TUI) 13 - **Dry-run mode** — preview the query, row count, and first 5 rows before committing to a download 14 - **Login flow** — interactive login screen with Duo 2FA support; saved credentials for one-press reconnect 15 - **Database switching** — browse and switch between WRDS databases from within the TUI 16 17 ## Installation 18 19 ### Pre-built binaries (recommended) 20 21 Download the latest release from the [Releases page](https://github.com/louloulibs/wrds-download/releases): 22 23 | Platform | Binary | 24 |---|---| 25 | macOS (Apple Silicon) | `wrds-dl-darwin-arm64` | 26 | macOS (Intel) | `wrds-dl-darwin-amd64` | 27 | Linux x86-64 | `wrds-dl-linux-amd64` | 28 | Windows x86-64 | `wrds-dl-windows-amd64.exe` | 29 30 ```sh 31 # macOS example 32 curl -L https://github.com/louloulibs/wrds-download/releases/latest/download/wrds-dl-darwin-arm64 \ 33 -o /usr/local/bin/wrds-dl 34 chmod +x /usr/local/bin/wrds-dl 35 ``` 36 37 ### Build from source 38 39 Requires Go 1.25+. No CGo or C compiler needed. 40 41 ```sh 42 cd go 43 CGO_ENABLED=0 go build -ldflags="-s -w" -o wrds-dl . 44 mv wrds-dl /usr/local/bin/ 45 ``` 46 47 ## TUI 48 49 Launch the interactive browser: 50 51 ```sh 52 wrds-dl tui 53 ``` 54 55 The TUI has three panes: **Schemas**, **Tables**, and **Preview** (column catalog with types, descriptions, and table stats). 56 57 ### Keybindings 58 59 | Key | Action | 60 |---|---| 61 | `tab` / `shift+tab` | Cycle focus between panes | 62 | `right` / `l` | Drill into selected schema or table | 63 | `left` / `h` | Go back one pane | 64 | `d` | Open download dialog for the selected table | 65 | `b` | Switch database | 66 | `/` | Filter current list (schemas, tables, or columns) | 67 | `esc` | Cancel / dismiss overlay | 68 | `q` / `ctrl+c` | Quit | 69 70 ### Download dialog 71 72 Press `d` on a selected table to open the download form: 73 74 | Field | Description | 75 |---|---| 76 | SELECT columns | Comma-separated column names, or `*` for all | 77 | WHERE clause | SQL filter without the `WHERE` keyword | 78 | LIMIT rows | Maximum number of rows to download (leave empty for no limit) | 79 | Output path | File path; defaults to `./schema_table.parquet` | 80 | Format | `parquet` or `csv` | 81 82 ## CLI 83 84 ### CRSP monthly stock file 85 86 ```sh 87 # Inspect the table first 88 wrds-dl info --schema crsp --table msf 89 90 # Download prices and returns for 2020 91 wrds-dl download \ 92 --schema crsp \ 93 --table msf \ 94 --columns "permno,date,prc,ret,shrout" \ 95 --where "date >= '2020-01-01' AND date < '2021-01-01'" \ 96 --out crsp_msf_2020.parquet 97 98 # Dry run — check row count before committing 99 wrds-dl download \ 100 --schema crsp \ 101 --table msf \ 102 --where "date = '2020-01-31'" \ 103 --dry-run 104 ``` 105 106 ### CRSP daily stock file 107 108 ```sh 109 wrds-dl download \ 110 --schema crsp \ 111 --table dsf \ 112 --where "date >= '2020-01-01' AND date < '2021-01-01'" \ 113 --out crsp_dsf_2020.parquet 114 ``` 115 116 ### Select specific columns 117 118 ```sh 119 wrds-dl download \ 120 --schema comp \ 121 --table funda \ 122 --columns "gvkey,datadate,sale,at" \ 123 --out funda_subset.parquet 124 ``` 125 126 ### Raw SQL 127 128 ```sh 129 wrds-dl download \ 130 --query "SELECT permno, date, prc FROM crsp.msf WHERE date >= '2015-01-01'" \ 131 --out crsp_msf_2015_onwards.parquet 132 ``` 133 134 ### CSV output 135 136 ```sh 137 wrds-dl download \ 138 --schema crsp \ 139 --table msf \ 140 --columns "permno,date,ret" --limit 1000 \ 141 --out crsp_msf_sample.csv 142 ``` 143 144 Format is inferred from the output file extension (`.parquet` or `.csv`). Override with `--format`. 145 146 ### All download flags 147 148 | Flag | Description | 149 |---|---| 150 | `--schema` | Schema name (e.g. `crsp`, `comp`) | 151 | `--table` | Table name (e.g. `dsf`, `funda`) | 152 | `-c`, `--columns` | Columns to select (comma-separated, default `*`) | 153 | `--where` | SQL `WHERE` clause, without the keyword | 154 | `--query` | Full SQL query — overrides `--schema`, `--table`, `--where`, `--columns` | 155 | `--out` | Output file path (required unless `--dry-run`) | 156 | `--format` | `parquet` or `csv` (inferred from extension if omitted) | 157 | `--limit` | Row limit, useful for testing (default: no limit) | 158 | `--dry-run` | Preview query, row count, and first 5 rows without downloading | 159 160 ### Table info 161 162 ```sh 163 wrds-dl info --schema crsp --table dsf 164 ``` 165 166 For machine-readable output: 167 168 ```sh 169 wrds-dl info --schema crsp --table dsf --json 170 ``` 171 172 ## How it works 173 174 `wrds-dl` connects directly to the WRDS PostgreSQL server using [pgx](https://github.com/jackc/pgx). All operations go through a single pooled connection (limited to 1 to avoid triggering multiple Duo 2FA prompts). 175 176 **Downloads** stream rows from Postgres and write them incrementally: 177 - **Parquet**: rows are batched (10,000 per row group) and written with ZSTD compression via [parquet-go](https://github.com/parquet-go/parquet-go). String columns use PLAIN encoding for broad compatibility (R, Python, Julia). 178 - **CSV**: rows are streamed directly to disk via Go's `encoding/csv`. 179 180 ## Dependencies 181 182 | Package | Purpose | 183 |---|---| 184 | [`jackc/pgx/v5`](https://github.com/jackc/pgx) | PostgreSQL driver — all queries and data streaming | 185 | [`parquet-go/parquet-go`](https://github.com/parquet-go/parquet-go) | Parquet file writing with ZSTD compression (pure Go) | 186 | [`charmbracelet/bubbletea`](https://github.com/charmbracelet/bubbletea) | TUI framework | 187 | [`charmbracelet/bubbles`](https://github.com/charmbracelet/bubbles) | List, text-input, spinner components | 188 | [`charmbracelet/lipgloss`](https://github.com/charmbracelet/lipgloss) | Terminal layout and styling | 189 | [`spf13/cobra`](https://github.com/spf13/cobra) | CLI commands and flags |