commit 945e9cb45239d63a0a7e3c8241aedc6c2e9ce344
parent 1a1d85d4018c7ae27a2ddd6ebea1afc2e35dba70
Author: Erik Loualiche <eloualic@umn.edu>
Date: Tue, 31 Mar 2026 15:58:12 -0500
docs: add README with usage reference for all three tools
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat:
| A | README.md | | | 118 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 118 insertions(+), 0 deletions(-)
diff --git a/README.md b/README.md
@@ -0,0 +1,118 @@
+# dt-cli-tools
+
+CLI tools for viewing, filtering, and comparing tabular data files. Supports CSV, TSV, Parquet, Arrow/Feather, JSON, NDJSON, and Excel.
+
+Three read-only tools: **dtcat**, **dtfilter**, **dtdiff**.
+
+## Install
+
+```bash
+cargo install --path .
+```
+
+## Formats
+
+| Format | Extensions | Detection |
+|--------|-----------|-----------|
+| CSV | `.csv` | delimiter heuristic |
+| TSV | `.tsv`, `.tab` | delimiter heuristic |
+| Parquet | `.parquet`, `.pq` | `PAR1` magic |
+| Arrow/Feather | `.arrow`, `.feather`, `.ipc` | `ARROW1` magic |
+| JSON | `.json` | `[` prefix |
+| NDJSON | `.ndjson`, `.jsonl` | `{` prefix |
+| Excel | `.xlsx`, `.xls`, `.xlsb`, `.ods` | ZIP/OLE magic |
+
+Format detection: `--format` flag > magic bytes > file extension.
+
+CSV delimiter auto-detected (comma, tab, semicolon).
+
+---
+
+## dtcat
+
+View and inspect files. Outputs markdown tables by default.
+
+```bash
+dtcat data.parquet # schema + data (≤50 rows all, >50 head/tail 25)
+dtcat data.csv --schema # column names and types
+dtcat data.csv --describe # summary statistics
+dtcat report.xlsx --info # file metadata (size, format, sheets)
+dtcat report.xlsx --sheet Revenue # specific Excel sheet
+dtcat data.csv --head 10 # first 10 rows
+dtcat data.csv --tail 5 # last 5 rows
+dtcat data.parquet --csv # output as CSV for piping
+dtcat data.txt --format csv # override format detection
+dtcat data.csv --skip 2 # skip metadata rows above header
+```
+
+Modes `--schema`, `--describe`, and data (default) are mutually exclusive.
+
+## dtfilter
+
+Filter, sort, and select.
+
+```bash
+dtfilter data.csv --filter State=CA # equality
+dtfilter data.csv --filter Amount>1000 # numeric comparison
+dtfilter data.csv --filter State=CA --filter Amount>1000 # AND logic
+dtfilter data.csv --filter Name~john # contains (case-insensitive)
+dtfilter data.csv --filter Status!=Draft # not equals
+dtfilter data.csv --columns State,City,Amount # select columns
+dtfilter data.csv --sort Amount:desc # sort descending
+dtfilter data.csv --sort Name # sort ascending (default)
+dtfilter data.csv --filter Active=true --limit 10 # cap output rows
+dtfilter data.csv --head 100 --filter State=CA # window before filter
+dtfilter data.parquet --filter value>0 --csv # CSV output
+```
+
+Filter operators: `=` `!=` `>` `<` `>=` `<=` `~` (contains) `!~` (not contains).
+
+`--head`/`--tail` apply before filtering. `--limit` applies after. `--head` and `--tail` are mutually exclusive.
+
+## dtdiff
+
+Compare two files of the same format. Exit code 0 = identical, 1 = differences, 2 = error.
+
+```bash
+dtdiff old.csv new.csv # positional comparison
+dtdiff old.csv new.csv --key ID # key-based (added/removed/modified)
+dtdiff old.csv new.csv --key Date,Ticker # composite key
+dtdiff old.csv new.csv --key ID --tolerance 0.01 # float tolerance
+dtdiff old.csv new.csv --key ID --json # JSON output
+dtdiff old.csv new.csv --key ID --csv # CSV output
+dtdiff old.csv new.csv --no-color # plain text
+dtdiff report.xlsx other.xlsx --sheet Revenue # Excel sheets
+```
+
+Both files must be the same format (CSV/TSV are treated as compatible).
+
+**Positional mode** (no `--key`): reports added/removed rows based on full-row equality.
+
+**Key-based mode** (`--key`): matches by key columns, reports added/removed/modified with cell-level changes.
+
+---
+
+## Exit Codes
+
+| Tool | 0 | 1 | 2 |
+|------|---|---|---|
+| dtcat | success | runtime error | invalid arguments |
+| dtfilter | success | runtime error | invalid arguments |
+| dtdiff | no differences | differences found | error |
+
+## Architecture
+
+Library crate `dtcore` with three thin binaries. ~60% ported from [xl-cli-tools](https://github.com/LouLouLibs/xl-cli-tools).
+
+```
+src/
+ format.rs # format detection (magic bytes + extension)
+ reader.rs # dispatch to format-specific readers
+ readers/ # CSV, Parquet, Arrow, JSON, Excel
+ formatter.rs # DataFrame → markdown/CSV output
+ filter.rs # filter expressions, sort, pipeline
+ diff.rs # positional and key-based comparison
+ metadata.rs # FileInfo, SheetInfo, display helpers
+```
+
+Built on [Polars](https://pola.rs/) for DataFrames, [calamine](https://github.com/tafia/calamine) for Excel, [clap](https://clap.rs/) for CLI.