dt-cli-tools

CLI tools for viewing, filtering, and comparing tabular data files
Log | Files | Refs | README | LICENSE

2026-03-30-dt-cli-tools.md (62470B)


      1 # dt-cli-tools Implementation Plan
      2 
      3 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
      4 
      5 **Goal:** Build a Rust CLI tool suite (`dtcat`, `dtfilter`, `dtdiff`) for inspecting, querying, and comparing tabular data files across formats (CSV, Parquet, Arrow, JSON, Excel).
      6 
      7 **Architecture:** Multi-format reader layer with automatic format detection feeds DataFrames into format-agnostic modules (formatter, filter, diff) ported from xl-cli-tools. Three binaries share the `dtcore` library crate.
      8 
      9 **Tech Stack:** Rust 2024 edition, Polars 0.46 (DataFrame engine + CSV/Parquet/Arrow/JSON readers), calamine (Excel), clap (CLI), anyhow (errors), serde_json (JSON output).
     10 
     11 **Source reference:** xl-cli-tools at `/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/`
     12 
     13 ---
     14 
     15 ## File Structure
     16 
     17 ```
     18 dt-cli-tools/
     19   Cargo.toml
     20   src/
     21     lib.rs                    # pub mod declarations
     22     format.rs                 # Format enum, magic-byte + extension detection
     23     reader.rs                 # ReadOptions, read_file dispatch
     24     metadata.rs               # FileInfo, format_file_size (generalized)
     25     formatter.rs              # ported from xl-cli-tools (pure DataFrame formatting)
     26     filter.rs                 # ported from xl-cli-tools (letter-based column resolution removed)
     27     diff.rs                   # ported from xl-cli-tools (pure DataFrame comparison)
     28     readers/
     29       mod.rs                  # sub-module declarations
     30       csv.rs                  # CSV/TSV reader via Polars CsvReader
     31       parquet.rs              # Parquet reader via Polars ParquetReader
     32       arrow.rs                # Arrow IPC reader via Polars IpcReader
     33       json.rs                 # JSON/NDJSON reader via Polars JsonReader/JsonLineReader
     34       excel.rs                # Excel reader via calamine (ported from xl-cli-tools reader.rs)
     35   src/bin/
     36     dtcat.rs                  # view/inspect any tabular file
     37     dtfilter.rs               # filter/query any tabular file
     38     dtdiff.rs                 # compare two tabular files
     39   tests/
     40     integration/
     41       dtcat.rs
     42       dtfilter.rs
     43       dtdiff.rs
     44   demo/                       # fixture files for tests
     45 ```
     46 
     47 ---
     48 
     49 ### Task 1: Project Scaffolding
     50 
     51 **Files:**
     52 - Create: `Cargo.toml`
     53 - Create: `src/lib.rs`
     54 - Create: `src/readers/mod.rs`
     55 
     56 - [ ] **Step 1: Create Cargo.toml**
     57 
     58 ```toml
     59 [package]
     60 name = "dt-cli-tools"
     61 version = "0.1.0"
     62 edition = "2024"
     63 description = "CLI tools for viewing, filtering, and comparing tabular data files"
     64 license = "MIT"
     65 
     66 [lib]
     67 name = "dtcore"
     68 path = "src/lib.rs"
     69 
     70 [[bin]]
     71 name = "dtcat"
     72 path = "src/bin/dtcat.rs"
     73 
     74 [[bin]]
     75 name = "dtfilter"
     76 path = "src/bin/dtfilter.rs"
     77 
     78 [[bin]]
     79 name = "dtdiff"
     80 path = "src/bin/dtdiff.rs"
     81 
     82 [dependencies]
     83 polars = { version = "0.46", default-features = false, features = [
     84     "dtype-datetime",
     85     "csv",
     86     "parquet",
     87     "ipc",
     88     "json",
     89 ] }
     90 calamine = "0.26"
     91 clap = { version = "4", features = ["derive"] }
     92 anyhow = "1"
     93 serde_json = { version = "1", features = ["preserve_order"] }
     94 
     95 [profile.release]
     96 strip = true
     97 lto = true
     98 codegen-units = 1
     99 panic = "abort"
    100 opt-level = "z"
    101 
    102 [dev-dependencies]
    103 assert_cmd = "2"
    104 predicates = "3"
    105 tempfile = "3"
    106 ```
    107 
    108 - [ ] **Step 2: Create src/lib.rs with module declarations**
    109 
    110 ```rust
    111 pub mod diff;
    112 pub mod filter;
    113 pub mod format;
    114 pub mod formatter;
    115 pub mod metadata;
    116 pub mod reader;
    117 pub mod readers;
    118 ```
    119 
    120 - [ ] **Step 3: Create src/readers/mod.rs**
    121 
    122 ```rust
    123 pub mod arrow;
    124 pub mod csv;
    125 pub mod excel;
    126 pub mod json;
    127 pub mod parquet;
    128 ```
    129 
    130 - [ ] **Step 4: Create placeholder files so the project compiles**
    131 
    132 Create minimal empty-module stubs for every file declared in lib.rs and readers/mod.rs. Each stub is just an empty file or contains only `use anyhow::Result;` as needed. Also create empty `src/bin/dtcat.rs`, `src/bin/dtfilter.rs`, `src/bin/dtdiff.rs` with `fn main() {}`.
    133 
    134 - [ ] **Step 5: Verify the project compiles**
    135 
    136 Run: `cargo check 2>&1`
    137 Expected: compiles with no errors (warnings OK at this stage)
    138 
    139 - [ ] **Step 6: Commit**
    140 
    141 ```bash
    142 git add Cargo.toml src/
    143 git commit -m "feat: scaffold dt-cli-tools project structure"
    144 ```
    145 
    146 ---
    147 
    148 ### Task 2: Format Detection (`format.rs`)
    149 
    150 **Files:**
    151 - Create: `src/format.rs`
    152 
    153 - [ ] **Step 1: Write tests for format detection**
    154 
    155 ```rust
    156 // src/format.rs
    157 
    158 use anyhow::{Result, bail};
    159 use std::path::Path;
    160 use std::io::Read;
    161 
    162 #[derive(Debug, Clone, Copy, PartialEq)]
    163 pub enum Format {
    164     Csv,
    165     Tsv,
    166     Parquet,
    167     Arrow,
    168     Json,
    169     Ndjson,
    170     Excel,
    171 }
    172 
    173 impl Format {
    174     /// Returns true if this format and `other` belong to the same family
    175     /// (e.g. Csv and Tsv are both delimited text).
    176     pub fn same_family(&self, other: &Format) -> bool {
    177         matches!(
    178             (self, other),
    179             (Format::Csv, Format::Tsv)
    180                 | (Format::Tsv, Format::Csv)
    181                 | (Format::Json, Format::Ndjson)
    182                 | (Format::Ndjson, Format::Json)
    183         ) || self == other
    184     }
    185 }
    186 
    187 // Placeholder public functions — will implement in Step 3
    188 pub fn detect_format(path: &Path, override_fmt: Option<&str>) -> Result<Format> {
    189     todo!()
    190 }
    191 
    192 pub fn parse_format_str(s: &str) -> Result<Format> {
    193     todo!()
    194 }
    195 
    196 fn detect_by_magic(path: &Path) -> Result<Option<Format>> {
    197     todo!()
    198 }
    199 
    200 fn detect_by_extension(path: &Path) -> Result<Format> {
    201     todo!()
    202 }
    203 
    204 /// Auto-detect CSV delimiter by sampling the first few lines.
    205 /// Returns b',' (comma), b'\t' (tab), or b';' (semicolon).
    206 pub fn detect_csv_delimiter(path: &Path) -> Result<u8> {
    207     todo!()
    208 }
    209 
    210 #[cfg(test)]
    211 mod tests {
    212     use super::*;
    213     use std::io::Write;
    214     use tempfile::NamedTempFile;
    215 
    216     // -- parse_format_str --
    217 
    218     #[test]
    219     fn parse_csv() {
    220         assert_eq!(parse_format_str("csv").unwrap(), Format::Csv);
    221     }
    222 
    223     #[test]
    224     fn parse_tsv() {
    225         assert_eq!(parse_format_str("tsv").unwrap(), Format::Tsv);
    226     }
    227 
    228     #[test]
    229     fn parse_parquet() {
    230         assert_eq!(parse_format_str("parquet").unwrap(), Format::Parquet);
    231     }
    232 
    233     #[test]
    234     fn parse_arrow() {
    235         assert_eq!(parse_format_str("arrow").unwrap(), Format::Arrow);
    236     }
    237 
    238     #[test]
    239     fn parse_json() {
    240         assert_eq!(parse_format_str("json").unwrap(), Format::Json);
    241     }
    242 
    243     #[test]
    244     fn parse_ndjson() {
    245         assert_eq!(parse_format_str("ndjson").unwrap(), Format::Ndjson);
    246     }
    247 
    248     #[test]
    249     fn parse_excel() {
    250         assert_eq!(parse_format_str("excel").unwrap(), Format::Excel);
    251         assert_eq!(parse_format_str("xlsx").unwrap(), Format::Excel);
    252     }
    253 
    254     #[test]
    255     fn parse_unknown_is_err() {
    256         assert!(parse_format_str("banana").is_err());
    257     }
    258 
    259     #[test]
    260     fn parse_case_insensitive() {
    261         assert_eq!(parse_format_str("CSV").unwrap(), Format::Csv);
    262         assert_eq!(parse_format_str("Parquet").unwrap(), Format::Parquet);
    263     }
    264 
    265     // -- detect_by_extension --
    266 
    267     #[test]
    268     fn ext_csv() {
    269         assert_eq!(detect_by_extension(Path::new("data.csv")).unwrap(), Format::Csv);
    270     }
    271 
    272     #[test]
    273     fn ext_tsv() {
    274         assert_eq!(detect_by_extension(Path::new("data.tsv")).unwrap(), Format::Tsv);
    275         assert_eq!(detect_by_extension(Path::new("data.tab")).unwrap(), Format::Tsv);
    276     }
    277 
    278     #[test]
    279     fn ext_parquet() {
    280         assert_eq!(detect_by_extension(Path::new("data.parquet")).unwrap(), Format::Parquet);
    281         assert_eq!(detect_by_extension(Path::new("data.pq")).unwrap(), Format::Parquet);
    282     }
    283 
    284     #[test]
    285     fn ext_arrow() {
    286         assert_eq!(detect_by_extension(Path::new("data.arrow")).unwrap(), Format::Arrow);
    287         assert_eq!(detect_by_extension(Path::new("data.feather")).unwrap(), Format::Arrow);
    288         assert_eq!(detect_by_extension(Path::new("data.ipc")).unwrap(), Format::Arrow);
    289     }
    290 
    291     #[test]
    292     fn ext_json() {
    293         assert_eq!(detect_by_extension(Path::new("data.json")).unwrap(), Format::Json);
    294     }
    295 
    296     #[test]
    297     fn ext_ndjson() {
    298         assert_eq!(detect_by_extension(Path::new("data.ndjson")).unwrap(), Format::Ndjson);
    299         assert_eq!(detect_by_extension(Path::new("data.jsonl")).unwrap(), Format::Ndjson);
    300     }
    301 
    302     #[test]
    303     fn ext_excel() {
    304         assert_eq!(detect_by_extension(Path::new("data.xlsx")).unwrap(), Format::Excel);
    305         assert_eq!(detect_by_extension(Path::new("data.xls")).unwrap(), Format::Excel);
    306         assert_eq!(detect_by_extension(Path::new("data.xlsb")).unwrap(), Format::Excel);
    307         assert_eq!(detect_by_extension(Path::new("data.ods")).unwrap(), Format::Excel);
    308     }
    309 
    310     #[test]
    311     fn ext_unknown_is_err() {
    312         assert!(detect_by_extension(Path::new("data.txt")).is_err());
    313         assert!(detect_by_extension(Path::new("data")).is_err());
    314     }
    315 
    316     // -- detect_by_magic --
    317 
    318     #[test]
    319     fn magic_parquet() {
    320         let mut f = NamedTempFile::with_suffix(".bin").unwrap();
    321         f.write_all(b"PAR1some_data").unwrap();
    322         f.flush().unwrap();
    323         assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Parquet));
    324     }
    325 
    326     #[test]
    327     fn magic_arrow() {
    328         let mut f = NamedTempFile::with_suffix(".bin").unwrap();
    329         f.write_all(b"ARROW1some_data").unwrap();
    330         f.flush().unwrap();
    331         assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Arrow));
    332     }
    333 
    334     #[test]
    335     fn magic_xlsx_zip() {
    336         let mut f = NamedTempFile::with_suffix(".bin").unwrap();
    337         f.write_all(&[0x50, 0x4B, 0x03, 0x04, 0x00]).unwrap();
    338         f.flush().unwrap();
    339         assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Excel));
    340     }
    341 
    342     #[test]
    343     fn magic_xls_ole() {
    344         let mut f = NamedTempFile::with_suffix(".bin").unwrap();
    345         f.write_all(&[0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1]).unwrap();
    346         f.flush().unwrap();
    347         assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Excel));
    348     }
    349 
    350     #[test]
    351     fn magic_json_array() {
    352         let mut f = NamedTempFile::with_suffix(".bin").unwrap();
    353         f.write_all(b"[{\"a\":1}]").unwrap();
    354         f.flush().unwrap();
    355         assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Json));
    356     }
    357 
    358     #[test]
    359     fn magic_json_object() {
    360         let mut f = NamedTempFile::with_suffix(".bin").unwrap();
    361         f.write_all(b"{\"a\":1}\n{\"a\":2}").unwrap();
    362         f.flush().unwrap();
    363         // Leading { suggests NDJSON
    364         assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Ndjson));
    365     }
    366 
    367     #[test]
    368     fn magic_csv_fallback_none() {
    369         // Plain text with commas — magic returns None, falls back to extension
    370         let mut f = NamedTempFile::with_suffix(".bin").unwrap();
    371         f.write_all(b"a,b,c\n1,2,3\n").unwrap();
    372         f.flush().unwrap();
    373         assert_eq!(detect_by_magic(f.path()).unwrap(), None);
    374     }
    375 
    376     // -- detect_format (integration) --
    377 
    378     #[test]
    379     fn override_wins() {
    380         // Even with .csv extension, override to parquet
    381         assert_eq!(
    382             detect_format(Path::new("data.csv"), Some("parquet")).unwrap(),
    383             Format::Parquet
    384         );
    385     }
    386 
    387     // -- same_family --
    388 
    389     #[test]
    390     fn csv_tsv_same_family() {
    391         assert!(Format::Csv.same_family(&Format::Tsv));
    392         assert!(Format::Tsv.same_family(&Format::Csv));
    393     }
    394 
    395     #[test]
    396     fn json_ndjson_same_family() {
    397         assert!(Format::Json.same_family(&Format::Ndjson));
    398     }
    399 
    400     #[test]
    401     fn csv_parquet_different_family() {
    402         assert!(!Format::Csv.same_family(&Format::Parquet));
    403     }
    404 
    405     // -- detect_csv_delimiter --
    406 
    407     #[test]
    408     fn delimiter_comma() {
    409         let mut f = NamedTempFile::with_suffix(".csv").unwrap();
    410         f.write_all(b"a,b,c\n1,2,3\n4,5,6\n").unwrap();
    411         f.flush().unwrap();
    412         assert_eq!(detect_csv_delimiter(f.path()).unwrap(), b',');
    413     }
    414 
    415     #[test]
    416     fn delimiter_tab() {
    417         let mut f = NamedTempFile::with_suffix(".tsv").unwrap();
    418         f.write_all(b"a\tb\tc\n1\t2\t3\n").unwrap();
    419         f.flush().unwrap();
    420         assert_eq!(detect_csv_delimiter(f.path()).unwrap(), b'\t');
    421     }
    422 
    423     #[test]
    424     fn delimiter_semicolon() {
    425         let mut f = NamedTempFile::with_suffix(".csv").unwrap();
    426         f.write_all(b"a;b;c\n1;2;3\n").unwrap();
    427         f.flush().unwrap();
    428         assert_eq!(detect_csv_delimiter(f.path()).unwrap(), b';');
    429     }
    430 }
    431 ```
    432 
    433 - [ ] **Step 2: Run tests to verify they fail**
    434 
    435 Run: `cargo test --lib format:: 2>&1 | tail -5`
    436 Expected: all tests FAIL (todo! panics)
    437 
    438 - [ ] **Step 3: Implement format detection**
    439 
    440 Replace the `todo!()` bodies with real implementations:
    441 
    442 ```rust
    443 pub fn parse_format_str(s: &str) -> Result<Format> {
    444     match s.to_lowercase().as_str() {
    445         "csv" => Ok(Format::Csv),
    446         "tsv" | "tab" => Ok(Format::Tsv),
    447         "parquet" | "pq" => Ok(Format::Parquet),
    448         "arrow" | "feather" | "ipc" => Ok(Format::Arrow),
    449         "json" => Ok(Format::Json),
    450         "ndjson" | "jsonl" => Ok(Format::Ndjson),
    451         "excel" | "xlsx" | "xls" | "xlsb" | "ods" => Ok(Format::Excel),
    452         _ => bail!("unknown format '{}'. Supported: csv, tsv, parquet, arrow, json, ndjson, excel", s),
    453     }
    454 }
    455 
    456 fn detect_by_extension(path: &Path) -> Result<Format> {
    457     let ext = path
    458         .extension()
    459         .and_then(|e| e.to_str())
    460         .map(|e| e.to_lowercase());
    461 
    462     match ext.as_deref() {
    463         Some("csv") => Ok(Format::Csv),
    464         Some("tsv") | Some("tab") => Ok(Format::Tsv),
    465         Some("parquet") | Some("pq") => Ok(Format::Parquet),
    466         Some("arrow") | Some("feather") | Some("ipc") => Ok(Format::Arrow),
    467         Some("json") => Ok(Format::Json),
    468         Some("ndjson") | Some("jsonl") => Ok(Format::Ndjson),
    469         Some("xlsx") | Some("xls") | Some("xlsb") | Some("ods") => Ok(Format::Excel),
    470         Some(other) => bail!("unrecognized extension '.{}'. Use --format to specify.", other),
    471         None => bail!("no file extension. Use --format to specify the format."),
    472     }
    473 }
    474 
    475 fn detect_by_magic(path: &Path) -> Result<Option<Format>> {
    476     let mut file = std::fs::File::open(path)?;
    477     let mut buf = [0u8; 8];
    478     let n = file.read(&mut buf)?;
    479     if n < 2 {
    480         return Ok(None);
    481     }
    482 
    483     // Parquet: "PAR1"
    484     if n >= 4 && &buf[..4] == b"PAR1" {
    485         return Ok(Some(Format::Parquet));
    486     }
    487     // Arrow IPC: "ARROW1"
    488     if n >= 6 && &buf[..6] == b"ARROW1" {
    489         return Ok(Some(Format::Arrow));
    490     }
    491     // ZIP (xlsx, ods): PK\x03\x04
    492     if buf[0] == 0x50 && buf[1] == 0x4B {
    493         return Ok(Some(Format::Excel));
    494     }
    495     // OLE2 (xls): D0 CF 11 E0
    496     if n >= 4 && buf[0] == 0xD0 && buf[1] == 0xCF && buf[2] == 0x11 && buf[3] == 0xE0 {
    497         return Ok(Some(Format::Excel));
    498     }
    499     // JSON array: starts with [
    500     // Need to skip leading whitespace
    501     let first_non_ws = buf[..n].iter().find(|b| !b.is_ascii_whitespace());
    502     if let Some(b'[') = first_non_ws {
    503         return Ok(Some(Format::Json));
    504     }
    505     if let Some(b'{') = first_non_ws {
    506         return Ok(Some(Format::Ndjson));
    507     }
    508 
    509     // CSV/TSV: no distinctive magic bytes — return None to fall through to extension
    510     Ok(None)
    511 }
    512 
    513 pub fn detect_format(path: &Path, override_fmt: Option<&str>) -> Result<Format> {
    514     if let Some(fmt) = override_fmt {
    515         return parse_format_str(fmt);
    516     }
    517     if let Some(fmt) = detect_by_magic(path)? {
    518         return Ok(fmt);
    519     }
    520     detect_by_extension(path)
    521 }
    522 
    523 pub fn detect_csv_delimiter(path: &Path) -> Result<u8> {
    524     let mut file = std::fs::File::open(path)?;
    525     let mut buf = String::new();
    526     // Read up to 8KB for sampling
    527     file.take(8192).read_to_string(&mut buf)?;
    528 
    529     let lines: Vec<&str> = buf.lines().take(10).collect();
    530     if lines.is_empty() {
    531         return Ok(b',');
    532     }
    533 
    534     let delimiters = [b',', b'\t', b';'];
    535     let mut best = b',';
    536     let mut best_score = 0usize;
    537 
    538     for &d in &delimiters {
    539         let counts: Vec<usize> = lines
    540             .iter()
    541             .map(|line| line.as_bytes().iter().filter(|&&b| b == d).count())
    542             .collect();
    543         // Score: minimum count across lines (consistency matters)
    544         let min_count = *counts.iter().min().unwrap_or(&0);
    545         if min_count > best_score {
    546             best_score = min_count;
    547             best = d;
    548         }
    549     }
    550 
    551     Ok(best)
    552 }
    553 ```
    554 
    555 - [ ] **Step 4: Run tests to verify they pass**
    556 
    557 Run: `cargo test --lib format:: 2>&1`
    558 Expected: all tests PASS
    559 
    560 - [ ] **Step 5: Commit**
    561 
    562 ```bash
    563 git add src/format.rs
    564 git commit -m "feat: add format detection with magic bytes and extension matching"
    565 ```
    566 
    567 ---
    568 
    569 ### Task 3: Metadata Module (`metadata.rs`)
    570 
    571 **Files:**
    572 - Create: `src/metadata.rs`
    573 
    574 - [ ] **Step 1: Write metadata module with tests**
    575 
    576 Port `format_file_size` from xl-cli-tools (`/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/metadata.rs`). Generalize `FileInfo` to include the detected format and work for non-Excel files.
    577 
    578 ```rust
    579 // src/metadata.rs
    580 
    581 use crate::format::Format;
    582 
    583 /// Info about a single sheet (Excel) or the entire file (other formats).
    584 #[derive(Debug, Clone)]
    585 pub struct SheetInfo {
    586     pub name: String,
    587     pub rows: usize, // total rows including header
    588     pub cols: usize,
    589 }
    590 
    591 /// Info about the file.
    592 #[derive(Debug)]
    593 pub struct FileInfo {
    594     pub file_size: u64,
    595     pub format: Format,
    596     pub sheets: Vec<SheetInfo>,
    597 }
    598 
    599 /// Format file size for display: "245 KB", "1.2 MB", etc.
    600 pub fn format_file_size(bytes: u64) -> String {
    601     if bytes < 1_024 {
    602         format!("{bytes} B")
    603     } else if bytes < 1_048_576 {
    604         format!("{:.0} KB", bytes as f64 / 1_024.0)
    605     } else if bytes < 1_073_741_824 {
    606         format!("{:.1} MB", bytes as f64 / 1_048_576.0)
    607     } else {
    608         format!("{:.1} GB", bytes as f64 / 1_073_741_824.0)
    609     }
    610 }
    611 
    612 /// Format name for a Format variant.
    613 pub fn format_name(fmt: Format) -> &'static str {
    614     match fmt {
    615         Format::Csv => "CSV",
    616         Format::Tsv => "TSV",
    617         Format::Parquet => "Parquet",
    618         Format::Arrow => "Arrow IPC",
    619         Format::Json => "JSON",
    620         Format::Ndjson => "NDJSON",
    621         Format::Excel => "Excel",
    622     }
    623 }
    624 
    625 #[cfg(test)]
    626 mod tests {
    627     use super::*;
    628 
    629     #[test]
    630     fn test_format_file_size() {
    631         assert_eq!(format_file_size(500), "500 B");
    632         assert_eq!(format_file_size(2_048), "2 KB");
    633         assert_eq!(format_file_size(1_500_000), "1.4 MB");
    634     }
    635 
    636     #[test]
    637     fn test_format_name() {
    638         assert_eq!(format_name(Format::Csv), "CSV");
    639         assert_eq!(format_name(Format::Parquet), "Parquet");
    640         assert_eq!(format_name(Format::Excel), "Excel");
    641     }
    642 }
    643 ```
    644 
    645 - [ ] **Step 2: Run tests**
    646 
    647 Run: `cargo test --lib metadata:: 2>&1`
    648 Expected: PASS
    649 
    650 - [ ] **Step 3: Commit**
    651 
    652 ```bash
    653 git add src/metadata.rs
    654 git commit -m "feat: add metadata module with FileInfo and format_file_size"
    655 ```
    656 
    657 ---
    658 
    659 ### Task 4: Formatter Module (`formatter.rs`)
    660 
    661 **Files:**
    662 - Create: `src/formatter.rs`
    663 
    664 - [ ] **Step 1: Port formatter.rs from xl-cli-tools**
    665 
    666 Copy `/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/formatter.rs` and update imports:
    667 - Change `use crate::metadata::{format_file_size, FileInfo, SheetInfo};` to `use crate::metadata::{format_file_size, FileInfo, SheetInfo, format_name};`
    668 - Update `format_header` to include the format name: `# File: report.csv (245 KB) [CSV]`
    669 - The rest of the module (format_schema, format_data_table, format_head_tail, format_csv, format_describe, all helper functions, and all tests) transfers verbatim.
    670 
    671 Key change to `format_header`:
    672 ```rust
    673 pub fn format_header(file_name: &str, info: &FileInfo) -> String {
    674     let size_str = format_file_size(info.file_size);
    675     let fmt_name = format_name(info.format);
    676     let sheet_count = info.sheets.len();
    677     if sheet_count > 1 {
    678         format!("# File: {file_name} ({size_str}) [{fmt_name}]\n# Sheets: {sheet_count}\n")
    679     } else {
    680         format!("# File: {file_name} ({size_str}) [{fmt_name}]\n")
    681     }
    682 }
    683 ```
    684 
    685 Update the `format_header` test to match the new output:
    686 ```rust
    687 #[test]
    688 fn test_format_header() {
    689     let info = FileInfo {
    690         file_size: 250_000,
    691         format: Format::Excel,
    692         sheets: vec![
    693             SheetInfo { name: "Sheet1".into(), rows: 100, cols: 5 },
    694             SheetInfo { name: "Sheet2".into(), rows: 50, cols: 3 },
    695         ],
    696     };
    697     let out = format_header("test.xlsx", &info);
    698     assert!(out.contains("# File: test.xlsx (244 KB) [Excel]"));
    699     assert!(out.contains("# Sheets: 2"));
    700 }
    701 
    702 #[test]
    703 fn test_format_header_single_sheet() {
    704     let info = FileInfo {
    705         file_size: 1_000,
    706         format: Format::Csv,
    707         sheets: vec![SheetInfo { name: "data".into(), rows: 10, cols: 3 }],
    708     };
    709     let out = format_header("data.csv", &info);
    710     assert!(out.contains("[CSV]"));
    711     assert!(!out.contains("Sheets"));
    712 }
    713 ```
    714 
    715 All other tests (format_data_table, format_head_tail, format_schema, format_csv, format_describe, etc.) transfer verbatim from xl-cli-tools. They test pure DataFrame formatting and don't reference Excel-specific types.
    716 
    717 - [ ] **Step 2: Run tests**
    718 
    719 Run: `cargo test --lib formatter:: 2>&1`
    720 Expected: all tests PASS
    721 
    722 - [ ] **Step 3: Commit**
    723 
    724 ```bash
    725 git add src/formatter.rs
    726 git commit -m "feat: port formatter module from xl-cli-tools with format-name support"
    727 ```
    728 
    729 ---
    730 
    731 ### Task 5: Filter Module (`filter.rs`)
    732 
    733 **Files:**
    734 - Create: `src/filter.rs`
    735 
    736 - [ ] **Step 1: Port filter.rs from xl-cli-tools, removing letter-based column resolution**
    737 
    738 Copy `/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/filter.rs` and make these changes:
    739 
    740 1. **Remove** `col_letter_to_index` function entirely.
    741 2. **Simplify** `resolve_column` to only do name matching (exact, then case-insensitive). Remove the letter-based fallback step:
    742 
    743 ```rust
    744 /// Resolve a column specifier to a DataFrame column name.
    745 /// Accepts a header name (exact match first, then case-insensitive).
    746 pub fn resolve_column(spec: &str, df_columns: &[String]) -> Result<String, String> {
    747     // 1. Exact header name match
    748     if df_columns.contains(&spec.to_string()) {
    749         return Ok(spec.to_string());
    750     }
    751     // 2. Case-insensitive header name match
    752     let spec_lower = spec.to_lowercase();
    753     for col in df_columns {
    754         if col.to_lowercase() == spec_lower {
    755             return Ok(col.clone());
    756         }
    757     }
    758     let available = df_columns.join(", ");
    759     Err(format!("column '{}' not found. Available columns: {}", spec, available))
    760 }
    761 ```
    762 
    763 3. **Remove** the letter-based tests: `resolve_by_letter`, `resolve_by_letter_lowercase`, `resolve_header_takes_priority_over_letter`, `resolve_letter_out_of_range_is_err`, `pipeline_cols_by_letter`.
    764 4. Keep everything else: `parse_filter_expr`, `parse_sort_spec`, `build_filter_mask`, `apply_filters`, `filter_pipeline`, `FilterOptions`, `SortSpec`, `FilterExpr`, `FilterOp`, `apply_sort`, and all their tests.
    765 
    766 - [ ] **Step 2: Run tests**
    767 
    768 Run: `cargo test --lib filter:: 2>&1`
    769 Expected: all tests PASS
    770 
    771 - [ ] **Step 3: Commit**
    772 
    773 ```bash
    774 git add src/filter.rs
    775 git commit -m "feat: port filter module from xl-cli-tools without letter-based column resolution"
    776 ```
    777 
    778 ---
    779 
    780 ### Task 6: Diff Module (`diff.rs`)
    781 
    782 **Files:**
    783 - Create: `src/diff.rs`
    784 
    785 - [ ] **Step 1: Port diff.rs verbatim from xl-cli-tools**
    786 
    787 Copy `/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/diff.rs` and update the import path:
    788 - Change `use crate::formatter;` to `use crate::formatter;` (same - no change needed)
    789 
    790 The entire module (SheetSource, DiffRow, CellChange, ModifiedRow, DiffResult, DiffOptions, diff_positional, diff_keyed, diff_sheets, and all tests) transfers verbatim. No changes to logic.
    791 
    792 - [ ] **Step 2: Run tests**
    793 
    794 Run: `cargo test --lib diff:: 2>&1`
    795 Expected: all tests PASS
    796 
    797 - [ ] **Step 3: Commit**
    798 
    799 ```bash
    800 git add src/diff.rs
    801 git commit -m "feat: port diff module from xl-cli-tools"
    802 ```
    803 
    804 ---
    805 
    806 ### Task 7: CSV Reader (`readers/csv.rs`)
    807 
    808 **Files:**
    809 - Create: `src/readers/csv.rs`
    810 
    811 - [ ] **Step 1: Write CSV reader with tests**
    812 
    813 ```rust
    814 // src/readers/csv.rs
    815 
    816 use anyhow::Result;
    817 use polars::prelude::*;
    818 use std::path::Path;
    819 
    820 use crate::reader::ReadOptions;
    821 
    822 pub fn read(path: &Path, opts: &ReadOptions) -> Result<DataFrame> {
    823     let separator = opts.separator.unwrap_or_else(|| {
    824         crate::format::detect_csv_delimiter(path).unwrap_or(b',')
    825     });
    826 
    827     let mut reader = CsvReadOptions::default()
    828         .with_has_header(true)
    829         .with_skip_rows(opts.skip_rows.unwrap_or(0))
    830         .with_parse_options(
    831             CsvParseOptions::default().with_separator(separator),
    832         )
    833         .try_into_reader_with_file_path(Some(path.into()))?;
    834 
    835     let df = reader.finish()?;
    836     Ok(df)
    837 }
    838 
    839 #[cfg(test)]
    840 mod tests {
    841     use super::*;
    842     use std::io::Write;
    843     use tempfile::NamedTempFile;
    844 
    845     fn default_opts() -> ReadOptions {
    846         ReadOptions::default()
    847     }
    848 
    849     #[test]
    850     fn read_basic_csv() {
    851         let mut f = NamedTempFile::with_suffix(".csv").unwrap();
    852         write!(f, "name,value\nAlice,100\nBob,200\n").unwrap();
    853         f.flush().unwrap();
    854 
    855         let df = read(f.path(), &default_opts()).unwrap();
    856         assert_eq!(df.height(), 2);
    857         assert_eq!(df.width(), 2);
    858         let names: Vec<String> = df.get_column_names().iter().map(|s| s.to_string()).collect();
    859         assert_eq!(names, vec!["name", "value"]);
    860     }
    861 
    862     #[test]
    863     fn read_tsv() {
    864         let mut f = NamedTempFile::with_suffix(".tsv").unwrap();
    865         write!(f, "a\tb\n1\t2\n3\t4\n").unwrap();
    866         f.flush().unwrap();
    867 
    868         let opts = ReadOptions { separator: Some(b'\t'), ..Default::default() };
    869         let df = read(f.path(), &opts).unwrap();
    870         assert_eq!(df.height(), 2);
    871         assert_eq!(df.width(), 2);
    872     }
    873 
    874     #[test]
    875     fn read_with_skip() {
    876         let mut f = NamedTempFile::with_suffix(".csv").unwrap();
    877         write!(f, "metadata line\nname,value\nAlice,100\n").unwrap();
    878         f.flush().unwrap();
    879 
    880         let opts = ReadOptions { skip_rows: Some(1), ..Default::default() };
    881         let df = read(f.path(), &opts).unwrap();
    882         assert_eq!(df.height(), 1);
    883         let names: Vec<String> = df.get_column_names().iter().map(|s| s.to_string()).collect();
    884         assert_eq!(names, vec!["name", "value"]);
    885     }
    886 }
    887 ```
    888 
    889 Note: This requires `ReadOptions` from `reader.rs`. Define it first (in the next step, or define a minimal version now).
    890 
    891 - [ ] **Step 2: Define ReadOptions in reader.rs**
    892 
    893 ```rust
    894 // src/reader.rs
    895 
    896 /// Options that control how a file is read.
    897 #[derive(Debug, Clone, Default)]
    898 pub struct ReadOptions {
    899     pub sheet: Option<String>,     // Excel only
    900     pub skip_rows: Option<usize>,
    901     pub separator: Option<u8>,     // CSV override
    902 }
    903 ```
    904 
    905 - [ ] **Step 3: Run tests**
    906 
    907 Run: `cargo test --lib readers::csv:: 2>&1`
    908 Expected: all tests PASS
    909 
    910 - [ ] **Step 4: Commit**
    911 
    912 ```bash
    913 git add src/readers/csv.rs src/reader.rs
    914 git commit -m "feat: add CSV/TSV reader with delimiter auto-detection"
    915 ```
    916 
    917 ---
    918 
    919 ### Task 8: Parquet Reader (`readers/parquet.rs`)
    920 
    921 **Files:**
    922 - Create: `src/readers/parquet.rs`
    923 
    924 - [ ] **Step 1: Write Parquet reader with tests**
    925 
    926 ```rust
    927 // src/readers/parquet.rs
    928 
    929 use anyhow::Result;
    930 use polars::prelude::*;
    931 use std::path::Path;
    932 
    933 use crate::reader::ReadOptions;
    934 
    935 pub fn read(path: &Path, opts: &ReadOptions) -> Result<DataFrame> {
    936     let file = std::fs::File::open(path)?;
    937     let mut df = ParquetReader::new(file).finish()?;
    938 
    939     if let Some(skip) = opts.skip_rows {
    940         if skip > 0 && skip < df.height() {
    941             df = df.slice(skip as i64, df.height() - skip);
    942         }
    943     }
    944 
    945     Ok(df)
    946 }
    947 
    948 #[cfg(test)]
    949 mod tests {
    950     use super::*;
    951     use tempfile::NamedTempFile;
    952 
    953     fn default_opts() -> ReadOptions {
    954         ReadOptions::default()
    955     }
    956 
    957     #[test]
    958     fn read_parquet_roundtrip() {
    959         // Create a parquet file using Polars writer
    960         let s1 = Series::new("name".into(), &["Alice", "Bob"]);
    961         let s2 = Series::new("value".into(), &[100i64, 200]);
    962         let mut df = DataFrame::new(vec![s1.into_column(), s2.into_column()]).unwrap();
    963 
    964         let f = NamedTempFile::with_suffix(".parquet").unwrap();
    965         let file = std::fs::File::create(f.path()).unwrap();
    966         ParquetWriter::new(file).finish(&mut df).unwrap();
    967 
    968         let result = read(f.path(), &default_opts()).unwrap();
    969         assert_eq!(result.height(), 2);
    970         assert_eq!(result.width(), 2);
    971     }
    972 }
    973 ```
    974 
    975 - [ ] **Step 2: Run tests**
    976 
    977 Run: `cargo test --lib readers::parquet:: 2>&1`
    978 Expected: PASS
    979 
    980 - [ ] **Step 3: Commit**
    981 
    982 ```bash
    983 git add src/readers/parquet.rs
    984 git commit -m "feat: add Parquet reader"
    985 ```
    986 
    987 ---
    988 
    989 ### Task 9: Arrow IPC Reader (`readers/arrow.rs`)
    990 
    991 **Files:**
    992 - Create: `src/readers/arrow.rs`
    993 
    994 - [ ] **Step 1: Write Arrow IPC reader with tests**
    995 
    996 ```rust
    997 // src/readers/arrow.rs
    998 
    999 use anyhow::Result;
   1000 use polars::prelude::*;
   1001 use std::path::Path;
   1002 
   1003 use crate::reader::ReadOptions;
   1004 
   1005 pub fn read(path: &Path, opts: &ReadOptions) -> Result<DataFrame> {
   1006     let file = std::fs::File::open(path)?;
   1007     let mut df = IpcReader::new(file).finish()?;
   1008 
   1009     if let Some(skip) = opts.skip_rows {
   1010         if skip > 0 && skip < df.height() {
   1011             df = df.slice(skip as i64, df.height() - skip);
   1012         }
   1013     }
   1014 
   1015     Ok(df)
   1016 }
   1017 
   1018 #[cfg(test)]
   1019 mod tests {
   1020     use super::*;
   1021     use tempfile::NamedTempFile;
   1022 
   1023     fn default_opts() -> ReadOptions {
   1024         ReadOptions::default()
   1025     }
   1026 
   1027     #[test]
   1028     fn read_arrow_roundtrip() {
   1029         let s1 = Series::new("x".into(), &[1i64, 2, 3]);
   1030         let mut df = DataFrame::new(vec![s1.into_column()]).unwrap();
   1031 
   1032         let f = NamedTempFile::with_suffix(".arrow").unwrap();
   1033         let file = std::fs::File::create(f.path()).unwrap();
   1034         IpcWriter::new(file).finish(&mut df).unwrap();
   1035 
   1036         let result = read(f.path(), &default_opts()).unwrap();
   1037         assert_eq!(result.height(), 3);
   1038         assert_eq!(result.width(), 1);
   1039     }
   1040 }
   1041 ```
   1042 
   1043 - [ ] **Step 2: Run tests**
   1044 
   1045 Run: `cargo test --lib readers::arrow:: 2>&1`
   1046 Expected: PASS
   1047 
   1048 - [ ] **Step 3: Commit**
   1049 
   1050 ```bash
   1051 git add src/readers/arrow.rs
   1052 git commit -m "feat: add Arrow IPC reader"
   1053 ```
   1054 
   1055 ---
   1056 
   1057 ### Task 10: JSON/NDJSON Reader (`readers/json.rs`)
   1058 
   1059 **Files:**
   1060 - Create: `src/readers/json.rs`
   1061 
   1062 - [ ] **Step 1: Write JSON reader with tests**
   1063 
   1064 ```rust
   1065 // src/readers/json.rs
   1066 
   1067 use anyhow::Result;
   1068 use polars::prelude::*;
   1069 use std::path::Path;
   1070 
   1071 use crate::format::Format;
   1072 use crate::reader::ReadOptions;
   1073 
   1074 pub fn read(path: &Path, format: Format, opts: &ReadOptions) -> Result<DataFrame> {
   1075     let file = std::fs::File::open(path)?;
   1076 
   1077     let mut df = match format {
   1078         Format::Ndjson => {
   1079             JsonLineReader::new(file).finish()?
   1080         }
   1081         _ => {
   1082             // JSON array format
   1083             JsonReader::new(file).finish()?
   1084         }
   1085     };
   1086 
   1087     if let Some(skip) = opts.skip_rows {
   1088         if skip > 0 && skip < df.height() {
   1089             df = df.slice(skip as i64, df.height() - skip);
   1090         }
   1091     }
   1092 
   1093     Ok(df)
   1094 }
   1095 
   1096 #[cfg(test)]
   1097 mod tests {
   1098     use super::*;
   1099     use std::io::Write;
   1100     use tempfile::NamedTempFile;
   1101 
   1102     fn default_opts() -> ReadOptions {
   1103         ReadOptions::default()
   1104     }
   1105 
   1106     #[test]
   1107     fn read_json_array() {
   1108         let mut f = NamedTempFile::with_suffix(".json").unwrap();
   1109         write!(f, r#"[{{"name":"Alice","value":1}},{{"name":"Bob","value":2}}]"#).unwrap();
   1110         f.flush().unwrap();
   1111 
   1112         let df = read(f.path(), Format::Json, &default_opts()).unwrap();
   1113         assert_eq!(df.height(), 2);
   1114     }
   1115 
   1116     #[test]
   1117     fn read_ndjson() {
   1118         let mut f = NamedTempFile::with_suffix(".ndjson").unwrap();
   1119         write!(f, "{}\n{}\n",
   1120             r#"{{"name":"Alice","value":1}}"#,
   1121             r#"{{"name":"Bob","value":2}}"#,
   1122         ).unwrap();
   1123         f.flush().unwrap();
   1124 
   1125         let df = read(f.path(), Format::Ndjson, &default_opts()).unwrap();
   1126         assert_eq!(df.height(), 2);
   1127     }
   1128 }
   1129 ```
   1130 
   1131 Note: Polars JSON reader API may vary. If `JsonReader` is not directly available, use `JsonFormat::Json` with the appropriate reader. The implementer should check the exact Polars 0.46 API and adapt. Alternative approach if `JsonReader` doesn't exist:
   1132 
   1133 ```rust
   1134 // Alternative using LazyFrame
   1135 let lf = LazyJsonLineReader::new(path).finish()?;
   1136 let df = lf.collect()?;
   1137 ```
   1138 
   1139 - [ ] **Step 2: Run tests**
   1140 
   1141 Run: `cargo test --lib readers::json:: 2>&1`
   1142 Expected: PASS (adapt if Polars API differs)
   1143 
   1144 - [ ] **Step 3: Commit**
   1145 
   1146 ```bash
   1147 git add src/readers/json.rs
   1148 git commit -m "feat: add JSON/NDJSON reader"
   1149 ```
   1150 
   1151 ---
   1152 
   1153 ### Task 11: Excel Reader (`readers/excel.rs`)
   1154 
   1155 **Files:**
   1156 - Create: `src/readers/excel.rs`
   1157 
   1158 - [ ] **Step 1: Port Excel reader from xl-cli-tools**
   1159 
   1160 Copy `/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/reader.rs` to `src/readers/excel.rs` and adapt:
   1161 
   1162 1. Change the public API from `read_sheet(path, sheet_name)` / `read_sheet_with_skip(path, sheet_name, skip)` to a single function matching the reader pattern:
   1163 
   1164 ```rust
   1165 pub fn read(path: &Path, opts: &ReadOptions) -> Result<DataFrame>
   1166 ```
   1167 
   1168 This function:
   1169 - Resolves the sheet name from `opts.sheet` (defaults to the first sheet).
   1170 - Applies `opts.skip_rows`.
   1171 - Reuses `range_to_dataframe_skip`, `infer_column_type`, `build_series` verbatim from xl-cli-tools.
   1172 
   1173 2. Also provide a helper for reading Excel metadata (sheet names, dimensions):
   1174 
   1175 ```rust
   1176 pub fn read_excel_info(path: &Path) -> Result<Vec<SheetInfo>>
   1177 ```
   1178 
   1179 This reuses the calamine-based metadata reading from xl-cli-tools `metadata.rs:read_file_info`, but returns just the sheet list.
   1180 
   1181 3. Port all internal functions (`infer_column_type`, `build_series`, `range_to_dataframe_skip`) and unit tests verbatim.
   1182 
   1183 - [ ] **Step 2: Run tests**
   1184 
   1185 Run: `cargo test --lib readers::excel:: 2>&1`
   1186 Expected: PASS
   1187 
   1188 - [ ] **Step 3: Commit**
   1189 
   1190 ```bash
   1191 git add src/readers/excel.rs
   1192 git commit -m "feat: port Excel reader from xl-cli-tools"
   1193 ```
   1194 
   1195 ---
   1196 
   1197 ### Task 12: Reader Dispatch (`reader.rs`)
   1198 
   1199 **Files:**
   1200 - Modify: `src/reader.rs` (already has ReadOptions from Task 7)
   1201 
   1202 - [ ] **Step 1: Add read_file dispatch function**
   1203 
   1204 ```rust
   1205 // Add to src/reader.rs
   1206 
   1207 use anyhow::{Result, bail};
   1208 use polars::prelude::*;
   1209 use std::path::Path;
   1210 
   1211 use crate::format::Format;
   1212 use crate::metadata::{FileInfo, SheetInfo};
   1213 use crate::readers;
   1214 
   1215 /// Options that control how a file is read.
   1216 #[derive(Debug, Clone, Default)]
   1217 pub struct ReadOptions {
   1218     pub sheet: Option<String>,     // Excel only
   1219     pub skip_rows: Option<usize>,
   1220     pub separator: Option<u8>,     // CSV override
   1221 }
   1222 
   1223 /// Read a file into a DataFrame, dispatching to the appropriate reader.
   1224 pub fn read_file(path: &Path, format: Format, opts: &ReadOptions) -> Result<DataFrame> {
   1225     match format {
   1226         Format::Csv | Format::Tsv => readers::csv::read(path, opts),
   1227         Format::Parquet => readers::parquet::read(path, opts),
   1228         Format::Arrow => readers::arrow::read(path, opts),
   1229         Format::Json | Format::Ndjson => readers::json::read(path, format, opts),
   1230         Format::Excel => readers::excel::read(path, opts),
   1231     }
   1232 }
   1233 
   1234 /// Read file metadata: size, format, and sheet info (for Excel).
   1235 pub fn read_file_info(path: &Path, format: Format) -> Result<FileInfo> {
   1236     let file_size = std::fs::metadata(path)?.len();
   1237 
   1238     let sheets = match format {
   1239         Format::Excel => readers::excel::read_excel_info(path)?,
   1240         _ => vec![], // Non-Excel formats have no sheet concept
   1241     };
   1242 
   1243     Ok(FileInfo {
   1244         file_size,
   1245         format,
   1246         sheets,
   1247     })
   1248 }
   1249 ```
   1250 
   1251 - [ ] **Step 2: Write integration test for dispatch**
   1252 
   1253 ```rust
   1254 #[cfg(test)]
   1255 mod tests {
   1256     use super::*;
   1257     use std::io::Write;
   1258     use tempfile::NamedTempFile;
   1259 
   1260     #[test]
   1261     fn dispatch_csv() {
   1262         let mut f = NamedTempFile::with_suffix(".csv").unwrap();
   1263         write!(f, "a,b\n1,2\n").unwrap();
   1264         f.flush().unwrap();
   1265 
   1266         let df = read_file(f.path(), Format::Csv, &ReadOptions::default()).unwrap();
   1267         assert_eq!(df.height(), 1);
   1268     }
   1269 
   1270     #[test]
   1271     fn dispatch_parquet() {
   1272         use polars::prelude::*;
   1273         let s = Series::new("x".into(), &[1i64, 2]);
   1274         let mut df = DataFrame::new(vec![s.into_column()]).unwrap();
   1275 
   1276         let f = NamedTempFile::with_suffix(".parquet").unwrap();
   1277         let file = std::fs::File::create(f.path()).unwrap();
   1278         ParquetWriter::new(file).finish(&mut df).unwrap();
   1279 
   1280         let result = read_file(f.path(), Format::Parquet, &ReadOptions::default()).unwrap();
   1281         assert_eq!(result.height(), 2);
   1282     }
   1283 }
   1284 ```
   1285 
   1286 - [ ] **Step 3: Run tests**
   1287 
   1288 Run: `cargo test --lib reader:: 2>&1`
   1289 Expected: PASS
   1290 
   1291 - [ ] **Step 4: Commit**
   1292 
   1293 ```bash
   1294 git add src/reader.rs
   1295 git commit -m "feat: add reader dispatch with read_file and read_file_info"
   1296 ```
   1297 
   1298 ---
   1299 
   1300 ### Task 13: dtcat Binary (`src/bin/dtcat.rs`)
   1301 
   1302 **Files:**
   1303 - Create: `src/bin/dtcat.rs`
   1304 
   1305 - [ ] **Step 1: Implement dtcat**
   1306 
   1307 Adapt from xl-cli-tools `xlcat.rs` (`/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/bin/xlcat.rs`). Key changes:
   1308 
   1309 1. Replace `xlcat::` imports with `dtcore::`.
   1310 2. Add `--format` flag for format override.
   1311 3. Replace Excel-specific file validation with format detection.
   1312 4. Add `--info` flag (show file metadata).
   1313 5. For non-Excel files, skip sheet resolution (no sheets concept). For Excel files with multiple sheets, keep the same listing behavior.
   1314 6. Use `reader::read_file` and `reader::read_file_info` instead of `metadata::read_file_info` + `reader::read_sheet`.
   1315 
   1316 ```rust
   1317 // src/bin/dtcat.rs
   1318 
   1319 use dtcore::format;
   1320 use dtcore::formatter;
   1321 use dtcore::metadata::{self, SheetInfo};
   1322 use dtcore::reader::{self, ReadOptions};
   1323 
   1324 use anyhow::Result;
   1325 use clap::Parser;
   1326 use polars::prelude::*;
   1327 use std::path::PathBuf;
   1328 use std::process;
   1329 
   1330 #[derive(Parser, Debug)]
   1331 #[command(name = "dtcat", about = "View tabular data files in the terminal")]
   1332 struct Cli {
   1333     /// Path to data file
   1334     file: PathBuf,
   1335 
   1336     /// Override format detection (csv, tsv, parquet, arrow, json, ndjson, excel)
   1337     #[arg(long)]
   1338     format: Option<String>,
   1339 
   1340     /// Select sheet by name or 0-based index (Excel only)
   1341     #[arg(long)]
   1342     sheet: Option<String>,
   1343 
   1344     /// Skip first N rows
   1345     #[arg(long)]
   1346     skip: Option<usize>,
   1347 
   1348     /// Show column names and types only
   1349     #[arg(long)]
   1350     schema: bool,
   1351 
   1352     /// Show summary statistics
   1353     #[arg(long)]
   1354     describe: bool,
   1355 
   1356     /// Show first N rows (default: 50)
   1357     #[arg(long)]
   1358     head: Option<usize>,
   1359 
   1360     /// Show last N rows
   1361     #[arg(long)]
   1362     tail: Option<usize>,
   1363 
   1364     /// Output as CSV instead of markdown table
   1365     #[arg(long)]
   1366     csv: bool,
   1367 
   1368     /// Show file metadata (size, format, shape, sheets)
   1369     #[arg(long)]
   1370     info: bool,
   1371 }
   1372 
   1373 #[derive(Debug)]
   1374 struct ArgError(String);
   1375 
   1376 impl std::fmt::Display for ArgError {
   1377     fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
   1378         write!(f, "{}", self.0)
   1379     }
   1380 }
   1381 
   1382 impl std::error::Error for ArgError {}
   1383 
   1384 fn run(cli: &Cli) -> Result<()> {
   1385     // Validate flag combinations
   1386     if cli.schema && cli.describe {
   1387         return Err(ArgError("--schema and --describe are mutually exclusive".into()).into());
   1388     }
   1389 
   1390     // Detect format
   1391     let fmt = format::detect_format(&cli.file, cli.format.as_deref())?;
   1392 
   1393     // Read file info
   1394     let file_info = reader::read_file_info(&cli.file, fmt)?;
   1395     let file_name = cli.file
   1396         .file_name()
   1397         .map(|s| s.to_string_lossy().to_string())
   1398         .unwrap_or_else(|| cli.file.display().to_string());
   1399 
   1400     // --info mode
   1401     if cli.info {
   1402         let mut out = formatter::format_header(&file_name, &file_info);
   1403         out.push_str(&format!("Format: {}\n", metadata::format_name(fmt)));
   1404         if !file_info.sheets.is_empty() {
   1405             for sheet in &file_info.sheets {
   1406                 out.push_str(&format!("  {}: {} rows x {} cols\n", sheet.name, sheet.rows, sheet.cols));
   1407             }
   1408         }
   1409         print!("{out}");
   1410         return Ok(());
   1411     }
   1412 
   1413     // Build read options
   1414     let read_opts = ReadOptions {
   1415         sheet: cli.sheet.clone(),
   1416         skip_rows: cli.skip,
   1417         separator: None,
   1418     };
   1419 
   1420     // For Excel with multiple sheets and no --sheet flag: list sheets
   1421     if fmt == format::Format::Excel && file_info.sheets.len() > 1 && cli.sheet.is_none() {
   1422         let has_row_flags = cli.head.is_some() || cli.tail.is_some() || cli.csv;
   1423         if has_row_flags {
   1424             return Err(ArgError(
   1425                 "Multiple sheets found. Use --sheet <name> to select one.".into(),
   1426             ).into());
   1427         }
   1428 
   1429         // List all sheets with schemas
   1430         let mut out = formatter::format_header(&file_name, &file_info);
   1431         out.push('\n');
   1432         for sheet in &file_info.sheets {
   1433             let opts = ReadOptions { sheet: Some(sheet.name.clone()), ..read_opts.clone() };
   1434             let df = reader::read_file(&cli.file, fmt, &opts)?;
   1435             if sheet.rows == 0 && sheet.cols == 0 {
   1436                 out.push_str(&formatter::format_empty_sheet(sheet));
   1437             } else {
   1438                 out.push_str(&formatter::format_schema(sheet, &df));
   1439             }
   1440             out.push('\n');
   1441         }
   1442         out.push_str("Use --sheet <name> to view a specific sheet.\n");
   1443         print!("{out}");
   1444         return Ok(());
   1445     }
   1446 
   1447     // Read the data
   1448     let df = reader::read_file(&cli.file, fmt, &read_opts)?;
   1449 
   1450     // Build a SheetInfo for display
   1451     let sheet_info = if let Some(si) = file_info.sheets.first() {
   1452         si.clone()
   1453     } else {
   1454         SheetInfo {
   1455             name: file_name.clone(),
   1456             rows: df.height() + 1, // +1 for header
   1457             cols: df.width(),
   1458         }
   1459     };
   1460 
   1461     // Render output
   1462     render_output(cli, &file_name, &file_info, &sheet_info, &df)
   1463 }
   1464 
   1465 fn render_output(
   1466     cli: &Cli,
   1467     file_name: &str,
   1468     file_info: &metadata::FileInfo,
   1469     sheet_info: &SheetInfo,
   1470     df: &DataFrame,
   1471 ) -> Result<()> {
   1472     if cli.csv {
   1473         let selected = select_rows(cli, df);
   1474         print!("{}", formatter::format_csv(&selected));
   1475         return Ok(());
   1476     }
   1477 
   1478     let mut out = formatter::format_header(file_name, file_info);
   1479     out.push('\n');
   1480 
   1481     if df.height() == 0 {
   1482         out.push_str(&formatter::format_schema(sheet_info, df));
   1483         out.push_str("\n(no data rows)\n");
   1484         print!("{out}");
   1485         return Ok(());
   1486     }
   1487 
   1488     if cli.schema {
   1489         out.push_str(&formatter::format_schema(sheet_info, df));
   1490     } else if cli.describe {
   1491         out.push_str(&formatter::format_schema(sheet_info, df));
   1492         out.push_str(&formatter::format_describe(df));
   1493     } else {
   1494         out.push_str(&formatter::format_schema(sheet_info, df));
   1495         out.push('\n');
   1496         out.push_str(&format_data_selection(cli, df));
   1497     }
   1498 
   1499     print!("{out}");
   1500     Ok(())
   1501 }
   1502 
   1503 fn format_data_selection(cli: &Cli, df: &DataFrame) -> String {
   1504     let total = df.height();
   1505 
   1506     if cli.head.is_some() || cli.tail.is_some() {
   1507         let head_n = cli.head.unwrap_or(0);
   1508         let tail_n = cli.tail.unwrap_or(0);
   1509         if head_n + tail_n >= total || (head_n == 0 && tail_n == 0) {
   1510             return formatter::format_data_table(df);
   1511         }
   1512         if cli.tail.is_none() {
   1513             return formatter::format_data_table(&df.head(Some(head_n)));
   1514         }
   1515         if cli.head.is_none() {
   1516             return formatter::format_data_table(&df.tail(Some(tail_n)));
   1517         }
   1518         return formatter::format_head_tail(df, head_n, tail_n);
   1519     }
   1520 
   1521     // Default: <=50 rows show all, >50 show head 25 + tail 25
   1522     if total <= 50 {
   1523         formatter::format_data_table(df)
   1524     } else {
   1525         formatter::format_head_tail(df, 25, 25)
   1526     }
   1527 }
   1528 
   1529 fn select_rows(cli: &Cli, df: &DataFrame) -> DataFrame {
   1530     let total = df.height();
   1531 
   1532     if cli.head.is_some() || cli.tail.is_some() {
   1533         let head_n = cli.head.unwrap_or(0);
   1534         let tail_n = cli.tail.unwrap_or(0);
   1535         if head_n + tail_n >= total || (head_n == 0 && tail_n == 0) {
   1536             return df.clone();
   1537         }
   1538         if cli.tail.is_none() {
   1539             return df.head(Some(head_n));
   1540         }
   1541         if cli.head.is_none() {
   1542             return df.tail(Some(tail_n));
   1543         }
   1544         let head_df = df.head(Some(head_n));
   1545         let tail_df = df.tail(Some(tail_n));
   1546         return head_df.vstack(&tail_df).unwrap_or_else(|_| df.clone());
   1547     }
   1548 
   1549     if total <= 50 { df.clone() } else {
   1550         let h = df.head(Some(25));
   1551         let t = df.tail(Some(25));
   1552         h.vstack(&t).unwrap_or_else(|_| df.clone())
   1553     }
   1554 }
   1555 
   1556 fn main() {
   1557     let cli = Cli::parse();
   1558     if let Err(err) = run(&cli) {
   1559         if err.downcast_ref::<ArgError>().is_some() {
   1560             eprintln!("dtcat: {err}");
   1561             process::exit(2);
   1562         }
   1563         eprintln!("dtcat: {err}");
   1564         process::exit(1);
   1565     }
   1566 }
   1567 ```
   1568 
   1569 - [ ] **Step 2: Verify it compiles**
   1570 
   1571 Run: `cargo build --bin dtcat 2>&1`
   1572 Expected: compiles successfully
   1573 
   1574 - [ ] **Step 3: Manual smoke test**
   1575 
   1576 Create a quick test CSV and run dtcat on it:
   1577 ```bash
   1578 echo "name,value\nAlice,100\nBob,200" > /tmp/test.csv
   1579 cargo run --bin dtcat -- /tmp/test.csv
   1580 cargo run --bin dtcat -- /tmp/test.csv --schema
   1581 cargo run --bin dtcat -- /tmp/test.csv --csv
   1582 ```
   1583 
   1584 - [ ] **Step 4: Commit**
   1585 
   1586 ```bash
   1587 git add src/bin/dtcat.rs
   1588 git commit -m "feat: add dtcat binary for viewing tabular data files"
   1589 ```
   1590 
   1591 ---
   1592 
   1593 ### Task 14: dtfilter Binary (`src/bin/dtfilter.rs`)
   1594 
   1595 **Files:**
   1596 - Create: `src/bin/dtfilter.rs`
   1597 
   1598 - [ ] **Step 1: Implement dtfilter**
   1599 
   1600 Adapt from xl-cli-tools `xlfilter.rs` (`/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/bin/xlfilter.rs`). Key changes:
   1601 
   1602 1. Replace `xlcat::` imports with `dtcore::`.
   1603 2. Add `--format` flag.
   1604 3. Replace Excel-specific file reading with format detection + `reader::read_file`.
   1605 4. Remove Excel-specific sheet resolution for non-Excel formats.
   1606 5. Change `--cols` description to "Select columns by name" (no letter-based).
   1607 
   1608 ```rust
   1609 // src/bin/dtfilter.rs
   1610 
   1611 use std::path::PathBuf;
   1612 use std::process;
   1613 
   1614 use anyhow::Result;
   1615 use clap::Parser;
   1616 
   1617 use dtcore::filter::{parse_filter_expr, parse_sort_spec, filter_pipeline, FilterOptions};
   1618 use dtcore::format;
   1619 use dtcore::formatter;
   1620 use dtcore::reader::{self, ReadOptions};
   1621 
   1622 #[derive(Parser)]
   1623 #[command(
   1624     name = "dtfilter",
   1625     about = "Filter and query tabular data files",
   1626     version
   1627 )]
   1628 struct Args {
   1629     /// Path to data file
   1630     file: PathBuf,
   1631 
   1632     /// Override format detection
   1633     #[arg(long)]
   1634     format: Option<String>,
   1635 
   1636     /// Select sheet (Excel only)
   1637     #[arg(long)]
   1638     sheet: Option<String>,
   1639 
   1640     /// Skip first N rows
   1641     #[arg(long)]
   1642     skip: Option<usize>,
   1643 
   1644     /// Select columns by name (comma-separated)
   1645     #[arg(long)]
   1646     columns: Option<String>,
   1647 
   1648     /// Filter expressions (e.g., Amount>1000, Name~john)
   1649     #[arg(long = "filter")]
   1650     filters: Vec<String>,
   1651 
   1652     /// Sort specification (e.g., Amount:desc)
   1653     #[arg(long)]
   1654     sort: Option<String>,
   1655 
   1656     /// Max rows in output (applied after filter)
   1657     #[arg(long)]
   1658     limit: Option<usize>,
   1659 
   1660     /// First N rows (applied before filter)
   1661     #[arg(long)]
   1662     head: Option<usize>,
   1663 
   1664     /// Last N rows (applied before filter)
   1665     #[arg(long)]
   1666     tail: Option<usize>,
   1667 
   1668     /// Output as CSV
   1669     #[arg(long)]
   1670     csv: bool,
   1671 }
   1672 
   1673 #[derive(Debug)]
   1674 struct ArgError(String);
   1675 impl std::fmt::Display for ArgError {
   1676     fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
   1677         write!(f, "{}", self.0)
   1678     }
   1679 }
   1680 impl std::error::Error for ArgError {}
   1681 
   1682 fn run(args: Args) -> Result<()> {
   1683     if !args.file.exists() {
   1684         return Err(ArgError(format!("file not found: {}", args.file.display())).into());
   1685     }
   1686     if args.head.is_some() && args.tail.is_some() {
   1687         return Err(ArgError("--head and --tail are mutually exclusive".into()).into());
   1688     }
   1689 
   1690     let fmt = format::detect_format(&args.file, args.format.as_deref())?;
   1691 
   1692     let read_opts = ReadOptions {
   1693         sheet: args.sheet,
   1694         skip_rows: args.skip,
   1695         separator: None,
   1696     };
   1697 
   1698     let df = reader::read_file(&args.file, fmt, &read_opts)?;
   1699 
   1700     if df.height() == 0 {
   1701         eprintln!("0 rows");
   1702         println!("(no data rows)");
   1703         return Ok(());
   1704     }
   1705 
   1706     // Parse filter expressions
   1707     let filters: Vec<_> = args.filters
   1708         .iter()
   1709         .map(|s| parse_filter_expr(s))
   1710         .collect::<Result<Vec<_>, _>>()
   1711         .map_err(|e| anyhow::anyhow!(ArgError(e)))?;
   1712 
   1713     let sort = args.sort
   1714         .as_deref()
   1715         .map(parse_sort_spec)
   1716         .transpose()
   1717         .map_err(|e| anyhow::anyhow!(ArgError(e)))?;
   1718 
   1719     let cols = args.columns.map(|s| {
   1720         s.split(',').map(|c| c.trim().to_string()).collect::<Vec<_>>()
   1721     });
   1722 
   1723     let opts = FilterOptions {
   1724         filters,
   1725         cols,
   1726         sort,
   1727         limit: args.limit,
   1728         head: args.head,
   1729         tail: args.tail,
   1730     };
   1731 
   1732     let result = filter_pipeline(df, &opts)?;
   1733 
   1734     eprintln!("{} rows", result.height());
   1735 
   1736     if result.height() == 0 {
   1737         println!("{}", formatter::format_data_table(&result));
   1738     } else if args.csv {
   1739         print!("{}", formatter::format_csv(&result));
   1740     } else {
   1741         println!("{}", formatter::format_data_table(&result));
   1742     }
   1743 
   1744     Ok(())
   1745 }
   1746 
   1747 fn main() {
   1748     let args = Args::parse();
   1749     if let Err(err) = run(args) {
   1750         if err.downcast_ref::<ArgError>().is_some() {
   1751             eprintln!("dtfilter: {err}");
   1752             process::exit(2);
   1753         }
   1754         eprintln!("dtfilter: {err}");
   1755         process::exit(1);
   1756     }
   1757 }
   1758 ```
   1759 
   1760 - [ ] **Step 2: Verify it compiles**
   1761 
   1762 Run: `cargo build --bin dtfilter 2>&1`
   1763 Expected: compiles
   1764 
   1765 - [ ] **Step 3: Commit**
   1766 
   1767 ```bash
   1768 git add src/bin/dtfilter.rs
   1769 git commit -m "feat: add dtfilter binary for filtering tabular data files"
   1770 ```
   1771 
   1772 ---
   1773 
   1774 ### Task 15: dtdiff Binary (`src/bin/dtdiff.rs`)
   1775 
   1776 **Files:**
   1777 - Create: `src/bin/dtdiff.rs`
   1778 
   1779 - [ ] **Step 1: Implement dtdiff**
   1780 
   1781 Adapt from xl-cli-tools `xldiff.rs` (`/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/bin/xldiff.rs`). Key changes:
   1782 
   1783 1. Replace `xlcat::` imports with `dtcore::`.
   1784 2. Add `--format` flag.
   1785 3. **Same-format enforcement**: detect format of both files and error if they differ (Csv/Tsv are same family and allowed).
   1786 4. Replace Excel-specific reading with format detection + `reader::read_file`.
   1787 5. Remove letter-based column resolution in key/cols parsing (use name-only `resolve_column`).
   1788 6. Port all output formatters (format_text, format_markdown, format_json, format_csv) and tests verbatim.
   1789 
   1790 Exit codes: 0 = no differences, 1 = differences found, 2 = error.
   1791 
   1792 ```rust
   1793 // src/bin/dtdiff.rs
   1794 // Adapted from xl-cli-tools xldiff.rs
   1795 
   1796 use std::io::IsTerminal;
   1797 use std::path::PathBuf;
   1798 use std::process;
   1799 
   1800 use anyhow::{Result, bail};
   1801 use clap::Parser;
   1802 use serde_json::{Map, Value, json};
   1803 
   1804 use dtcore::diff::{DiffOptions, DiffResult, SheetSource};
   1805 use dtcore::format;
   1806 use dtcore::formatter;
   1807 use dtcore::reader::{self, ReadOptions};
   1808 
   1809 #[derive(Parser)]
   1810 #[command(
   1811     name = "dtdiff",
   1812     about = "Compare two tabular data files and show differences",
   1813     version
   1814 )]
   1815 struct Args {
   1816     /// First file
   1817     file_a: PathBuf,
   1818 
   1819     /// Second file
   1820     file_b: PathBuf,
   1821 
   1822     /// Override format detection (both files must be this format)
   1823     #[arg(long)]
   1824     format: Option<String>,
   1825 
   1826     /// Select sheet (Excel only)
   1827     #[arg(long)]
   1828     sheet: Option<String>,
   1829 
   1830     /// Key column(s) for matched comparison (comma-separated names)
   1831     #[arg(long)]
   1832     key: Option<String>,
   1833 
   1834     /// Numeric tolerance for float comparisons (default: 1e-10)
   1835     #[arg(long, default_value = "1e-10")]
   1836     tolerance: f64,
   1837 
   1838     /// Output as JSON
   1839     #[arg(long)]
   1840     json: bool,
   1841 
   1842     /// Output as CSV
   1843     #[arg(long)]
   1844     csv: bool,
   1845 
   1846     /// Disable colored output
   1847     #[arg(long)]
   1848     no_color: bool,
   1849 }
   1850 
   1851 fn run(args: Args) -> Result<()> {
   1852     if !args.file_a.exists() {
   1853         bail!("file not found: {}", args.file_a.display());
   1854     }
   1855     if !args.file_b.exists() {
   1856         bail!("file not found: {}", args.file_b.display());
   1857     }
   1858 
   1859     // Detect formats
   1860     let fmt_a = format::detect_format(&args.file_a, args.format.as_deref())?;
   1861     let fmt_b = format::detect_format(&args.file_b, args.format.as_deref())?;
   1862 
   1863     // Same-format enforcement (Csv/Tsv are same family)
   1864     if !fmt_a.same_family(&fmt_b) {
   1865         bail!(
   1866             "format mismatch: {} is {:?} but {} is {:?}. Both files must be the same format.",
   1867             args.file_a.display(), fmt_a,
   1868             args.file_b.display(), fmt_b,
   1869         );
   1870     }
   1871 
   1872     let read_opts = ReadOptions {
   1873         sheet: args.sheet.clone(),
   1874         skip_rows: None,
   1875         separator: None,
   1876     };
   1877 
   1878     let df_a = reader::read_file(&args.file_a, fmt_a, &read_opts)?;
   1879     let df_b = reader::read_file(&args.file_b, fmt_b, &read_opts)?;
   1880 
   1881     // Resolve key columns
   1882     let key_columns: Vec<String> = if let Some(ref key_str) = args.key {
   1883         key_str.split(',').map(|s| s.trim().to_string()).collect()
   1884     } else {
   1885         vec![]
   1886     };
   1887 
   1888     let file_name_a = args.file_a.file_name()
   1889         .map(|s| s.to_string_lossy().to_string())
   1890         .unwrap_or_else(|| args.file_a.display().to_string());
   1891     let file_name_b = args.file_b.file_name()
   1892         .map(|s| s.to_string_lossy().to_string())
   1893         .unwrap_or_else(|| args.file_b.display().to_string());
   1894 
   1895     let source_a = SheetSource {
   1896         file_name: file_name_a,
   1897         sheet_name: args.sheet.clone().unwrap_or_else(|| "data".into()),
   1898     };
   1899     let source_b = SheetSource {
   1900         file_name: file_name_b,
   1901         sheet_name: args.sheet.unwrap_or_else(|| "data".into()),
   1902     };
   1903 
   1904     let opts = DiffOptions {
   1905         key_columns,
   1906         tolerance: Some(args.tolerance),
   1907     };
   1908 
   1909     let result = dtcore::diff::diff_sheets(&df_a, &df_b, &opts, source_a, source_b)?;
   1910 
   1911     let use_color = !args.no_color && std::io::stdout().is_terminal();
   1912 
   1913     // Format output
   1914     let output = if args.json {
   1915         format_json(&result)
   1916     } else if args.csv {
   1917         format_csv_output(&result)
   1918     } else {
   1919         format_text(&result, use_color)
   1920     };
   1921 
   1922     print!("{}", output);
   1923 
   1924     if result.has_differences() {
   1925         process::exit(1);
   1926     }
   1927 
   1928     Ok(())
   1929 }
   1930 
   1931 // Port format_text, format_json, format_csv_output (renamed from format_csv to avoid
   1932 // collision with the flag) verbatim from xl-cli-tools xldiff.rs.
   1933 // Include format_row_inline, csv_quote, csv_row helpers.
   1934 
   1935 // [Full implementations copied from xl-cli-tools xldiff.rs - see source at
   1936 //  /Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/bin/xldiff.rs lines 141-455]
   1937 // The only rename: format_csv -> format_csv_output to avoid name collision.
   1938 
   1939 fn main() {
   1940     let args = Args::parse();
   1941     if let Err(err) = run(args) {
   1942         eprintln!("dtdiff: {err}");
   1943         process::exit(2);
   1944     }
   1945 }
   1946 ```
   1947 
   1948 The output formatter functions (`format_text`, `format_json`, `format_csv_output`, `format_row_inline`, `csv_quote`, `csv_row`) and their tests transfer verbatim from xldiff.rs lines 141-827. Copy them into dtdiff.rs.
   1949 
   1950 - [ ] **Step 2: Verify it compiles**
   1951 
   1952 Run: `cargo build --bin dtdiff 2>&1`
   1953 Expected: compiles
   1954 
   1955 - [ ] **Step 3: Commit**
   1956 
   1957 ```bash
   1958 git add src/bin/dtdiff.rs
   1959 git commit -m "feat: add dtdiff binary for comparing tabular data files"
   1960 ```
   1961 
   1962 ---
   1963 
   1964 ### Task 16: Demo Fixtures and Integration Tests
   1965 
   1966 **Files:**
   1967 - Create: `demo/` fixture files
   1968 - Create: `tests/integration/dtcat.rs`
   1969 - Create: `tests/integration/dtfilter.rs`
   1970 - Create: `tests/integration/dtdiff.rs`
   1971 
   1972 - [ ] **Step 1: Create demo fixture files**
   1973 
   1974 Create small test files in `demo/`:
   1975 
   1976 ```bash
   1977 # demo/sample.csv
   1978 echo 'name,value,category
   1979 Alice,100,A
   1980 Bob,200,B
   1981 Charlie,300,A
   1982 Diana,400,B
   1983 Eve,500,A' > demo/sample.csv
   1984 
   1985 # demo/sample.tsv
   1986 printf 'name\tvalue\ncategory\nAlice\t100\tA\nBob\t200\tB\n' > demo/sample.tsv
   1987 ```
   1988 
   1989 Also create Parquet and Arrow fixtures programmatically in a test helper, or via a small Rust script.
   1990 
   1991 - [ ] **Step 2: Write dtcat integration tests**
   1992 
   1993 ```rust
   1994 // tests/integration/dtcat.rs
   1995 
   1996 use assert_cmd::Command;
   1997 use predicates::prelude::*;
   1998 use std::io::Write;
   1999 use tempfile::NamedTempFile;
   2000 
   2001 fn dtcat() -> Command {
   2002     Command::cargo_bin("dtcat").unwrap()
   2003 }
   2004 
   2005 fn csv_file(content: &str) -> NamedTempFile {
   2006     let mut f = NamedTempFile::with_suffix(".csv").unwrap();
   2007     write!(f, "{}", content).unwrap();
   2008     f.flush().unwrap();
   2009     f
   2010 }
   2011 
   2012 #[test]
   2013 fn shows_csv_data() {
   2014     let f = csv_file("name,value\nAlice,100\nBob,200\n");
   2015     dtcat()
   2016         .arg(f.path())
   2017         .assert()
   2018         .success()
   2019         .stdout(predicate::str::contains("Alice"))
   2020         .stdout(predicate::str::contains("Bob"));
   2021 }
   2022 
   2023 #[test]
   2024 fn schema_flag() {
   2025     let f = csv_file("name,value\nAlice,100\n");
   2026     dtcat()
   2027         .arg(f.path())
   2028         .arg("--schema")
   2029         .assert()
   2030         .success()
   2031         .stdout(predicate::str::contains("Column"))
   2032         .stdout(predicate::str::contains("Type"));
   2033 }
   2034 
   2035 #[test]
   2036 fn csv_output_flag() {
   2037     let f = csv_file("name,value\nAlice,100\n");
   2038     dtcat()
   2039         .arg(f.path())
   2040         .arg("--csv")
   2041         .assert()
   2042         .success()
   2043         .stdout(predicate::str::contains("name,value"));
   2044 }
   2045 
   2046 #[test]
   2047 fn head_flag() {
   2048     let f = csv_file("x\n1\n2\n3\n4\n5\n");
   2049     dtcat()
   2050         .arg(f.path())
   2051         .arg("--head")
   2052         .arg("2")
   2053         .assert()
   2054         .success()
   2055         .stdout(predicate::str::contains("1"))
   2056         .stdout(predicate::str::contains("2"));
   2057 }
   2058 
   2059 #[test]
   2060 fn nonexistent_file_exits_1() {
   2061     dtcat()
   2062         .arg("/tmp/does_not_exist.csv")
   2063         .assert()
   2064         .failure();
   2065 }
   2066 
   2067 #[test]
   2068 fn format_override() {
   2069     // A .txt file read as CSV
   2070     let mut f = NamedTempFile::with_suffix(".txt").unwrap();
   2071     write!(f, "a,b\n1,2\n").unwrap();
   2072     f.flush().unwrap();
   2073 
   2074     dtcat()
   2075         .arg(f.path())
   2076         .arg("--format")
   2077         .arg("csv")
   2078         .assert()
   2079         .success()
   2080         .stdout(predicate::str::contains("1"));
   2081 }
   2082 ```
   2083 
   2084 - [ ] **Step 3: Write dtfilter integration tests**
   2085 
   2086 ```rust
   2087 // tests/integration/dtfilter.rs
   2088 
   2089 use assert_cmd::Command;
   2090 use predicates::prelude::*;
   2091 use std::io::Write;
   2092 use tempfile::NamedTempFile;
   2093 
   2094 fn dtfilter() -> Command {
   2095     Command::cargo_bin("dtfilter").unwrap()
   2096 }
   2097 
   2098 fn csv_file(content: &str) -> NamedTempFile {
   2099     let mut f = NamedTempFile::with_suffix(".csv").unwrap();
   2100     write!(f, "{}", content).unwrap();
   2101     f.flush().unwrap();
   2102     f
   2103 }
   2104 
   2105 #[test]
   2106 fn filter_eq() {
   2107     let f = csv_file("name,value\nAlice,100\nBob,200\n");
   2108     dtfilter()
   2109         .arg(f.path())
   2110         .arg("--filter")
   2111         .arg("name=Alice")
   2112         .assert()
   2113         .success()
   2114         .stdout(predicate::str::contains("Alice"))
   2115         .stdout(predicate::str::contains("Bob").not());
   2116 }
   2117 
   2118 #[test]
   2119 fn filter_gt() {
   2120     let f = csv_file("name,value\nAlice,100\nBob,200\nCharlie,300\n");
   2121     dtfilter()
   2122         .arg(f.path())
   2123         .arg("--filter")
   2124         .arg("value>150")
   2125         .assert()
   2126         .success()
   2127         .stdout(predicate::str::contains("Bob"))
   2128         .stdout(predicate::str::contains("Charlie"));
   2129 }
   2130 
   2131 #[test]
   2132 fn sort_desc() {
   2133     let f = csv_file("name,value\nAlice,100\nBob,200\n");
   2134     dtfilter()
   2135         .arg(f.path())
   2136         .arg("--sort")
   2137         .arg("value:desc")
   2138         .assert()
   2139         .success();
   2140 }
   2141 
   2142 #[test]
   2143 fn columns_select() {
   2144     let f = csv_file("name,value,extra\nAlice,100,x\n");
   2145     dtfilter()
   2146         .arg(f.path())
   2147         .arg("--columns")
   2148         .arg("name,value")
   2149         .assert()
   2150         .success()
   2151         .stdout(predicate::str::contains("name"))
   2152         .stdout(predicate::str::contains("extra").not());
   2153 }
   2154 
   2155 #[test]
   2156 fn csv_output() {
   2157     let f = csv_file("name,value\nAlice,100\n");
   2158     dtfilter()
   2159         .arg(f.path())
   2160         .arg("--csv")
   2161         .assert()
   2162         .success()
   2163         .stdout(predicate::str::contains("name,value"));
   2164 }
   2165 ```
   2166 
   2167 - [ ] **Step 4: Write dtdiff integration tests**
   2168 
   2169 ```rust
   2170 // tests/integration/dtdiff.rs
   2171 
   2172 use assert_cmd::Command;
   2173 use predicates::prelude::*;
   2174 use std::io::Write;
   2175 use tempfile::NamedTempFile;
   2176 
   2177 fn dtdiff() -> Command {
   2178     Command::cargo_bin("dtdiff").unwrap()
   2179 }
   2180 
   2181 fn csv_file(content: &str) -> NamedTempFile {
   2182     let mut f = NamedTempFile::with_suffix(".csv").unwrap();
   2183     write!(f, "{}", content).unwrap();
   2184     f.flush().unwrap();
   2185     f
   2186 }
   2187 
   2188 #[test]
   2189 fn no_diff_exits_0() {
   2190     let a = csv_file("name,value\nAlice,100\n");
   2191     let b = csv_file("name,value\nAlice,100\n");
   2192     dtdiff()
   2193         .arg(a.path())
   2194         .arg(b.path())
   2195         .assert()
   2196         .success()
   2197         .stdout(predicate::str::contains("No differences"));
   2198 }
   2199 
   2200 #[test]
   2201 fn diff_exits_1() {
   2202     let a = csv_file("name,value\nAlice,100\n");
   2203     let b = csv_file("name,value\nBob,200\n");
   2204     dtdiff()
   2205         .arg(a.path())
   2206         .arg(b.path())
   2207         .assert()
   2208         .code(1);
   2209 }
   2210 
   2211 #[test]
   2212 fn keyed_diff() {
   2213     let a = csv_file("id,name\n1,Alice\n2,Bob\n");
   2214     let b = csv_file("id,name\n1,Alice\n2,Robert\n");
   2215     dtdiff()
   2216         .arg(a.path())
   2217         .arg(b.path())
   2218         .arg("--key")
   2219         .arg("id")
   2220         .assert()
   2221         .code(1)
   2222         .stdout(predicate::str::contains("Bob").or(predicate::str::contains("Robert")));
   2223 }
   2224 
   2225 #[test]
   2226 fn json_output() {
   2227     let a = csv_file("id,val\n1,a\n");
   2228     let b = csv_file("id,val\n1,b\n");
   2229     dtdiff()
   2230         .arg(a.path())
   2231         .arg(b.path())
   2232         .arg("--key")
   2233         .arg("id")
   2234         .arg("--json")
   2235         .assert()
   2236         .code(1)
   2237         .stdout(predicate::str::contains("\"modified\""));
   2238 }
   2239 
   2240 #[test]
   2241 fn format_mismatch_exits_2() {
   2242     let csv = csv_file("a,b\n1,2\n");
   2243     // Create a file with .parquet extension but CSV content - format detection
   2244     // will see it as parquet by extension, creating a mismatch
   2245     let mut pq = NamedTempFile::with_suffix(".parquet").unwrap();
   2246     write!(pq, "a,b\n1,2\n").unwrap();
   2247     pq.flush().unwrap();
   2248     // This should fail because formats differ (or parquet reader fails on CSV content)
   2249     dtdiff()
   2250         .arg(csv.path())
   2251         .arg(pq.path())
   2252         .assert()
   2253         .failure();
   2254 }
   2255 ```
   2256 
   2257 - [ ] **Step 5: Run all integration tests**
   2258 
   2259 Run: `cargo test --test '*' 2>&1`
   2260 Expected: all integration tests PASS
   2261 
   2262 - [ ] **Step 6: Commit**
   2263 
   2264 ```bash
   2265 git add demo/ tests/
   2266 git commit -m "feat: add demo fixtures and integration tests for all binaries"
   2267 ```
   2268 
   2269 ---
   2270 
   2271 ### Task 17: Final Verification
   2272 
   2273 - [ ] **Step 1: Run full test suite**
   2274 
   2275 Run: `cargo test 2>&1`
   2276 Expected: all unit tests and integration tests PASS
   2277 
   2278 - [ ] **Step 2: Run clippy**
   2279 
   2280 Run: `cargo clippy 2>&1`
   2281 Expected: no errors (warnings acceptable)
   2282 
   2283 - [ ] **Step 3: Build release binaries**
   2284 
   2285 Run: `cargo build --release 2>&1`
   2286 Expected: builds successfully, produces `dtcat`, `dtfilter`, `dtdiff` in `target/release/`
   2287 
   2288 - [ ] **Step 4: Smoke test all binaries**
   2289 
   2290 ```bash
   2291 echo "name,value\nAlice,100\nBob,200" > /tmp/dt_test.csv
   2292 ./target/release/dtcat /tmp/dt_test.csv
   2293 ./target/release/dtcat /tmp/dt_test.csv --schema
   2294 ./target/release/dtcat /tmp/dt_test.csv --describe
   2295 ./target/release/dtfilter /tmp/dt_test.csv --filter "value>100"
   2296 echo "name,value\nAlice,100\nCharlie,300" > /tmp/dt_test2.csv
   2297 ./target/release/dtdiff /tmp/dt_test.csv /tmp/dt_test2.csv
   2298 ```
   2299 
   2300 - [ ] **Step 5: Final commit**
   2301 
   2302 ```bash
   2303 git add -A
   2304 git commit -m "chore: final cleanup and verification for v0.1"
   2305 ```