2026-03-30-dt-cli-tools.md (62470B)
1 # dt-cli-tools Implementation Plan 2 3 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. 4 5 **Goal:** Build a Rust CLI tool suite (`dtcat`, `dtfilter`, `dtdiff`) for inspecting, querying, and comparing tabular data files across formats (CSV, Parquet, Arrow, JSON, Excel). 6 7 **Architecture:** Multi-format reader layer with automatic format detection feeds DataFrames into format-agnostic modules (formatter, filter, diff) ported from xl-cli-tools. Three binaries share the `dtcore` library crate. 8 9 **Tech Stack:** Rust 2024 edition, Polars 0.46 (DataFrame engine + CSV/Parquet/Arrow/JSON readers), calamine (Excel), clap (CLI), anyhow (errors), serde_json (JSON output). 10 11 **Source reference:** xl-cli-tools at `/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/` 12 13 --- 14 15 ## File Structure 16 17 ``` 18 dt-cli-tools/ 19 Cargo.toml 20 src/ 21 lib.rs # pub mod declarations 22 format.rs # Format enum, magic-byte + extension detection 23 reader.rs # ReadOptions, read_file dispatch 24 metadata.rs # FileInfo, format_file_size (generalized) 25 formatter.rs # ported from xl-cli-tools (pure DataFrame formatting) 26 filter.rs # ported from xl-cli-tools (letter-based column resolution removed) 27 diff.rs # ported from xl-cli-tools (pure DataFrame comparison) 28 readers/ 29 mod.rs # sub-module declarations 30 csv.rs # CSV/TSV reader via Polars CsvReader 31 parquet.rs # Parquet reader via Polars ParquetReader 32 arrow.rs # Arrow IPC reader via Polars IpcReader 33 json.rs # JSON/NDJSON reader via Polars JsonReader/JsonLineReader 34 excel.rs # Excel reader via calamine (ported from xl-cli-tools reader.rs) 35 src/bin/ 36 dtcat.rs # view/inspect any tabular file 37 dtfilter.rs # filter/query any tabular file 38 dtdiff.rs # compare two tabular files 39 tests/ 40 integration/ 41 dtcat.rs 42 dtfilter.rs 43 dtdiff.rs 44 demo/ # fixture files for tests 45 ``` 46 47 --- 48 49 ### Task 1: Project Scaffolding 50 51 **Files:** 52 - Create: `Cargo.toml` 53 - Create: `src/lib.rs` 54 - Create: `src/readers/mod.rs` 55 56 - [ ] **Step 1: Create Cargo.toml** 57 58 ```toml 59 [package] 60 name = "dt-cli-tools" 61 version = "0.1.0" 62 edition = "2024" 63 description = "CLI tools for viewing, filtering, and comparing tabular data files" 64 license = "MIT" 65 66 [lib] 67 name = "dtcore" 68 path = "src/lib.rs" 69 70 [[bin]] 71 name = "dtcat" 72 path = "src/bin/dtcat.rs" 73 74 [[bin]] 75 name = "dtfilter" 76 path = "src/bin/dtfilter.rs" 77 78 [[bin]] 79 name = "dtdiff" 80 path = "src/bin/dtdiff.rs" 81 82 [dependencies] 83 polars = { version = "0.46", default-features = false, features = [ 84 "dtype-datetime", 85 "csv", 86 "parquet", 87 "ipc", 88 "json", 89 ] } 90 calamine = "0.26" 91 clap = { version = "4", features = ["derive"] } 92 anyhow = "1" 93 serde_json = { version = "1", features = ["preserve_order"] } 94 95 [profile.release] 96 strip = true 97 lto = true 98 codegen-units = 1 99 panic = "abort" 100 opt-level = "z" 101 102 [dev-dependencies] 103 assert_cmd = "2" 104 predicates = "3" 105 tempfile = "3" 106 ``` 107 108 - [ ] **Step 2: Create src/lib.rs with module declarations** 109 110 ```rust 111 pub mod diff; 112 pub mod filter; 113 pub mod format; 114 pub mod formatter; 115 pub mod metadata; 116 pub mod reader; 117 pub mod readers; 118 ``` 119 120 - [ ] **Step 3: Create src/readers/mod.rs** 121 122 ```rust 123 pub mod arrow; 124 pub mod csv; 125 pub mod excel; 126 pub mod json; 127 pub mod parquet; 128 ``` 129 130 - [ ] **Step 4: Create placeholder files so the project compiles** 131 132 Create minimal empty-module stubs for every file declared in lib.rs and readers/mod.rs. Each stub is just an empty file or contains only `use anyhow::Result;` as needed. Also create empty `src/bin/dtcat.rs`, `src/bin/dtfilter.rs`, `src/bin/dtdiff.rs` with `fn main() {}`. 133 134 - [ ] **Step 5: Verify the project compiles** 135 136 Run: `cargo check 2>&1` 137 Expected: compiles with no errors (warnings OK at this stage) 138 139 - [ ] **Step 6: Commit** 140 141 ```bash 142 git add Cargo.toml src/ 143 git commit -m "feat: scaffold dt-cli-tools project structure" 144 ``` 145 146 --- 147 148 ### Task 2: Format Detection (`format.rs`) 149 150 **Files:** 151 - Create: `src/format.rs` 152 153 - [ ] **Step 1: Write tests for format detection** 154 155 ```rust 156 // src/format.rs 157 158 use anyhow::{Result, bail}; 159 use std::path::Path; 160 use std::io::Read; 161 162 #[derive(Debug, Clone, Copy, PartialEq)] 163 pub enum Format { 164 Csv, 165 Tsv, 166 Parquet, 167 Arrow, 168 Json, 169 Ndjson, 170 Excel, 171 } 172 173 impl Format { 174 /// Returns true if this format and `other` belong to the same family 175 /// (e.g. Csv and Tsv are both delimited text). 176 pub fn same_family(&self, other: &Format) -> bool { 177 matches!( 178 (self, other), 179 (Format::Csv, Format::Tsv) 180 | (Format::Tsv, Format::Csv) 181 | (Format::Json, Format::Ndjson) 182 | (Format::Ndjson, Format::Json) 183 ) || self == other 184 } 185 } 186 187 // Placeholder public functions — will implement in Step 3 188 pub fn detect_format(path: &Path, override_fmt: Option<&str>) -> Result<Format> { 189 todo!() 190 } 191 192 pub fn parse_format_str(s: &str) -> Result<Format> { 193 todo!() 194 } 195 196 fn detect_by_magic(path: &Path) -> Result<Option<Format>> { 197 todo!() 198 } 199 200 fn detect_by_extension(path: &Path) -> Result<Format> { 201 todo!() 202 } 203 204 /// Auto-detect CSV delimiter by sampling the first few lines. 205 /// Returns b',' (comma), b'\t' (tab), or b';' (semicolon). 206 pub fn detect_csv_delimiter(path: &Path) -> Result<u8> { 207 todo!() 208 } 209 210 #[cfg(test)] 211 mod tests { 212 use super::*; 213 use std::io::Write; 214 use tempfile::NamedTempFile; 215 216 // -- parse_format_str -- 217 218 #[test] 219 fn parse_csv() { 220 assert_eq!(parse_format_str("csv").unwrap(), Format::Csv); 221 } 222 223 #[test] 224 fn parse_tsv() { 225 assert_eq!(parse_format_str("tsv").unwrap(), Format::Tsv); 226 } 227 228 #[test] 229 fn parse_parquet() { 230 assert_eq!(parse_format_str("parquet").unwrap(), Format::Parquet); 231 } 232 233 #[test] 234 fn parse_arrow() { 235 assert_eq!(parse_format_str("arrow").unwrap(), Format::Arrow); 236 } 237 238 #[test] 239 fn parse_json() { 240 assert_eq!(parse_format_str("json").unwrap(), Format::Json); 241 } 242 243 #[test] 244 fn parse_ndjson() { 245 assert_eq!(parse_format_str("ndjson").unwrap(), Format::Ndjson); 246 } 247 248 #[test] 249 fn parse_excel() { 250 assert_eq!(parse_format_str("excel").unwrap(), Format::Excel); 251 assert_eq!(parse_format_str("xlsx").unwrap(), Format::Excel); 252 } 253 254 #[test] 255 fn parse_unknown_is_err() { 256 assert!(parse_format_str("banana").is_err()); 257 } 258 259 #[test] 260 fn parse_case_insensitive() { 261 assert_eq!(parse_format_str("CSV").unwrap(), Format::Csv); 262 assert_eq!(parse_format_str("Parquet").unwrap(), Format::Parquet); 263 } 264 265 // -- detect_by_extension -- 266 267 #[test] 268 fn ext_csv() { 269 assert_eq!(detect_by_extension(Path::new("data.csv")).unwrap(), Format::Csv); 270 } 271 272 #[test] 273 fn ext_tsv() { 274 assert_eq!(detect_by_extension(Path::new("data.tsv")).unwrap(), Format::Tsv); 275 assert_eq!(detect_by_extension(Path::new("data.tab")).unwrap(), Format::Tsv); 276 } 277 278 #[test] 279 fn ext_parquet() { 280 assert_eq!(detect_by_extension(Path::new("data.parquet")).unwrap(), Format::Parquet); 281 assert_eq!(detect_by_extension(Path::new("data.pq")).unwrap(), Format::Parquet); 282 } 283 284 #[test] 285 fn ext_arrow() { 286 assert_eq!(detect_by_extension(Path::new("data.arrow")).unwrap(), Format::Arrow); 287 assert_eq!(detect_by_extension(Path::new("data.feather")).unwrap(), Format::Arrow); 288 assert_eq!(detect_by_extension(Path::new("data.ipc")).unwrap(), Format::Arrow); 289 } 290 291 #[test] 292 fn ext_json() { 293 assert_eq!(detect_by_extension(Path::new("data.json")).unwrap(), Format::Json); 294 } 295 296 #[test] 297 fn ext_ndjson() { 298 assert_eq!(detect_by_extension(Path::new("data.ndjson")).unwrap(), Format::Ndjson); 299 assert_eq!(detect_by_extension(Path::new("data.jsonl")).unwrap(), Format::Ndjson); 300 } 301 302 #[test] 303 fn ext_excel() { 304 assert_eq!(detect_by_extension(Path::new("data.xlsx")).unwrap(), Format::Excel); 305 assert_eq!(detect_by_extension(Path::new("data.xls")).unwrap(), Format::Excel); 306 assert_eq!(detect_by_extension(Path::new("data.xlsb")).unwrap(), Format::Excel); 307 assert_eq!(detect_by_extension(Path::new("data.ods")).unwrap(), Format::Excel); 308 } 309 310 #[test] 311 fn ext_unknown_is_err() { 312 assert!(detect_by_extension(Path::new("data.txt")).is_err()); 313 assert!(detect_by_extension(Path::new("data")).is_err()); 314 } 315 316 // -- detect_by_magic -- 317 318 #[test] 319 fn magic_parquet() { 320 let mut f = NamedTempFile::with_suffix(".bin").unwrap(); 321 f.write_all(b"PAR1some_data").unwrap(); 322 f.flush().unwrap(); 323 assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Parquet)); 324 } 325 326 #[test] 327 fn magic_arrow() { 328 let mut f = NamedTempFile::with_suffix(".bin").unwrap(); 329 f.write_all(b"ARROW1some_data").unwrap(); 330 f.flush().unwrap(); 331 assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Arrow)); 332 } 333 334 #[test] 335 fn magic_xlsx_zip() { 336 let mut f = NamedTempFile::with_suffix(".bin").unwrap(); 337 f.write_all(&[0x50, 0x4B, 0x03, 0x04, 0x00]).unwrap(); 338 f.flush().unwrap(); 339 assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Excel)); 340 } 341 342 #[test] 343 fn magic_xls_ole() { 344 let mut f = NamedTempFile::with_suffix(".bin").unwrap(); 345 f.write_all(&[0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1]).unwrap(); 346 f.flush().unwrap(); 347 assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Excel)); 348 } 349 350 #[test] 351 fn magic_json_array() { 352 let mut f = NamedTempFile::with_suffix(".bin").unwrap(); 353 f.write_all(b"[{\"a\":1}]").unwrap(); 354 f.flush().unwrap(); 355 assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Json)); 356 } 357 358 #[test] 359 fn magic_json_object() { 360 let mut f = NamedTempFile::with_suffix(".bin").unwrap(); 361 f.write_all(b"{\"a\":1}\n{\"a\":2}").unwrap(); 362 f.flush().unwrap(); 363 // Leading { suggests NDJSON 364 assert_eq!(detect_by_magic(f.path()).unwrap(), Some(Format::Ndjson)); 365 } 366 367 #[test] 368 fn magic_csv_fallback_none() { 369 // Plain text with commas — magic returns None, falls back to extension 370 let mut f = NamedTempFile::with_suffix(".bin").unwrap(); 371 f.write_all(b"a,b,c\n1,2,3\n").unwrap(); 372 f.flush().unwrap(); 373 assert_eq!(detect_by_magic(f.path()).unwrap(), None); 374 } 375 376 // -- detect_format (integration) -- 377 378 #[test] 379 fn override_wins() { 380 // Even with .csv extension, override to parquet 381 assert_eq!( 382 detect_format(Path::new("data.csv"), Some("parquet")).unwrap(), 383 Format::Parquet 384 ); 385 } 386 387 // -- same_family -- 388 389 #[test] 390 fn csv_tsv_same_family() { 391 assert!(Format::Csv.same_family(&Format::Tsv)); 392 assert!(Format::Tsv.same_family(&Format::Csv)); 393 } 394 395 #[test] 396 fn json_ndjson_same_family() { 397 assert!(Format::Json.same_family(&Format::Ndjson)); 398 } 399 400 #[test] 401 fn csv_parquet_different_family() { 402 assert!(!Format::Csv.same_family(&Format::Parquet)); 403 } 404 405 // -- detect_csv_delimiter -- 406 407 #[test] 408 fn delimiter_comma() { 409 let mut f = NamedTempFile::with_suffix(".csv").unwrap(); 410 f.write_all(b"a,b,c\n1,2,3\n4,5,6\n").unwrap(); 411 f.flush().unwrap(); 412 assert_eq!(detect_csv_delimiter(f.path()).unwrap(), b','); 413 } 414 415 #[test] 416 fn delimiter_tab() { 417 let mut f = NamedTempFile::with_suffix(".tsv").unwrap(); 418 f.write_all(b"a\tb\tc\n1\t2\t3\n").unwrap(); 419 f.flush().unwrap(); 420 assert_eq!(detect_csv_delimiter(f.path()).unwrap(), b'\t'); 421 } 422 423 #[test] 424 fn delimiter_semicolon() { 425 let mut f = NamedTempFile::with_suffix(".csv").unwrap(); 426 f.write_all(b"a;b;c\n1;2;3\n").unwrap(); 427 f.flush().unwrap(); 428 assert_eq!(detect_csv_delimiter(f.path()).unwrap(), b';'); 429 } 430 } 431 ``` 432 433 - [ ] **Step 2: Run tests to verify they fail** 434 435 Run: `cargo test --lib format:: 2>&1 | tail -5` 436 Expected: all tests FAIL (todo! panics) 437 438 - [ ] **Step 3: Implement format detection** 439 440 Replace the `todo!()` bodies with real implementations: 441 442 ```rust 443 pub fn parse_format_str(s: &str) -> Result<Format> { 444 match s.to_lowercase().as_str() { 445 "csv" => Ok(Format::Csv), 446 "tsv" | "tab" => Ok(Format::Tsv), 447 "parquet" | "pq" => Ok(Format::Parquet), 448 "arrow" | "feather" | "ipc" => Ok(Format::Arrow), 449 "json" => Ok(Format::Json), 450 "ndjson" | "jsonl" => Ok(Format::Ndjson), 451 "excel" | "xlsx" | "xls" | "xlsb" | "ods" => Ok(Format::Excel), 452 _ => bail!("unknown format '{}'. Supported: csv, tsv, parquet, arrow, json, ndjson, excel", s), 453 } 454 } 455 456 fn detect_by_extension(path: &Path) -> Result<Format> { 457 let ext = path 458 .extension() 459 .and_then(|e| e.to_str()) 460 .map(|e| e.to_lowercase()); 461 462 match ext.as_deref() { 463 Some("csv") => Ok(Format::Csv), 464 Some("tsv") | Some("tab") => Ok(Format::Tsv), 465 Some("parquet") | Some("pq") => Ok(Format::Parquet), 466 Some("arrow") | Some("feather") | Some("ipc") => Ok(Format::Arrow), 467 Some("json") => Ok(Format::Json), 468 Some("ndjson") | Some("jsonl") => Ok(Format::Ndjson), 469 Some("xlsx") | Some("xls") | Some("xlsb") | Some("ods") => Ok(Format::Excel), 470 Some(other) => bail!("unrecognized extension '.{}'. Use --format to specify.", other), 471 None => bail!("no file extension. Use --format to specify the format."), 472 } 473 } 474 475 fn detect_by_magic(path: &Path) -> Result<Option<Format>> { 476 let mut file = std::fs::File::open(path)?; 477 let mut buf = [0u8; 8]; 478 let n = file.read(&mut buf)?; 479 if n < 2 { 480 return Ok(None); 481 } 482 483 // Parquet: "PAR1" 484 if n >= 4 && &buf[..4] == b"PAR1" { 485 return Ok(Some(Format::Parquet)); 486 } 487 // Arrow IPC: "ARROW1" 488 if n >= 6 && &buf[..6] == b"ARROW1" { 489 return Ok(Some(Format::Arrow)); 490 } 491 // ZIP (xlsx, ods): PK\x03\x04 492 if buf[0] == 0x50 && buf[1] == 0x4B { 493 return Ok(Some(Format::Excel)); 494 } 495 // OLE2 (xls): D0 CF 11 E0 496 if n >= 4 && buf[0] == 0xD0 && buf[1] == 0xCF && buf[2] == 0x11 && buf[3] == 0xE0 { 497 return Ok(Some(Format::Excel)); 498 } 499 // JSON array: starts with [ 500 // Need to skip leading whitespace 501 let first_non_ws = buf[..n].iter().find(|b| !b.is_ascii_whitespace()); 502 if let Some(b'[') = first_non_ws { 503 return Ok(Some(Format::Json)); 504 } 505 if let Some(b'{') = first_non_ws { 506 return Ok(Some(Format::Ndjson)); 507 } 508 509 // CSV/TSV: no distinctive magic bytes — return None to fall through to extension 510 Ok(None) 511 } 512 513 pub fn detect_format(path: &Path, override_fmt: Option<&str>) -> Result<Format> { 514 if let Some(fmt) = override_fmt { 515 return parse_format_str(fmt); 516 } 517 if let Some(fmt) = detect_by_magic(path)? { 518 return Ok(fmt); 519 } 520 detect_by_extension(path) 521 } 522 523 pub fn detect_csv_delimiter(path: &Path) -> Result<u8> { 524 let mut file = std::fs::File::open(path)?; 525 let mut buf = String::new(); 526 // Read up to 8KB for sampling 527 file.take(8192).read_to_string(&mut buf)?; 528 529 let lines: Vec<&str> = buf.lines().take(10).collect(); 530 if lines.is_empty() { 531 return Ok(b','); 532 } 533 534 let delimiters = [b',', b'\t', b';']; 535 let mut best = b','; 536 let mut best_score = 0usize; 537 538 for &d in &delimiters { 539 let counts: Vec<usize> = lines 540 .iter() 541 .map(|line| line.as_bytes().iter().filter(|&&b| b == d).count()) 542 .collect(); 543 // Score: minimum count across lines (consistency matters) 544 let min_count = *counts.iter().min().unwrap_or(&0); 545 if min_count > best_score { 546 best_score = min_count; 547 best = d; 548 } 549 } 550 551 Ok(best) 552 } 553 ``` 554 555 - [ ] **Step 4: Run tests to verify they pass** 556 557 Run: `cargo test --lib format:: 2>&1` 558 Expected: all tests PASS 559 560 - [ ] **Step 5: Commit** 561 562 ```bash 563 git add src/format.rs 564 git commit -m "feat: add format detection with magic bytes and extension matching" 565 ``` 566 567 --- 568 569 ### Task 3: Metadata Module (`metadata.rs`) 570 571 **Files:** 572 - Create: `src/metadata.rs` 573 574 - [ ] **Step 1: Write metadata module with tests** 575 576 Port `format_file_size` from xl-cli-tools (`/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/metadata.rs`). Generalize `FileInfo` to include the detected format and work for non-Excel files. 577 578 ```rust 579 // src/metadata.rs 580 581 use crate::format::Format; 582 583 /// Info about a single sheet (Excel) or the entire file (other formats). 584 #[derive(Debug, Clone)] 585 pub struct SheetInfo { 586 pub name: String, 587 pub rows: usize, // total rows including header 588 pub cols: usize, 589 } 590 591 /// Info about the file. 592 #[derive(Debug)] 593 pub struct FileInfo { 594 pub file_size: u64, 595 pub format: Format, 596 pub sheets: Vec<SheetInfo>, 597 } 598 599 /// Format file size for display: "245 KB", "1.2 MB", etc. 600 pub fn format_file_size(bytes: u64) -> String { 601 if bytes < 1_024 { 602 format!("{bytes} B") 603 } else if bytes < 1_048_576 { 604 format!("{:.0} KB", bytes as f64 / 1_024.0) 605 } else if bytes < 1_073_741_824 { 606 format!("{:.1} MB", bytes as f64 / 1_048_576.0) 607 } else { 608 format!("{:.1} GB", bytes as f64 / 1_073_741_824.0) 609 } 610 } 611 612 /// Format name for a Format variant. 613 pub fn format_name(fmt: Format) -> &'static str { 614 match fmt { 615 Format::Csv => "CSV", 616 Format::Tsv => "TSV", 617 Format::Parquet => "Parquet", 618 Format::Arrow => "Arrow IPC", 619 Format::Json => "JSON", 620 Format::Ndjson => "NDJSON", 621 Format::Excel => "Excel", 622 } 623 } 624 625 #[cfg(test)] 626 mod tests { 627 use super::*; 628 629 #[test] 630 fn test_format_file_size() { 631 assert_eq!(format_file_size(500), "500 B"); 632 assert_eq!(format_file_size(2_048), "2 KB"); 633 assert_eq!(format_file_size(1_500_000), "1.4 MB"); 634 } 635 636 #[test] 637 fn test_format_name() { 638 assert_eq!(format_name(Format::Csv), "CSV"); 639 assert_eq!(format_name(Format::Parquet), "Parquet"); 640 assert_eq!(format_name(Format::Excel), "Excel"); 641 } 642 } 643 ``` 644 645 - [ ] **Step 2: Run tests** 646 647 Run: `cargo test --lib metadata:: 2>&1` 648 Expected: PASS 649 650 - [ ] **Step 3: Commit** 651 652 ```bash 653 git add src/metadata.rs 654 git commit -m "feat: add metadata module with FileInfo and format_file_size" 655 ``` 656 657 --- 658 659 ### Task 4: Formatter Module (`formatter.rs`) 660 661 **Files:** 662 - Create: `src/formatter.rs` 663 664 - [ ] **Step 1: Port formatter.rs from xl-cli-tools** 665 666 Copy `/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/formatter.rs` and update imports: 667 - Change `use crate::metadata::{format_file_size, FileInfo, SheetInfo};` to `use crate::metadata::{format_file_size, FileInfo, SheetInfo, format_name};` 668 - Update `format_header` to include the format name: `# File: report.csv (245 KB) [CSV]` 669 - The rest of the module (format_schema, format_data_table, format_head_tail, format_csv, format_describe, all helper functions, and all tests) transfers verbatim. 670 671 Key change to `format_header`: 672 ```rust 673 pub fn format_header(file_name: &str, info: &FileInfo) -> String { 674 let size_str = format_file_size(info.file_size); 675 let fmt_name = format_name(info.format); 676 let sheet_count = info.sheets.len(); 677 if sheet_count > 1 { 678 format!("# File: {file_name} ({size_str}) [{fmt_name}]\n# Sheets: {sheet_count}\n") 679 } else { 680 format!("# File: {file_name} ({size_str}) [{fmt_name}]\n") 681 } 682 } 683 ``` 684 685 Update the `format_header` test to match the new output: 686 ```rust 687 #[test] 688 fn test_format_header() { 689 let info = FileInfo { 690 file_size: 250_000, 691 format: Format::Excel, 692 sheets: vec![ 693 SheetInfo { name: "Sheet1".into(), rows: 100, cols: 5 }, 694 SheetInfo { name: "Sheet2".into(), rows: 50, cols: 3 }, 695 ], 696 }; 697 let out = format_header("test.xlsx", &info); 698 assert!(out.contains("# File: test.xlsx (244 KB) [Excel]")); 699 assert!(out.contains("# Sheets: 2")); 700 } 701 702 #[test] 703 fn test_format_header_single_sheet() { 704 let info = FileInfo { 705 file_size: 1_000, 706 format: Format::Csv, 707 sheets: vec![SheetInfo { name: "data".into(), rows: 10, cols: 3 }], 708 }; 709 let out = format_header("data.csv", &info); 710 assert!(out.contains("[CSV]")); 711 assert!(!out.contains("Sheets")); 712 } 713 ``` 714 715 All other tests (format_data_table, format_head_tail, format_schema, format_csv, format_describe, etc.) transfer verbatim from xl-cli-tools. They test pure DataFrame formatting and don't reference Excel-specific types. 716 717 - [ ] **Step 2: Run tests** 718 719 Run: `cargo test --lib formatter:: 2>&1` 720 Expected: all tests PASS 721 722 - [ ] **Step 3: Commit** 723 724 ```bash 725 git add src/formatter.rs 726 git commit -m "feat: port formatter module from xl-cli-tools with format-name support" 727 ``` 728 729 --- 730 731 ### Task 5: Filter Module (`filter.rs`) 732 733 **Files:** 734 - Create: `src/filter.rs` 735 736 - [ ] **Step 1: Port filter.rs from xl-cli-tools, removing letter-based column resolution** 737 738 Copy `/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/filter.rs` and make these changes: 739 740 1. **Remove** `col_letter_to_index` function entirely. 741 2. **Simplify** `resolve_column` to only do name matching (exact, then case-insensitive). Remove the letter-based fallback step: 742 743 ```rust 744 /// Resolve a column specifier to a DataFrame column name. 745 /// Accepts a header name (exact match first, then case-insensitive). 746 pub fn resolve_column(spec: &str, df_columns: &[String]) -> Result<String, String> { 747 // 1. Exact header name match 748 if df_columns.contains(&spec.to_string()) { 749 return Ok(spec.to_string()); 750 } 751 // 2. Case-insensitive header name match 752 let spec_lower = spec.to_lowercase(); 753 for col in df_columns { 754 if col.to_lowercase() == spec_lower { 755 return Ok(col.clone()); 756 } 757 } 758 let available = df_columns.join(", "); 759 Err(format!("column '{}' not found. Available columns: {}", spec, available)) 760 } 761 ``` 762 763 3. **Remove** the letter-based tests: `resolve_by_letter`, `resolve_by_letter_lowercase`, `resolve_header_takes_priority_over_letter`, `resolve_letter_out_of_range_is_err`, `pipeline_cols_by_letter`. 764 4. Keep everything else: `parse_filter_expr`, `parse_sort_spec`, `build_filter_mask`, `apply_filters`, `filter_pipeline`, `FilterOptions`, `SortSpec`, `FilterExpr`, `FilterOp`, `apply_sort`, and all their tests. 765 766 - [ ] **Step 2: Run tests** 767 768 Run: `cargo test --lib filter:: 2>&1` 769 Expected: all tests PASS 770 771 - [ ] **Step 3: Commit** 772 773 ```bash 774 git add src/filter.rs 775 git commit -m "feat: port filter module from xl-cli-tools without letter-based column resolution" 776 ``` 777 778 --- 779 780 ### Task 6: Diff Module (`diff.rs`) 781 782 **Files:** 783 - Create: `src/diff.rs` 784 785 - [ ] **Step 1: Port diff.rs verbatim from xl-cli-tools** 786 787 Copy `/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/diff.rs` and update the import path: 788 - Change `use crate::formatter;` to `use crate::formatter;` (same - no change needed) 789 790 The entire module (SheetSource, DiffRow, CellChange, ModifiedRow, DiffResult, DiffOptions, diff_positional, diff_keyed, diff_sheets, and all tests) transfers verbatim. No changes to logic. 791 792 - [ ] **Step 2: Run tests** 793 794 Run: `cargo test --lib diff:: 2>&1` 795 Expected: all tests PASS 796 797 - [ ] **Step 3: Commit** 798 799 ```bash 800 git add src/diff.rs 801 git commit -m "feat: port diff module from xl-cli-tools" 802 ``` 803 804 --- 805 806 ### Task 7: CSV Reader (`readers/csv.rs`) 807 808 **Files:** 809 - Create: `src/readers/csv.rs` 810 811 - [ ] **Step 1: Write CSV reader with tests** 812 813 ```rust 814 // src/readers/csv.rs 815 816 use anyhow::Result; 817 use polars::prelude::*; 818 use std::path::Path; 819 820 use crate::reader::ReadOptions; 821 822 pub fn read(path: &Path, opts: &ReadOptions) -> Result<DataFrame> { 823 let separator = opts.separator.unwrap_or_else(|| { 824 crate::format::detect_csv_delimiter(path).unwrap_or(b',') 825 }); 826 827 let mut reader = CsvReadOptions::default() 828 .with_has_header(true) 829 .with_skip_rows(opts.skip_rows.unwrap_or(0)) 830 .with_parse_options( 831 CsvParseOptions::default().with_separator(separator), 832 ) 833 .try_into_reader_with_file_path(Some(path.into()))?; 834 835 let df = reader.finish()?; 836 Ok(df) 837 } 838 839 #[cfg(test)] 840 mod tests { 841 use super::*; 842 use std::io::Write; 843 use tempfile::NamedTempFile; 844 845 fn default_opts() -> ReadOptions { 846 ReadOptions::default() 847 } 848 849 #[test] 850 fn read_basic_csv() { 851 let mut f = NamedTempFile::with_suffix(".csv").unwrap(); 852 write!(f, "name,value\nAlice,100\nBob,200\n").unwrap(); 853 f.flush().unwrap(); 854 855 let df = read(f.path(), &default_opts()).unwrap(); 856 assert_eq!(df.height(), 2); 857 assert_eq!(df.width(), 2); 858 let names: Vec<String> = df.get_column_names().iter().map(|s| s.to_string()).collect(); 859 assert_eq!(names, vec!["name", "value"]); 860 } 861 862 #[test] 863 fn read_tsv() { 864 let mut f = NamedTempFile::with_suffix(".tsv").unwrap(); 865 write!(f, "a\tb\n1\t2\n3\t4\n").unwrap(); 866 f.flush().unwrap(); 867 868 let opts = ReadOptions { separator: Some(b'\t'), ..Default::default() }; 869 let df = read(f.path(), &opts).unwrap(); 870 assert_eq!(df.height(), 2); 871 assert_eq!(df.width(), 2); 872 } 873 874 #[test] 875 fn read_with_skip() { 876 let mut f = NamedTempFile::with_suffix(".csv").unwrap(); 877 write!(f, "metadata line\nname,value\nAlice,100\n").unwrap(); 878 f.flush().unwrap(); 879 880 let opts = ReadOptions { skip_rows: Some(1), ..Default::default() }; 881 let df = read(f.path(), &opts).unwrap(); 882 assert_eq!(df.height(), 1); 883 let names: Vec<String> = df.get_column_names().iter().map(|s| s.to_string()).collect(); 884 assert_eq!(names, vec!["name", "value"]); 885 } 886 } 887 ``` 888 889 Note: This requires `ReadOptions` from `reader.rs`. Define it first (in the next step, or define a minimal version now). 890 891 - [ ] **Step 2: Define ReadOptions in reader.rs** 892 893 ```rust 894 // src/reader.rs 895 896 /// Options that control how a file is read. 897 #[derive(Debug, Clone, Default)] 898 pub struct ReadOptions { 899 pub sheet: Option<String>, // Excel only 900 pub skip_rows: Option<usize>, 901 pub separator: Option<u8>, // CSV override 902 } 903 ``` 904 905 - [ ] **Step 3: Run tests** 906 907 Run: `cargo test --lib readers::csv:: 2>&1` 908 Expected: all tests PASS 909 910 - [ ] **Step 4: Commit** 911 912 ```bash 913 git add src/readers/csv.rs src/reader.rs 914 git commit -m "feat: add CSV/TSV reader with delimiter auto-detection" 915 ``` 916 917 --- 918 919 ### Task 8: Parquet Reader (`readers/parquet.rs`) 920 921 **Files:** 922 - Create: `src/readers/parquet.rs` 923 924 - [ ] **Step 1: Write Parquet reader with tests** 925 926 ```rust 927 // src/readers/parquet.rs 928 929 use anyhow::Result; 930 use polars::prelude::*; 931 use std::path::Path; 932 933 use crate::reader::ReadOptions; 934 935 pub fn read(path: &Path, opts: &ReadOptions) -> Result<DataFrame> { 936 let file = std::fs::File::open(path)?; 937 let mut df = ParquetReader::new(file).finish()?; 938 939 if let Some(skip) = opts.skip_rows { 940 if skip > 0 && skip < df.height() { 941 df = df.slice(skip as i64, df.height() - skip); 942 } 943 } 944 945 Ok(df) 946 } 947 948 #[cfg(test)] 949 mod tests { 950 use super::*; 951 use tempfile::NamedTempFile; 952 953 fn default_opts() -> ReadOptions { 954 ReadOptions::default() 955 } 956 957 #[test] 958 fn read_parquet_roundtrip() { 959 // Create a parquet file using Polars writer 960 let s1 = Series::new("name".into(), &["Alice", "Bob"]); 961 let s2 = Series::new("value".into(), &[100i64, 200]); 962 let mut df = DataFrame::new(vec![s1.into_column(), s2.into_column()]).unwrap(); 963 964 let f = NamedTempFile::with_suffix(".parquet").unwrap(); 965 let file = std::fs::File::create(f.path()).unwrap(); 966 ParquetWriter::new(file).finish(&mut df).unwrap(); 967 968 let result = read(f.path(), &default_opts()).unwrap(); 969 assert_eq!(result.height(), 2); 970 assert_eq!(result.width(), 2); 971 } 972 } 973 ``` 974 975 - [ ] **Step 2: Run tests** 976 977 Run: `cargo test --lib readers::parquet:: 2>&1` 978 Expected: PASS 979 980 - [ ] **Step 3: Commit** 981 982 ```bash 983 git add src/readers/parquet.rs 984 git commit -m "feat: add Parquet reader" 985 ``` 986 987 --- 988 989 ### Task 9: Arrow IPC Reader (`readers/arrow.rs`) 990 991 **Files:** 992 - Create: `src/readers/arrow.rs` 993 994 - [ ] **Step 1: Write Arrow IPC reader with tests** 995 996 ```rust 997 // src/readers/arrow.rs 998 999 use anyhow::Result; 1000 use polars::prelude::*; 1001 use std::path::Path; 1002 1003 use crate::reader::ReadOptions; 1004 1005 pub fn read(path: &Path, opts: &ReadOptions) -> Result<DataFrame> { 1006 let file = std::fs::File::open(path)?; 1007 let mut df = IpcReader::new(file).finish()?; 1008 1009 if let Some(skip) = opts.skip_rows { 1010 if skip > 0 && skip < df.height() { 1011 df = df.slice(skip as i64, df.height() - skip); 1012 } 1013 } 1014 1015 Ok(df) 1016 } 1017 1018 #[cfg(test)] 1019 mod tests { 1020 use super::*; 1021 use tempfile::NamedTempFile; 1022 1023 fn default_opts() -> ReadOptions { 1024 ReadOptions::default() 1025 } 1026 1027 #[test] 1028 fn read_arrow_roundtrip() { 1029 let s1 = Series::new("x".into(), &[1i64, 2, 3]); 1030 let mut df = DataFrame::new(vec![s1.into_column()]).unwrap(); 1031 1032 let f = NamedTempFile::with_suffix(".arrow").unwrap(); 1033 let file = std::fs::File::create(f.path()).unwrap(); 1034 IpcWriter::new(file).finish(&mut df).unwrap(); 1035 1036 let result = read(f.path(), &default_opts()).unwrap(); 1037 assert_eq!(result.height(), 3); 1038 assert_eq!(result.width(), 1); 1039 } 1040 } 1041 ``` 1042 1043 - [ ] **Step 2: Run tests** 1044 1045 Run: `cargo test --lib readers::arrow:: 2>&1` 1046 Expected: PASS 1047 1048 - [ ] **Step 3: Commit** 1049 1050 ```bash 1051 git add src/readers/arrow.rs 1052 git commit -m "feat: add Arrow IPC reader" 1053 ``` 1054 1055 --- 1056 1057 ### Task 10: JSON/NDJSON Reader (`readers/json.rs`) 1058 1059 **Files:** 1060 - Create: `src/readers/json.rs` 1061 1062 - [ ] **Step 1: Write JSON reader with tests** 1063 1064 ```rust 1065 // src/readers/json.rs 1066 1067 use anyhow::Result; 1068 use polars::prelude::*; 1069 use std::path::Path; 1070 1071 use crate::format::Format; 1072 use crate::reader::ReadOptions; 1073 1074 pub fn read(path: &Path, format: Format, opts: &ReadOptions) -> Result<DataFrame> { 1075 let file = std::fs::File::open(path)?; 1076 1077 let mut df = match format { 1078 Format::Ndjson => { 1079 JsonLineReader::new(file).finish()? 1080 } 1081 _ => { 1082 // JSON array format 1083 JsonReader::new(file).finish()? 1084 } 1085 }; 1086 1087 if let Some(skip) = opts.skip_rows { 1088 if skip > 0 && skip < df.height() { 1089 df = df.slice(skip as i64, df.height() - skip); 1090 } 1091 } 1092 1093 Ok(df) 1094 } 1095 1096 #[cfg(test)] 1097 mod tests { 1098 use super::*; 1099 use std::io::Write; 1100 use tempfile::NamedTempFile; 1101 1102 fn default_opts() -> ReadOptions { 1103 ReadOptions::default() 1104 } 1105 1106 #[test] 1107 fn read_json_array() { 1108 let mut f = NamedTempFile::with_suffix(".json").unwrap(); 1109 write!(f, r#"[{{"name":"Alice","value":1}},{{"name":"Bob","value":2}}]"#).unwrap(); 1110 f.flush().unwrap(); 1111 1112 let df = read(f.path(), Format::Json, &default_opts()).unwrap(); 1113 assert_eq!(df.height(), 2); 1114 } 1115 1116 #[test] 1117 fn read_ndjson() { 1118 let mut f = NamedTempFile::with_suffix(".ndjson").unwrap(); 1119 write!(f, "{}\n{}\n", 1120 r#"{{"name":"Alice","value":1}}"#, 1121 r#"{{"name":"Bob","value":2}}"#, 1122 ).unwrap(); 1123 f.flush().unwrap(); 1124 1125 let df = read(f.path(), Format::Ndjson, &default_opts()).unwrap(); 1126 assert_eq!(df.height(), 2); 1127 } 1128 } 1129 ``` 1130 1131 Note: Polars JSON reader API may vary. If `JsonReader` is not directly available, use `JsonFormat::Json` with the appropriate reader. The implementer should check the exact Polars 0.46 API and adapt. Alternative approach if `JsonReader` doesn't exist: 1132 1133 ```rust 1134 // Alternative using LazyFrame 1135 let lf = LazyJsonLineReader::new(path).finish()?; 1136 let df = lf.collect()?; 1137 ``` 1138 1139 - [ ] **Step 2: Run tests** 1140 1141 Run: `cargo test --lib readers::json:: 2>&1` 1142 Expected: PASS (adapt if Polars API differs) 1143 1144 - [ ] **Step 3: Commit** 1145 1146 ```bash 1147 git add src/readers/json.rs 1148 git commit -m "feat: add JSON/NDJSON reader" 1149 ``` 1150 1151 --- 1152 1153 ### Task 11: Excel Reader (`readers/excel.rs`) 1154 1155 **Files:** 1156 - Create: `src/readers/excel.rs` 1157 1158 - [ ] **Step 1: Port Excel reader from xl-cli-tools** 1159 1160 Copy `/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/reader.rs` to `src/readers/excel.rs` and adapt: 1161 1162 1. Change the public API from `read_sheet(path, sheet_name)` / `read_sheet_with_skip(path, sheet_name, skip)` to a single function matching the reader pattern: 1163 1164 ```rust 1165 pub fn read(path: &Path, opts: &ReadOptions) -> Result<DataFrame> 1166 ``` 1167 1168 This function: 1169 - Resolves the sheet name from `opts.sheet` (defaults to the first sheet). 1170 - Applies `opts.skip_rows`. 1171 - Reuses `range_to_dataframe_skip`, `infer_column_type`, `build_series` verbatim from xl-cli-tools. 1172 1173 2. Also provide a helper for reading Excel metadata (sheet names, dimensions): 1174 1175 ```rust 1176 pub fn read_excel_info(path: &Path) -> Result<Vec<SheetInfo>> 1177 ``` 1178 1179 This reuses the calamine-based metadata reading from xl-cli-tools `metadata.rs:read_file_info`, but returns just the sheet list. 1180 1181 3. Port all internal functions (`infer_column_type`, `build_series`, `range_to_dataframe_skip`) and unit tests verbatim. 1182 1183 - [ ] **Step 2: Run tests** 1184 1185 Run: `cargo test --lib readers::excel:: 2>&1` 1186 Expected: PASS 1187 1188 - [ ] **Step 3: Commit** 1189 1190 ```bash 1191 git add src/readers/excel.rs 1192 git commit -m "feat: port Excel reader from xl-cli-tools" 1193 ``` 1194 1195 --- 1196 1197 ### Task 12: Reader Dispatch (`reader.rs`) 1198 1199 **Files:** 1200 - Modify: `src/reader.rs` (already has ReadOptions from Task 7) 1201 1202 - [ ] **Step 1: Add read_file dispatch function** 1203 1204 ```rust 1205 // Add to src/reader.rs 1206 1207 use anyhow::{Result, bail}; 1208 use polars::prelude::*; 1209 use std::path::Path; 1210 1211 use crate::format::Format; 1212 use crate::metadata::{FileInfo, SheetInfo}; 1213 use crate::readers; 1214 1215 /// Options that control how a file is read. 1216 #[derive(Debug, Clone, Default)] 1217 pub struct ReadOptions { 1218 pub sheet: Option<String>, // Excel only 1219 pub skip_rows: Option<usize>, 1220 pub separator: Option<u8>, // CSV override 1221 } 1222 1223 /// Read a file into a DataFrame, dispatching to the appropriate reader. 1224 pub fn read_file(path: &Path, format: Format, opts: &ReadOptions) -> Result<DataFrame> { 1225 match format { 1226 Format::Csv | Format::Tsv => readers::csv::read(path, opts), 1227 Format::Parquet => readers::parquet::read(path, opts), 1228 Format::Arrow => readers::arrow::read(path, opts), 1229 Format::Json | Format::Ndjson => readers::json::read(path, format, opts), 1230 Format::Excel => readers::excel::read(path, opts), 1231 } 1232 } 1233 1234 /// Read file metadata: size, format, and sheet info (for Excel). 1235 pub fn read_file_info(path: &Path, format: Format) -> Result<FileInfo> { 1236 let file_size = std::fs::metadata(path)?.len(); 1237 1238 let sheets = match format { 1239 Format::Excel => readers::excel::read_excel_info(path)?, 1240 _ => vec![], // Non-Excel formats have no sheet concept 1241 }; 1242 1243 Ok(FileInfo { 1244 file_size, 1245 format, 1246 sheets, 1247 }) 1248 } 1249 ``` 1250 1251 - [ ] **Step 2: Write integration test for dispatch** 1252 1253 ```rust 1254 #[cfg(test)] 1255 mod tests { 1256 use super::*; 1257 use std::io::Write; 1258 use tempfile::NamedTempFile; 1259 1260 #[test] 1261 fn dispatch_csv() { 1262 let mut f = NamedTempFile::with_suffix(".csv").unwrap(); 1263 write!(f, "a,b\n1,2\n").unwrap(); 1264 f.flush().unwrap(); 1265 1266 let df = read_file(f.path(), Format::Csv, &ReadOptions::default()).unwrap(); 1267 assert_eq!(df.height(), 1); 1268 } 1269 1270 #[test] 1271 fn dispatch_parquet() { 1272 use polars::prelude::*; 1273 let s = Series::new("x".into(), &[1i64, 2]); 1274 let mut df = DataFrame::new(vec![s.into_column()]).unwrap(); 1275 1276 let f = NamedTempFile::with_suffix(".parquet").unwrap(); 1277 let file = std::fs::File::create(f.path()).unwrap(); 1278 ParquetWriter::new(file).finish(&mut df).unwrap(); 1279 1280 let result = read_file(f.path(), Format::Parquet, &ReadOptions::default()).unwrap(); 1281 assert_eq!(result.height(), 2); 1282 } 1283 } 1284 ``` 1285 1286 - [ ] **Step 3: Run tests** 1287 1288 Run: `cargo test --lib reader:: 2>&1` 1289 Expected: PASS 1290 1291 - [ ] **Step 4: Commit** 1292 1293 ```bash 1294 git add src/reader.rs 1295 git commit -m "feat: add reader dispatch with read_file and read_file_info" 1296 ``` 1297 1298 --- 1299 1300 ### Task 13: dtcat Binary (`src/bin/dtcat.rs`) 1301 1302 **Files:** 1303 - Create: `src/bin/dtcat.rs` 1304 1305 - [ ] **Step 1: Implement dtcat** 1306 1307 Adapt from xl-cli-tools `xlcat.rs` (`/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/bin/xlcat.rs`). Key changes: 1308 1309 1. Replace `xlcat::` imports with `dtcore::`. 1310 2. Add `--format` flag for format override. 1311 3. Replace Excel-specific file validation with format detection. 1312 4. Add `--info` flag (show file metadata). 1313 5. For non-Excel files, skip sheet resolution (no sheets concept). For Excel files with multiple sheets, keep the same listing behavior. 1314 6. Use `reader::read_file` and `reader::read_file_info` instead of `metadata::read_file_info` + `reader::read_sheet`. 1315 1316 ```rust 1317 // src/bin/dtcat.rs 1318 1319 use dtcore::format; 1320 use dtcore::formatter; 1321 use dtcore::metadata::{self, SheetInfo}; 1322 use dtcore::reader::{self, ReadOptions}; 1323 1324 use anyhow::Result; 1325 use clap::Parser; 1326 use polars::prelude::*; 1327 use std::path::PathBuf; 1328 use std::process; 1329 1330 #[derive(Parser, Debug)] 1331 #[command(name = "dtcat", about = "View tabular data files in the terminal")] 1332 struct Cli { 1333 /// Path to data file 1334 file: PathBuf, 1335 1336 /// Override format detection (csv, tsv, parquet, arrow, json, ndjson, excel) 1337 #[arg(long)] 1338 format: Option<String>, 1339 1340 /// Select sheet by name or 0-based index (Excel only) 1341 #[arg(long)] 1342 sheet: Option<String>, 1343 1344 /// Skip first N rows 1345 #[arg(long)] 1346 skip: Option<usize>, 1347 1348 /// Show column names and types only 1349 #[arg(long)] 1350 schema: bool, 1351 1352 /// Show summary statistics 1353 #[arg(long)] 1354 describe: bool, 1355 1356 /// Show first N rows (default: 50) 1357 #[arg(long)] 1358 head: Option<usize>, 1359 1360 /// Show last N rows 1361 #[arg(long)] 1362 tail: Option<usize>, 1363 1364 /// Output as CSV instead of markdown table 1365 #[arg(long)] 1366 csv: bool, 1367 1368 /// Show file metadata (size, format, shape, sheets) 1369 #[arg(long)] 1370 info: bool, 1371 } 1372 1373 #[derive(Debug)] 1374 struct ArgError(String); 1375 1376 impl std::fmt::Display for ArgError { 1377 fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { 1378 write!(f, "{}", self.0) 1379 } 1380 } 1381 1382 impl std::error::Error for ArgError {} 1383 1384 fn run(cli: &Cli) -> Result<()> { 1385 // Validate flag combinations 1386 if cli.schema && cli.describe { 1387 return Err(ArgError("--schema and --describe are mutually exclusive".into()).into()); 1388 } 1389 1390 // Detect format 1391 let fmt = format::detect_format(&cli.file, cli.format.as_deref())?; 1392 1393 // Read file info 1394 let file_info = reader::read_file_info(&cli.file, fmt)?; 1395 let file_name = cli.file 1396 .file_name() 1397 .map(|s| s.to_string_lossy().to_string()) 1398 .unwrap_or_else(|| cli.file.display().to_string()); 1399 1400 // --info mode 1401 if cli.info { 1402 let mut out = formatter::format_header(&file_name, &file_info); 1403 out.push_str(&format!("Format: {}\n", metadata::format_name(fmt))); 1404 if !file_info.sheets.is_empty() { 1405 for sheet in &file_info.sheets { 1406 out.push_str(&format!(" {}: {} rows x {} cols\n", sheet.name, sheet.rows, sheet.cols)); 1407 } 1408 } 1409 print!("{out}"); 1410 return Ok(()); 1411 } 1412 1413 // Build read options 1414 let read_opts = ReadOptions { 1415 sheet: cli.sheet.clone(), 1416 skip_rows: cli.skip, 1417 separator: None, 1418 }; 1419 1420 // For Excel with multiple sheets and no --sheet flag: list sheets 1421 if fmt == format::Format::Excel && file_info.sheets.len() > 1 && cli.sheet.is_none() { 1422 let has_row_flags = cli.head.is_some() || cli.tail.is_some() || cli.csv; 1423 if has_row_flags { 1424 return Err(ArgError( 1425 "Multiple sheets found. Use --sheet <name> to select one.".into(), 1426 ).into()); 1427 } 1428 1429 // List all sheets with schemas 1430 let mut out = formatter::format_header(&file_name, &file_info); 1431 out.push('\n'); 1432 for sheet in &file_info.sheets { 1433 let opts = ReadOptions { sheet: Some(sheet.name.clone()), ..read_opts.clone() }; 1434 let df = reader::read_file(&cli.file, fmt, &opts)?; 1435 if sheet.rows == 0 && sheet.cols == 0 { 1436 out.push_str(&formatter::format_empty_sheet(sheet)); 1437 } else { 1438 out.push_str(&formatter::format_schema(sheet, &df)); 1439 } 1440 out.push('\n'); 1441 } 1442 out.push_str("Use --sheet <name> to view a specific sheet.\n"); 1443 print!("{out}"); 1444 return Ok(()); 1445 } 1446 1447 // Read the data 1448 let df = reader::read_file(&cli.file, fmt, &read_opts)?; 1449 1450 // Build a SheetInfo for display 1451 let sheet_info = if let Some(si) = file_info.sheets.first() { 1452 si.clone() 1453 } else { 1454 SheetInfo { 1455 name: file_name.clone(), 1456 rows: df.height() + 1, // +1 for header 1457 cols: df.width(), 1458 } 1459 }; 1460 1461 // Render output 1462 render_output(cli, &file_name, &file_info, &sheet_info, &df) 1463 } 1464 1465 fn render_output( 1466 cli: &Cli, 1467 file_name: &str, 1468 file_info: &metadata::FileInfo, 1469 sheet_info: &SheetInfo, 1470 df: &DataFrame, 1471 ) -> Result<()> { 1472 if cli.csv { 1473 let selected = select_rows(cli, df); 1474 print!("{}", formatter::format_csv(&selected)); 1475 return Ok(()); 1476 } 1477 1478 let mut out = formatter::format_header(file_name, file_info); 1479 out.push('\n'); 1480 1481 if df.height() == 0 { 1482 out.push_str(&formatter::format_schema(sheet_info, df)); 1483 out.push_str("\n(no data rows)\n"); 1484 print!("{out}"); 1485 return Ok(()); 1486 } 1487 1488 if cli.schema { 1489 out.push_str(&formatter::format_schema(sheet_info, df)); 1490 } else if cli.describe { 1491 out.push_str(&formatter::format_schema(sheet_info, df)); 1492 out.push_str(&formatter::format_describe(df)); 1493 } else { 1494 out.push_str(&formatter::format_schema(sheet_info, df)); 1495 out.push('\n'); 1496 out.push_str(&format_data_selection(cli, df)); 1497 } 1498 1499 print!("{out}"); 1500 Ok(()) 1501 } 1502 1503 fn format_data_selection(cli: &Cli, df: &DataFrame) -> String { 1504 let total = df.height(); 1505 1506 if cli.head.is_some() || cli.tail.is_some() { 1507 let head_n = cli.head.unwrap_or(0); 1508 let tail_n = cli.tail.unwrap_or(0); 1509 if head_n + tail_n >= total || (head_n == 0 && tail_n == 0) { 1510 return formatter::format_data_table(df); 1511 } 1512 if cli.tail.is_none() { 1513 return formatter::format_data_table(&df.head(Some(head_n))); 1514 } 1515 if cli.head.is_none() { 1516 return formatter::format_data_table(&df.tail(Some(tail_n))); 1517 } 1518 return formatter::format_head_tail(df, head_n, tail_n); 1519 } 1520 1521 // Default: <=50 rows show all, >50 show head 25 + tail 25 1522 if total <= 50 { 1523 formatter::format_data_table(df) 1524 } else { 1525 formatter::format_head_tail(df, 25, 25) 1526 } 1527 } 1528 1529 fn select_rows(cli: &Cli, df: &DataFrame) -> DataFrame { 1530 let total = df.height(); 1531 1532 if cli.head.is_some() || cli.tail.is_some() { 1533 let head_n = cli.head.unwrap_or(0); 1534 let tail_n = cli.tail.unwrap_or(0); 1535 if head_n + tail_n >= total || (head_n == 0 && tail_n == 0) { 1536 return df.clone(); 1537 } 1538 if cli.tail.is_none() { 1539 return df.head(Some(head_n)); 1540 } 1541 if cli.head.is_none() { 1542 return df.tail(Some(tail_n)); 1543 } 1544 let head_df = df.head(Some(head_n)); 1545 let tail_df = df.tail(Some(tail_n)); 1546 return head_df.vstack(&tail_df).unwrap_or_else(|_| df.clone()); 1547 } 1548 1549 if total <= 50 { df.clone() } else { 1550 let h = df.head(Some(25)); 1551 let t = df.tail(Some(25)); 1552 h.vstack(&t).unwrap_or_else(|_| df.clone()) 1553 } 1554 } 1555 1556 fn main() { 1557 let cli = Cli::parse(); 1558 if let Err(err) = run(&cli) { 1559 if err.downcast_ref::<ArgError>().is_some() { 1560 eprintln!("dtcat: {err}"); 1561 process::exit(2); 1562 } 1563 eprintln!("dtcat: {err}"); 1564 process::exit(1); 1565 } 1566 } 1567 ``` 1568 1569 - [ ] **Step 2: Verify it compiles** 1570 1571 Run: `cargo build --bin dtcat 2>&1` 1572 Expected: compiles successfully 1573 1574 - [ ] **Step 3: Manual smoke test** 1575 1576 Create a quick test CSV and run dtcat on it: 1577 ```bash 1578 echo "name,value\nAlice,100\nBob,200" > /tmp/test.csv 1579 cargo run --bin dtcat -- /tmp/test.csv 1580 cargo run --bin dtcat -- /tmp/test.csv --schema 1581 cargo run --bin dtcat -- /tmp/test.csv --csv 1582 ``` 1583 1584 - [ ] **Step 4: Commit** 1585 1586 ```bash 1587 git add src/bin/dtcat.rs 1588 git commit -m "feat: add dtcat binary for viewing tabular data files" 1589 ``` 1590 1591 --- 1592 1593 ### Task 14: dtfilter Binary (`src/bin/dtfilter.rs`) 1594 1595 **Files:** 1596 - Create: `src/bin/dtfilter.rs` 1597 1598 - [ ] **Step 1: Implement dtfilter** 1599 1600 Adapt from xl-cli-tools `xlfilter.rs` (`/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/bin/xlfilter.rs`). Key changes: 1601 1602 1. Replace `xlcat::` imports with `dtcore::`. 1603 2. Add `--format` flag. 1604 3. Replace Excel-specific file reading with format detection + `reader::read_file`. 1605 4. Remove Excel-specific sheet resolution for non-Excel formats. 1606 5. Change `--cols` description to "Select columns by name" (no letter-based). 1607 1608 ```rust 1609 // src/bin/dtfilter.rs 1610 1611 use std::path::PathBuf; 1612 use std::process; 1613 1614 use anyhow::Result; 1615 use clap::Parser; 1616 1617 use dtcore::filter::{parse_filter_expr, parse_sort_spec, filter_pipeline, FilterOptions}; 1618 use dtcore::format; 1619 use dtcore::formatter; 1620 use dtcore::reader::{self, ReadOptions}; 1621 1622 #[derive(Parser)] 1623 #[command( 1624 name = "dtfilter", 1625 about = "Filter and query tabular data files", 1626 version 1627 )] 1628 struct Args { 1629 /// Path to data file 1630 file: PathBuf, 1631 1632 /// Override format detection 1633 #[arg(long)] 1634 format: Option<String>, 1635 1636 /// Select sheet (Excel only) 1637 #[arg(long)] 1638 sheet: Option<String>, 1639 1640 /// Skip first N rows 1641 #[arg(long)] 1642 skip: Option<usize>, 1643 1644 /// Select columns by name (comma-separated) 1645 #[arg(long)] 1646 columns: Option<String>, 1647 1648 /// Filter expressions (e.g., Amount>1000, Name~john) 1649 #[arg(long = "filter")] 1650 filters: Vec<String>, 1651 1652 /// Sort specification (e.g., Amount:desc) 1653 #[arg(long)] 1654 sort: Option<String>, 1655 1656 /// Max rows in output (applied after filter) 1657 #[arg(long)] 1658 limit: Option<usize>, 1659 1660 /// First N rows (applied before filter) 1661 #[arg(long)] 1662 head: Option<usize>, 1663 1664 /// Last N rows (applied before filter) 1665 #[arg(long)] 1666 tail: Option<usize>, 1667 1668 /// Output as CSV 1669 #[arg(long)] 1670 csv: bool, 1671 } 1672 1673 #[derive(Debug)] 1674 struct ArgError(String); 1675 impl std::fmt::Display for ArgError { 1676 fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { 1677 write!(f, "{}", self.0) 1678 } 1679 } 1680 impl std::error::Error for ArgError {} 1681 1682 fn run(args: Args) -> Result<()> { 1683 if !args.file.exists() { 1684 return Err(ArgError(format!("file not found: {}", args.file.display())).into()); 1685 } 1686 if args.head.is_some() && args.tail.is_some() { 1687 return Err(ArgError("--head and --tail are mutually exclusive".into()).into()); 1688 } 1689 1690 let fmt = format::detect_format(&args.file, args.format.as_deref())?; 1691 1692 let read_opts = ReadOptions { 1693 sheet: args.sheet, 1694 skip_rows: args.skip, 1695 separator: None, 1696 }; 1697 1698 let df = reader::read_file(&args.file, fmt, &read_opts)?; 1699 1700 if df.height() == 0 { 1701 eprintln!("0 rows"); 1702 println!("(no data rows)"); 1703 return Ok(()); 1704 } 1705 1706 // Parse filter expressions 1707 let filters: Vec<_> = args.filters 1708 .iter() 1709 .map(|s| parse_filter_expr(s)) 1710 .collect::<Result<Vec<_>, _>>() 1711 .map_err(|e| anyhow::anyhow!(ArgError(e)))?; 1712 1713 let sort = args.sort 1714 .as_deref() 1715 .map(parse_sort_spec) 1716 .transpose() 1717 .map_err(|e| anyhow::anyhow!(ArgError(e)))?; 1718 1719 let cols = args.columns.map(|s| { 1720 s.split(',').map(|c| c.trim().to_string()).collect::<Vec<_>>() 1721 }); 1722 1723 let opts = FilterOptions { 1724 filters, 1725 cols, 1726 sort, 1727 limit: args.limit, 1728 head: args.head, 1729 tail: args.tail, 1730 }; 1731 1732 let result = filter_pipeline(df, &opts)?; 1733 1734 eprintln!("{} rows", result.height()); 1735 1736 if result.height() == 0 { 1737 println!("{}", formatter::format_data_table(&result)); 1738 } else if args.csv { 1739 print!("{}", formatter::format_csv(&result)); 1740 } else { 1741 println!("{}", formatter::format_data_table(&result)); 1742 } 1743 1744 Ok(()) 1745 } 1746 1747 fn main() { 1748 let args = Args::parse(); 1749 if let Err(err) = run(args) { 1750 if err.downcast_ref::<ArgError>().is_some() { 1751 eprintln!("dtfilter: {err}"); 1752 process::exit(2); 1753 } 1754 eprintln!("dtfilter: {err}"); 1755 process::exit(1); 1756 } 1757 } 1758 ``` 1759 1760 - [ ] **Step 2: Verify it compiles** 1761 1762 Run: `cargo build --bin dtfilter 2>&1` 1763 Expected: compiles 1764 1765 - [ ] **Step 3: Commit** 1766 1767 ```bash 1768 git add src/bin/dtfilter.rs 1769 git commit -m "feat: add dtfilter binary for filtering tabular data files" 1770 ``` 1771 1772 --- 1773 1774 ### Task 15: dtdiff Binary (`src/bin/dtdiff.rs`) 1775 1776 **Files:** 1777 - Create: `src/bin/dtdiff.rs` 1778 1779 - [ ] **Step 1: Implement dtdiff** 1780 1781 Adapt from xl-cli-tools `xldiff.rs` (`/Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/bin/xldiff.rs`). Key changes: 1782 1783 1. Replace `xlcat::` imports with `dtcore::`. 1784 2. Add `--format` flag. 1785 3. **Same-format enforcement**: detect format of both files and error if they differ (Csv/Tsv are same family and allowed). 1786 4. Replace Excel-specific reading with format detection + `reader::read_file`. 1787 5. Remove letter-based column resolution in key/cols parsing (use name-only `resolve_column`). 1788 6. Port all output formatters (format_text, format_markdown, format_json, format_csv) and tests verbatim. 1789 1790 Exit codes: 0 = no differences, 1 = differences found, 2 = error. 1791 1792 ```rust 1793 // src/bin/dtdiff.rs 1794 // Adapted from xl-cli-tools xldiff.rs 1795 1796 use std::io::IsTerminal; 1797 use std::path::PathBuf; 1798 use std::process; 1799 1800 use anyhow::{Result, bail}; 1801 use clap::Parser; 1802 use serde_json::{Map, Value, json}; 1803 1804 use dtcore::diff::{DiffOptions, DiffResult, SheetSource}; 1805 use dtcore::format; 1806 use dtcore::formatter; 1807 use dtcore::reader::{self, ReadOptions}; 1808 1809 #[derive(Parser)] 1810 #[command( 1811 name = "dtdiff", 1812 about = "Compare two tabular data files and show differences", 1813 version 1814 )] 1815 struct Args { 1816 /// First file 1817 file_a: PathBuf, 1818 1819 /// Second file 1820 file_b: PathBuf, 1821 1822 /// Override format detection (both files must be this format) 1823 #[arg(long)] 1824 format: Option<String>, 1825 1826 /// Select sheet (Excel only) 1827 #[arg(long)] 1828 sheet: Option<String>, 1829 1830 /// Key column(s) for matched comparison (comma-separated names) 1831 #[arg(long)] 1832 key: Option<String>, 1833 1834 /// Numeric tolerance for float comparisons (default: 1e-10) 1835 #[arg(long, default_value = "1e-10")] 1836 tolerance: f64, 1837 1838 /// Output as JSON 1839 #[arg(long)] 1840 json: bool, 1841 1842 /// Output as CSV 1843 #[arg(long)] 1844 csv: bool, 1845 1846 /// Disable colored output 1847 #[arg(long)] 1848 no_color: bool, 1849 } 1850 1851 fn run(args: Args) -> Result<()> { 1852 if !args.file_a.exists() { 1853 bail!("file not found: {}", args.file_a.display()); 1854 } 1855 if !args.file_b.exists() { 1856 bail!("file not found: {}", args.file_b.display()); 1857 } 1858 1859 // Detect formats 1860 let fmt_a = format::detect_format(&args.file_a, args.format.as_deref())?; 1861 let fmt_b = format::detect_format(&args.file_b, args.format.as_deref())?; 1862 1863 // Same-format enforcement (Csv/Tsv are same family) 1864 if !fmt_a.same_family(&fmt_b) { 1865 bail!( 1866 "format mismatch: {} is {:?} but {} is {:?}. Both files must be the same format.", 1867 args.file_a.display(), fmt_a, 1868 args.file_b.display(), fmt_b, 1869 ); 1870 } 1871 1872 let read_opts = ReadOptions { 1873 sheet: args.sheet.clone(), 1874 skip_rows: None, 1875 separator: None, 1876 }; 1877 1878 let df_a = reader::read_file(&args.file_a, fmt_a, &read_opts)?; 1879 let df_b = reader::read_file(&args.file_b, fmt_b, &read_opts)?; 1880 1881 // Resolve key columns 1882 let key_columns: Vec<String> = if let Some(ref key_str) = args.key { 1883 key_str.split(',').map(|s| s.trim().to_string()).collect() 1884 } else { 1885 vec![] 1886 }; 1887 1888 let file_name_a = args.file_a.file_name() 1889 .map(|s| s.to_string_lossy().to_string()) 1890 .unwrap_or_else(|| args.file_a.display().to_string()); 1891 let file_name_b = args.file_b.file_name() 1892 .map(|s| s.to_string_lossy().to_string()) 1893 .unwrap_or_else(|| args.file_b.display().to_string()); 1894 1895 let source_a = SheetSource { 1896 file_name: file_name_a, 1897 sheet_name: args.sheet.clone().unwrap_or_else(|| "data".into()), 1898 }; 1899 let source_b = SheetSource { 1900 file_name: file_name_b, 1901 sheet_name: args.sheet.unwrap_or_else(|| "data".into()), 1902 }; 1903 1904 let opts = DiffOptions { 1905 key_columns, 1906 tolerance: Some(args.tolerance), 1907 }; 1908 1909 let result = dtcore::diff::diff_sheets(&df_a, &df_b, &opts, source_a, source_b)?; 1910 1911 let use_color = !args.no_color && std::io::stdout().is_terminal(); 1912 1913 // Format output 1914 let output = if args.json { 1915 format_json(&result) 1916 } else if args.csv { 1917 format_csv_output(&result) 1918 } else { 1919 format_text(&result, use_color) 1920 }; 1921 1922 print!("{}", output); 1923 1924 if result.has_differences() { 1925 process::exit(1); 1926 } 1927 1928 Ok(()) 1929 } 1930 1931 // Port format_text, format_json, format_csv_output (renamed from format_csv to avoid 1932 // collision with the flag) verbatim from xl-cli-tools xldiff.rs. 1933 // Include format_row_inline, csv_quote, csv_row helpers. 1934 1935 // [Full implementations copied from xl-cli-tools xldiff.rs - see source at 1936 // /Users/loulou/Dropbox/projects_claude/xl-cli-tool/src/bin/xldiff.rs lines 141-455] 1937 // The only rename: format_csv -> format_csv_output to avoid name collision. 1938 1939 fn main() { 1940 let args = Args::parse(); 1941 if let Err(err) = run(args) { 1942 eprintln!("dtdiff: {err}"); 1943 process::exit(2); 1944 } 1945 } 1946 ``` 1947 1948 The output formatter functions (`format_text`, `format_json`, `format_csv_output`, `format_row_inline`, `csv_quote`, `csv_row`) and their tests transfer verbatim from xldiff.rs lines 141-827. Copy them into dtdiff.rs. 1949 1950 - [ ] **Step 2: Verify it compiles** 1951 1952 Run: `cargo build --bin dtdiff 2>&1` 1953 Expected: compiles 1954 1955 - [ ] **Step 3: Commit** 1956 1957 ```bash 1958 git add src/bin/dtdiff.rs 1959 git commit -m "feat: add dtdiff binary for comparing tabular data files" 1960 ``` 1961 1962 --- 1963 1964 ### Task 16: Demo Fixtures and Integration Tests 1965 1966 **Files:** 1967 - Create: `demo/` fixture files 1968 - Create: `tests/integration/dtcat.rs` 1969 - Create: `tests/integration/dtfilter.rs` 1970 - Create: `tests/integration/dtdiff.rs` 1971 1972 - [ ] **Step 1: Create demo fixture files** 1973 1974 Create small test files in `demo/`: 1975 1976 ```bash 1977 # demo/sample.csv 1978 echo 'name,value,category 1979 Alice,100,A 1980 Bob,200,B 1981 Charlie,300,A 1982 Diana,400,B 1983 Eve,500,A' > demo/sample.csv 1984 1985 # demo/sample.tsv 1986 printf 'name\tvalue\ncategory\nAlice\t100\tA\nBob\t200\tB\n' > demo/sample.tsv 1987 ``` 1988 1989 Also create Parquet and Arrow fixtures programmatically in a test helper, or via a small Rust script. 1990 1991 - [ ] **Step 2: Write dtcat integration tests** 1992 1993 ```rust 1994 // tests/integration/dtcat.rs 1995 1996 use assert_cmd::Command; 1997 use predicates::prelude::*; 1998 use std::io::Write; 1999 use tempfile::NamedTempFile; 2000 2001 fn dtcat() -> Command { 2002 Command::cargo_bin("dtcat").unwrap() 2003 } 2004 2005 fn csv_file(content: &str) -> NamedTempFile { 2006 let mut f = NamedTempFile::with_suffix(".csv").unwrap(); 2007 write!(f, "{}", content).unwrap(); 2008 f.flush().unwrap(); 2009 f 2010 } 2011 2012 #[test] 2013 fn shows_csv_data() { 2014 let f = csv_file("name,value\nAlice,100\nBob,200\n"); 2015 dtcat() 2016 .arg(f.path()) 2017 .assert() 2018 .success() 2019 .stdout(predicate::str::contains("Alice")) 2020 .stdout(predicate::str::contains("Bob")); 2021 } 2022 2023 #[test] 2024 fn schema_flag() { 2025 let f = csv_file("name,value\nAlice,100\n"); 2026 dtcat() 2027 .arg(f.path()) 2028 .arg("--schema") 2029 .assert() 2030 .success() 2031 .stdout(predicate::str::contains("Column")) 2032 .stdout(predicate::str::contains("Type")); 2033 } 2034 2035 #[test] 2036 fn csv_output_flag() { 2037 let f = csv_file("name,value\nAlice,100\n"); 2038 dtcat() 2039 .arg(f.path()) 2040 .arg("--csv") 2041 .assert() 2042 .success() 2043 .stdout(predicate::str::contains("name,value")); 2044 } 2045 2046 #[test] 2047 fn head_flag() { 2048 let f = csv_file("x\n1\n2\n3\n4\n5\n"); 2049 dtcat() 2050 .arg(f.path()) 2051 .arg("--head") 2052 .arg("2") 2053 .assert() 2054 .success() 2055 .stdout(predicate::str::contains("1")) 2056 .stdout(predicate::str::contains("2")); 2057 } 2058 2059 #[test] 2060 fn nonexistent_file_exits_1() { 2061 dtcat() 2062 .arg("/tmp/does_not_exist.csv") 2063 .assert() 2064 .failure(); 2065 } 2066 2067 #[test] 2068 fn format_override() { 2069 // A .txt file read as CSV 2070 let mut f = NamedTempFile::with_suffix(".txt").unwrap(); 2071 write!(f, "a,b\n1,2\n").unwrap(); 2072 f.flush().unwrap(); 2073 2074 dtcat() 2075 .arg(f.path()) 2076 .arg("--format") 2077 .arg("csv") 2078 .assert() 2079 .success() 2080 .stdout(predicate::str::contains("1")); 2081 } 2082 ``` 2083 2084 - [ ] **Step 3: Write dtfilter integration tests** 2085 2086 ```rust 2087 // tests/integration/dtfilter.rs 2088 2089 use assert_cmd::Command; 2090 use predicates::prelude::*; 2091 use std::io::Write; 2092 use tempfile::NamedTempFile; 2093 2094 fn dtfilter() -> Command { 2095 Command::cargo_bin("dtfilter").unwrap() 2096 } 2097 2098 fn csv_file(content: &str) -> NamedTempFile { 2099 let mut f = NamedTempFile::with_suffix(".csv").unwrap(); 2100 write!(f, "{}", content).unwrap(); 2101 f.flush().unwrap(); 2102 f 2103 } 2104 2105 #[test] 2106 fn filter_eq() { 2107 let f = csv_file("name,value\nAlice,100\nBob,200\n"); 2108 dtfilter() 2109 .arg(f.path()) 2110 .arg("--filter") 2111 .arg("name=Alice") 2112 .assert() 2113 .success() 2114 .stdout(predicate::str::contains("Alice")) 2115 .stdout(predicate::str::contains("Bob").not()); 2116 } 2117 2118 #[test] 2119 fn filter_gt() { 2120 let f = csv_file("name,value\nAlice,100\nBob,200\nCharlie,300\n"); 2121 dtfilter() 2122 .arg(f.path()) 2123 .arg("--filter") 2124 .arg("value>150") 2125 .assert() 2126 .success() 2127 .stdout(predicate::str::contains("Bob")) 2128 .stdout(predicate::str::contains("Charlie")); 2129 } 2130 2131 #[test] 2132 fn sort_desc() { 2133 let f = csv_file("name,value\nAlice,100\nBob,200\n"); 2134 dtfilter() 2135 .arg(f.path()) 2136 .arg("--sort") 2137 .arg("value:desc") 2138 .assert() 2139 .success(); 2140 } 2141 2142 #[test] 2143 fn columns_select() { 2144 let f = csv_file("name,value,extra\nAlice,100,x\n"); 2145 dtfilter() 2146 .arg(f.path()) 2147 .arg("--columns") 2148 .arg("name,value") 2149 .assert() 2150 .success() 2151 .stdout(predicate::str::contains("name")) 2152 .stdout(predicate::str::contains("extra").not()); 2153 } 2154 2155 #[test] 2156 fn csv_output() { 2157 let f = csv_file("name,value\nAlice,100\n"); 2158 dtfilter() 2159 .arg(f.path()) 2160 .arg("--csv") 2161 .assert() 2162 .success() 2163 .stdout(predicate::str::contains("name,value")); 2164 } 2165 ``` 2166 2167 - [ ] **Step 4: Write dtdiff integration tests** 2168 2169 ```rust 2170 // tests/integration/dtdiff.rs 2171 2172 use assert_cmd::Command; 2173 use predicates::prelude::*; 2174 use std::io::Write; 2175 use tempfile::NamedTempFile; 2176 2177 fn dtdiff() -> Command { 2178 Command::cargo_bin("dtdiff").unwrap() 2179 } 2180 2181 fn csv_file(content: &str) -> NamedTempFile { 2182 let mut f = NamedTempFile::with_suffix(".csv").unwrap(); 2183 write!(f, "{}", content).unwrap(); 2184 f.flush().unwrap(); 2185 f 2186 } 2187 2188 #[test] 2189 fn no_diff_exits_0() { 2190 let a = csv_file("name,value\nAlice,100\n"); 2191 let b = csv_file("name,value\nAlice,100\n"); 2192 dtdiff() 2193 .arg(a.path()) 2194 .arg(b.path()) 2195 .assert() 2196 .success() 2197 .stdout(predicate::str::contains("No differences")); 2198 } 2199 2200 #[test] 2201 fn diff_exits_1() { 2202 let a = csv_file("name,value\nAlice,100\n"); 2203 let b = csv_file("name,value\nBob,200\n"); 2204 dtdiff() 2205 .arg(a.path()) 2206 .arg(b.path()) 2207 .assert() 2208 .code(1); 2209 } 2210 2211 #[test] 2212 fn keyed_diff() { 2213 let a = csv_file("id,name\n1,Alice\n2,Bob\n"); 2214 let b = csv_file("id,name\n1,Alice\n2,Robert\n"); 2215 dtdiff() 2216 .arg(a.path()) 2217 .arg(b.path()) 2218 .arg("--key") 2219 .arg("id") 2220 .assert() 2221 .code(1) 2222 .stdout(predicate::str::contains("Bob").or(predicate::str::contains("Robert"))); 2223 } 2224 2225 #[test] 2226 fn json_output() { 2227 let a = csv_file("id,val\n1,a\n"); 2228 let b = csv_file("id,val\n1,b\n"); 2229 dtdiff() 2230 .arg(a.path()) 2231 .arg(b.path()) 2232 .arg("--key") 2233 .arg("id") 2234 .arg("--json") 2235 .assert() 2236 .code(1) 2237 .stdout(predicate::str::contains("\"modified\"")); 2238 } 2239 2240 #[test] 2241 fn format_mismatch_exits_2() { 2242 let csv = csv_file("a,b\n1,2\n"); 2243 // Create a file with .parquet extension but CSV content - format detection 2244 // will see it as parquet by extension, creating a mismatch 2245 let mut pq = NamedTempFile::with_suffix(".parquet").unwrap(); 2246 write!(pq, "a,b\n1,2\n").unwrap(); 2247 pq.flush().unwrap(); 2248 // This should fail because formats differ (or parquet reader fails on CSV content) 2249 dtdiff() 2250 .arg(csv.path()) 2251 .arg(pq.path()) 2252 .assert() 2253 .failure(); 2254 } 2255 ``` 2256 2257 - [ ] **Step 5: Run all integration tests** 2258 2259 Run: `cargo test --test '*' 2>&1` 2260 Expected: all integration tests PASS 2261 2262 - [ ] **Step 6: Commit** 2263 2264 ```bash 2265 git add demo/ tests/ 2266 git commit -m "feat: add demo fixtures and integration tests for all binaries" 2267 ``` 2268 2269 --- 2270 2271 ### Task 17: Final Verification 2272 2273 - [ ] **Step 1: Run full test suite** 2274 2275 Run: `cargo test 2>&1` 2276 Expected: all unit tests and integration tests PASS 2277 2278 - [ ] **Step 2: Run clippy** 2279 2280 Run: `cargo clippy 2>&1` 2281 Expected: no errors (warnings acceptable) 2282 2283 - [ ] **Step 3: Build release binaries** 2284 2285 Run: `cargo build --release 2>&1` 2286 Expected: builds successfully, produces `dtcat`, `dtfilter`, `dtdiff` in `target/release/` 2287 2288 - [ ] **Step 4: Smoke test all binaries** 2289 2290 ```bash 2291 echo "name,value\nAlice,100\nBob,200" > /tmp/dt_test.csv 2292 ./target/release/dtcat /tmp/dt_test.csv 2293 ./target/release/dtcat /tmp/dt_test.csv --schema 2294 ./target/release/dtcat /tmp/dt_test.csv --describe 2295 ./target/release/dtfilter /tmp/dt_test.csv --filter "value>100" 2296 echo "name,value\nAlice,100\nCharlie,300" > /tmp/dt_test2.csv 2297 ./target/release/dtdiff /tmp/dt_test.csv /tmp/dt_test2.csv 2298 ``` 2299 2300 - [ ] **Step 5: Final commit** 2301 2302 ```bash 2303 git add -A 2304 git commit -m "chore: final cleanup and verification for v0.1" 2305 ```