TigerFetch.jl

Download TIGER/Line shapefiles from the US Census Bureau
Log | Files | Refs | README | LICENSE

commit 9ac6f4400be14cd36eee7a7e8b467275d8ef7e10
parent 30075e42953d9c96d3fdbe2371871e9842cac057
Author: Erik Loualiche <eloualiche@users.noreply.github.com>
Date:   Tue, 24 Mar 2026 22:36:42 -0500

Merge pull request #6 from LouLouLibs/claude/robust-validation

Robust validation, retry logic, docs, and v0.3.0
Diffstat:
M.gitignore | 11++++++++++-
MProject.toml | 2+-
Mdocs/src/man/cli.md | 80+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mdocs/src/man/julia.md | 115++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
Msrc/TigerFetch.jl | 3++-
Msrc/artifacts.jl | 2++
Msrc/cli.jl | 3++-
Msrc/download.jl | 54++++++++++++++++++++++++++++++++++++++++++++++++------
Msrc/geotypes.jl | 2++
Msrc/main.jl | 3+++
Msrc/reference.jl | 49+++++++++++++++++++++++++++++++++----------------
Atest/TestRoutines/validation.jl | 229+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtest/UnitTests/assets.jl | 2++
Mtest/UnitTests/downloads.jl | 93+++++++++++++++++++++++++++++++++++--------------------------------------------
Mtest/runtests.jl | 3+++
15 files changed, 572 insertions(+), 79 deletions(-)

diff --git a/.gitignore b/.gitignore @@ -2,7 +2,6 @@ **/Manifest.toml .DS_Store tmp -setup_readme.md **/sandbox.* assets @@ -13,4 +12,14 @@ assets # doc docs/build/ docs/node_modules + +setup_readme.md +# -------------------------------------------------------------------------------------------------- + + +# -------------------------------------------------------------------------------------------------- +# AI +.claude +claude.md +todo.md # -------------------------------------------------------------------------------------------------- diff --git a/Project.toml b/Project.toml @@ -1,7 +1,7 @@ name = "TigerFetch" uuid = "59408d0a-a903-460e-b817-a5586a16527e" authors = ["Erik Loualiche"] -version = "0.2.0" +version = "0.3.0" [deps] Comonicon = "863f3e99-da2a-4334-8734-de3dacbe5542" diff --git a/docs/src/man/cli.md b/docs/src/man/cli.md @@ -1 +1,81 @@ # Command line interface + +TigerFetch provides a `tigerfetch` command that downloads TIGER/Line shapefiles directly from your terminal. + +## Installation + +You need a working Julia installation. Then install the CLI: + +```bash +julia -e 'using Pkg; Pkg.add("TigerFetch"); using TigerFetch; TigerFetch.comonicon_install()' +``` + +The binary is placed at `~/.julia/bin/tigerfetch`. Make sure `~/.julia/bin` is on your `PATH`. + +## Usage + +```bash +tigerfetch <type> [year] [--state STATE] [--county COUNTY] [--output DIR] [--force] +``` + +### Arguments + +| Argument | Description | Default | +|----------|-------------|---------| +| `type` | Geography type (see table below) | required | +| `year` | Data year | 2024 | + +### Options + +| Option | Description | Default | +|--------|-------------|---------| +| `--state` | State name, abbreviation, or FIPS code | all states | +| `--county` | County name or FIPS code (requires `--state`) | all counties | +| `--output` | Output directory | current directory | +| `--force` | Re-download existing files | false | + +### Geography types + +| Type | Scope | Description | +|------|-------|-------------| +| `state` | national | State boundaries | +| `county` | national | County boundaries | +| `cbsa` | national | Core Based Statistical Areas | +| `urbanarea` | national | Urban Areas | +| `zipcode` | national | ZIP Code Tabulation Areas | +| `metrodivision` | national | Metropolitan Divisions | +| `primaryroads` | national | Primary roads | +| `rails` | national | Railroads | +| `cousub` | state | County subdivisions | +| `tract` | state | Census tracts | +| `place` | state | Places | +| `consolidatedcity` | state | Consolidated cities | +| `primarysecondaryroads` | state | Primary and secondary roads | +| `areawater` | county | Area hydrography | +| `linearwater` | county | Linear hydrography | +| `road` | county | Roads | + +### Examples + +```bash +# Download state boundaries +tigerfetch state --output tmp + +# Download county subdivisions for Illinois +tigerfetch cousub --state IL --output tmp + +# Download area water for all of Minnesota +tigerfetch areawater --state "Minnesota" --output tmp + +# Download roads for a single county +tigerfetch road --state "MN" --county "Hennepin" --output tmp + +# Force re-download of 2023 data +tigerfetch county 2023 --output tmp --force +``` + +### Scope behavior + +- **National** types (`state`, `county`, etc.) download a single file covering the entire US. The `--state` and `--county` flags are ignored. +- **State** types (`cousub`, `tract`, etc.) download one file per state. Omitting `--state` downloads all states. +- **County** types (`areawater`, `road`, etc.) download one file per county. Omitting `--county` downloads all counties for the given state(s). diff --git a/docs/src/man/julia.md b/docs/src/man/julia.md @@ -1 +1,114 @@ -# Using the function from within julia +# Using TigerFetch from Julia + +The main entry point is the [`tigerdownload`](@ref) function. + +## Installation + +`TigerFetch.jl` is registered in the [`loulouJL`](https://github.com/LouLouLibs/loulouJL) registry: + +```julia +using Pkg, LocalRegistry +pkg"registry add https://github.com/LouLouLibs/loulouJL.git" +Pkg.add("TigerFetch") +``` + +Or install directly from GitHub: +```julia +import Pkg; Pkg.add(url="https://github.com/louloulibs/TigerFetch.jl") +``` + +## Basic usage + +```julia +using TigerFetch + +# Download state boundaries (national file) +tigerdownload("state") + +# Download county subdivisions for California +tigerdownload("cousub"; state="CA") + +# Download census tracts for a specific state using FIPS code +tigerdownload("tract"; state="27", output="/path/to/data") + +# Download area water for a specific county +tigerdownload("areawater"; state="MN", county="Hennepin", output="tmp") + +# Force re-download of existing files +tigerdownload("county"; output="tmp", force=true) +``` + +## Geography scopes + +Geographies are organized into three scopes that determine how files are downloaded: + +### National scope +A single file covers the entire US. The `state` and `county` arguments are ignored. + +```julia +tigerdownload("state") # one file: tl_2024_us_state.zip +tigerdownload("county") # one file: tl_2024_us_county.zip +tigerdownload("cbsa") # Core Based Statistical Areas +tigerdownload("urbanarea") # Urban Areas +tigerdownload("zipcode") # ZIP Code Tabulation Areas +tigerdownload("metrodivision") # Metropolitan Divisions +tigerdownload("primaryroads") # Primary roads +tigerdownload("rails") # Railroads +``` + +### State scope +One file per state. Omit `state` to download all states. + +```julia +tigerdownload("cousub"; state="IL") # County subdivisions +tigerdownload("tract"; state="CA") # Census tracts +tigerdownload("place"; state="27") # Places (using FIPS) +tigerdownload("primarysecondaryroads"; state="MN") # Primary & secondary roads +tigerdownload("consolidatedcity"; state="KS") # Consolidated cities +``` + +### County scope +One file per county. Requires `state`; omit `county` to download all counties in the state. + +```julia +tigerdownload("areawater"; state="MN", county="Hennepin") # Area hydrography +tigerdownload("linearwater"; state="MI") # All MI counties +tigerdownload("road"; state="MN", county="Hennepin") # Roads +``` + +## State and county identifiers + +States can be specified by name, abbreviation, or FIPS code: +```julia +tigerdownload("tract"; state="Minnesota") # full name +tigerdownload("tract"; state="MN") # abbreviation +tigerdownload("tract"; state="27") # FIPS code +``` + +Counties can be specified by name or FIPS code (common suffixes like "County" or "Parish" are stripped automatically): +```julia +tigerdownload("road"; state="MN", county="Hennepin") # name +tigerdownload("road"; state="MN", county="Hennepin County") # also works +tigerdownload("road"; state="MN", county="053") # FIPS code +``` + +## Options + +| Keyword | Type | Default | Description | +|---------|------|---------|-------------| +| `state` | `String` | `""` | State identifier | +| `county` | `String` | `""` | County identifier (requires `state`) | +| `output` | `String` | `pwd()` | Output directory | +| `force` | `Bool` | `false` | Re-download existing files | +| `verbose` | `Bool` | `false` | Print detailed progress | + +## Working with downloaded files + +TigerFetch downloads ZIP archives containing shapefiles (`.shp`, `.dbf`, `.shx`, `.prj`, etc.). Use [Shapefile.jl](https://github.com/JuliaGeo/Shapefile.jl) to read them: + +```julia +using Shapefile, DataFrames +df = Shapefile.Table("tl_2024_us_county.zip") |> DataFrame +``` + +See the [demo](@ref "Drawing a simple map") for a complete mapping example with CairoMakie. diff --git a/src/TigerFetch.jl b/src/TigerFetch.jl @@ -1,5 +1,6 @@ - +# ABOUTME: Main module entry point for TigerFetch.jl +# ABOUTME: Loads submodules and exports the tigerdownload function for Census shapefile retrieval # -------------------------------------------------------------------------------------------------- module TigerFetch diff --git a/src/artifacts.jl b/src/artifacts.jl @@ -1,3 +1,5 @@ +# ABOUTME: Artifact management for bundled reference data (state/county FIPS lists) +# ABOUTME: Handles artifact installation, path resolution, and reference data file access using Pkg.Artifacts """ diff --git a/src/cli.jl b/src/cli.jl @@ -1,4 +1,5 @@ - +# ABOUTME: Command-line interface for TigerFetch via Comonicon.jl +# ABOUTME: Provides the tigerfetch CLI command that wraps the tigerdownload Julia function # ------------------------------------------------------------------------------------------------- """ diff --git a/src/download.jl b/src/download.jl @@ -1,3 +1,5 @@ +# ABOUTME: Download logic for Census TIGER/Line shapefiles +# ABOUTME: Dispatches on geography scope (national, state, county) to construct URLs and manage downloads # -------------------------------------------------------------------------------------------------- macro conditional_log(verbose, level, message, params...) @@ -13,6 +15,37 @@ end # -------------------------------------------------------------------------------------------------- +const MAX_RETRIES = 3 +const RETRY_BASE_DELAY = 2 # seconds + +""" + download_with_retry(url, output_path; max_retries=MAX_RETRIES) + +Download a file with exponential backoff retry on transient failures. +Returns the output path on success, rethrows on persistent failure. +""" +function download_with_retry(url::String, output_path::String; max_retries::Int=MAX_RETRIES) + for attempt in 1:max_retries + try + Downloads.download(url, output_path) + return output_path + catch e + e isa InterruptException && rethrow(e) + if attempt == max_retries + rethrow(e) + end + delay = RETRY_BASE_DELAY * 2^(attempt - 1) + @warn "Download attempt $attempt/$max_retries failed, retrying in $(delay)s" url=url + sleep(delay) + # Clean up partial download before retry + isfile(output_path) && rm(output_path; force=true) + end + end +end +# -------------------------------------------------------------------------------------------------- + + +# -------------------------------------------------------------------------------------------------- # National scope (States, Counties nationally) function download_shapefile( geo::T; @@ -34,7 +67,7 @@ function download_shapefile( try @conditional_log verbose "Downloading $(description(geo_type))" url=url mkpath(output_dir) - Downloads.download(url, output_path) + download_with_retry(url, output_path) return output_path catch e @error "Download failed" exception=e @@ -73,11 +106,14 @@ function download_shapefile( geo_type = typeof(geo) base_url = "https://www2.census.gov/geo/tiger/TIGER$(geo.year)/$(tiger_name(geo_type))/" + n_states = length(states_to_process) + try # Process each state with total interrupt by user ... - for state_info in states_to_process + for (i, state_info) in enumerate(states_to_process) fips = state_info[2] state_name = state_info[3] + n_states > 1 && @info "[$i/$n_states] $(state_name)" filename = "tl_$(geo.year)_$(fips)_$(lowercase(tiger_name(T))).zip" url = base_url * filename output_path = joinpath(output_dir, filename) @@ -89,7 +125,7 @@ function download_shapefile( try @conditional_log verbose "Downloading" state=state_name url=url - Downloads.download(url, output_path) + download_with_retry(url, output_path) catch e if e isa InterruptException # Re-throw interrupt to be caught by outer try block @@ -143,7 +179,9 @@ function download_shapefile( # Track failures failed_downloads = String[] - for state_info in states_to_process + n_states = length(states_to_process) + + for (si, state_info) in enumerate(states_to_process) state_fips = state_info[2] state_name = state_info[3] @@ -159,9 +197,13 @@ function download_shapefile( counties = [county_info] end - for county_info in counties + n_counties = length(counties) + state_label = n_states > 1 ? "[state $si/$n_states] " : "" + + for (ci, county_info) in enumerate(counties) county_fips = county_info[3] # Assuming similar structure to state_info county_name = county_info[4] + n_counties > 1 && @info "$(state_label)$(state_name): [$ci/$n_counties] $(county_name)" filename = "tl_$(geo.year)_$(state_fips)$(county_fips)_$(lowercase(tiger_name(geo))).zip" url = "https://www2.census.gov/geo/tiger/TIGER$(geo.year)/$(tiger_name(geo))/" * filename @@ -175,7 +217,7 @@ function download_shapefile( try @conditional_log verbose "Downloading" state=state_name county=county_name url=url mkpath(output_dir) - Downloads.download(url, output_path) + download_with_retry(url, output_path) catch e push!(failed_downloads, "$(state_name) - $(county_name)") @error "Download failed" state=state_name county=county_name exception=e diff --git a/src/geotypes.jl b/src/geotypes.jl @@ -1,3 +1,5 @@ +# ABOUTME: Type system for Census TIGER/Line geographies (state, county, tract, etc.) +# ABOUTME: Defines abstract hierarchy (National/State/County) and concrete types with metadata # -------------------------------------------------------------------------------------------------- # Abstract base type diff --git a/src/main.jl b/src/main.jl @@ -1,3 +1,6 @@ +# ABOUTME: Main user-facing tigerdownload function and geography type registry +# ABOUTME: Validates inputs, creates geography instances, and dispatches to download methods + # -------------------------------------------------------------------------------------------------- const GEOGRAPHY_TYPES = Dict( "state" => State, diff --git a/src/reference.jl b/src/reference.jl @@ -1,14 +1,37 @@ -function get_state_list()::Vector{Vector{String}} +# ABOUTME: State and county reference data lookup from bundled FIPS lists +# ABOUTME: Provides standardization of state/county identifiers (name, abbreviation, FIPS code) + +# Module-level cache for parsed reference data +const _STATE_LIST_CACHE = Ref{Vector{Vector{String}}}() +const _COUNTY_LIST_CACHE = Ref{Vector{Vector{AbstractString}}}() +const _CACHE_INITIALIZED = Ref(false) + +function _ensure_cache() + _CACHE_INITIALIZED[] && return paths = get_reference_data() + + # Parse state list state_file = paths["state"] + _STATE_LIST_CACHE[] = readlines(state_file) |> + l -> split.(l, "|") |> + l -> map(s -> String.(s[ [1,2,4] ]), l) |> + l -> l[2:end] |> + unique + + # Parse county list + county_file = paths["county"] + _COUNTY_LIST_CACHE[] = readlines(county_file) |> + ( l -> split.(l, "|") ) |> + ( l -> map(s -> String.(s[ [1,2,3,5] ]), l) ) |> + ( l -> l[2:end] ) - # we do not need to load CSV so we read the file by hand - state_list = readlines(state_file) |> - l -> split.(l, "|") |> # split by vertical bar - l -> map(s -> String.(s[ [1,2,4] ]), l) |> # select some columns - l -> l[2:end] # remove the header + _CACHE_INITIALIZED[] = true + return +end - return unique(state_list) +function get_state_list()::Vector{Vector{String}} + _ensure_cache() + return _STATE_LIST_CACHE[] end # Takes a string input (handles names and abbreviations) @@ -36,20 +59,14 @@ standardize_state_input(::Nothing) = nothing # ------------------------------------------------------------------------------------------------- function get_county_list(state=nothing)::Vector{Vector{AbstractString}} - paths = get_reference_data() # Remove TigerFetch. prefix since we're inside the module - county_file = paths["county"] - - # we do not need to load CSV so we read the file by hand - county_list = readlines(county_file) |> - ( l -> split.(l, "|") ) |> # split by vertical bar - ( l -> map(s -> String.(s[ [1,2,3,5] ]), l) ) |> # select some columns - ( l -> l[2:end] ) # remove the header + _ensure_cache() + county_list = _COUNTY_LIST_CACHE[] if isnothing(state) return county_list elseif !isnothing(tryparse(Int, state)) # then its the fips return unique(filter(l -> l[2] == state, county_list)) - else # then its the abbreviation state name + else # then its the abbreviation state name return unique(filter(l -> l[1] == state, county_list)) end diff --git a/test/TestRoutines/validation.jl b/test/TestRoutines/validation.jl @@ -0,0 +1,228 @@ +# ABOUTME: Census shapefile validation utilities for robust testing +# ABOUTME: Provides functions to validate downloaded Census files without hardcoded hashes + +""" + ToleranceParams + +Parameters defining acceptable ranges for Census file validation. +""" +struct ToleranceParams + min_size_mb::Float64 + max_size_mb::Float64 + min_features::Union{Int, Nothing} + max_features::Union{Int, Nothing} + geographic_bounds::Union{NamedTuple, Nothing} +end + +""" + validate_file_basics(filepath::String, tolerance::ToleranceParams) -> Bool + +Basic file validation: existence, ZIP format, and reasonable size. +""" +function validate_file_basics(filepath::String, tolerance::ToleranceParams) + # File must exist + !isfile(filepath) && return false + + # Must be a ZIP file + !endswith(filepath, ".zip") && return false + + # Size check + size_mb = stat(filepath).size / (1024^2) + if size_mb < tolerance.min_size_mb || size_mb > tolerance.max_size_mb + @warn "File size $(round(size_mb, digits=2))MB outside expected range [$(tolerance.min_size_mb), $(tolerance.max_size_mb)]MB" + return false + end + + return true +end + +""" + validate_zip_structure(filepath::String) -> Bool + +Validate ZIP file can be opened and contains expected shapefile components. +""" +function validate_zip_structure(filepath::String) + try + # Try to read ZIP file structure using unzip -t (test archive) + result = run(pipeline(`unzip -t $filepath`, stdout=devnull, stderr=devnull)) + + # Get file listing + zip_contents = readchomp(`unzip -l $filepath`) + + # Check for essential shapefile components + has_shp = occursin(r"\.shp$"m, zip_contents) + has_dbf = occursin(r"\.dbf$"m, zip_contents) + has_shx = occursin(r"\.shx$"m, zip_contents) + + if !has_shp || !has_dbf || !has_shx + @warn "Missing essential shapefile components: .shp=$has_shp, .dbf=$has_dbf, .shx=$has_shx" + return false + end + + return true + catch e + @warn "Failed to validate ZIP structure: $e" + return false + end +end + +""" + validate_census_filename(filepath::String, expected_pattern::Regex) -> Bool + +Validate Census filename follows expected TIGER/Line naming convention. +""" +function validate_census_filename(filepath::String, expected_pattern::Regex) + filename = basename(filepath) + if !occursin(expected_pattern, filename) + @warn "Filename '$filename' doesn't match expected pattern $expected_pattern" + return false + end + return true +end + +""" + count_dbf_records(filepath::String) -> Union{Int, Nothing} + +Extract the record count from the .dbf header inside a shapefile ZIP. +The DBF header stores the record count as a UInt32 at bytes 4-7 (little-endian). +""" +function count_dbf_records(filepath::String) + try + # Find the .dbf filename inside the ZIP + zip_listing = readchomp(`unzip -l $filepath`) + dbf_match = match(r"(\S+\.dbf)$"m, zip_listing) + isnothing(dbf_match) && return nothing + + dbf_name = dbf_match.captures[1] + + # Extract just the .dbf to a temp dir and read its header + tmp = mktempdir() + run(pipeline(`unzip -j -o $filepath $dbf_name -d $tmp`, stdout=devnull, stderr=devnull)) + dbf_path = joinpath(tmp, basename(dbf_name)) + + header = open(dbf_path) do io + read(io, 8) + end + rm(tmp; recursive=true, force=true) + + length(header) < 8 && return nothing + + # Record count is at bytes 4-7 (0-indexed), little-endian UInt32 + n_records = reinterpret(UInt32, header[5:8])[1] + return Int(n_records) + catch e + @warn "Could not read DBF record count: $e" + return nothing + end +end + +""" + validate_feature_count(filepath::String, tolerance::ToleranceParams) -> Bool + +Validate that the number of features (DBF records) falls within the expected range. +Skips validation if tolerance bounds are nothing. +""" +function validate_feature_count(filepath::String, tolerance::ToleranceParams) + isnothing(tolerance.min_features) && isnothing(tolerance.max_features) && return true + + n_features = count_dbf_records(filepath) + if isnothing(n_features) + @warn "Could not determine feature count, skipping validation" + return true + end + + if !isnothing(tolerance.min_features) && n_features < tolerance.min_features + @warn "Feature count $n_features below minimum $(tolerance.min_features)" + return false + end + if !isnothing(tolerance.max_features) && n_features > tolerance.max_features + @warn "Feature count $n_features above maximum $(tolerance.max_features)" + return false + end + + @info "Feature count: $n_features (expected [$(tolerance.min_features), $(tolerance.max_features)])" + return true +end + +""" + validate_census_file_integrity(filepath::String, file_type::String, tolerance::ToleranceParams) -> Bool + +Comprehensive validation of a Census shapefile using the hybrid approach. +""" +function validate_census_file_integrity(filepath::String, file_type::String, tolerance::ToleranceParams) + @info "Validating $file_type file: $(basename(filepath))" + + # 1. Basic file validation + if !validate_file_basics(filepath, tolerance) + @error "Basic file validation failed" + return false + end + + # 2. ZIP structure validation + if !validate_zip_structure(filepath) + @error "ZIP structure validation failed" + return false + end + + # 3. Feature count validation + if !validate_feature_count(filepath, tolerance) + @error "Feature count validation failed" + return false + end + + # 4. Filename pattern validation + expected_patterns = Dict( + "state" => r"tl_\d{4}_us_state\.zip$", + "county" => r"tl_\d{4}_us_county\.zip$", + "cbsa" => r"tl_\d{4}_us_cbsa\.zip$", + "urbanarea" => r"tl_\d{4}_us_uac20\.zip$", + "zipcode" => r"tl_\d{4}_us_zcta520\.zip$", + "metrodivision" => r"tl_\d{4}_us_metdiv\.zip$", + "rails" => r"tl_\d{4}_us_rails\.zip$", + "primaryroads" => r"tl_\d{4}_us_primaryroads\.zip$", + "cousub" => r"tl_\d{4}_\d{2}_cousub\.zip$", + "tract" => r"tl_\d{4}_\d{2}_tract\.zip$", + "place" => r"tl_\d{4}_\d{2}_place\.zip$", + "consolidatedcity" => r"tl_\d{4}_\d{2}_concity\.zip$", + "primarysecondaryroads" => r"tl_\d{4}_\d{2}_prisecroads\.zip$", + "areawater" => r"tl_\d{4}_\d{5}_areawater\.zip$", + "linearwater" => r"tl_\d{4}_\d{5}_linearwater\.zip$", + "road" => r"tl_\d{4}_\d{5}_roads\.zip$" + ) + + if haskey(expected_patterns, file_type) + if !validate_census_filename(filepath, expected_patterns[file_type]) + @error "Filename validation failed" + return false + end + end + + @info "✓ File validation passed for $file_type" + return true +end + +# Define tolerance parameters for different geography types +# Updated based on actual 2024 Census data observations +const TOLERANCE_PARAMS = Dict( + # National geographies + "state" => ToleranceParams(5.0, 50.0, 45, 65, nothing), # 56 in 2024 + "county" => ToleranceParams(75.0, 150.0, 2900, 3500, nothing), # 3235 in 2024 + "cbsa" => ToleranceParams(25.0, 60.0, 800, 1100, nothing), # 935 in 2024 + "urbanarea" => ToleranceParams(60.0, 100.0, 2200, 3200, nothing), # 2644 in 2024 + "zipcode" => ToleranceParams(400.0, 700.0, 28000, 38000, nothing), # ~33k ZIP codes + "metrodivision" => ToleranceParams(0.5, 5.0, 25, 50, nothing), # 37 in 2024 + "rails" => ToleranceParams(25.0, 150.0, nothing, nothing, nothing), # Rails was ~32MB + "primaryroads" => ToleranceParams(25.0, 150.0, nothing, nothing, nothing), # Primary roads was ~43MB + + # State geographies (examples for typical states) + "cousub" => ToleranceParams(5.0, 50.0, 50, 2000, nothing), # Varies widely by state + "tract" => ToleranceParams(5.0, 100.0, 200, 5000, nothing), # MN tract was ~7.15MB + "place" => ToleranceParams(2.0, 50.0, 100, 3000, nothing), # MN place was ~3.96MB + "consolidatedcity" => ToleranceParams(0.01, 10.0, 0, 50, nothing), # KS was ~0.02MB + "primarysecondaryroads" => ToleranceParams(2.0, 200.0, nothing, nothing, nothing), # MN was ~3.9MB + + # County geographies (examples for typical counties) + "areawater" => ToleranceParams(0.1, 50.0, 0, 5000, nothing), # Varies by geography + "linearwater" => ToleranceParams(0.01, 100.0, 0, 10000, nothing), # Varies widely by county + "road" => ToleranceParams(0.1, 200.0, 100, 50000, nothing) # Varies widely by county +)+ \ No newline at end of file diff --git a/test/UnitTests/assets.jl b/test/UnitTests/assets.jl @@ -1,3 +1,5 @@ +# ABOUTME: Tests for artifact installation and reference data integrity +# ABOUTME: Validates bundled FIPS state/county files, their structure, and data processing functions @testset "Asset Installation Tests" begin @testset "Artifact Configuration" begin diff --git a/test/UnitTests/downloads.jl b/test/UnitTests/downloads.jl @@ -1,3 +1,9 @@ +# ABOUTME: Download tests for all TIGER/Line geography types (national, state, county) +# ABOUTME: Uses hybrid validation (size/structure checks) to verify downloaded Census shapefiles + +# Include validation utilities +include("../TestRoutines/validation.jl") + @testset "Download Tests" begin @@ -9,52 +15,44 @@ # Download the states shapefiles tigerdownload("state", 2024; state="MN", county="", output=test_dir, force=true) state_file_download = joinpath(test_dir, "tl_2024_us_state.zip") - # stat(state_file_download) - @test bytes2hex(SHA.sha256(read(state_file_download))) == - "e30bad8922b177b5991bf8606d3d95de8f5f0b4bab25848648de53b25f72c17f" + + @test validate_census_file_integrity(state_file_download, "state", TOLERANCE_PARAMS["state"]) tigerdownload("county", 2024; state="MN", county="Hennepin", output=test_dir, force=true) county_file_download = joinpath(test_dir, "tl_2024_us_county.zip") - # stat(county_file_download) - @test bytes2hex(SHA.sha256(read(county_file_download))) == - "a344b72be48f2448df1ae1757098d94571b96556d3b9253cf9d6ee77bce8a0b4" + + @test validate_census_file_integrity(county_file_download, "county", TOLERANCE_PARAMS["county"]) tigerdownload("cbsa", 2024; output=test_dir, force=true) cbsa_file_download = joinpath(test_dir, "tl_2024_us_cbsa.zip") - round(stat(cbsa_file_download).size / 1024, digits=2) # 34mb - @test bytes2hex(SHA.sha256(read(cbsa_file_download))) == - "7bd2cef06f0cd6cccc1aeeb10105095d543515c9535b8a89c9e8e7470615c8fa" + + @test validate_census_file_integrity(cbsa_file_download, "cbsa", TOLERANCE_PARAMS["cbsa"]) tigerdownload("urbanarea", 2024; output=test_dir, force=true) urbanarea_file_download = joinpath(test_dir, "tl_2024_us_uac20.zip") - round(stat(urbanarea_file_download).size / 1024, digits=2) # 72mb - @test bytes2hex(SHA.sha256(read(urbanarea_file_download))) == - "13f2f86cd31935387fa458022b73ad0433c39333c36ffb6efa8185694eba9d18" + + @test validate_census_file_integrity(urbanarea_file_download, "urbanarea", TOLERANCE_PARAMS["urbanarea"]) tigerdownload("zipcode", 2024; output=test_dir, force=true) zipcode_file_download = joinpath(test_dir, "tl_2024_us_zcta520.zip") - round(stat(zipcode_file_download).size / 1024, digits=2) # 516mb - @test bytes2hex(SHA.sha256(read(zipcode_file_download))) == - "7331f68ada3d8eec3a87478c2a6ca68b7434762aa9d5a6cf2369d6ad90b3e03d" + + @test validate_census_file_integrity(zipcode_file_download, "zipcode", TOLERANCE_PARAMS["zipcode"]) tigerdownload("metrodivision", 2024; output=test_dir, force=true) metrodivision_file_download = joinpath(test_dir, "tl_2024_us_metdiv.zip") - round(stat(metrodivision_file_download).size / 1024, digits=2) # 516mb - @test bytes2hex(SHA.sha256(read(metrodivision_file_download))) == - "c7deea8ce439d3671a565e2e629bf23e1b6df5c714be3f9f72555728de3ab975" + + @test validate_census_file_integrity(metrodivision_file_download, "metrodivision", TOLERANCE_PARAMS["metrodivision"]) # -- rails tigerdownload("rails", 2024; output=test_dir, force=true) rails_file_download = joinpath(test_dir, "tl_2024_us_rails.zip") - round(stat(rails_file_download).size / 1024, digits=2) # 516mb - @test bytes2hex(SHA.sha256(read(rails_file_download))) == - "b0c19b22b1ee293062dba5dc05f57c2b6290c3df916aab8de62ff9344ebe9658" + + @test validate_census_file_integrity(rails_file_download, "rails", TOLERANCE_PARAMS["rails"]) tigerdownload("primaryroads", 2024; output=test_dir, force=true) primaryroads_file_download = joinpath(test_dir, "tl_2024_us_primaryroads.zip") - round(stat(primaryroads_file_download).size / 1024, digits=2) # 516mb - @test bytes2hex(SHA.sha256(read(primaryroads_file_download))) == - "d4f1b1cd981f440aee9980fdf991d4312a0bd03e7b2b2ae609a266bfc59ae786" + + @test validate_census_file_integrity(primaryroads_file_download, "primaryroads", TOLERANCE_PARAMS["primaryroads"]) end # -------------------------------------------------------------------------------------------------- @@ -68,9 +66,8 @@ # Download the county subdivisions shapefiles tigerdownload("cousub", 2024; state="MN", county="", output=test_dir, force=true) cousub_file_download = joinpath(test_dir, "tl_2024_27_cousub.zip") - # stat(cousub_file_download) - @test bytes2hex(SHA.sha256(read(cousub_file_download))) == - "b1cf4855fe102d9ebc34e165457986b8d906052868da0079ea650d39d973ec98" + + @test validate_census_file_integrity(cousub_file_download, "cousub", TOLERANCE_PARAMS["cousub"]) # for all the states ... tigerdownload("cousub", 2024; output=test_dir, force=false) @@ -81,37 +78,32 @@ @test all(.!isfile.(filter(contains("tl_2024_74_cousub.zip"), cousub_file_list))) # there should be one missing file cousub_file_download = filter(contains("tl_2024_28_cousub.zip"), cousub_file_list)[1] - round(stat(cousub_file_download).size / 1024, digits=2) - @test bytes2hex(SHA.sha256(read(cousub_file_download))) == - "f91963513bf14f64267fefc5ffda24161e879bfb76a48c19517eba0f85c638ba" + + @test validate_census_file_integrity(cousub_file_download, "cousub", TOLERANCE_PARAMS["cousub"]) # -- tracts tigerdownload("tract", 2024; state="27", county="", output=test_dir, force=true) tract_file_download = joinpath(test_dir, "tl_2024_27_tract.zip") - round(stat(tract_file_download).size / 1024, digits=2) - @test bytes2hex(SHA.sha256(read(tract_file_download))) == - "83f784b2042d0af55723baaac37b2b29840d1485ac233b3bb73d6af4ec7246eb" + + @test validate_census_file_integrity(tract_file_download, "tract", TOLERANCE_PARAMS["tract"]) # -- place tigerdownload("place", 2024; state="27", county="", output=test_dir, force=true) - tract_file_download = joinpath(test_dir, "tl_2024_27_place.zip") - round(stat(tract_file_download).size / 1024, digits=2) - @test bytes2hex(SHA.sha256(read(tract_file_download))) == - "f03383a2522009c63daae5b73164ac565fc37470539d1fc79c057ed5dc31c9c3" + place_file_download = joinpath(test_dir, "tl_2024_27_place.zip") + @test validate_census_file_integrity(place_file_download, "place", TOLERANCE_PARAMS["place"]) + # -- concity ... not all states are available tigerdownload("consolidatedcity", 2024; state="20", county="", output=test_dir, force=true) consolidatedcity_file_download = joinpath(test_dir, "tl_2024_20_concity.zip") - round(stat(consolidatedcity_file_download).size / 1024, digits=2) - @test bytes2hex(SHA.sha256(read(consolidatedcity_file_download))) == - "510ee4a9d1e2bcf0dc8b87fc3c97f66e7afafbd5e4f1c2996d024c14c2eb7ab4" + @test validate_census_file_integrity(consolidatedcity_file_download, "consolidatedcity", TOLERANCE_PARAMS["consolidatedcity"]) + # -- roads tigerdownload("primarysecondaryroads", 2024; state="27", county="", output=test_dir, force=true) road_file_download = joinpath(test_dir, "tl_2024_27_prisecroads.zip") - round(stat(road_file_download).size / 1024, digits=2) - @test bytes2hex(SHA.sha256(read(road_file_download))) == - "3c06a9b03ca06abf42db85b3b9ab3110d251d54ccf3d59335a2e5b98d2e6f52a" + + @test validate_census_file_integrity(road_file_download, "primarysecondaryroads", TOLERANCE_PARAMS["primarysecondaryroads"]) @@ -127,9 +119,8 @@ # Download the areawater shapefiles tigerdownload("areawater", 2024; state="MN", county="Hennepin", output=test_dir, force=true) areawater_file_download = joinpath(test_dir, "tl_2024_27053_areawater.zip") - # stat(cousub_file_download) - @test bytes2hex(SHA.sha256(read(areawater_file_download))) == - "54a2825f26405fbb83bd4c5c7a96190867437bc46dc0d4a8155198890d63db54" + + @test validate_census_file_integrity(areawater_file_download, "areawater", TOLERANCE_PARAMS["areawater"]) # Download the linear water shapefiles for all of Michigan tigerdownload("linearwater", 2024; state="MI", output=test_dir, force=true) @@ -139,16 +130,14 @@ @test all(isfile.(linearwater_file_list)) # test that all the files are there linearwater_file_download = filter(contains("tl_2024_26089_linearwater.zip"), linearwater_file_list)[1] - round(stat(linearwater_file_download).size / 1024, digits=2) - @test bytes2hex(SHA.sha256(read(linearwater_file_download))) == - "b05a58ddb37abdc9287c533a6f87110ef4b153dc4fbd20833d3d1cf56470cba7" + + @test validate_census_file_integrity(linearwater_file_download, "linearwater", TOLERANCE_PARAMS["linearwater"]) # roads tigerdownload("road", 2024; state="MN", county="Hennepin", output=test_dir, force=true) roads_file_download = joinpath(test_dir, "tl_2024_27053_roads.zip") - round(stat(roads_file_download).size / 1024, digits=2) - @test bytes2hex(SHA.sha256(read(roads_file_download))) == - "b828ad38a8bc3cd3299efcc7e3b333ec2954229392eb254a460e596c1db78511" + + @test validate_census_file_integrity(roads_file_download, "road", TOLERANCE_PARAMS["road"]) diff --git a/test/runtests.jl b/test/runtests.jl @@ -1,3 +1,6 @@ +# ABOUTME: Test runner for TigerFetch.jl package +# ABOUTME: Executes all test suites (assets, downloads) with verbose output + # -------------------------------------------------------------------------------------------------- using TigerFetch using Test