commit 9ac6f4400be14cd36eee7a7e8b467275d8ef7e10
parent 30075e42953d9c96d3fdbe2371871e9842cac057
Author: Erik Loualiche <eloualiche@users.noreply.github.com>
Date: Tue, 24 Mar 2026 22:36:42 -0500
Merge pull request #6 from LouLouLibs/claude/robust-validation
Robust validation, retry logic, docs, and v0.3.0
Diffstat:
15 files changed, 572 insertions(+), 79 deletions(-)
diff --git a/.gitignore b/.gitignore
@@ -2,7 +2,6 @@
**/Manifest.toml
.DS_Store
tmp
-setup_readme.md
**/sandbox.*
assets
@@ -13,4 +12,14 @@ assets
# doc
docs/build/
docs/node_modules
+
+setup_readme.md
+# --------------------------------------------------------------------------------------------------
+
+
+# --------------------------------------------------------------------------------------------------
+# AI
+.claude
+claude.md
+todo.md
# --------------------------------------------------------------------------------------------------
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
name = "TigerFetch"
uuid = "59408d0a-a903-460e-b817-a5586a16527e"
authors = ["Erik Loualiche"]
-version = "0.2.0"
+version = "0.3.0"
[deps]
Comonicon = "863f3e99-da2a-4334-8734-de3dacbe5542"
diff --git a/docs/src/man/cli.md b/docs/src/man/cli.md
@@ -1 +1,81 @@
# Command line interface
+
+TigerFetch provides a `tigerfetch` command that downloads TIGER/Line shapefiles directly from your terminal.
+
+## Installation
+
+You need a working Julia installation. Then install the CLI:
+
+```bash
+julia -e 'using Pkg; Pkg.add("TigerFetch"); using TigerFetch; TigerFetch.comonicon_install()'
+```
+
+The binary is placed at `~/.julia/bin/tigerfetch`. Make sure `~/.julia/bin` is on your `PATH`.
+
+## Usage
+
+```bash
+tigerfetch <type> [year] [--state STATE] [--county COUNTY] [--output DIR] [--force]
+```
+
+### Arguments
+
+| Argument | Description | Default |
+|----------|-------------|---------|
+| `type` | Geography type (see table below) | required |
+| `year` | Data year | 2024 |
+
+### Options
+
+| Option | Description | Default |
+|--------|-------------|---------|
+| `--state` | State name, abbreviation, or FIPS code | all states |
+| `--county` | County name or FIPS code (requires `--state`) | all counties |
+| `--output` | Output directory | current directory |
+| `--force` | Re-download existing files | false |
+
+### Geography types
+
+| Type | Scope | Description |
+|------|-------|-------------|
+| `state` | national | State boundaries |
+| `county` | national | County boundaries |
+| `cbsa` | national | Core Based Statistical Areas |
+| `urbanarea` | national | Urban Areas |
+| `zipcode` | national | ZIP Code Tabulation Areas |
+| `metrodivision` | national | Metropolitan Divisions |
+| `primaryroads` | national | Primary roads |
+| `rails` | national | Railroads |
+| `cousub` | state | County subdivisions |
+| `tract` | state | Census tracts |
+| `place` | state | Places |
+| `consolidatedcity` | state | Consolidated cities |
+| `primarysecondaryroads` | state | Primary and secondary roads |
+| `areawater` | county | Area hydrography |
+| `linearwater` | county | Linear hydrography |
+| `road` | county | Roads |
+
+### Examples
+
+```bash
+# Download state boundaries
+tigerfetch state --output tmp
+
+# Download county subdivisions for Illinois
+tigerfetch cousub --state IL --output tmp
+
+# Download area water for all of Minnesota
+tigerfetch areawater --state "Minnesota" --output tmp
+
+# Download roads for a single county
+tigerfetch road --state "MN" --county "Hennepin" --output tmp
+
+# Force re-download of 2023 data
+tigerfetch county 2023 --output tmp --force
+```
+
+### Scope behavior
+
+- **National** types (`state`, `county`, etc.) download a single file covering the entire US. The `--state` and `--county` flags are ignored.
+- **State** types (`cousub`, `tract`, etc.) download one file per state. Omitting `--state` downloads all states.
+- **County** types (`areawater`, `road`, etc.) download one file per county. Omitting `--county` downloads all counties for the given state(s).
diff --git a/docs/src/man/julia.md b/docs/src/man/julia.md
@@ -1 +1,114 @@
-# Using the function from within julia
+# Using TigerFetch from Julia
+
+The main entry point is the [`tigerdownload`](@ref) function.
+
+## Installation
+
+`TigerFetch.jl` is registered in the [`loulouJL`](https://github.com/LouLouLibs/loulouJL) registry:
+
+```julia
+using Pkg, LocalRegistry
+pkg"registry add https://github.com/LouLouLibs/loulouJL.git"
+Pkg.add("TigerFetch")
+```
+
+Or install directly from GitHub:
+```julia
+import Pkg; Pkg.add(url="https://github.com/louloulibs/TigerFetch.jl")
+```
+
+## Basic usage
+
+```julia
+using TigerFetch
+
+# Download state boundaries (national file)
+tigerdownload("state")
+
+# Download county subdivisions for California
+tigerdownload("cousub"; state="CA")
+
+# Download census tracts for a specific state using FIPS code
+tigerdownload("tract"; state="27", output="/path/to/data")
+
+# Download area water for a specific county
+tigerdownload("areawater"; state="MN", county="Hennepin", output="tmp")
+
+# Force re-download of existing files
+tigerdownload("county"; output="tmp", force=true)
+```
+
+## Geography scopes
+
+Geographies are organized into three scopes that determine how files are downloaded:
+
+### National scope
+A single file covers the entire US. The `state` and `county` arguments are ignored.
+
+```julia
+tigerdownload("state") # one file: tl_2024_us_state.zip
+tigerdownload("county") # one file: tl_2024_us_county.zip
+tigerdownload("cbsa") # Core Based Statistical Areas
+tigerdownload("urbanarea") # Urban Areas
+tigerdownload("zipcode") # ZIP Code Tabulation Areas
+tigerdownload("metrodivision") # Metropolitan Divisions
+tigerdownload("primaryroads") # Primary roads
+tigerdownload("rails") # Railroads
+```
+
+### State scope
+One file per state. Omit `state` to download all states.
+
+```julia
+tigerdownload("cousub"; state="IL") # County subdivisions
+tigerdownload("tract"; state="CA") # Census tracts
+tigerdownload("place"; state="27") # Places (using FIPS)
+tigerdownload("primarysecondaryroads"; state="MN") # Primary & secondary roads
+tigerdownload("consolidatedcity"; state="KS") # Consolidated cities
+```
+
+### County scope
+One file per county. Requires `state`; omit `county` to download all counties in the state.
+
+```julia
+tigerdownload("areawater"; state="MN", county="Hennepin") # Area hydrography
+tigerdownload("linearwater"; state="MI") # All MI counties
+tigerdownload("road"; state="MN", county="Hennepin") # Roads
+```
+
+## State and county identifiers
+
+States can be specified by name, abbreviation, or FIPS code:
+```julia
+tigerdownload("tract"; state="Minnesota") # full name
+tigerdownload("tract"; state="MN") # abbreviation
+tigerdownload("tract"; state="27") # FIPS code
+```
+
+Counties can be specified by name or FIPS code (common suffixes like "County" or "Parish" are stripped automatically):
+```julia
+tigerdownload("road"; state="MN", county="Hennepin") # name
+tigerdownload("road"; state="MN", county="Hennepin County") # also works
+tigerdownload("road"; state="MN", county="053") # FIPS code
+```
+
+## Options
+
+| Keyword | Type | Default | Description |
+|---------|------|---------|-------------|
+| `state` | `String` | `""` | State identifier |
+| `county` | `String` | `""` | County identifier (requires `state`) |
+| `output` | `String` | `pwd()` | Output directory |
+| `force` | `Bool` | `false` | Re-download existing files |
+| `verbose` | `Bool` | `false` | Print detailed progress |
+
+## Working with downloaded files
+
+TigerFetch downloads ZIP archives containing shapefiles (`.shp`, `.dbf`, `.shx`, `.prj`, etc.). Use [Shapefile.jl](https://github.com/JuliaGeo/Shapefile.jl) to read them:
+
+```julia
+using Shapefile, DataFrames
+df = Shapefile.Table("tl_2024_us_county.zip") |> DataFrame
+```
+
+See the [demo](@ref "Drawing a simple map") for a complete mapping example with CairoMakie.
diff --git a/src/TigerFetch.jl b/src/TigerFetch.jl
@@ -1,5 +1,6 @@
-
+# ABOUTME: Main module entry point for TigerFetch.jl
+# ABOUTME: Loads submodules and exports the tigerdownload function for Census shapefile retrieval
# --------------------------------------------------------------------------------------------------
module TigerFetch
diff --git a/src/artifacts.jl b/src/artifacts.jl
@@ -1,3 +1,5 @@
+# ABOUTME: Artifact management for bundled reference data (state/county FIPS lists)
+# ABOUTME: Handles artifact installation, path resolution, and reference data file access
using Pkg.Artifacts
"""
diff --git a/src/cli.jl b/src/cli.jl
@@ -1,4 +1,5 @@
-
+# ABOUTME: Command-line interface for TigerFetch via Comonicon.jl
+# ABOUTME: Provides the tigerfetch CLI command that wraps the tigerdownload Julia function
# -------------------------------------------------------------------------------------------------
"""
diff --git a/src/download.jl b/src/download.jl
@@ -1,3 +1,5 @@
+# ABOUTME: Download logic for Census TIGER/Line shapefiles
+# ABOUTME: Dispatches on geography scope (national, state, county) to construct URLs and manage downloads
# --------------------------------------------------------------------------------------------------
macro conditional_log(verbose, level, message, params...)
@@ -13,6 +15,37 @@ end
# --------------------------------------------------------------------------------------------------
+const MAX_RETRIES = 3
+const RETRY_BASE_DELAY = 2 # seconds
+
+"""
+ download_with_retry(url, output_path; max_retries=MAX_RETRIES)
+
+Download a file with exponential backoff retry on transient failures.
+Returns the output path on success, rethrows on persistent failure.
+"""
+function download_with_retry(url::String, output_path::String; max_retries::Int=MAX_RETRIES)
+ for attempt in 1:max_retries
+ try
+ Downloads.download(url, output_path)
+ return output_path
+ catch e
+ e isa InterruptException && rethrow(e)
+ if attempt == max_retries
+ rethrow(e)
+ end
+ delay = RETRY_BASE_DELAY * 2^(attempt - 1)
+ @warn "Download attempt $attempt/$max_retries failed, retrying in $(delay)s" url=url
+ sleep(delay)
+ # Clean up partial download before retry
+ isfile(output_path) && rm(output_path; force=true)
+ end
+ end
+end
+# --------------------------------------------------------------------------------------------------
+
+
+# --------------------------------------------------------------------------------------------------
# National scope (States, Counties nationally)
function download_shapefile(
geo::T;
@@ -34,7 +67,7 @@ function download_shapefile(
try
@conditional_log verbose "Downloading $(description(geo_type))" url=url
mkpath(output_dir)
- Downloads.download(url, output_path)
+ download_with_retry(url, output_path)
return output_path
catch e
@error "Download failed" exception=e
@@ -73,11 +106,14 @@ function download_shapefile(
geo_type = typeof(geo)
base_url = "https://www2.census.gov/geo/tiger/TIGER$(geo.year)/$(tiger_name(geo_type))/"
+ n_states = length(states_to_process)
+
try
# Process each state with total interrupt by user ...
- for state_info in states_to_process
+ for (i, state_info) in enumerate(states_to_process)
fips = state_info[2]
state_name = state_info[3]
+ n_states > 1 && @info "[$i/$n_states] $(state_name)"
filename = "tl_$(geo.year)_$(fips)_$(lowercase(tiger_name(T))).zip"
url = base_url * filename
output_path = joinpath(output_dir, filename)
@@ -89,7 +125,7 @@ function download_shapefile(
try
@conditional_log verbose "Downloading" state=state_name url=url
- Downloads.download(url, output_path)
+ download_with_retry(url, output_path)
catch e
if e isa InterruptException
# Re-throw interrupt to be caught by outer try block
@@ -143,7 +179,9 @@ function download_shapefile(
# Track failures
failed_downloads = String[]
- for state_info in states_to_process
+ n_states = length(states_to_process)
+
+ for (si, state_info) in enumerate(states_to_process)
state_fips = state_info[2]
state_name = state_info[3]
@@ -159,9 +197,13 @@ function download_shapefile(
counties = [county_info]
end
- for county_info in counties
+ n_counties = length(counties)
+ state_label = n_states > 1 ? "[state $si/$n_states] " : ""
+
+ for (ci, county_info) in enumerate(counties)
county_fips = county_info[3] # Assuming similar structure to state_info
county_name = county_info[4]
+ n_counties > 1 && @info "$(state_label)$(state_name): [$ci/$n_counties] $(county_name)"
filename = "tl_$(geo.year)_$(state_fips)$(county_fips)_$(lowercase(tiger_name(geo))).zip"
url = "https://www2.census.gov/geo/tiger/TIGER$(geo.year)/$(tiger_name(geo))/" * filename
@@ -175,7 +217,7 @@ function download_shapefile(
try
@conditional_log verbose "Downloading" state=state_name county=county_name url=url
mkpath(output_dir)
- Downloads.download(url, output_path)
+ download_with_retry(url, output_path)
catch e
push!(failed_downloads, "$(state_name) - $(county_name)")
@error "Download failed" state=state_name county=county_name exception=e
diff --git a/src/geotypes.jl b/src/geotypes.jl
@@ -1,3 +1,5 @@
+# ABOUTME: Type system for Census TIGER/Line geographies (state, county, tract, etc.)
+# ABOUTME: Defines abstract hierarchy (National/State/County) and concrete types with metadata
# --------------------------------------------------------------------------------------------------
# Abstract base type
diff --git a/src/main.jl b/src/main.jl
@@ -1,3 +1,6 @@
+# ABOUTME: Main user-facing tigerdownload function and geography type registry
+# ABOUTME: Validates inputs, creates geography instances, and dispatches to download methods
+
# --------------------------------------------------------------------------------------------------
const GEOGRAPHY_TYPES = Dict(
"state" => State,
diff --git a/src/reference.jl b/src/reference.jl
@@ -1,14 +1,37 @@
-function get_state_list()::Vector{Vector{String}}
+# ABOUTME: State and county reference data lookup from bundled FIPS lists
+# ABOUTME: Provides standardization of state/county identifiers (name, abbreviation, FIPS code)
+
+# Module-level cache for parsed reference data
+const _STATE_LIST_CACHE = Ref{Vector{Vector{String}}}()
+const _COUNTY_LIST_CACHE = Ref{Vector{Vector{AbstractString}}}()
+const _CACHE_INITIALIZED = Ref(false)
+
+function _ensure_cache()
+ _CACHE_INITIALIZED[] && return
paths = get_reference_data()
+
+ # Parse state list
state_file = paths["state"]
+ _STATE_LIST_CACHE[] = readlines(state_file) |>
+ l -> split.(l, "|") |>
+ l -> map(s -> String.(s[ [1,2,4] ]), l) |>
+ l -> l[2:end] |>
+ unique
+
+ # Parse county list
+ county_file = paths["county"]
+ _COUNTY_LIST_CACHE[] = readlines(county_file) |>
+ ( l -> split.(l, "|") ) |>
+ ( l -> map(s -> String.(s[ [1,2,3,5] ]), l) ) |>
+ ( l -> l[2:end] )
- # we do not need to load CSV so we read the file by hand
- state_list = readlines(state_file) |>
- l -> split.(l, "|") |> # split by vertical bar
- l -> map(s -> String.(s[ [1,2,4] ]), l) |> # select some columns
- l -> l[2:end] # remove the header
+ _CACHE_INITIALIZED[] = true
+ return
+end
- return unique(state_list)
+function get_state_list()::Vector{Vector{String}}
+ _ensure_cache()
+ return _STATE_LIST_CACHE[]
end
# Takes a string input (handles names and abbreviations)
@@ -36,20 +59,14 @@ standardize_state_input(::Nothing) = nothing
# -------------------------------------------------------------------------------------------------
function get_county_list(state=nothing)::Vector{Vector{AbstractString}}
- paths = get_reference_data() # Remove TigerFetch. prefix since we're inside the module
- county_file = paths["county"]
-
- # we do not need to load CSV so we read the file by hand
- county_list = readlines(county_file) |>
- ( l -> split.(l, "|") ) |> # split by vertical bar
- ( l -> map(s -> String.(s[ [1,2,3,5] ]), l) ) |> # select some columns
- ( l -> l[2:end] ) # remove the header
+ _ensure_cache()
+ county_list = _COUNTY_LIST_CACHE[]
if isnothing(state)
return county_list
elseif !isnothing(tryparse(Int, state)) # then its the fips
return unique(filter(l -> l[2] == state, county_list))
- else # then its the abbreviation state name
+ else # then its the abbreviation state name
return unique(filter(l -> l[1] == state, county_list))
end
diff --git a/test/TestRoutines/validation.jl b/test/TestRoutines/validation.jl
@@ -0,0 +1,228 @@
+# ABOUTME: Census shapefile validation utilities for robust testing
+# ABOUTME: Provides functions to validate downloaded Census files without hardcoded hashes
+
+"""
+ ToleranceParams
+
+Parameters defining acceptable ranges for Census file validation.
+"""
+struct ToleranceParams
+ min_size_mb::Float64
+ max_size_mb::Float64
+ min_features::Union{Int, Nothing}
+ max_features::Union{Int, Nothing}
+ geographic_bounds::Union{NamedTuple, Nothing}
+end
+
+"""
+ validate_file_basics(filepath::String, tolerance::ToleranceParams) -> Bool
+
+Basic file validation: existence, ZIP format, and reasonable size.
+"""
+function validate_file_basics(filepath::String, tolerance::ToleranceParams)
+ # File must exist
+ !isfile(filepath) && return false
+
+ # Must be a ZIP file
+ !endswith(filepath, ".zip") && return false
+
+ # Size check
+ size_mb = stat(filepath).size / (1024^2)
+ if size_mb < tolerance.min_size_mb || size_mb > tolerance.max_size_mb
+ @warn "File size $(round(size_mb, digits=2))MB outside expected range [$(tolerance.min_size_mb), $(tolerance.max_size_mb)]MB"
+ return false
+ end
+
+ return true
+end
+
+"""
+ validate_zip_structure(filepath::String) -> Bool
+
+Validate ZIP file can be opened and contains expected shapefile components.
+"""
+function validate_zip_structure(filepath::String)
+ try
+ # Try to read ZIP file structure using unzip -t (test archive)
+ result = run(pipeline(`unzip -t $filepath`, stdout=devnull, stderr=devnull))
+
+ # Get file listing
+ zip_contents = readchomp(`unzip -l $filepath`)
+
+ # Check for essential shapefile components
+ has_shp = occursin(r"\.shp$"m, zip_contents)
+ has_dbf = occursin(r"\.dbf$"m, zip_contents)
+ has_shx = occursin(r"\.shx$"m, zip_contents)
+
+ if !has_shp || !has_dbf || !has_shx
+ @warn "Missing essential shapefile components: .shp=$has_shp, .dbf=$has_dbf, .shx=$has_shx"
+ return false
+ end
+
+ return true
+ catch e
+ @warn "Failed to validate ZIP structure: $e"
+ return false
+ end
+end
+
+"""
+ validate_census_filename(filepath::String, expected_pattern::Regex) -> Bool
+
+Validate Census filename follows expected TIGER/Line naming convention.
+"""
+function validate_census_filename(filepath::String, expected_pattern::Regex)
+ filename = basename(filepath)
+ if !occursin(expected_pattern, filename)
+ @warn "Filename '$filename' doesn't match expected pattern $expected_pattern"
+ return false
+ end
+ return true
+end
+
+"""
+ count_dbf_records(filepath::String) -> Union{Int, Nothing}
+
+Extract the record count from the .dbf header inside a shapefile ZIP.
+The DBF header stores the record count as a UInt32 at bytes 4-7 (little-endian).
+"""
+function count_dbf_records(filepath::String)
+ try
+ # Find the .dbf filename inside the ZIP
+ zip_listing = readchomp(`unzip -l $filepath`)
+ dbf_match = match(r"(\S+\.dbf)$"m, zip_listing)
+ isnothing(dbf_match) && return nothing
+
+ dbf_name = dbf_match.captures[1]
+
+ # Extract just the .dbf to a temp dir and read its header
+ tmp = mktempdir()
+ run(pipeline(`unzip -j -o $filepath $dbf_name -d $tmp`, stdout=devnull, stderr=devnull))
+ dbf_path = joinpath(tmp, basename(dbf_name))
+
+ header = open(dbf_path) do io
+ read(io, 8)
+ end
+ rm(tmp; recursive=true, force=true)
+
+ length(header) < 8 && return nothing
+
+ # Record count is at bytes 4-7 (0-indexed), little-endian UInt32
+ n_records = reinterpret(UInt32, header[5:8])[1]
+ return Int(n_records)
+ catch e
+ @warn "Could not read DBF record count: $e"
+ return nothing
+ end
+end
+
+"""
+ validate_feature_count(filepath::String, tolerance::ToleranceParams) -> Bool
+
+Validate that the number of features (DBF records) falls within the expected range.
+Skips validation if tolerance bounds are nothing.
+"""
+function validate_feature_count(filepath::String, tolerance::ToleranceParams)
+ isnothing(tolerance.min_features) && isnothing(tolerance.max_features) && return true
+
+ n_features = count_dbf_records(filepath)
+ if isnothing(n_features)
+ @warn "Could not determine feature count, skipping validation"
+ return true
+ end
+
+ if !isnothing(tolerance.min_features) && n_features < tolerance.min_features
+ @warn "Feature count $n_features below minimum $(tolerance.min_features)"
+ return false
+ end
+ if !isnothing(tolerance.max_features) && n_features > tolerance.max_features
+ @warn "Feature count $n_features above maximum $(tolerance.max_features)"
+ return false
+ end
+
+ @info "Feature count: $n_features (expected [$(tolerance.min_features), $(tolerance.max_features)])"
+ return true
+end
+
+"""
+ validate_census_file_integrity(filepath::String, file_type::String, tolerance::ToleranceParams) -> Bool
+
+Comprehensive validation of a Census shapefile using the hybrid approach.
+"""
+function validate_census_file_integrity(filepath::String, file_type::String, tolerance::ToleranceParams)
+ @info "Validating $file_type file: $(basename(filepath))"
+
+ # 1. Basic file validation
+ if !validate_file_basics(filepath, tolerance)
+ @error "Basic file validation failed"
+ return false
+ end
+
+ # 2. ZIP structure validation
+ if !validate_zip_structure(filepath)
+ @error "ZIP structure validation failed"
+ return false
+ end
+
+ # 3. Feature count validation
+ if !validate_feature_count(filepath, tolerance)
+ @error "Feature count validation failed"
+ return false
+ end
+
+ # 4. Filename pattern validation
+ expected_patterns = Dict(
+ "state" => r"tl_\d{4}_us_state\.zip$",
+ "county" => r"tl_\d{4}_us_county\.zip$",
+ "cbsa" => r"tl_\d{4}_us_cbsa\.zip$",
+ "urbanarea" => r"tl_\d{4}_us_uac20\.zip$",
+ "zipcode" => r"tl_\d{4}_us_zcta520\.zip$",
+ "metrodivision" => r"tl_\d{4}_us_metdiv\.zip$",
+ "rails" => r"tl_\d{4}_us_rails\.zip$",
+ "primaryroads" => r"tl_\d{4}_us_primaryroads\.zip$",
+ "cousub" => r"tl_\d{4}_\d{2}_cousub\.zip$",
+ "tract" => r"tl_\d{4}_\d{2}_tract\.zip$",
+ "place" => r"tl_\d{4}_\d{2}_place\.zip$",
+ "consolidatedcity" => r"tl_\d{4}_\d{2}_concity\.zip$",
+ "primarysecondaryroads" => r"tl_\d{4}_\d{2}_prisecroads\.zip$",
+ "areawater" => r"tl_\d{4}_\d{5}_areawater\.zip$",
+ "linearwater" => r"tl_\d{4}_\d{5}_linearwater\.zip$",
+ "road" => r"tl_\d{4}_\d{5}_roads\.zip$"
+ )
+
+ if haskey(expected_patterns, file_type)
+ if !validate_census_filename(filepath, expected_patterns[file_type])
+ @error "Filename validation failed"
+ return false
+ end
+ end
+
+ @info "✓ File validation passed for $file_type"
+ return true
+end
+
+# Define tolerance parameters for different geography types
+# Updated based on actual 2024 Census data observations
+const TOLERANCE_PARAMS = Dict(
+ # National geographies
+ "state" => ToleranceParams(5.0, 50.0, 45, 65, nothing), # 56 in 2024
+ "county" => ToleranceParams(75.0, 150.0, 2900, 3500, nothing), # 3235 in 2024
+ "cbsa" => ToleranceParams(25.0, 60.0, 800, 1100, nothing), # 935 in 2024
+ "urbanarea" => ToleranceParams(60.0, 100.0, 2200, 3200, nothing), # 2644 in 2024
+ "zipcode" => ToleranceParams(400.0, 700.0, 28000, 38000, nothing), # ~33k ZIP codes
+ "metrodivision" => ToleranceParams(0.5, 5.0, 25, 50, nothing), # 37 in 2024
+ "rails" => ToleranceParams(25.0, 150.0, nothing, nothing, nothing), # Rails was ~32MB
+ "primaryroads" => ToleranceParams(25.0, 150.0, nothing, nothing, nothing), # Primary roads was ~43MB
+
+ # State geographies (examples for typical states)
+ "cousub" => ToleranceParams(5.0, 50.0, 50, 2000, nothing), # Varies widely by state
+ "tract" => ToleranceParams(5.0, 100.0, 200, 5000, nothing), # MN tract was ~7.15MB
+ "place" => ToleranceParams(2.0, 50.0, 100, 3000, nothing), # MN place was ~3.96MB
+ "consolidatedcity" => ToleranceParams(0.01, 10.0, 0, 50, nothing), # KS was ~0.02MB
+ "primarysecondaryroads" => ToleranceParams(2.0, 200.0, nothing, nothing, nothing), # MN was ~3.9MB
+
+ # County geographies (examples for typical counties)
+ "areawater" => ToleranceParams(0.1, 50.0, 0, 5000, nothing), # Varies by geography
+ "linearwater" => ToleranceParams(0.01, 100.0, 0, 10000, nothing), # Varies widely by county
+ "road" => ToleranceParams(0.1, 200.0, 100, 50000, nothing) # Varies widely by county
+)+
\ No newline at end of file
diff --git a/test/UnitTests/assets.jl b/test/UnitTests/assets.jl
@@ -1,3 +1,5 @@
+# ABOUTME: Tests for artifact installation and reference data integrity
+# ABOUTME: Validates bundled FIPS state/county files, their structure, and data processing functions
@testset "Asset Installation Tests" begin
@testset "Artifact Configuration" begin
diff --git a/test/UnitTests/downloads.jl b/test/UnitTests/downloads.jl
@@ -1,3 +1,9 @@
+# ABOUTME: Download tests for all TIGER/Line geography types (national, state, county)
+# ABOUTME: Uses hybrid validation (size/structure checks) to verify downloaded Census shapefiles
+
+# Include validation utilities
+include("../TestRoutines/validation.jl")
+
@testset "Download Tests" begin
@@ -9,52 +15,44 @@
# Download the states shapefiles
tigerdownload("state", 2024; state="MN", county="", output=test_dir, force=true)
state_file_download = joinpath(test_dir, "tl_2024_us_state.zip")
- # stat(state_file_download)
- @test bytes2hex(SHA.sha256(read(state_file_download))) ==
- "e30bad8922b177b5991bf8606d3d95de8f5f0b4bab25848648de53b25f72c17f"
+
+ @test validate_census_file_integrity(state_file_download, "state", TOLERANCE_PARAMS["state"])
tigerdownload("county", 2024; state="MN", county="Hennepin", output=test_dir, force=true)
county_file_download = joinpath(test_dir, "tl_2024_us_county.zip")
- # stat(county_file_download)
- @test bytes2hex(SHA.sha256(read(county_file_download))) ==
- "a344b72be48f2448df1ae1757098d94571b96556d3b9253cf9d6ee77bce8a0b4"
+
+ @test validate_census_file_integrity(county_file_download, "county", TOLERANCE_PARAMS["county"])
tigerdownload("cbsa", 2024; output=test_dir, force=true)
cbsa_file_download = joinpath(test_dir, "tl_2024_us_cbsa.zip")
- round(stat(cbsa_file_download).size / 1024, digits=2) # 34mb
- @test bytes2hex(SHA.sha256(read(cbsa_file_download))) ==
- "7bd2cef06f0cd6cccc1aeeb10105095d543515c9535b8a89c9e8e7470615c8fa"
+
+ @test validate_census_file_integrity(cbsa_file_download, "cbsa", TOLERANCE_PARAMS["cbsa"])
tigerdownload("urbanarea", 2024; output=test_dir, force=true)
urbanarea_file_download = joinpath(test_dir, "tl_2024_us_uac20.zip")
- round(stat(urbanarea_file_download).size / 1024, digits=2) # 72mb
- @test bytes2hex(SHA.sha256(read(urbanarea_file_download))) ==
- "13f2f86cd31935387fa458022b73ad0433c39333c36ffb6efa8185694eba9d18"
+
+ @test validate_census_file_integrity(urbanarea_file_download, "urbanarea", TOLERANCE_PARAMS["urbanarea"])
tigerdownload("zipcode", 2024; output=test_dir, force=true)
zipcode_file_download = joinpath(test_dir, "tl_2024_us_zcta520.zip")
- round(stat(zipcode_file_download).size / 1024, digits=2) # 516mb
- @test bytes2hex(SHA.sha256(read(zipcode_file_download))) ==
- "7331f68ada3d8eec3a87478c2a6ca68b7434762aa9d5a6cf2369d6ad90b3e03d"
+
+ @test validate_census_file_integrity(zipcode_file_download, "zipcode", TOLERANCE_PARAMS["zipcode"])
tigerdownload("metrodivision", 2024; output=test_dir, force=true)
metrodivision_file_download = joinpath(test_dir, "tl_2024_us_metdiv.zip")
- round(stat(metrodivision_file_download).size / 1024, digits=2) # 516mb
- @test bytes2hex(SHA.sha256(read(metrodivision_file_download))) ==
- "c7deea8ce439d3671a565e2e629bf23e1b6df5c714be3f9f72555728de3ab975"
+
+ @test validate_census_file_integrity(metrodivision_file_download, "metrodivision", TOLERANCE_PARAMS["metrodivision"])
# -- rails
tigerdownload("rails", 2024; output=test_dir, force=true)
rails_file_download = joinpath(test_dir, "tl_2024_us_rails.zip")
- round(stat(rails_file_download).size / 1024, digits=2) # 516mb
- @test bytes2hex(SHA.sha256(read(rails_file_download))) ==
- "b0c19b22b1ee293062dba5dc05f57c2b6290c3df916aab8de62ff9344ebe9658"
+
+ @test validate_census_file_integrity(rails_file_download, "rails", TOLERANCE_PARAMS["rails"])
tigerdownload("primaryroads", 2024; output=test_dir, force=true)
primaryroads_file_download = joinpath(test_dir, "tl_2024_us_primaryroads.zip")
- round(stat(primaryroads_file_download).size / 1024, digits=2) # 516mb
- @test bytes2hex(SHA.sha256(read(primaryroads_file_download))) ==
- "d4f1b1cd981f440aee9980fdf991d4312a0bd03e7b2b2ae609a266bfc59ae786"
+
+ @test validate_census_file_integrity(primaryroads_file_download, "primaryroads", TOLERANCE_PARAMS["primaryroads"])
end
# --------------------------------------------------------------------------------------------------
@@ -68,9 +66,8 @@
# Download the county subdivisions shapefiles
tigerdownload("cousub", 2024; state="MN", county="", output=test_dir, force=true)
cousub_file_download = joinpath(test_dir, "tl_2024_27_cousub.zip")
- # stat(cousub_file_download)
- @test bytes2hex(SHA.sha256(read(cousub_file_download))) ==
- "b1cf4855fe102d9ebc34e165457986b8d906052868da0079ea650d39d973ec98"
+
+ @test validate_census_file_integrity(cousub_file_download, "cousub", TOLERANCE_PARAMS["cousub"])
# for all the states ...
tigerdownload("cousub", 2024; output=test_dir, force=false)
@@ -81,37 +78,32 @@
@test all(.!isfile.(filter(contains("tl_2024_74_cousub.zip"), cousub_file_list))) # there should be one missing file
cousub_file_download = filter(contains("tl_2024_28_cousub.zip"), cousub_file_list)[1]
- round(stat(cousub_file_download).size / 1024, digits=2)
- @test bytes2hex(SHA.sha256(read(cousub_file_download))) ==
- "f91963513bf14f64267fefc5ffda24161e879bfb76a48c19517eba0f85c638ba"
+
+ @test validate_census_file_integrity(cousub_file_download, "cousub", TOLERANCE_PARAMS["cousub"])
# -- tracts
tigerdownload("tract", 2024; state="27", county="", output=test_dir, force=true)
tract_file_download = joinpath(test_dir, "tl_2024_27_tract.zip")
- round(stat(tract_file_download).size / 1024, digits=2)
- @test bytes2hex(SHA.sha256(read(tract_file_download))) ==
- "83f784b2042d0af55723baaac37b2b29840d1485ac233b3bb73d6af4ec7246eb"
+
+ @test validate_census_file_integrity(tract_file_download, "tract", TOLERANCE_PARAMS["tract"])
# -- place
tigerdownload("place", 2024; state="27", county="", output=test_dir, force=true)
- tract_file_download = joinpath(test_dir, "tl_2024_27_place.zip")
- round(stat(tract_file_download).size / 1024, digits=2)
- @test bytes2hex(SHA.sha256(read(tract_file_download))) ==
- "f03383a2522009c63daae5b73164ac565fc37470539d1fc79c057ed5dc31c9c3"
+ place_file_download = joinpath(test_dir, "tl_2024_27_place.zip")
+ @test validate_census_file_integrity(place_file_download, "place", TOLERANCE_PARAMS["place"])
+
# -- concity ... not all states are available
tigerdownload("consolidatedcity", 2024; state="20", county="", output=test_dir, force=true)
consolidatedcity_file_download = joinpath(test_dir, "tl_2024_20_concity.zip")
- round(stat(consolidatedcity_file_download).size / 1024, digits=2)
- @test bytes2hex(SHA.sha256(read(consolidatedcity_file_download))) ==
- "510ee4a9d1e2bcf0dc8b87fc3c97f66e7afafbd5e4f1c2996d024c14c2eb7ab4"
+ @test validate_census_file_integrity(consolidatedcity_file_download, "consolidatedcity", TOLERANCE_PARAMS["consolidatedcity"])
+
# -- roads
tigerdownload("primarysecondaryroads", 2024; state="27", county="", output=test_dir, force=true)
road_file_download = joinpath(test_dir, "tl_2024_27_prisecroads.zip")
- round(stat(road_file_download).size / 1024, digits=2)
- @test bytes2hex(SHA.sha256(read(road_file_download))) ==
- "3c06a9b03ca06abf42db85b3b9ab3110d251d54ccf3d59335a2e5b98d2e6f52a"
+
+ @test validate_census_file_integrity(road_file_download, "primarysecondaryroads", TOLERANCE_PARAMS["primarysecondaryroads"])
@@ -127,9 +119,8 @@
# Download the areawater shapefiles
tigerdownload("areawater", 2024; state="MN", county="Hennepin", output=test_dir, force=true)
areawater_file_download = joinpath(test_dir, "tl_2024_27053_areawater.zip")
- # stat(cousub_file_download)
- @test bytes2hex(SHA.sha256(read(areawater_file_download))) ==
- "54a2825f26405fbb83bd4c5c7a96190867437bc46dc0d4a8155198890d63db54"
+
+ @test validate_census_file_integrity(areawater_file_download, "areawater", TOLERANCE_PARAMS["areawater"])
# Download the linear water shapefiles for all of Michigan
tigerdownload("linearwater", 2024; state="MI", output=test_dir, force=true)
@@ -139,16 +130,14 @@
@test all(isfile.(linearwater_file_list)) # test that all the files are there
linearwater_file_download = filter(contains("tl_2024_26089_linearwater.zip"), linearwater_file_list)[1]
- round(stat(linearwater_file_download).size / 1024, digits=2)
- @test bytes2hex(SHA.sha256(read(linearwater_file_download))) ==
- "b05a58ddb37abdc9287c533a6f87110ef4b153dc4fbd20833d3d1cf56470cba7"
+
+ @test validate_census_file_integrity(linearwater_file_download, "linearwater", TOLERANCE_PARAMS["linearwater"])
# roads
tigerdownload("road", 2024; state="MN", county="Hennepin", output=test_dir, force=true)
roads_file_download = joinpath(test_dir, "tl_2024_27053_roads.zip")
- round(stat(roads_file_download).size / 1024, digits=2)
- @test bytes2hex(SHA.sha256(read(roads_file_download))) ==
- "b828ad38a8bc3cd3299efcc7e3b333ec2954229392eb254a460e596c1db78511"
+
+ @test validate_census_file_integrity(roads_file_download, "road", TOLERANCE_PARAMS["road"])
diff --git a/test/runtests.jl b/test/runtests.jl
@@ -1,3 +1,6 @@
+# ABOUTME: Test runner for TigerFetch.jl package
+# ABOUTME: Executes all test suites (assets, downloads) with verbose output
+
# --------------------------------------------------------------------------------------------------
using TigerFetch
using Test