2026-03-22-v0.5.0-hardening-and-extensions.md - FinanceRoutines.jl

2026-03-22-v0.5.0-hardening-and-extensions.md (28612B)
      1 # FinanceRoutines.jl v0.5.0 — Hardening & Extensions
      2 
      3 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
      4 
      5 **Goal:** Fix all identified quality/robustness issues, restructure ImportYields.jl, add CI path filtering, create NEWS.md, and implement extensions (FF5, portfolio returns, event studies, diagnostics) — releasing as v0.5.0.
      6 
      7 **Architecture:** Fixes first (tasks 1–8), then extensions (tasks 9–11), then integration + release (tasks 12–13). Each task is independently testable and committable. The ImportYields.jl split (task 5) is the highest-risk refactor — it moves code into two new files without changing any public API.
      8 
      9 **Tech Stack:** Julia 1.10+, LibPQ, DataFrames, CSV, Downloads, ZipFile, Roots, LinearAlgebra, FlexiJoins, BazerData, GitHub Actions
     10 
     11 ---
     12 
     13 ## File Map
     14 
     15 ### Files to create
     16 - `src/GSW.jl` — GSW parameter struct, yield/price/forward/return calculations, DataFrame wrappers
     17 - `src/BondPricing.jl` — `bond_yield`, `bond_yield_excel`, day-count helpers
     18 - `src/ImportFamaFrench5.jl` — FF5 + momentum import functions
     19 - `src/PortfolioUtils.jl` — portfolio return calculations
     20 - `src/Diagnostics.jl` — data quality diagnostics
     21 - `test/UnitTests/FF5.jl` — tests for FF5/momentum imports
     22 - `test/UnitTests/PortfolioUtils.jl` — tests for portfolio returns
     23 - `test/UnitTests/Diagnostics.jl` — tests for diagnostics
     24 - `NEWS.md` — changelog
     25 
     26 ### Files to modify
     27 - `src/FinanceRoutines.jl` — update includes/exports
     28 - `src/Utilities.jl` — add retry logic, remove broken logging macro
     29 - `src/ImportFamaFrench.jl` — make parsing more robust, refactor for FF5 reuse
     30 - `src/ImportYields.jl` — DELETE (replaced by GSW.jl + BondPricing.jl)
     31 - `src/ImportCRSP.jl` — expand missing-value flags in `_safe_parse_float` (actually in ImportYields.jl, moves to GSW.jl)
     32 - `.github/workflows/CI.yml` — add path filters, consider macOS
     33 - `Project.toml` — bump to 0.5.0
     34 - `test/runtests.jl` — add new test suites
     35 
     36 ### Files to delete
     37 - `src/ImportYields.jl` — replaced by `src/GSW.jl` + `src/BondPricing.jl`
     38 
     39 ---
     40 
     41 ## Task 1: Remove broken logging macro & clean up Utilities.jl
     42 
     43 **Files:**
     44 - Modify: `src/Utilities.jl:70-102`
     45 - Modify: `src/FinanceRoutines.jl:19` (remove Logging import if unused)
     46 - Modify: `src/ImportCRSP.jl:303,381,400` (replace `@log_msg` calls)
     47 
     48 - [ ] **Step 1: Check all usages of `@log_msg` and `log_with_level`**
     49 
     50 Run: `grep -rn "log_msg\|log_with_level" src/`
     51 
     52 - [ ] **Step 2: Replace `@log_msg` calls in ImportCRSP.jl with `@debug`**
     53 
     54 In `src/ImportCRSP.jl`, replace the three `@log_msg` calls (lines 303, 381, 400) with `@debug`:
     55 ```julia
     56 # Before:
     57 @log_msg "# -- GETTING MONTHLY STOCK FILE (CIZ) ... msf_v2"
     58 # After:
     59 @debug "Getting monthly stock file (CIZ) ... msf_v2"
     60 ```
     61 
     62 - [ ] **Step 3: Remove `log_with_level` and `@log_msg` from Utilities.jl**
     63 
     64 Delete lines 69–102 (the `log_with_level` function and `@log_msg` macro).
     65 
     66 - [ ] **Step 4: Clean up Logging import and stale export in FinanceRoutines.jl**
     67 
     68 Remove the entire `import Logging: ...` line from `FinanceRoutines.jl` (line 19). `@debug` and `@warn` are available from `Base.CoreLogging` without explicit import. Also remove `Logging` from the `[deps]` section of `Project.toml`. Also remove the stale `export greet_FinanceRoutines` (line 45) — this function is not defined anywhere.
     69 
     70 - [ ] **Step 5: Run tests to verify nothing breaks**
     71 
     72 Run: `julia --project=. -e 'using Pkg; Pkg.test()'` (with `[skip ci]` since this is internal cleanup)
     73 
     74 - [ ] **Step 6: Commit**
     75 
     76 ```bash
     77 git add src/Utilities.jl src/ImportCRSP.jl src/FinanceRoutines.jl
     78 git commit -m "Remove broken @log_msg macro, replace with @debug [skip ci]"
     79 ```
     80 
     81 ---
     82 
     83 ## Task 2: Add WRDS connection retry logic
     84 
     85 **Files:**
     86 - Modify: `src/Utilities.jl:19-29` (the `open_wrds_pg(user, password)` method)
     87 
     88 - [ ] **Step 1: Write test for retry behavior**
     89 
     90 This is hard to unit test (requires WRDS), so we verify by code review. The retry logic should:
     91 - Attempt up to 3 connections
     92 - Exponential backoff: 1s, 2s, 4s
     93 - Log warnings on retry
     94 - Rethrow on final failure
     95 
     96 - [ ] **Step 2: Add retry wrapper to `open_wrds_pg`**
     97 
     98 Replace the `open_wrds_pg(user, password)` function:
     99 
    100 ```julia
    101 function open_wrds_pg(user::AbstractString, password::AbstractString;
    102                       max_retries::Int=3, base_delay::Float64=1.0)
    103     conn_str = """
    104         host = wrds-pgdata.wharton.upenn.edu
    105         port = 9737
    106         user='$user'
    107         password='$password'
    108         sslmode = 'require' dbname = wrds
    109     """
    110     for attempt in 1:max_retries
    111         try
    112             return Connection(conn_str)
    113         catch e
    114             if attempt == max_retries
    115                 rethrow(e)
    116             end
    117             delay = base_delay * 2^(attempt - 1)
    118             @warn "WRDS connection attempt $attempt/$max_retries failed, retrying in $(delay)s" exception=e
    119             sleep(delay)
    120         end
    121     end
    122 end
    123 ```
    124 
    125 - [ ] **Step 3: Verify the package loads and existing tests still pass**
    126 
    127 Run: `julia --project=. -e 'using FinanceRoutines'`
    128 
    129 - [ ] **Step 4: Commit**
    130 
    131 ```bash
    132 git add src/Utilities.jl
    133 git commit -m "Add retry logic with exponential backoff for WRDS connections"
    134 ```
    135 
    136 ---
    137 
    138 ## Task 3: Expand missing-value flags in `_safe_parse_float`
    139 
    140 **Files:**
    141 - Modify: `src/ImportYields.jl:281-309` (will move to GSW.jl in task 5, but fix first)
    142 
    143 - [ ] **Step 1: Write test for expanded flags**
    144 
    145 Add to `test/UnitTests/Yields.jl` inside a new testset:
    146 
    147 ```julia
    148 @testset "Missing value flag handling" begin
    149     @test ismissing(FinanceRoutines._safe_parse_float(-999.99))
    150     @test ismissing(FinanceRoutines._safe_parse_float(-999.0))
    151     @test ismissing(FinanceRoutines._safe_parse_float(-9999.0))
    152     @test ismissing(FinanceRoutines._safe_parse_float(-99.99))
    153     @test !ismissing(FinanceRoutines._safe_parse_float(-5.0))  # legitimate negative
    154     @test FinanceRoutines._safe_parse_float(3.14) ≈ 3.14
    155     @test ismissing(FinanceRoutines._safe_parse_float(""))
    156     @test ismissing(FinanceRoutines._safe_parse_float(missing))
    157 end
    158 ```
    159 
    160 - [ ] **Step 2: Run test to verify it fails**
    161 
    162 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'`
    163 Expected: FAIL on `-999.0` and `-9999.0`
    164 
    165 - [ ] **Step 3: Update `_safe_parse_float`**
    166 
    167 ```julia
    168 function _safe_parse_float(value)
    169     if ismissing(value) || value == ""
    170         return missing
    171     end
    172 
    173     if value isa AbstractString
    174         parsed = tryparse(Float64, strip(value))
    175         if isnothing(parsed)
    176             return missing
    177         end
    178         value = parsed
    179     end
    180 
    181     try
    182         numeric_value = Float64(value)
    183         # Common missing data flags in economic/financial datasets
    184         if numeric_value in (-999.99, -999.0, -9999.0, -99.99)
    185             return missing
    186         end
    187         return numeric_value
    188     catch
    189         return missing
    190     end
    191 end
    192 ```
    193 
    194 - [ ] **Step 4: Run test to verify it passes**
    195 
    196 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'`
    197 Expected: PASS
    198 
    199 - [ ] **Step 5: Commit**
    200 
    201 ```bash
    202 git add src/ImportYields.jl test/UnitTests/Yields.jl
    203 git commit -m "Expand missing-value flags to cover -999, -9999, -99.99"
    204 ```
    205 
    206 ---
    207 
    208 ## Task 4: Make Ken French parsing more robust
    209 
    210 **Files:**
    211 - Modify: `src/ImportFamaFrench.jl:118-159` (`_parse_ff_annual`)
    212 - Modify: `src/ImportFamaFrench.jl:164-205` (`_parse_ff_monthly`)
    213 
    214 - [ ] **Step 1: Refactor `_parse_ff_annual` to use data-pattern detection**
    215 
    216 Instead of `occursin(r"Annual Factors", line)`, detect the annual section by:
    217 1. Skip past the monthly data (lines starting with 6-digit YYYYMM)
    218 2. Find the next block of lines starting with 4-digit YYYY
    219 
    220 ```julia
    221 function _parse_ff_annual(zip_file; types=nothing)
    222     file_lines = split(String(read(zip_file)), '\n')
    223 
    224     # Find annual data: lines starting with a 4-digit year that are NOT 6-digit monthly dates
    225     # Annual section comes after monthly section
    226     found_monthly = false
    227     past_monthly = false
    228     lines = String[]
    229 
    230     for line in file_lines
    231         stripped = strip(line)
    232 
    233         # Track when we're past the monthly data section
    234         if !found_monthly && occursin(r"^\s*\d{6}", stripped)
    235             found_monthly = true
    236             continue
    237         end
    238 
    239         if found_monthly && !past_monthly
    240             # Still in monthly section until we hit a non-data line
    241             if occursin(r"^\s*\d{6}", stripped)
    242                 continue
    243             elseif !occursin(r"^\s*$", stripped) && !occursin(r"^\s*\d", stripped)
    244                 past_monthly = true
    245                 continue
    246             else
    247                 continue
    248             end
    249         end
    250 
    251         if past_monthly
    252             # Look for annual data lines (4-digit year)
    253             if occursin(r"^\s*\d{4}\s*,", stripped)
    254                 push!(lines, replace(stripped, r"[\r]" => ""))
    255             elseif !isempty(lines) && occursin(r"^\s*$", stripped)
    256                 break  # End of annual section
    257             end
    258         end
    259     end
    260 
    261     if isempty(lines)
    262         error("Annual Factors section not found in file")
    263     end
    264 
    265     lines_buffer = IOBuffer(join(lines, "\n"))
    266     return CSV.File(lines_buffer, header=false, delim=",", ntasks=1, types=types) |> DataFrame |>
    267            df -> rename!(df, [:datey, :mktrf, :smb, :hml, :rf])
    268 end
    269 ```
    270 
    271 - [ ] **Step 2: Run Ken French tests**
    272 
    273 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/KenFrench.jl")'`
    274 Expected: PASS
    275 
    276 - [ ] **Step 3: Commit**
    277 
    278 ```bash
    279 git add src/ImportFamaFrench.jl
    280 git commit -m "Make FF3 parsing use data patterns instead of hardcoded headers"
    281 ```
    282 
    283 ---
    284 
    285 ## Task 5: Split ImportYields.jl into GSW.jl + BondPricing.jl
    286 
    287 This is the largest refactor. No API changes — just file reorganization.
    288 
    289 **Files:**
    290 - Create: `src/GSW.jl` — everything from ImportYields.jl lines 1–1368 (GSWParameters struct, all gsw_* functions, DataFrame wrappers, helpers)
    291 - Create: `src/BondPricing.jl` — everything from ImportYields.jl lines 1371–1694 (bond_yield, bond_yield_excel, day-count functions)
    292 - Delete: `src/ImportYields.jl`
    293 - Modify: `src/FinanceRoutines.jl` — update includes
    294 
    295 - [ ] **Step 1: Create `src/GSW.jl`**
    296 
    297 Copy lines 1–1368 from ImportYields.jl into `src/GSW.jl`. This includes:
    298 - `GSWParameters` struct and constructors
    299 - `is_three_factor_model`, `_extract_params`
    300 - `import_gsw_parameters`, `_clean_gsw_data`, `_safe_parse_float`, `_validate_gsw_data`
    301 - `gsw_yield`, `gsw_price`, `gsw_forward_rate`
    302 - `gsw_yield_curve`, `gsw_price_curve`
    303 - `gsw_return`, `gsw_excess_return`
    304 - `add_yields!`, `add_prices!`, `add_returns!`, `add_excess_returns!`
    305 - `gsw_curve_snapshot`
    306 - `_validate_gsw_dataframe`, `_maturity_to_column_name`
    307 
    308 - [ ] **Step 2: Create `src/BondPricing.jl`**
    309 
    310 Copy lines 1371–1694 from ImportYields.jl into `src/BondPricing.jl`. This includes:
    311 - `bond_yield_excel`
    312 - `bond_yield`
    313 - `_day_count_days`
    314 - `_date_difference`
    315 
    316 - [ ] **Step 3: Update `src/FinanceRoutines.jl`**
    317 
    318 Replace `include("ImportYields.jl")` with:
    319 ```julia
    320 include("GSW.jl")
    321 include("BondPricing.jl")
    322 ```
    323 
    324 - [ ] **Step 4: Delete `src/ImportYields.jl`**
    325 
    326 - [ ] **Step 5: Run full test suite to verify nothing broke**
    327 
    328 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'`
    329 Expected: All 70+ assertions PASS
    330 
    331 - [ ] **Step 6: Commit**
    332 
    333 ```bash
    334 git add src/GSW.jl src/BondPricing.jl src/FinanceRoutines.jl
    335 git rm src/ImportYields.jl
    336 git commit -m "Split ImportYields.jl into GSW.jl and BondPricing.jl (no API changes)"
    337 ```
    338 
    339 ---
    340 
    341 ## Task 6: Add CI path filters and macOS runner
    342 
    343 **Files:**
    344 - Modify: `.github/workflows/CI.yml`
    345 
    346 - [ ] **Step 1: Add path filters to CI.yml**
    347 
    348 ```yaml
    349 on:
    350   push:
    351     branches:
    352       - main
    353     tags:
    354       - "*"
    355     paths:
    356       - 'src/**'
    357       - 'test/**'
    358       - 'Project.toml'
    359       - '.github/workflows/CI.yml'
    360   pull_request:
    361     paths:
    362       - 'src/**'
    363       - 'test/**'
    364       - 'Project.toml'
    365       - '.github/workflows/CI.yml'
    366 ```
    367 
    368 - [ ] **Step 2: Add macOS to the matrix**
    369 
    370 ```yaml
    371 matrix:
    372   version:
    373     - "1.11"
    374     - nightly
    375   os:
    376     - ubuntu-latest
    377     - macos-latest
    378   arch:
    379     - x64
    380 ```
    381 
    382 - [ ] **Step 3: Commit**
    383 
    384 ```bash
    385 git add .github/workflows/CI.yml
    386 git commit -m "Add CI path filters and macOS runner [skip ci]"
    387 ```
    388 
    389 ---
    390 
    391 ## Task 7: Clarify env parsing in test/runtests.jl
    392 
    393 **Files:**
    394 - Modify: `test/runtests.jl:33`
    395 
    396 Line 33 uses `!startswith(line, "#") || continue` which correctly skips comment lines (the `||` evaluates `continue` when the left side is `false`, i.e. when the line IS a comment). This is logically correct but reads awkwardly. Rewrite to the more idiomatic `&&` form for clarity.
    397 
    398 - [ ] **Step 1: Rewrite for readability**
    399 
    400 ```julia
    401 # Before (correct but hard to read):
    402 !startswith(line, "#") || continue
    403 # After (same logic, clearer):
    404 startswith(line, "#") && continue
    405 ```
    406 
    407 - [ ] **Step 2: Commit**
    408 
    409 ```bash
    410 git add test/runtests.jl
    411 git commit -m "Clarify env parsing idiom in test runner [skip ci]"
    412 ```
    413 
    414 ---
    415 
    416 ## Task 8: Create NEWS.md
    417 
    418 **Files:**
    419 - Create: `NEWS.md`
    420 
    421 - [ ] **Step 1: Create NEWS.md with v0.5.0 changelog**
    422 
    423 ```markdown
    424 # FinanceRoutines.jl Changelog
    425 
    426 ## v0.5.0
    427 
    428 ### Breaking changes
    429 - `ImportYields.jl` split into `GSW.jl` (yield curve model) and `BondPricing.jl` (bond math). No public API changes, but code that `include`d `ImportYields.jl` directly will need updating.
    430 - Missing-value flags expanded: `-999.0`, `-9999.0`, `-99.99` now treated as missing in GSW data (previously only `-999.99`). **Migration note:** if your downstream code used these numeric values (e.g., `-999.0` as an actual number), they will now silently become `missing`. Check any filtering or aggregation that might be affected.
    431 
    432 ### New features
    433 - `import_FF5`: Import Fama-French 5-factor model data (market, size, value, profitability, investment)
    434 - `import_FF_momentum`: Import Fama-French momentum factor
    435 - `calculate_portfolio_returns`: Value-weighted and equal-weighted portfolio return calculations
    436 - `diagnose`: Data quality diagnostics for financial DataFrames
    437 - WRDS connections now retry up to 3 times with exponential backoff
    438 
    439 ### Internal improvements
    440 - Removed broken `@log_msg` macro, replaced with `@debug`
    441 - Removed stale `export greet_FinanceRoutines` (function was never defined)
    442 - Removed `Logging` from dependencies (macros available from Base)
    443 - Ken French file parsing generalized with shared helpers for FF3/FF5 reuse
    444 - CI now filters by path (skips runs for docs-only changes)
    445 - CI matrix includes macOS
    446 ```
    447 
    448 - [ ] **Step 2: Commit**
    449 
    450 ```bash
    451 git add NEWS.md
    452 git commit -m "Add NEWS.md for v0.5.0 [skip ci]"
    453 ```
    454 
    455 ---
    456 
    457 ## Task 9: Add Fama-French 5-factor and Momentum imports
    458 
    459 **Files:**
    460 - Create: `src/ImportFamaFrench5.jl`
    461 - Modify: `src/FinanceRoutines.jl` (add include + exports)
    462 - Create: `test/UnitTests/FF5.jl`
    463 - Modify: `test/runtests.jl` (add "FF5" to testsuite)
    464 
    465 The FF5 and momentum files follow the same zip+CSV format as FF3 on Ken French's site.
    466 
    467 - [ ] **Step 1: Write failing tests**
    468 
    469 ```julia
    470 # test/UnitTests/FF5.jl
    471 @testset "Importing Fama-French 5 factors and Momentum" begin
    472     import Dates
    473 
    474     # FF5 monthly
    475     df_FF5_monthly = import_FF5(frequency=:monthly)
    476     @test names(df_FF5_monthly) == ["datem", "mktrf", "smb", "hml", "rmw", "cma", "rf"]
    477     @test nrow(df_FF5_monthly) >= (Dates.year(Dates.today()) - 1963 - 1) * 12
    478 
    479     # FF5 annual
    480     df_FF5_annual = import_FF5(frequency=:annual)
    481     @test names(df_FF5_annual) == ["datey", "mktrf", "smb", "hml", "rmw", "cma", "rf"]
    482     @test nrow(df_FF5_annual) >= Dates.year(Dates.today()) - 1963 - 2
    483 
    484     # FF5 daily
    485     df_FF5_daily = import_FF5(frequency=:daily)
    486     @test names(df_FF5_daily) == ["date", "mktrf", "smb", "hml", "rmw", "cma", "rf"]
    487     @test nrow(df_FF5_daily) >= 15_000
    488 
    489     # Momentum monthly
    490     df_mom_monthly = import_FF_momentum(frequency=:monthly)
    491     @test "mom" in names(df_mom_monthly)
    492     @test nrow(df_mom_monthly) > 1000
    493 end
    494 ```
    495 
    496 - [ ] **Step 2: Run tests to verify they fail**
    497 
    498 Expected: `import_FF5` and `import_FF_momentum` not defined
    499 
    500 - [ ] **Step 3: Generalize `_parse_ff_annual` and `_parse_ff_monthly` to accept `col_names`**
    501 
    502 Before writing FF5, first update the existing parsers in `src/ImportFamaFrench.jl` to accept a `col_names` keyword argument. Default to the FF3 column names so `import_FF3` continues to work unchanged.
    503 
    504 ```julia
    505 # In _parse_ff_annual:
    506 function _parse_ff_annual(zip_file; types=nothing, col_names=[:datey, :mktrf, :smb, :hml, :rf])
    507     # ... existing logic ...
    508     return CSV.File(...) |> DataFrame |> df -> rename!(df, col_names)
    509 end
    510 
    511 # In _parse_ff_monthly:
    512 function _parse_ff_monthly(zip_file; types=nothing, col_names=[:datem, :mktrf, :smb, :hml, :rf])
    513     # ... existing logic ...
    514     return CSV.File(...) |> DataFrame |> df -> rename!(df, col_names)
    515 end
    516 ```
    517 
    518 Also extract a shared `_download_and_parse_ff_zip` helper to DRY up the download+zip+parse logic shared by FF3 and FF5.
    519 
    520 - [ ] **Step 4: Run existing KenFrench tests to verify no regression**
    521 
    522 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/KenFrench.jl")'`
    523 Expected: PASS (existing FF3 behavior unchanged)
    524 
    525 - [ ] **Step 5: Implement `import_FF5` in `src/ImportFamaFrench5.jl`**
    526 
    527 The FF5 file URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip`
    528 Daily URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_daily_CSV.zip`
    529 
    530 Uses the generalized `_download_and_parse_ff_zip` helper with 7-column names:
    531 
    532 ```julia
    533 function import_FF5(; frequency::Symbol=:monthly)
    534     ff_col_classes = [String7, Float64, Float64, Float64, Float64, Float64, Float64]
    535     url_mth_yr = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip"
    536     url_daily  = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_daily_CSV.zip"
    537     col_names_mth = [:datem, :mktrf, :smb, :hml, :rmw, :cma, :rf]
    538     col_names_yr  = [:datey, :mktrf, :smb, :hml, :rmw, :cma, :rf]
    539     col_names_day = [:date, :mktrf, :smb, :hml, :rmw, :cma, :rf]
    540     # ... uses shared helper
    541 end
    542 ```
    543 
    544 - [ ] **Step 4: Implement `import_FF_momentum`**
    545 
    546 Momentum URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_CSV.zip`
    547 Daily URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_daily_CSV.zip`
    548 
    549 Single factor file, columns: date, mom
    550 
    551 - [ ] **Step 5: Add exports to FinanceRoutines.jl**
    552 
    553 ```julia
    554 include("ImportFamaFrench5.jl")
    555 export import_FF5, import_FF_momentum
    556 ```
    557 
    558 - [ ] **Step 6: Run tests**
    559 
    560 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/FF5.jl")'`
    561 Expected: PASS
    562 
    563 - [ ] **Step 7: Commit**
    564 
    565 ```bash
    566 git add src/ImportFamaFrench5.jl src/FinanceRoutines.jl test/UnitTests/FF5.jl test/runtests.jl
    567 git commit -m "Add import_FF5 and import_FF_momentum for 5-factor model and momentum"
    568 ```
    569 
    570 ---
    571 
    572 ## Task 10: Add portfolio return calculations
    573 
    574 **Files:**
    575 - Create: `src/PortfolioUtils.jl`
    576 - Modify: `src/FinanceRoutines.jl` (add include + exports)
    577 - Create: `test/UnitTests/PortfolioUtils.jl`
    578 - Modify: `test/runtests.jl`
    579 
    580 - [ ] **Step 1: Write failing tests**
    581 
    582 ```julia
    583 # test/UnitTests/PortfolioUtils.jl
    584 @testset "Portfolio Return Calculations" begin
    585     import Dates: Date, Month
    586     import DataFrames: DataFrame, groupby, combine, nrow, transform!
    587 
    588     # Create test data: 3 stocks, 12 months
    589     dates = repeat(Date(2020,1,1):Month(1):Date(2020,12,1), inner=3)
    590     df = DataFrame(
    591         datem = dates,
    592         permno = repeat([1, 2, 3], 12),
    593         ret = rand(36) .* 0.1 .- 0.05,
    594         mktcap = [100.0, 200.0, 300.0, # weights sum to 600
    595                   repeat([100.0, 200.0, 300.0], 11)...]
    596     )
    597 
    598     # Equal-weighted returns
    599     df_ew = calculate_portfolio_returns(df, :ret, :datem; weighting=:equal)
    600     @test nrow(df_ew) == 12
    601     @test "port_ret" in names(df_ew)
    602 
    603     # Value-weighted returns
    604     df_vw = calculate_portfolio_returns(df, :ret, :datem;
    605                                          weighting=:value, weight_col=:mktcap)
    606     @test nrow(df_vw) == 12
    607     @test "port_ret" in names(df_vw)
    608 
    609     # Grouped portfolios (e.g., by size quintile)
    610     df.group = repeat([1, 1, 2], 12)
    611     df_grouped = calculate_portfolio_returns(df, :ret, :datem;
    612                                               weighting=:value, weight_col=:mktcap,
    613                                               groupby=:group)
    614     @test nrow(df_grouped) == 24  # 12 months x 2 groups
    615 end
    616 ```
    617 
    618 - [ ] **Step 2: Implement `calculate_portfolio_returns`**
    619 
    620 ```julia
    621 """
    622     calculate_portfolio_returns(df, ret_col, date_col;
    623         weighting=:value, weight_col=nothing, groupby=nothing)
    624 
    625 Calculate portfolio returns from individual stock returns.
    626 
    627 # Arguments
    628 - `df::DataFrame`: Panel data with stock returns
    629 - `ret_col::Symbol`: Column name for returns
    630 - `date_col::Symbol`: Column name for dates
    631 - `weighting::Symbol`: `:equal` or `:value`
    632 - `weight_col::Union{Nothing,Symbol}`: Column for weights (required if weighting=:value)
    633 - `groupby::Union{Nothing,Symbol,Vector{Symbol}}`: Optional grouping columns
    634 
    635 # Returns
    636 - `DataFrame`: Portfolio returns by date (and group if specified)
    637 """
    638 function calculate_portfolio_returns(df::AbstractDataFrame, ret_col::Symbol, date_col::Symbol;
    639     weighting::Symbol=:value, weight_col::Union{Nothing,Symbol}=nothing,
    640     groupby::Union{Nothing,Symbol,Vector{Symbol}}=nothing)
    641 
    642     if weighting == :value && isnothing(weight_col)
    643         throw(ArgumentError("weight_col required for value-weighted portfolios"))
    644     end
    645 
    646     group_cols = isnothing(groupby) ? [date_col] : vcat([date_col], groupby isa Symbol ? [groupby] : groupby)
    647 
    648     grouped = DataFrames.groupby(df, group_cols)
    649 
    650     if weighting == :equal
    651         return combine(grouped, ret_col => (r -> mean(skipmissing(r))) => :port_ret)
    652     else
    653         return combine(grouped,
    654             [ret_col, weight_col] => ((r, w) -> begin
    655                 valid = .!ismissing.(r) .& .!ismissing.(w)
    656                 any(valid) || return missing
    657                 rv, wv = r[valid], w[valid]
    658                 sum(rv .* wv) / sum(wv)
    659             end) => :port_ret)
    660     end
    661 end
    662 ```
    663 
    664 - [ ] **Step 3: Add dependencies and imports**
    665 
    666 First, move `Statistics` from `[extras]` to `[deps]` in `Project.toml` (it's currently test-only but `calculate_portfolio_returns` uses `mean` at runtime). Add the UUID line to `[deps]`:
    667 ```toml
    668 Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
    669 ```
    670 Keep it in `[extras]` too (valid and harmless in Julia 1.10+).
    671 
    672 Then update `src/FinanceRoutines.jl`:
    673 - Add `import Statistics: mean` to the imports
    674 - Add `combine` to the `import DataFrames:` line (used by `calculate_portfolio_returns`)
    675 - Add include and export:
    676 ```julia
    677 include("PortfolioUtils.jl")
    678 export calculate_portfolio_returns
    679 ```
    680 
    681 - [ ] **Step 4: Run tests**
    682 
    683 Expected: PASS
    684 
    685 - [ ] **Step 5: Commit**
    686 
    687 ```bash
    688 git add src/PortfolioUtils.jl src/FinanceRoutines.jl test/UnitTests/PortfolioUtils.jl test/runtests.jl
    689 git commit -m "Add calculate_portfolio_returns for equal/value-weighted portfolios"
    690 ```
    691 
    692 ---
    693 
    694 ## Task 11: Add data quality diagnostics
    695 
    696 **Files:**
    697 - Create: `src/Diagnostics.jl`
    698 - Modify: `src/FinanceRoutines.jl` (add include + exports)
    699 - Create: `test/UnitTests/Diagnostics.jl`
    700 - Modify: `test/runtests.jl`
    701 
    702 - [ ] **Step 1: Write failing tests**
    703 
    704 ```julia
    705 # test/UnitTests/Diagnostics.jl
    706 @testset "Data Quality Diagnostics" begin
    707     import DataFrames: DataFrame, allowmissing!
    708 
    709     # Create test data with known issues
    710     df = DataFrame(
    711         permno = [1, 1, 1, 2, 2, 2],
    712         date = [Date(2020,1,1), Date(2020,2,1), Date(2020,2,1),  # duplicate for permno 1
    713                 Date(2020,1,1), Date(2020,3,1), Date(2020,4,1)],  # gap for permno 2
    714         ret = [0.05, missing, 0.03, -1.5, 0.02, 150.0],  # suspicious: -1.5, 150.0
    715         prc = [10.0, 20.0, 20.0, -5.0, 30.0, 40.0]  # negative price
    716     )
    717     allowmissing!(df, :ret)
    718 
    719     report = diagnose(df)
    720 
    721     @test haskey(report, :missing_rates)
    722     @test haskey(report, :suspicious_values)
    723     @test haskey(report, :duplicate_keys)
    724     @test report[:missing_rates][:ret] > 0
    725     @test length(report[:suspicious_values]) > 0
    726 end
    727 ```
    728 
    729 - [ ] **Step 2: Implement `diagnose`**
    730 
    731 ```julia
    732 """
    733     diagnose(df; id_col=:permno, date_col=:date, ret_col=:ret, price_col=:prc)
    734 
    735 Run data quality diagnostics on a financial DataFrame.
    736 
    737 Returns a Dict with:
    738 - `:missing_rates` — fraction missing per column
    739 - `:suspicious_values` — rows with returns > 100% or < -100%, negative prices
    740 - `:duplicate_keys` — duplicate (id, date) pairs
    741 - `:nrow`, `:ncol` — dimensions
    742 """
    743 function diagnose(df::AbstractDataFrame;
    744     id_col::Symbol=:permno, date_col::Symbol=:date,
    745     ret_col::Union{Nothing,Symbol}=:ret,
    746     price_col::Union{Nothing,Symbol}=:prc)
    747 
    748     report = Dict{Symbol, Any}()
    749     report[:nrow] = nrow(df)
    750     report[:ncol] = ncol(df)
    751 
    752     # Missing rates
    753     missing_rates = Dict{Symbol, Float64}()
    754     for col in names(df)
    755         col_sym = Symbol(col)
    756         missing_rates[col_sym] = count(ismissing, df[!, col]) / nrow(df)
    757     end
    758     report[:missing_rates] = missing_rates
    759 
    760     # Duplicate keys
    761     if id_col in propertynames(df) && date_col in propertynames(df)
    762         dup_count = nrow(df) - nrow(unique(df, [id_col, date_col]))
    763         report[:duplicate_keys] = dup_count
    764     end
    765 
    766     # Suspicious values
    767     suspicious = String[]
    768     if !isnothing(ret_col) && ret_col in propertynames(df)
    769         n_extreme = count(r -> !ismissing(r) && (r > 1.0 || r < -1.0), df[!, ret_col])
    770         n_extreme > 0 && push!(suspicious, "$n_extreme returns outside [-100%, +100%]")
    771     end
    772     if !isnothing(price_col) && price_col in propertynames(df)
    773         n_neg = count(r -> !ismissing(r) && r < 0, df[!, price_col])
    774         n_neg > 0 && push!(suspicious, "$n_neg negative prices (CRSP convention for bid/ask midpoint)")
    775     end
    776     report[:suspicious_values] = suspicious
    777 
    778     return report
    779 end
    780 ```
    781 
    782 - [ ] **Step 3: Add to FinanceRoutines.jl**
    783 
    784 Update the `import DataFrames:` line to also include `ncol` and `unique` (DataFrames-specific `unique` for duplicate key detection by column). Then add:
    785 
    786 ```julia
    787 include("Diagnostics.jl")
    788 export diagnose
    789 ```
    790 
    791 - [ ] **Step 4: Run tests**
    792 
    793 Expected: PASS
    794 
    795 - [ ] **Step 5: Commit**
    796 
    797 ```bash
    798 git add src/Diagnostics.jl src/FinanceRoutines.jl test/UnitTests/Diagnostics.jl test/runtests.jl
    799 git commit -m "Add diagnose() for data quality diagnostics on financial DataFrames"
    800 ```
    801 
    802 ---
    803 
    804 ## Task 12: Version bump and final integration
    805 
    806 **Files:**
    807 - Modify: `Project.toml` — version to "0.5.0", add Statistics to [deps] if needed for PortfolioUtils
    808 - Modify: `NEWS.md` — finalize
    809 - Modify: `test/runtests.jl` — ensure all new test suites are listed
    810 
    811 - [ ] **Step 1: Update Project.toml version**
    812 
    813 Change `version = "0.4.5"` to `version = "0.5.0"`
    814 
    815 - [ ] **Step 2: Verify all dependencies are correct in Project.toml**
    816 
    817 Statistics should already be in `[deps]` (added in Task 10). Logging should already be removed (Task 1). Verify no stale entries.
    818 
    819 - [ ] **Step 3: Update test/runtests.jl testsuite list**
    820 
    821 ```julia
    822 const testsuite = [
    823     "KenFrench",
    824     "FF5",
    825     "WRDS",
    826     "betas",
    827     "Yields",
    828     "PortfolioUtils",
    829     "Diagnostics",
    830 ]
    831 ```
    832 
    833 - [ ] **Step 4: Run full test suite**
    834 
    835 Run: `julia --project=. -e 'using Pkg; Pkg.test()'`
    836 Expected: ALL PASS
    837 
    838 - [ ] **Step 5: Commit**
    839 
    840 ```bash
    841 git add Project.toml test/runtests.jl NEWS.md
    842 git commit -m "Bump version to v0.5.0, finalize test suite and changelog"
    843 ```
    844 
    845 ---
    846 
    847 ## Task 13: Tag release and update registry
    848 
    849 Follow the release workflow in CLAUDE.md:
    850 
    851 - [ ] **Step 1: Tag**
    852 
    853 ```bash
    854 git tag v0.5.0
    855 git push origin v0.5.0
    856 ```
    857 
    858 - [ ] **Step 2: Get tree SHA**
    859 
    860 ```bash
    861 git rev-parse v0.5.0^{tree}
    862 ```
    863 
    864 - [ ] **Step 3: Update LouLouLibs/loulouJL registry**
    865 
    866 Update `F/FinanceRoutines/Versions.toml`, `Deps.toml`, `Compat.toml` via `gh api`.
    867 
    868 ---
    869 
    870 ## Extensions deferred for user decision
    871 
    872 These were listed as extensions A–E. Tasks 9–11 cover B (FF5), E (diagnostics), and A (portfolio returns). The remaining two are:
    873 
    874 - **C: Event study utilities** — `event_study(events_df, returns_df; ...)` computing CARs/BHARs. Can be added as Task 15 if desired.
    875 - **D: Treasury yield interpolation** — `treasury_zero_rate(date, maturity)` incorporating T-bill rates. Requires a new data source. Can be added as Task 16 if desired.
    876 
    877 Both are independent of the above tasks and can be planned separately.
	FinanceRoutines.jl Financial data routines for Julia
	Log \| Files \| Refs \| README \| LICENSE