2026-03-22-v0.5.0-hardening-and-extensions.md (28612B)
1 # FinanceRoutines.jl v0.5.0 — Hardening & Extensions 2 3 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. 4 5 **Goal:** Fix all identified quality/robustness issues, restructure ImportYields.jl, add CI path filtering, create NEWS.md, and implement extensions (FF5, portfolio returns, event studies, diagnostics) — releasing as v0.5.0. 6 7 **Architecture:** Fixes first (tasks 1–8), then extensions (tasks 9–11), then integration + release (tasks 12–13). Each task is independently testable and committable. The ImportYields.jl split (task 5) is the highest-risk refactor — it moves code into two new files without changing any public API. 8 9 **Tech Stack:** Julia 1.10+, LibPQ, DataFrames, CSV, Downloads, ZipFile, Roots, LinearAlgebra, FlexiJoins, BazerData, GitHub Actions 10 11 --- 12 13 ## File Map 14 15 ### Files to create 16 - `src/GSW.jl` — GSW parameter struct, yield/price/forward/return calculations, DataFrame wrappers 17 - `src/BondPricing.jl` — `bond_yield`, `bond_yield_excel`, day-count helpers 18 - `src/ImportFamaFrench5.jl` — FF5 + momentum import functions 19 - `src/PortfolioUtils.jl` — portfolio return calculations 20 - `src/Diagnostics.jl` — data quality diagnostics 21 - `test/UnitTests/FF5.jl` — tests for FF5/momentum imports 22 - `test/UnitTests/PortfolioUtils.jl` — tests for portfolio returns 23 - `test/UnitTests/Diagnostics.jl` — tests for diagnostics 24 - `NEWS.md` — changelog 25 26 ### Files to modify 27 - `src/FinanceRoutines.jl` — update includes/exports 28 - `src/Utilities.jl` — add retry logic, remove broken logging macro 29 - `src/ImportFamaFrench.jl` — make parsing more robust, refactor for FF5 reuse 30 - `src/ImportYields.jl` — DELETE (replaced by GSW.jl + BondPricing.jl) 31 - `src/ImportCRSP.jl` — expand missing-value flags in `_safe_parse_float` (actually in ImportYields.jl, moves to GSW.jl) 32 - `.github/workflows/CI.yml` — add path filters, consider macOS 33 - `Project.toml` — bump to 0.5.0 34 - `test/runtests.jl` — add new test suites 35 36 ### Files to delete 37 - `src/ImportYields.jl` — replaced by `src/GSW.jl` + `src/BondPricing.jl` 38 39 --- 40 41 ## Task 1: Remove broken logging macro & clean up Utilities.jl 42 43 **Files:** 44 - Modify: `src/Utilities.jl:70-102` 45 - Modify: `src/FinanceRoutines.jl:19` (remove Logging import if unused) 46 - Modify: `src/ImportCRSP.jl:303,381,400` (replace `@log_msg` calls) 47 48 - [ ] **Step 1: Check all usages of `@log_msg` and `log_with_level`** 49 50 Run: `grep -rn "log_msg\|log_with_level" src/` 51 52 - [ ] **Step 2: Replace `@log_msg` calls in ImportCRSP.jl with `@debug`** 53 54 In `src/ImportCRSP.jl`, replace the three `@log_msg` calls (lines 303, 381, 400) with `@debug`: 55 ```julia 56 # Before: 57 @log_msg "# -- GETTING MONTHLY STOCK FILE (CIZ) ... msf_v2" 58 # After: 59 @debug "Getting monthly stock file (CIZ) ... msf_v2" 60 ``` 61 62 - [ ] **Step 3: Remove `log_with_level` and `@log_msg` from Utilities.jl** 63 64 Delete lines 69–102 (the `log_with_level` function and `@log_msg` macro). 65 66 - [ ] **Step 4: Clean up Logging import and stale export in FinanceRoutines.jl** 67 68 Remove the entire `import Logging: ...` line from `FinanceRoutines.jl` (line 19). `@debug` and `@warn` are available from `Base.CoreLogging` without explicit import. Also remove `Logging` from the `[deps]` section of `Project.toml`. Also remove the stale `export greet_FinanceRoutines` (line 45) — this function is not defined anywhere. 69 70 - [ ] **Step 5: Run tests to verify nothing breaks** 71 72 Run: `julia --project=. -e 'using Pkg; Pkg.test()'` (with `[skip ci]` since this is internal cleanup) 73 74 - [ ] **Step 6: Commit** 75 76 ```bash 77 git add src/Utilities.jl src/ImportCRSP.jl src/FinanceRoutines.jl 78 git commit -m "Remove broken @log_msg macro, replace with @debug [skip ci]" 79 ``` 80 81 --- 82 83 ## Task 2: Add WRDS connection retry logic 84 85 **Files:** 86 - Modify: `src/Utilities.jl:19-29` (the `open_wrds_pg(user, password)` method) 87 88 - [ ] **Step 1: Write test for retry behavior** 89 90 This is hard to unit test (requires WRDS), so we verify by code review. The retry logic should: 91 - Attempt up to 3 connections 92 - Exponential backoff: 1s, 2s, 4s 93 - Log warnings on retry 94 - Rethrow on final failure 95 96 - [ ] **Step 2: Add retry wrapper to `open_wrds_pg`** 97 98 Replace the `open_wrds_pg(user, password)` function: 99 100 ```julia 101 function open_wrds_pg(user::AbstractString, password::AbstractString; 102 max_retries::Int=3, base_delay::Float64=1.0) 103 conn_str = """ 104 host = wrds-pgdata.wharton.upenn.edu 105 port = 9737 106 user='$user' 107 password='$password' 108 sslmode = 'require' dbname = wrds 109 """ 110 for attempt in 1:max_retries 111 try 112 return Connection(conn_str) 113 catch e 114 if attempt == max_retries 115 rethrow(e) 116 end 117 delay = base_delay * 2^(attempt - 1) 118 @warn "WRDS connection attempt $attempt/$max_retries failed, retrying in $(delay)s" exception=e 119 sleep(delay) 120 end 121 end 122 end 123 ``` 124 125 - [ ] **Step 3: Verify the package loads and existing tests still pass** 126 127 Run: `julia --project=. -e 'using FinanceRoutines'` 128 129 - [ ] **Step 4: Commit** 130 131 ```bash 132 git add src/Utilities.jl 133 git commit -m "Add retry logic with exponential backoff for WRDS connections" 134 ``` 135 136 --- 137 138 ## Task 3: Expand missing-value flags in `_safe_parse_float` 139 140 **Files:** 141 - Modify: `src/ImportYields.jl:281-309` (will move to GSW.jl in task 5, but fix first) 142 143 - [ ] **Step 1: Write test for expanded flags** 144 145 Add to `test/UnitTests/Yields.jl` inside a new testset: 146 147 ```julia 148 @testset "Missing value flag handling" begin 149 @test ismissing(FinanceRoutines._safe_parse_float(-999.99)) 150 @test ismissing(FinanceRoutines._safe_parse_float(-999.0)) 151 @test ismissing(FinanceRoutines._safe_parse_float(-9999.0)) 152 @test ismissing(FinanceRoutines._safe_parse_float(-99.99)) 153 @test !ismissing(FinanceRoutines._safe_parse_float(-5.0)) # legitimate negative 154 @test FinanceRoutines._safe_parse_float(3.14) ≈ 3.14 155 @test ismissing(FinanceRoutines._safe_parse_float("")) 156 @test ismissing(FinanceRoutines._safe_parse_float(missing)) 157 end 158 ``` 159 160 - [ ] **Step 2: Run test to verify it fails** 161 162 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'` 163 Expected: FAIL on `-999.0` and `-9999.0` 164 165 - [ ] **Step 3: Update `_safe_parse_float`** 166 167 ```julia 168 function _safe_parse_float(value) 169 if ismissing(value) || value == "" 170 return missing 171 end 172 173 if value isa AbstractString 174 parsed = tryparse(Float64, strip(value)) 175 if isnothing(parsed) 176 return missing 177 end 178 value = parsed 179 end 180 181 try 182 numeric_value = Float64(value) 183 # Common missing data flags in economic/financial datasets 184 if numeric_value in (-999.99, -999.0, -9999.0, -99.99) 185 return missing 186 end 187 return numeric_value 188 catch 189 return missing 190 end 191 end 192 ``` 193 194 - [ ] **Step 4: Run test to verify it passes** 195 196 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'` 197 Expected: PASS 198 199 - [ ] **Step 5: Commit** 200 201 ```bash 202 git add src/ImportYields.jl test/UnitTests/Yields.jl 203 git commit -m "Expand missing-value flags to cover -999, -9999, -99.99" 204 ``` 205 206 --- 207 208 ## Task 4: Make Ken French parsing more robust 209 210 **Files:** 211 - Modify: `src/ImportFamaFrench.jl:118-159` (`_parse_ff_annual`) 212 - Modify: `src/ImportFamaFrench.jl:164-205` (`_parse_ff_monthly`) 213 214 - [ ] **Step 1: Refactor `_parse_ff_annual` to use data-pattern detection** 215 216 Instead of `occursin(r"Annual Factors", line)`, detect the annual section by: 217 1. Skip past the monthly data (lines starting with 6-digit YYYYMM) 218 2. Find the next block of lines starting with 4-digit YYYY 219 220 ```julia 221 function _parse_ff_annual(zip_file; types=nothing) 222 file_lines = split(String(read(zip_file)), '\n') 223 224 # Find annual data: lines starting with a 4-digit year that are NOT 6-digit monthly dates 225 # Annual section comes after monthly section 226 found_monthly = false 227 past_monthly = false 228 lines = String[] 229 230 for line in file_lines 231 stripped = strip(line) 232 233 # Track when we're past the monthly data section 234 if !found_monthly && occursin(r"^\s*\d{6}", stripped) 235 found_monthly = true 236 continue 237 end 238 239 if found_monthly && !past_monthly 240 # Still in monthly section until we hit a non-data line 241 if occursin(r"^\s*\d{6}", stripped) 242 continue 243 elseif !occursin(r"^\s*$", stripped) && !occursin(r"^\s*\d", stripped) 244 past_monthly = true 245 continue 246 else 247 continue 248 end 249 end 250 251 if past_monthly 252 # Look for annual data lines (4-digit year) 253 if occursin(r"^\s*\d{4}\s*,", stripped) 254 push!(lines, replace(stripped, r"[\r]" => "")) 255 elseif !isempty(lines) && occursin(r"^\s*$", stripped) 256 break # End of annual section 257 end 258 end 259 end 260 261 if isempty(lines) 262 error("Annual Factors section not found in file") 263 end 264 265 lines_buffer = IOBuffer(join(lines, "\n")) 266 return CSV.File(lines_buffer, header=false, delim=",", ntasks=1, types=types) |> DataFrame |> 267 df -> rename!(df, [:datey, :mktrf, :smb, :hml, :rf]) 268 end 269 ``` 270 271 - [ ] **Step 2: Run Ken French tests** 272 273 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/KenFrench.jl")'` 274 Expected: PASS 275 276 - [ ] **Step 3: Commit** 277 278 ```bash 279 git add src/ImportFamaFrench.jl 280 git commit -m "Make FF3 parsing use data patterns instead of hardcoded headers" 281 ``` 282 283 --- 284 285 ## Task 5: Split ImportYields.jl into GSW.jl + BondPricing.jl 286 287 This is the largest refactor. No API changes — just file reorganization. 288 289 **Files:** 290 - Create: `src/GSW.jl` — everything from ImportYields.jl lines 1–1368 (GSWParameters struct, all gsw_* functions, DataFrame wrappers, helpers) 291 - Create: `src/BondPricing.jl` — everything from ImportYields.jl lines 1371–1694 (bond_yield, bond_yield_excel, day-count functions) 292 - Delete: `src/ImportYields.jl` 293 - Modify: `src/FinanceRoutines.jl` — update includes 294 295 - [ ] **Step 1: Create `src/GSW.jl`** 296 297 Copy lines 1–1368 from ImportYields.jl into `src/GSW.jl`. This includes: 298 - `GSWParameters` struct and constructors 299 - `is_three_factor_model`, `_extract_params` 300 - `import_gsw_parameters`, `_clean_gsw_data`, `_safe_parse_float`, `_validate_gsw_data` 301 - `gsw_yield`, `gsw_price`, `gsw_forward_rate` 302 - `gsw_yield_curve`, `gsw_price_curve` 303 - `gsw_return`, `gsw_excess_return` 304 - `add_yields!`, `add_prices!`, `add_returns!`, `add_excess_returns!` 305 - `gsw_curve_snapshot` 306 - `_validate_gsw_dataframe`, `_maturity_to_column_name` 307 308 - [ ] **Step 2: Create `src/BondPricing.jl`** 309 310 Copy lines 1371–1694 from ImportYields.jl into `src/BondPricing.jl`. This includes: 311 - `bond_yield_excel` 312 - `bond_yield` 313 - `_day_count_days` 314 - `_date_difference` 315 316 - [ ] **Step 3: Update `src/FinanceRoutines.jl`** 317 318 Replace `include("ImportYields.jl")` with: 319 ```julia 320 include("GSW.jl") 321 include("BondPricing.jl") 322 ``` 323 324 - [ ] **Step 4: Delete `src/ImportYields.jl`** 325 326 - [ ] **Step 5: Run full test suite to verify nothing broke** 327 328 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'` 329 Expected: All 70+ assertions PASS 330 331 - [ ] **Step 6: Commit** 332 333 ```bash 334 git add src/GSW.jl src/BondPricing.jl src/FinanceRoutines.jl 335 git rm src/ImportYields.jl 336 git commit -m "Split ImportYields.jl into GSW.jl and BondPricing.jl (no API changes)" 337 ``` 338 339 --- 340 341 ## Task 6: Add CI path filters and macOS runner 342 343 **Files:** 344 - Modify: `.github/workflows/CI.yml` 345 346 - [ ] **Step 1: Add path filters to CI.yml** 347 348 ```yaml 349 on: 350 push: 351 branches: 352 - main 353 tags: 354 - "*" 355 paths: 356 - 'src/**' 357 - 'test/**' 358 - 'Project.toml' 359 - '.github/workflows/CI.yml' 360 pull_request: 361 paths: 362 - 'src/**' 363 - 'test/**' 364 - 'Project.toml' 365 - '.github/workflows/CI.yml' 366 ``` 367 368 - [ ] **Step 2: Add macOS to the matrix** 369 370 ```yaml 371 matrix: 372 version: 373 - "1.11" 374 - nightly 375 os: 376 - ubuntu-latest 377 - macos-latest 378 arch: 379 - x64 380 ``` 381 382 - [ ] **Step 3: Commit** 383 384 ```bash 385 git add .github/workflows/CI.yml 386 git commit -m "Add CI path filters and macOS runner [skip ci]" 387 ``` 388 389 --- 390 391 ## Task 7: Clarify env parsing in test/runtests.jl 392 393 **Files:** 394 - Modify: `test/runtests.jl:33` 395 396 Line 33 uses `!startswith(line, "#") || continue` which correctly skips comment lines (the `||` evaluates `continue` when the left side is `false`, i.e. when the line IS a comment). This is logically correct but reads awkwardly. Rewrite to the more idiomatic `&&` form for clarity. 397 398 - [ ] **Step 1: Rewrite for readability** 399 400 ```julia 401 # Before (correct but hard to read): 402 !startswith(line, "#") || continue 403 # After (same logic, clearer): 404 startswith(line, "#") && continue 405 ``` 406 407 - [ ] **Step 2: Commit** 408 409 ```bash 410 git add test/runtests.jl 411 git commit -m "Clarify env parsing idiom in test runner [skip ci]" 412 ``` 413 414 --- 415 416 ## Task 8: Create NEWS.md 417 418 **Files:** 419 - Create: `NEWS.md` 420 421 - [ ] **Step 1: Create NEWS.md with v0.5.0 changelog** 422 423 ```markdown 424 # FinanceRoutines.jl Changelog 425 426 ## v0.5.0 427 428 ### Breaking changes 429 - `ImportYields.jl` split into `GSW.jl` (yield curve model) and `BondPricing.jl` (bond math). No public API changes, but code that `include`d `ImportYields.jl` directly will need updating. 430 - Missing-value flags expanded: `-999.0`, `-9999.0`, `-99.99` now treated as missing in GSW data (previously only `-999.99`). **Migration note:** if your downstream code used these numeric values (e.g., `-999.0` as an actual number), they will now silently become `missing`. Check any filtering or aggregation that might be affected. 431 432 ### New features 433 - `import_FF5`: Import Fama-French 5-factor model data (market, size, value, profitability, investment) 434 - `import_FF_momentum`: Import Fama-French momentum factor 435 - `calculate_portfolio_returns`: Value-weighted and equal-weighted portfolio return calculations 436 - `diagnose`: Data quality diagnostics for financial DataFrames 437 - WRDS connections now retry up to 3 times with exponential backoff 438 439 ### Internal improvements 440 - Removed broken `@log_msg` macro, replaced with `@debug` 441 - Removed stale `export greet_FinanceRoutines` (function was never defined) 442 - Removed `Logging` from dependencies (macros available from Base) 443 - Ken French file parsing generalized with shared helpers for FF3/FF5 reuse 444 - CI now filters by path (skips runs for docs-only changes) 445 - CI matrix includes macOS 446 ``` 447 448 - [ ] **Step 2: Commit** 449 450 ```bash 451 git add NEWS.md 452 git commit -m "Add NEWS.md for v0.5.0 [skip ci]" 453 ``` 454 455 --- 456 457 ## Task 9: Add Fama-French 5-factor and Momentum imports 458 459 **Files:** 460 - Create: `src/ImportFamaFrench5.jl` 461 - Modify: `src/FinanceRoutines.jl` (add include + exports) 462 - Create: `test/UnitTests/FF5.jl` 463 - Modify: `test/runtests.jl` (add "FF5" to testsuite) 464 465 The FF5 and momentum files follow the same zip+CSV format as FF3 on Ken French's site. 466 467 - [ ] **Step 1: Write failing tests** 468 469 ```julia 470 # test/UnitTests/FF5.jl 471 @testset "Importing Fama-French 5 factors and Momentum" begin 472 import Dates 473 474 # FF5 monthly 475 df_FF5_monthly = import_FF5(frequency=:monthly) 476 @test names(df_FF5_monthly) == ["datem", "mktrf", "smb", "hml", "rmw", "cma", "rf"] 477 @test nrow(df_FF5_monthly) >= (Dates.year(Dates.today()) - 1963 - 1) * 12 478 479 # FF5 annual 480 df_FF5_annual = import_FF5(frequency=:annual) 481 @test names(df_FF5_annual) == ["datey", "mktrf", "smb", "hml", "rmw", "cma", "rf"] 482 @test nrow(df_FF5_annual) >= Dates.year(Dates.today()) - 1963 - 2 483 484 # FF5 daily 485 df_FF5_daily = import_FF5(frequency=:daily) 486 @test names(df_FF5_daily) == ["date", "mktrf", "smb", "hml", "rmw", "cma", "rf"] 487 @test nrow(df_FF5_daily) >= 15_000 488 489 # Momentum monthly 490 df_mom_monthly = import_FF_momentum(frequency=:monthly) 491 @test "mom" in names(df_mom_monthly) 492 @test nrow(df_mom_monthly) > 1000 493 end 494 ``` 495 496 - [ ] **Step 2: Run tests to verify they fail** 497 498 Expected: `import_FF5` and `import_FF_momentum` not defined 499 500 - [ ] **Step 3: Generalize `_parse_ff_annual` and `_parse_ff_monthly` to accept `col_names`** 501 502 Before writing FF5, first update the existing parsers in `src/ImportFamaFrench.jl` to accept a `col_names` keyword argument. Default to the FF3 column names so `import_FF3` continues to work unchanged. 503 504 ```julia 505 # In _parse_ff_annual: 506 function _parse_ff_annual(zip_file; types=nothing, col_names=[:datey, :mktrf, :smb, :hml, :rf]) 507 # ... existing logic ... 508 return CSV.File(...) |> DataFrame |> df -> rename!(df, col_names) 509 end 510 511 # In _parse_ff_monthly: 512 function _parse_ff_monthly(zip_file; types=nothing, col_names=[:datem, :mktrf, :smb, :hml, :rf]) 513 # ... existing logic ... 514 return CSV.File(...) |> DataFrame |> df -> rename!(df, col_names) 515 end 516 ``` 517 518 Also extract a shared `_download_and_parse_ff_zip` helper to DRY up the download+zip+parse logic shared by FF3 and FF5. 519 520 - [ ] **Step 4: Run existing KenFrench tests to verify no regression** 521 522 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/KenFrench.jl")'` 523 Expected: PASS (existing FF3 behavior unchanged) 524 525 - [ ] **Step 5: Implement `import_FF5` in `src/ImportFamaFrench5.jl`** 526 527 The FF5 file URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip` 528 Daily URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_daily_CSV.zip` 529 530 Uses the generalized `_download_and_parse_ff_zip` helper with 7-column names: 531 532 ```julia 533 function import_FF5(; frequency::Symbol=:monthly) 534 ff_col_classes = [String7, Float64, Float64, Float64, Float64, Float64, Float64] 535 url_mth_yr = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip" 536 url_daily = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_daily_CSV.zip" 537 col_names_mth = [:datem, :mktrf, :smb, :hml, :rmw, :cma, :rf] 538 col_names_yr = [:datey, :mktrf, :smb, :hml, :rmw, :cma, :rf] 539 col_names_day = [:date, :mktrf, :smb, :hml, :rmw, :cma, :rf] 540 # ... uses shared helper 541 end 542 ``` 543 544 - [ ] **Step 4: Implement `import_FF_momentum`** 545 546 Momentum URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_CSV.zip` 547 Daily URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_daily_CSV.zip` 548 549 Single factor file, columns: date, mom 550 551 - [ ] **Step 5: Add exports to FinanceRoutines.jl** 552 553 ```julia 554 include("ImportFamaFrench5.jl") 555 export import_FF5, import_FF_momentum 556 ``` 557 558 - [ ] **Step 6: Run tests** 559 560 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/FF5.jl")'` 561 Expected: PASS 562 563 - [ ] **Step 7: Commit** 564 565 ```bash 566 git add src/ImportFamaFrench5.jl src/FinanceRoutines.jl test/UnitTests/FF5.jl test/runtests.jl 567 git commit -m "Add import_FF5 and import_FF_momentum for 5-factor model and momentum" 568 ``` 569 570 --- 571 572 ## Task 10: Add portfolio return calculations 573 574 **Files:** 575 - Create: `src/PortfolioUtils.jl` 576 - Modify: `src/FinanceRoutines.jl` (add include + exports) 577 - Create: `test/UnitTests/PortfolioUtils.jl` 578 - Modify: `test/runtests.jl` 579 580 - [ ] **Step 1: Write failing tests** 581 582 ```julia 583 # test/UnitTests/PortfolioUtils.jl 584 @testset "Portfolio Return Calculations" begin 585 import Dates: Date, Month 586 import DataFrames: DataFrame, groupby, combine, nrow, transform! 587 588 # Create test data: 3 stocks, 12 months 589 dates = repeat(Date(2020,1,1):Month(1):Date(2020,12,1), inner=3) 590 df = DataFrame( 591 datem = dates, 592 permno = repeat([1, 2, 3], 12), 593 ret = rand(36) .* 0.1 .- 0.05, 594 mktcap = [100.0, 200.0, 300.0, # weights sum to 600 595 repeat([100.0, 200.0, 300.0], 11)...] 596 ) 597 598 # Equal-weighted returns 599 df_ew = calculate_portfolio_returns(df, :ret, :datem; weighting=:equal) 600 @test nrow(df_ew) == 12 601 @test "port_ret" in names(df_ew) 602 603 # Value-weighted returns 604 df_vw = calculate_portfolio_returns(df, :ret, :datem; 605 weighting=:value, weight_col=:mktcap) 606 @test nrow(df_vw) == 12 607 @test "port_ret" in names(df_vw) 608 609 # Grouped portfolios (e.g., by size quintile) 610 df.group = repeat([1, 1, 2], 12) 611 df_grouped = calculate_portfolio_returns(df, :ret, :datem; 612 weighting=:value, weight_col=:mktcap, 613 groupby=:group) 614 @test nrow(df_grouped) == 24 # 12 months x 2 groups 615 end 616 ``` 617 618 - [ ] **Step 2: Implement `calculate_portfolio_returns`** 619 620 ```julia 621 """ 622 calculate_portfolio_returns(df, ret_col, date_col; 623 weighting=:value, weight_col=nothing, groupby=nothing) 624 625 Calculate portfolio returns from individual stock returns. 626 627 # Arguments 628 - `df::DataFrame`: Panel data with stock returns 629 - `ret_col::Symbol`: Column name for returns 630 - `date_col::Symbol`: Column name for dates 631 - `weighting::Symbol`: `:equal` or `:value` 632 - `weight_col::Union{Nothing,Symbol}`: Column for weights (required if weighting=:value) 633 - `groupby::Union{Nothing,Symbol,Vector{Symbol}}`: Optional grouping columns 634 635 # Returns 636 - `DataFrame`: Portfolio returns by date (and group if specified) 637 """ 638 function calculate_portfolio_returns(df::AbstractDataFrame, ret_col::Symbol, date_col::Symbol; 639 weighting::Symbol=:value, weight_col::Union{Nothing,Symbol}=nothing, 640 groupby::Union{Nothing,Symbol,Vector{Symbol}}=nothing) 641 642 if weighting == :value && isnothing(weight_col) 643 throw(ArgumentError("weight_col required for value-weighted portfolios")) 644 end 645 646 group_cols = isnothing(groupby) ? [date_col] : vcat([date_col], groupby isa Symbol ? [groupby] : groupby) 647 648 grouped = DataFrames.groupby(df, group_cols) 649 650 if weighting == :equal 651 return combine(grouped, ret_col => (r -> mean(skipmissing(r))) => :port_ret) 652 else 653 return combine(grouped, 654 [ret_col, weight_col] => ((r, w) -> begin 655 valid = .!ismissing.(r) .& .!ismissing.(w) 656 any(valid) || return missing 657 rv, wv = r[valid], w[valid] 658 sum(rv .* wv) / sum(wv) 659 end) => :port_ret) 660 end 661 end 662 ``` 663 664 - [ ] **Step 3: Add dependencies and imports** 665 666 First, move `Statistics` from `[extras]` to `[deps]` in `Project.toml` (it's currently test-only but `calculate_portfolio_returns` uses `mean` at runtime). Add the UUID line to `[deps]`: 667 ```toml 668 Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" 669 ``` 670 Keep it in `[extras]` too (valid and harmless in Julia 1.10+). 671 672 Then update `src/FinanceRoutines.jl`: 673 - Add `import Statistics: mean` to the imports 674 - Add `combine` to the `import DataFrames:` line (used by `calculate_portfolio_returns`) 675 - Add include and export: 676 ```julia 677 include("PortfolioUtils.jl") 678 export calculate_portfolio_returns 679 ``` 680 681 - [ ] **Step 4: Run tests** 682 683 Expected: PASS 684 685 - [ ] **Step 5: Commit** 686 687 ```bash 688 git add src/PortfolioUtils.jl src/FinanceRoutines.jl test/UnitTests/PortfolioUtils.jl test/runtests.jl 689 git commit -m "Add calculate_portfolio_returns for equal/value-weighted portfolios" 690 ``` 691 692 --- 693 694 ## Task 11: Add data quality diagnostics 695 696 **Files:** 697 - Create: `src/Diagnostics.jl` 698 - Modify: `src/FinanceRoutines.jl` (add include + exports) 699 - Create: `test/UnitTests/Diagnostics.jl` 700 - Modify: `test/runtests.jl` 701 702 - [ ] **Step 1: Write failing tests** 703 704 ```julia 705 # test/UnitTests/Diagnostics.jl 706 @testset "Data Quality Diagnostics" begin 707 import DataFrames: DataFrame, allowmissing! 708 709 # Create test data with known issues 710 df = DataFrame( 711 permno = [1, 1, 1, 2, 2, 2], 712 date = [Date(2020,1,1), Date(2020,2,1), Date(2020,2,1), # duplicate for permno 1 713 Date(2020,1,1), Date(2020,3,1), Date(2020,4,1)], # gap for permno 2 714 ret = [0.05, missing, 0.03, -1.5, 0.02, 150.0], # suspicious: -1.5, 150.0 715 prc = [10.0, 20.0, 20.0, -5.0, 30.0, 40.0] # negative price 716 ) 717 allowmissing!(df, :ret) 718 719 report = diagnose(df) 720 721 @test haskey(report, :missing_rates) 722 @test haskey(report, :suspicious_values) 723 @test haskey(report, :duplicate_keys) 724 @test report[:missing_rates][:ret] > 0 725 @test length(report[:suspicious_values]) > 0 726 end 727 ``` 728 729 - [ ] **Step 2: Implement `diagnose`** 730 731 ```julia 732 """ 733 diagnose(df; id_col=:permno, date_col=:date, ret_col=:ret, price_col=:prc) 734 735 Run data quality diagnostics on a financial DataFrame. 736 737 Returns a Dict with: 738 - `:missing_rates` — fraction missing per column 739 - `:suspicious_values` — rows with returns > 100% or < -100%, negative prices 740 - `:duplicate_keys` — duplicate (id, date) pairs 741 - `:nrow`, `:ncol` — dimensions 742 """ 743 function diagnose(df::AbstractDataFrame; 744 id_col::Symbol=:permno, date_col::Symbol=:date, 745 ret_col::Union{Nothing,Symbol}=:ret, 746 price_col::Union{Nothing,Symbol}=:prc) 747 748 report = Dict{Symbol, Any}() 749 report[:nrow] = nrow(df) 750 report[:ncol] = ncol(df) 751 752 # Missing rates 753 missing_rates = Dict{Symbol, Float64}() 754 for col in names(df) 755 col_sym = Symbol(col) 756 missing_rates[col_sym] = count(ismissing, df[!, col]) / nrow(df) 757 end 758 report[:missing_rates] = missing_rates 759 760 # Duplicate keys 761 if id_col in propertynames(df) && date_col in propertynames(df) 762 dup_count = nrow(df) - nrow(unique(df, [id_col, date_col])) 763 report[:duplicate_keys] = dup_count 764 end 765 766 # Suspicious values 767 suspicious = String[] 768 if !isnothing(ret_col) && ret_col in propertynames(df) 769 n_extreme = count(r -> !ismissing(r) && (r > 1.0 || r < -1.0), df[!, ret_col]) 770 n_extreme > 0 && push!(suspicious, "$n_extreme returns outside [-100%, +100%]") 771 end 772 if !isnothing(price_col) && price_col in propertynames(df) 773 n_neg = count(r -> !ismissing(r) && r < 0, df[!, price_col]) 774 n_neg > 0 && push!(suspicious, "$n_neg negative prices (CRSP convention for bid/ask midpoint)") 775 end 776 report[:suspicious_values] = suspicious 777 778 return report 779 end 780 ``` 781 782 - [ ] **Step 3: Add to FinanceRoutines.jl** 783 784 Update the `import DataFrames:` line to also include `ncol` and `unique` (DataFrames-specific `unique` for duplicate key detection by column). Then add: 785 786 ```julia 787 include("Diagnostics.jl") 788 export diagnose 789 ``` 790 791 - [ ] **Step 4: Run tests** 792 793 Expected: PASS 794 795 - [ ] **Step 5: Commit** 796 797 ```bash 798 git add src/Diagnostics.jl src/FinanceRoutines.jl test/UnitTests/Diagnostics.jl test/runtests.jl 799 git commit -m "Add diagnose() for data quality diagnostics on financial DataFrames" 800 ``` 801 802 --- 803 804 ## Task 12: Version bump and final integration 805 806 **Files:** 807 - Modify: `Project.toml` — version to "0.5.0", add Statistics to [deps] if needed for PortfolioUtils 808 - Modify: `NEWS.md` — finalize 809 - Modify: `test/runtests.jl` — ensure all new test suites are listed 810 811 - [ ] **Step 1: Update Project.toml version** 812 813 Change `version = "0.4.5"` to `version = "0.5.0"` 814 815 - [ ] **Step 2: Verify all dependencies are correct in Project.toml** 816 817 Statistics should already be in `[deps]` (added in Task 10). Logging should already be removed (Task 1). Verify no stale entries. 818 819 - [ ] **Step 3: Update test/runtests.jl testsuite list** 820 821 ```julia 822 const testsuite = [ 823 "KenFrench", 824 "FF5", 825 "WRDS", 826 "betas", 827 "Yields", 828 "PortfolioUtils", 829 "Diagnostics", 830 ] 831 ``` 832 833 - [ ] **Step 4: Run full test suite** 834 835 Run: `julia --project=. -e 'using Pkg; Pkg.test()'` 836 Expected: ALL PASS 837 838 - [ ] **Step 5: Commit** 839 840 ```bash 841 git add Project.toml test/runtests.jl NEWS.md 842 git commit -m "Bump version to v0.5.0, finalize test suite and changelog" 843 ``` 844 845 --- 846 847 ## Task 13: Tag release and update registry 848 849 Follow the release workflow in CLAUDE.md: 850 851 - [ ] **Step 1: Tag** 852 853 ```bash 854 git tag v0.5.0 855 git push origin v0.5.0 856 ``` 857 858 - [ ] **Step 2: Get tree SHA** 859 860 ```bash 861 git rev-parse v0.5.0^{tree} 862 ``` 863 864 - [ ] **Step 3: Update LouLouLibs/loulouJL registry** 865 866 Update `F/FinanceRoutines/Versions.toml`, `Deps.toml`, `Compat.toml` via `gh api`. 867 868 --- 869 870 ## Extensions deferred for user decision 871 872 These were listed as extensions A–E. Tasks 9–11 cover B (FF5), E (diagnostics), and A (portfolio returns). The remaining two are: 873 874 - **C: Event study utilities** — `event_study(events_df, returns_df; ...)` computing CARs/BHARs. Can be added as Task 15 if desired. 875 - **D: Treasury yield interpolation** — `treasury_zero_rate(date, maturity)` incorporating T-bill rates. Requires a new data source. Can be added as Task 16 if desired. 876 877 Both are independent of the above tasks and can be planned separately.