FinanceRoutines.jl

Financial data routines for Julia
Log | Files | Refs | README | LICENSE

commit 95151f27d7ab321862e8e22627a9a5b9783655c3
parent 12f368cb2156503e11fecd914949e72000e6c452
Author: Erik Loualiche <eloualic@umn.edu>
Date:   Sun, 22 Mar 2026 10:07:49 -0500

Add v0.5.0 implementation plan: hardening + extensions

13-task plan covering:
- Fixes: dead logging, WRDS retry, missing-value flags, FF parsing robustness,
  ImportYields.jl split, CI path filters, env parsing cleanup
- Extensions: FF5+momentum, portfolio returns, data diagnostics
- Release: version bump, NEWS.md, registry update

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat:
Adocs/superpowers/plans/2026-03-22-v0.5.0-hardening-and-extensions.md | 877+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 877 insertions(+), 0 deletions(-)

diff --git a/docs/superpowers/plans/2026-03-22-v0.5.0-hardening-and-extensions.md b/docs/superpowers/plans/2026-03-22-v0.5.0-hardening-and-extensions.md @@ -0,0 +1,877 @@ +# FinanceRoutines.jl v0.5.0 — Hardening & Extensions + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Fix all identified quality/robustness issues, restructure ImportYields.jl, add CI path filtering, create NEWS.md, and implement extensions (FF5, portfolio returns, event studies, diagnostics) — releasing as v0.5.0. + +**Architecture:** Fixes first (tasks 1–8), then extensions (tasks 9–11), then integration + release (tasks 12–13). Each task is independently testable and committable. The ImportYields.jl split (task 5) is the highest-risk refactor — it moves code into two new files without changing any public API. + +**Tech Stack:** Julia 1.10+, LibPQ, DataFrames, CSV, Downloads, ZipFile, Roots, LinearAlgebra, FlexiJoins, BazerData, GitHub Actions + +--- + +## File Map + +### Files to create +- `src/GSW.jl` — GSW parameter struct, yield/price/forward/return calculations, DataFrame wrappers +- `src/BondPricing.jl` — `bond_yield`, `bond_yield_excel`, day-count helpers +- `src/ImportFamaFrench5.jl` — FF5 + momentum import functions +- `src/PortfolioUtils.jl` — portfolio return calculations +- `src/Diagnostics.jl` — data quality diagnostics +- `test/UnitTests/FF5.jl` — tests for FF5/momentum imports +- `test/UnitTests/PortfolioUtils.jl` — tests for portfolio returns +- `test/UnitTests/Diagnostics.jl` — tests for diagnostics +- `NEWS.md` — changelog + +### Files to modify +- `src/FinanceRoutines.jl` — update includes/exports +- `src/Utilities.jl` — add retry logic, remove broken logging macro +- `src/ImportFamaFrench.jl` — make parsing more robust, refactor for FF5 reuse +- `src/ImportYields.jl` — DELETE (replaced by GSW.jl + BondPricing.jl) +- `src/ImportCRSP.jl` — expand missing-value flags in `_safe_parse_float` (actually in ImportYields.jl, moves to GSW.jl) +- `.github/workflows/CI.yml` — add path filters, consider macOS +- `Project.toml` — bump to 0.5.0 +- `test/runtests.jl` — add new test suites + +### Files to delete +- `src/ImportYields.jl` — replaced by `src/GSW.jl` + `src/BondPricing.jl` + +--- + +## Task 1: Remove broken logging macro & clean up Utilities.jl + +**Files:** +- Modify: `src/Utilities.jl:70-102` +- Modify: `src/FinanceRoutines.jl:19` (remove Logging import if unused) +- Modify: `src/ImportCRSP.jl:303,381,400` (replace `@log_msg` calls) + +- [ ] **Step 1: Check all usages of `@log_msg` and `log_with_level`** + +Run: `grep -rn "log_msg\|log_with_level" src/` + +- [ ] **Step 2: Replace `@log_msg` calls in ImportCRSP.jl with `@debug`** + +In `src/ImportCRSP.jl`, replace the three `@log_msg` calls (lines 303, 381, 400) with `@debug`: +```julia +# Before: +@log_msg "# -- GETTING MONTHLY STOCK FILE (CIZ) ... msf_v2" +# After: +@debug "Getting monthly stock file (CIZ) ... msf_v2" +``` + +- [ ] **Step 3: Remove `log_with_level` and `@log_msg` from Utilities.jl** + +Delete lines 69–102 (the `log_with_level` function and `@log_msg` macro). + +- [ ] **Step 4: Clean up Logging import and stale export in FinanceRoutines.jl** + +Remove the entire `import Logging: ...` line from `FinanceRoutines.jl` (line 19). `@debug` and `@warn` are available from `Base.CoreLogging` without explicit import. Also remove `Logging` from the `[deps]` section of `Project.toml`. Also remove the stale `export greet_FinanceRoutines` (line 45) — this function is not defined anywhere. + +- [ ] **Step 5: Run tests to verify nothing breaks** + +Run: `julia --project=. -e 'using Pkg; Pkg.test()'` (with `[skip ci]` since this is internal cleanup) + +- [ ] **Step 6: Commit** + +```bash +git add src/Utilities.jl src/ImportCRSP.jl src/FinanceRoutines.jl +git commit -m "Remove broken @log_msg macro, replace with @debug [skip ci]" +``` + +--- + +## Task 2: Add WRDS connection retry logic + +**Files:** +- Modify: `src/Utilities.jl:19-29` (the `open_wrds_pg(user, password)` method) + +- [ ] **Step 1: Write test for retry behavior** + +This is hard to unit test (requires WRDS), so we verify by code review. The retry logic should: +- Attempt up to 3 connections +- Exponential backoff: 1s, 2s, 4s +- Log warnings on retry +- Rethrow on final failure + +- [ ] **Step 2: Add retry wrapper to `open_wrds_pg`** + +Replace the `open_wrds_pg(user, password)` function: + +```julia +function open_wrds_pg(user::AbstractString, password::AbstractString; + max_retries::Int=3, base_delay::Float64=1.0) + conn_str = """ + host = wrds-pgdata.wharton.upenn.edu + port = 9737 + user='$user' + password='$password' + sslmode = 'require' dbname = wrds + """ + for attempt in 1:max_retries + try + return Connection(conn_str) + catch e + if attempt == max_retries + rethrow(e) + end + delay = base_delay * 2^(attempt - 1) + @warn "WRDS connection attempt $attempt/$max_retries failed, retrying in $(delay)s" exception=e + sleep(delay) + end + end +end +``` + +- [ ] **Step 3: Verify the package loads and existing tests still pass** + +Run: `julia --project=. -e 'using FinanceRoutines'` + +- [ ] **Step 4: Commit** + +```bash +git add src/Utilities.jl +git commit -m "Add retry logic with exponential backoff for WRDS connections" +``` + +--- + +## Task 3: Expand missing-value flags in `_safe_parse_float` + +**Files:** +- Modify: `src/ImportYields.jl:281-309` (will move to GSW.jl in task 5, but fix first) + +- [ ] **Step 1: Write test for expanded flags** + +Add to `test/UnitTests/Yields.jl` inside a new testset: + +```julia +@testset "Missing value flag handling" begin + @test ismissing(FinanceRoutines._safe_parse_float(-999.99)) + @test ismissing(FinanceRoutines._safe_parse_float(-999.0)) + @test ismissing(FinanceRoutines._safe_parse_float(-9999.0)) + @test ismissing(FinanceRoutines._safe_parse_float(-99.99)) + @test !ismissing(FinanceRoutines._safe_parse_float(-5.0)) # legitimate negative + @test FinanceRoutines._safe_parse_float(3.14) ≈ 3.14 + @test ismissing(FinanceRoutines._safe_parse_float("")) + @test ismissing(FinanceRoutines._safe_parse_float(missing)) +end +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'` +Expected: FAIL on `-999.0` and `-9999.0` + +- [ ] **Step 3: Update `_safe_parse_float`** + +```julia +function _safe_parse_float(value) + if ismissing(value) || value == "" + return missing + end + + if value isa AbstractString + parsed = tryparse(Float64, strip(value)) + if isnothing(parsed) + return missing + end + value = parsed + end + + try + numeric_value = Float64(value) + # Common missing data flags in economic/financial datasets + if numeric_value in (-999.99, -999.0, -9999.0, -99.99) + return missing + end + return numeric_value + catch + return missing + end +end +``` + +- [ ] **Step 4: Run test to verify it passes** + +Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'` +Expected: PASS + +- [ ] **Step 5: Commit** + +```bash +git add src/ImportYields.jl test/UnitTests/Yields.jl +git commit -m "Expand missing-value flags to cover -999, -9999, -99.99" +``` + +--- + +## Task 4: Make Ken French parsing more robust + +**Files:** +- Modify: `src/ImportFamaFrench.jl:118-159` (`_parse_ff_annual`) +- Modify: `src/ImportFamaFrench.jl:164-205` (`_parse_ff_monthly`) + +- [ ] **Step 1: Refactor `_parse_ff_annual` to use data-pattern detection** + +Instead of `occursin(r"Annual Factors", line)`, detect the annual section by: +1. Skip past the monthly data (lines starting with 6-digit YYYYMM) +2. Find the next block of lines starting with 4-digit YYYY + +```julia +function _parse_ff_annual(zip_file; types=nothing) + file_lines = split(String(read(zip_file)), '\n') + + # Find annual data: lines starting with a 4-digit year that are NOT 6-digit monthly dates + # Annual section comes after monthly section + found_monthly = false + past_monthly = false + lines = String[] + + for line in file_lines + stripped = strip(line) + + # Track when we're past the monthly data section + if !found_monthly && occursin(r"^\s*\d{6}", stripped) + found_monthly = true + continue + end + + if found_monthly && !past_monthly + # Still in monthly section until we hit a non-data line + if occursin(r"^\s*\d{6}", stripped) + continue + elseif !occursin(r"^\s*$", stripped) && !occursin(r"^\s*\d", stripped) + past_monthly = true + continue + else + continue + end + end + + if past_monthly + # Look for annual data lines (4-digit year) + if occursin(r"^\s*\d{4}\s*,", stripped) + push!(lines, replace(stripped, r"[\r]" => "")) + elseif !isempty(lines) && occursin(r"^\s*$", stripped) + break # End of annual section + end + end + end + + if isempty(lines) + error("Annual Factors section not found in file") + end + + lines_buffer = IOBuffer(join(lines, "\n")) + return CSV.File(lines_buffer, header=false, delim=",", ntasks=1, types=types) |> DataFrame |> + df -> rename!(df, [:datey, :mktrf, :smb, :hml, :rf]) +end +``` + +- [ ] **Step 2: Run Ken French tests** + +Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/KenFrench.jl")'` +Expected: PASS + +- [ ] **Step 3: Commit** + +```bash +git add src/ImportFamaFrench.jl +git commit -m "Make FF3 parsing use data patterns instead of hardcoded headers" +``` + +--- + +## Task 5: Split ImportYields.jl into GSW.jl + BondPricing.jl + +This is the largest refactor. No API changes — just file reorganization. + +**Files:** +- Create: `src/GSW.jl` — everything from ImportYields.jl lines 1–1368 (GSWParameters struct, all gsw_* functions, DataFrame wrappers, helpers) +- Create: `src/BondPricing.jl` — everything from ImportYields.jl lines 1371–1694 (bond_yield, bond_yield_excel, day-count functions) +- Delete: `src/ImportYields.jl` +- Modify: `src/FinanceRoutines.jl` — update includes + +- [ ] **Step 1: Create `src/GSW.jl`** + +Copy lines 1–1368 from ImportYields.jl into `src/GSW.jl`. This includes: +- `GSWParameters` struct and constructors +- `is_three_factor_model`, `_extract_params` +- `import_gsw_parameters`, `_clean_gsw_data`, `_safe_parse_float`, `_validate_gsw_data` +- `gsw_yield`, `gsw_price`, `gsw_forward_rate` +- `gsw_yield_curve`, `gsw_price_curve` +- `gsw_return`, `gsw_excess_return` +- `add_yields!`, `add_prices!`, `add_returns!`, `add_excess_returns!` +- `gsw_curve_snapshot` +- `_validate_gsw_dataframe`, `_maturity_to_column_name` + +- [ ] **Step 2: Create `src/BondPricing.jl`** + +Copy lines 1371–1694 from ImportYields.jl into `src/BondPricing.jl`. This includes: +- `bond_yield_excel` +- `bond_yield` +- `_day_count_days` +- `_date_difference` + +- [ ] **Step 3: Update `src/FinanceRoutines.jl`** + +Replace `include("ImportYields.jl")` with: +```julia +include("GSW.jl") +include("BondPricing.jl") +``` + +- [ ] **Step 4: Delete `src/ImportYields.jl`** + +- [ ] **Step 5: Run full test suite to verify nothing broke** + +Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'` +Expected: All 70+ assertions PASS + +- [ ] **Step 6: Commit** + +```bash +git add src/GSW.jl src/BondPricing.jl src/FinanceRoutines.jl +git rm src/ImportYields.jl +git commit -m "Split ImportYields.jl into GSW.jl and BondPricing.jl (no API changes)" +``` + +--- + +## Task 6: Add CI path filters and macOS runner + +**Files:** +- Modify: `.github/workflows/CI.yml` + +- [ ] **Step 1: Add path filters to CI.yml** + +```yaml +on: + push: + branches: + - main + tags: + - "*" + paths: + - 'src/**' + - 'test/**' + - 'Project.toml' + - '.github/workflows/CI.yml' + pull_request: + paths: + - 'src/**' + - 'test/**' + - 'Project.toml' + - '.github/workflows/CI.yml' +``` + +- [ ] **Step 2: Add macOS to the matrix** + +```yaml +matrix: + version: + - "1.11" + - nightly + os: + - ubuntu-latest + - macos-latest + arch: + - x64 +``` + +- [ ] **Step 3: Commit** + +```bash +git add .github/workflows/CI.yml +git commit -m "Add CI path filters and macOS runner [skip ci]" +``` + +--- + +## Task 7: Clarify env parsing in test/runtests.jl + +**Files:** +- Modify: `test/runtests.jl:33` + +Line 33 uses `!startswith(line, "#") || continue` which correctly skips comment lines (the `||` evaluates `continue` when the left side is `false`, i.e. when the line IS a comment). This is logically correct but reads awkwardly. Rewrite to the more idiomatic `&&` form for clarity. + +- [ ] **Step 1: Rewrite for readability** + +```julia +# Before (correct but hard to read): +!startswith(line, "#") || continue +# After (same logic, clearer): +startswith(line, "#") && continue +``` + +- [ ] **Step 2: Commit** + +```bash +git add test/runtests.jl +git commit -m "Clarify env parsing idiom in test runner [skip ci]" +``` + +--- + +## Task 8: Create NEWS.md + +**Files:** +- Create: `NEWS.md` + +- [ ] **Step 1: Create NEWS.md with v0.5.0 changelog** + +```markdown +# FinanceRoutines.jl Changelog + +## v0.5.0 + +### Breaking changes +- `ImportYields.jl` split into `GSW.jl` (yield curve model) and `BondPricing.jl` (bond math). No public API changes, but code that `include`d `ImportYields.jl` directly will need updating. +- Missing-value flags expanded: `-999.0`, `-9999.0`, `-99.99` now treated as missing in GSW data (previously only `-999.99`). **Migration note:** if your downstream code used these numeric values (e.g., `-999.0` as an actual number), they will now silently become `missing`. Check any filtering or aggregation that might be affected. + +### New features +- `import_FF5`: Import Fama-French 5-factor model data (market, size, value, profitability, investment) +- `import_FF_momentum`: Import Fama-French momentum factor +- `calculate_portfolio_returns`: Value-weighted and equal-weighted portfolio return calculations +- `diagnose`: Data quality diagnostics for financial DataFrames +- WRDS connections now retry up to 3 times with exponential backoff + +### Internal improvements +- Removed broken `@log_msg` macro, replaced with `@debug` +- Removed stale `export greet_FinanceRoutines` (function was never defined) +- Removed `Logging` from dependencies (macros available from Base) +- Ken French file parsing generalized with shared helpers for FF3/FF5 reuse +- CI now filters by path (skips runs for docs-only changes) +- CI matrix includes macOS +``` + +- [ ] **Step 2: Commit** + +```bash +git add NEWS.md +git commit -m "Add NEWS.md for v0.5.0 [skip ci]" +``` + +--- + +## Task 9: Add Fama-French 5-factor and Momentum imports + +**Files:** +- Create: `src/ImportFamaFrench5.jl` +- Modify: `src/FinanceRoutines.jl` (add include + exports) +- Create: `test/UnitTests/FF5.jl` +- Modify: `test/runtests.jl` (add "FF5" to testsuite) + +The FF5 and momentum files follow the same zip+CSV format as FF3 on Ken French's site. + +- [ ] **Step 1: Write failing tests** + +```julia +# test/UnitTests/FF5.jl +@testset "Importing Fama-French 5 factors and Momentum" begin + import Dates + + # FF5 monthly + df_FF5_monthly = import_FF5(frequency=:monthly) + @test names(df_FF5_monthly) == ["datem", "mktrf", "smb", "hml", "rmw", "cma", "rf"] + @test nrow(df_FF5_monthly) >= (Dates.year(Dates.today()) - 1963 - 1) * 12 + + # FF5 annual + df_FF5_annual = import_FF5(frequency=:annual) + @test names(df_FF5_annual) == ["datey", "mktrf", "smb", "hml", "rmw", "cma", "rf"] + @test nrow(df_FF5_annual) >= Dates.year(Dates.today()) - 1963 - 2 + + # FF5 daily + df_FF5_daily = import_FF5(frequency=:daily) + @test names(df_FF5_daily) == ["date", "mktrf", "smb", "hml", "rmw", "cma", "rf"] + @test nrow(df_FF5_daily) >= 15_000 + + # Momentum monthly + df_mom_monthly = import_FF_momentum(frequency=:monthly) + @test "mom" in names(df_mom_monthly) + @test nrow(df_mom_monthly) > 1000 +end +``` + +- [ ] **Step 2: Run tests to verify they fail** + +Expected: `import_FF5` and `import_FF_momentum` not defined + +- [ ] **Step 3: Generalize `_parse_ff_annual` and `_parse_ff_monthly` to accept `col_names`** + +Before writing FF5, first update the existing parsers in `src/ImportFamaFrench.jl` to accept a `col_names` keyword argument. Default to the FF3 column names so `import_FF3` continues to work unchanged. + +```julia +# In _parse_ff_annual: +function _parse_ff_annual(zip_file; types=nothing, col_names=[:datey, :mktrf, :smb, :hml, :rf]) + # ... existing logic ... + return CSV.File(...) |> DataFrame |> df -> rename!(df, col_names) +end + +# In _parse_ff_monthly: +function _parse_ff_monthly(zip_file; types=nothing, col_names=[:datem, :mktrf, :smb, :hml, :rf]) + # ... existing logic ... + return CSV.File(...) |> DataFrame |> df -> rename!(df, col_names) +end +``` + +Also extract a shared `_download_and_parse_ff_zip` helper to DRY up the download+zip+parse logic shared by FF3 and FF5. + +- [ ] **Step 4: Run existing KenFrench tests to verify no regression** + +Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/KenFrench.jl")'` +Expected: PASS (existing FF3 behavior unchanged) + +- [ ] **Step 5: Implement `import_FF5` in `src/ImportFamaFrench5.jl`** + +The FF5 file URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip` +Daily URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_daily_CSV.zip` + +Uses the generalized `_download_and_parse_ff_zip` helper with 7-column names: + +```julia +function import_FF5(; frequency::Symbol=:monthly) + ff_col_classes = [String7, Float64, Float64, Float64, Float64, Float64, Float64] + url_mth_yr = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip" + url_daily = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_daily_CSV.zip" + col_names_mth = [:datem, :mktrf, :smb, :hml, :rmw, :cma, :rf] + col_names_yr = [:datey, :mktrf, :smb, :hml, :rmw, :cma, :rf] + col_names_day = [:date, :mktrf, :smb, :hml, :rmw, :cma, :rf] + # ... uses shared helper +end +``` + +- [ ] **Step 4: Implement `import_FF_momentum`** + +Momentum URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_CSV.zip` +Daily URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_daily_CSV.zip` + +Single factor file, columns: date, mom + +- [ ] **Step 5: Add exports to FinanceRoutines.jl** + +```julia +include("ImportFamaFrench5.jl") +export import_FF5, import_FF_momentum +``` + +- [ ] **Step 6: Run tests** + +Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/FF5.jl")'` +Expected: PASS + +- [ ] **Step 7: Commit** + +```bash +git add src/ImportFamaFrench5.jl src/FinanceRoutines.jl test/UnitTests/FF5.jl test/runtests.jl +git commit -m "Add import_FF5 and import_FF_momentum for 5-factor model and momentum" +``` + +--- + +## Task 10: Add portfolio return calculations + +**Files:** +- Create: `src/PortfolioUtils.jl` +- Modify: `src/FinanceRoutines.jl` (add include + exports) +- Create: `test/UnitTests/PortfolioUtils.jl` +- Modify: `test/runtests.jl` + +- [ ] **Step 1: Write failing tests** + +```julia +# test/UnitTests/PortfolioUtils.jl +@testset "Portfolio Return Calculations" begin + import Dates: Date, Month + import DataFrames: DataFrame, groupby, combine, nrow, transform! + + # Create test data: 3 stocks, 12 months + dates = repeat(Date(2020,1,1):Month(1):Date(2020,12,1), inner=3) + df = DataFrame( + datem = dates, + permno = repeat([1, 2, 3], 12), + ret = rand(36) .* 0.1 .- 0.05, + mktcap = [100.0, 200.0, 300.0, # weights sum to 600 + repeat([100.0, 200.0, 300.0], 11)...] + ) + + # Equal-weighted returns + df_ew = calculate_portfolio_returns(df, :ret, :datem; weighting=:equal) + @test nrow(df_ew) == 12 + @test "port_ret" in names(df_ew) + + # Value-weighted returns + df_vw = calculate_portfolio_returns(df, :ret, :datem; + weighting=:value, weight_col=:mktcap) + @test nrow(df_vw) == 12 + @test "port_ret" in names(df_vw) + + # Grouped portfolios (e.g., by size quintile) + df.group = repeat([1, 1, 2], 12) + df_grouped = calculate_portfolio_returns(df, :ret, :datem; + weighting=:value, weight_col=:mktcap, + groupby=:group) + @test nrow(df_grouped) == 24 # 12 months x 2 groups +end +``` + +- [ ] **Step 2: Implement `calculate_portfolio_returns`** + +```julia +""" + calculate_portfolio_returns(df, ret_col, date_col; + weighting=:value, weight_col=nothing, groupby=nothing) + +Calculate portfolio returns from individual stock returns. + +# Arguments +- `df::DataFrame`: Panel data with stock returns +- `ret_col::Symbol`: Column name for returns +- `date_col::Symbol`: Column name for dates +- `weighting::Symbol`: `:equal` or `:value` +- `weight_col::Union{Nothing,Symbol}`: Column for weights (required if weighting=:value) +- `groupby::Union{Nothing,Symbol,Vector{Symbol}}`: Optional grouping columns + +# Returns +- `DataFrame`: Portfolio returns by date (and group if specified) +""" +function calculate_portfolio_returns(df::AbstractDataFrame, ret_col::Symbol, date_col::Symbol; + weighting::Symbol=:value, weight_col::Union{Nothing,Symbol}=nothing, + groupby::Union{Nothing,Symbol,Vector{Symbol}}=nothing) + + if weighting == :value && isnothing(weight_col) + throw(ArgumentError("weight_col required for value-weighted portfolios")) + end + + group_cols = isnothing(groupby) ? [date_col] : vcat([date_col], groupby isa Symbol ? [groupby] : groupby) + + grouped = DataFrames.groupby(df, group_cols) + + if weighting == :equal + return combine(grouped, ret_col => (r -> mean(skipmissing(r))) => :port_ret) + else + return combine(grouped, + [ret_col, weight_col] => ((r, w) -> begin + valid = .!ismissing.(r) .& .!ismissing.(w) + any(valid) || return missing + rv, wv = r[valid], w[valid] + sum(rv .* wv) / sum(wv) + end) => :port_ret) + end +end +``` + +- [ ] **Step 3: Add dependencies and imports** + +First, move `Statistics` from `[extras]` to `[deps]` in `Project.toml` (it's currently test-only but `calculate_portfolio_returns` uses `mean` at runtime). Add the UUID line to `[deps]`: +```toml +Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" +``` +Keep it in `[extras]` too (valid and harmless in Julia 1.10+). + +Then update `src/FinanceRoutines.jl`: +- Add `import Statistics: mean` to the imports +- Add `combine` to the `import DataFrames:` line (used by `calculate_portfolio_returns`) +- Add include and export: +```julia +include("PortfolioUtils.jl") +export calculate_portfolio_returns +``` + +- [ ] **Step 4: Run tests** + +Expected: PASS + +- [ ] **Step 5: Commit** + +```bash +git add src/PortfolioUtils.jl src/FinanceRoutines.jl test/UnitTests/PortfolioUtils.jl test/runtests.jl +git commit -m "Add calculate_portfolio_returns for equal/value-weighted portfolios" +``` + +--- + +## Task 11: Add data quality diagnostics + +**Files:** +- Create: `src/Diagnostics.jl` +- Modify: `src/FinanceRoutines.jl` (add include + exports) +- Create: `test/UnitTests/Diagnostics.jl` +- Modify: `test/runtests.jl` + +- [ ] **Step 1: Write failing tests** + +```julia +# test/UnitTests/Diagnostics.jl +@testset "Data Quality Diagnostics" begin + import DataFrames: DataFrame, allowmissing! + + # Create test data with known issues + df = DataFrame( + permno = [1, 1, 1, 2, 2, 2], + date = [Date(2020,1,1), Date(2020,2,1), Date(2020,2,1), # duplicate for permno 1 + Date(2020,1,1), Date(2020,3,1), Date(2020,4,1)], # gap for permno 2 + ret = [0.05, missing, 0.03, -1.5, 0.02, 150.0], # suspicious: -1.5, 150.0 + prc = [10.0, 20.0, 20.0, -5.0, 30.0, 40.0] # negative price + ) + allowmissing!(df, :ret) + + report = diagnose(df) + + @test haskey(report, :missing_rates) + @test haskey(report, :suspicious_values) + @test haskey(report, :duplicate_keys) + @test report[:missing_rates][:ret] > 0 + @test length(report[:suspicious_values]) > 0 +end +``` + +- [ ] **Step 2: Implement `diagnose`** + +```julia +""" + diagnose(df; id_col=:permno, date_col=:date, ret_col=:ret, price_col=:prc) + +Run data quality diagnostics on a financial DataFrame. + +Returns a Dict with: +- `:missing_rates` — fraction missing per column +- `:suspicious_values` — rows with returns > 100% or < -100%, negative prices +- `:duplicate_keys` — duplicate (id, date) pairs +- `:nrow`, `:ncol` — dimensions +""" +function diagnose(df::AbstractDataFrame; + id_col::Symbol=:permno, date_col::Symbol=:date, + ret_col::Union{Nothing,Symbol}=:ret, + price_col::Union{Nothing,Symbol}=:prc) + + report = Dict{Symbol, Any}() + report[:nrow] = nrow(df) + report[:ncol] = ncol(df) + + # Missing rates + missing_rates = Dict{Symbol, Float64}() + for col in names(df) + col_sym = Symbol(col) + missing_rates[col_sym] = count(ismissing, df[!, col]) / nrow(df) + end + report[:missing_rates] = missing_rates + + # Duplicate keys + if id_col in propertynames(df) && date_col in propertynames(df) + dup_count = nrow(df) - nrow(unique(df, [id_col, date_col])) + report[:duplicate_keys] = dup_count + end + + # Suspicious values + suspicious = String[] + if !isnothing(ret_col) && ret_col in propertynames(df) + n_extreme = count(r -> !ismissing(r) && (r > 1.0 || r < -1.0), df[!, ret_col]) + n_extreme > 0 && push!(suspicious, "$n_extreme returns outside [-100%, +100%]") + end + if !isnothing(price_col) && price_col in propertynames(df) + n_neg = count(r -> !ismissing(r) && r < 0, df[!, price_col]) + n_neg > 0 && push!(suspicious, "$n_neg negative prices (CRSP convention for bid/ask midpoint)") + end + report[:suspicious_values] = suspicious + + return report +end +``` + +- [ ] **Step 3: Add to FinanceRoutines.jl** + +Update the `import DataFrames:` line to also include `ncol` and `unique` (DataFrames-specific `unique` for duplicate key detection by column). Then add: + +```julia +include("Diagnostics.jl") +export diagnose +``` + +- [ ] **Step 4: Run tests** + +Expected: PASS + +- [ ] **Step 5: Commit** + +```bash +git add src/Diagnostics.jl src/FinanceRoutines.jl test/UnitTests/Diagnostics.jl test/runtests.jl +git commit -m "Add diagnose() for data quality diagnostics on financial DataFrames" +``` + +--- + +## Task 12: Version bump and final integration + +**Files:** +- Modify: `Project.toml` — version to "0.5.0", add Statistics to [deps] if needed for PortfolioUtils +- Modify: `NEWS.md` — finalize +- Modify: `test/runtests.jl` — ensure all new test suites are listed + +- [ ] **Step 1: Update Project.toml version** + +Change `version = "0.4.5"` to `version = "0.5.0"` + +- [ ] **Step 2: Verify all dependencies are correct in Project.toml** + +Statistics should already be in `[deps]` (added in Task 10). Logging should already be removed (Task 1). Verify no stale entries. + +- [ ] **Step 3: Update test/runtests.jl testsuite list** + +```julia +const testsuite = [ + "KenFrench", + "FF5", + "WRDS", + "betas", + "Yields", + "PortfolioUtils", + "Diagnostics", +] +``` + +- [ ] **Step 4: Run full test suite** + +Run: `julia --project=. -e 'using Pkg; Pkg.test()'` +Expected: ALL PASS + +- [ ] **Step 5: Commit** + +```bash +git add Project.toml test/runtests.jl NEWS.md +git commit -m "Bump version to v0.5.0, finalize test suite and changelog" +``` + +--- + +## Task 13: Tag release and update registry + +Follow the release workflow in CLAUDE.md: + +- [ ] **Step 1: Tag** + +```bash +git tag v0.5.0 +git push origin v0.5.0 +``` + +- [ ] **Step 2: Get tree SHA** + +```bash +git rev-parse v0.5.0^{tree} +``` + +- [ ] **Step 3: Update LouLouLibs/loulouJL registry** + +Update `F/FinanceRoutines/Versions.toml`, `Deps.toml`, `Compat.toml` via `gh api`. + +--- + +## Extensions deferred for user decision + +These were listed as extensions A–E. Tasks 9–11 cover B (FF5), E (diagnostics), and A (portfolio returns). The remaining two are: + +- **C: Event study utilities** — `event_study(events_df, returns_df; ...)` computing CARs/BHARs. Can be added as Task 15 if desired. +- **D: Treasury yield interpolation** — `treasury_zero_rate(date, maturity)` incorporating T-bill rates. Requires a new data source. Can be added as Task 16 if desired. + +Both are independent of the above tasks and can be planned separately.