Where EU Sovereign Convergence Actually Lives

A Johansen Screen of Front-End and Curve Spreads

Vikram Bahure · 18 May 2026 · v2.0 (screen reframe, 2026-05-31)


Executive Summary


1. Context & Motivation

The Italy-Germany 10Y sovereign spread (BTP-Bund) is the macroeconomic tape that the European rates community watches in real time. It is the headline measure of Italian sovereign risk, the channel through which Italian fiscal and political news transmits to euro-area financial conditions, and the trigger condition the European Central Bank (ECB) writes its Transmission Protection Instrument (TPI) policy around. It is also, on the 2020-2026 sample, not cointegrated. The next-most-obvious convergence pair, the matching 2Y, is.

This paper argues that the testable Italy-Germany convergence relationship in the current window lives at the front end of the curve. The cointegration evidence is at the 2Y, not the 10Y. The argument rests on three pillars: a Johansen-based universe screen that puts the BTP-Schatz through a proper I(1)-then-cointegration pipeline; an Error Correction Model characterisation of the Johansen-estimated residual; and a regime-stability check across the sample that quantifies how cleanly the relationship holds in different windows.

The Italian-German front end is a natural object of attention for three reasons. First, the 2Y is overwhelmingly driven by ECB policy expectations, so the credit and political premium between the two sovereigns is structurally smaller and more mean-reverting than at the 10Y. Second, the post-2022 ECB hiking-and-easing cycle has produced enough variation in the spread to make the cointegration test informative; the pre-2022 zero-lower-bound regime produced almost no testable variation. Third, the TPI is targeted at the 10Y end of the curve under stress, which leaves the 2Y as the cleaner residual after the central-bank backstop has been priced in.


2. Universe Screen and the BTP-Schatz Selection

The screen tests thirteen candidate sovereign-rate convergence pairs across the United States (US), United Kingdom (UK), Germany, and Italy. Three survive Johansen. The BTP-Schatz is the only 2-leg survivor; the other two are an Italy-Germany 2s10s slope (4-leg, rank 1) and an Italy-Germany 2s3s10s butterfly (6-leg, rank 2). The remaining ten pairs fail.

For each pair, the screen runs: (a) ADF and KPSS on every individual leg in level and first difference to confirm strict integration of order one, I(1); (b) Johansen cointegration test on the joint panel of legs with det_order=0 (constant in the cointegrating equation, no trend) and k_ar_diff=1 (one lagged difference in the Vector Error Correction Model, VECM); (c) a Bayesian Information Criterion (BIC) lag-sensitivity check as a robustness gate. The pair is treated as a survivor when both/all legs are I(1) and the trace test rejects rank zero at the 5% level.

The BTP-Schatz is the only 2-leg survivor and is the focus of the rest of this paper. The summary table follows; Appendix A reports the full per-pair test statistics.

Pair Both legs I(1) Trace rank at 5% Survives
BTP-Bund 10Y Yes 0 No
BTP-Schatz (Italy-Germany 2Y) Yes 1 Yes
US 10Y nominal vs 10Y breakeven Yes 2 (full rank, ambiguous) Yes, not a candidate (see below)
Italy-Germany 2s10s slope (4 legs) Yes 1 Yes (new finding, see Appendix B)
Italy-Germany 2s3s10s butterfly (6 legs) Yes 2 Yes (new finding, see Appendix B)
Eight other pairs various 0 or n/a No

Three observations follow. First, the headline BTP-Bund 10Y spread that drives most macroeconomic commentary fails cointegration on this sample. The 10Y carries Italian sovereign credit and political risk premia that move on their own clock and break the residual stationarity. Second, the United States 10Y nominal versus 10Y Treasury Inflation-Protected Securities (TIPS) breakeven proxy returns full rank, that is, the Johansen test treats both legs as effectively stationary in level on the long 5,840-observation sample. The verdict is mathematically valid but it sits awkwardly against the per-leg ADF result; the pair is also flagged as a non-candidate because an earlier Error Correction Model backtest on it lost money. Third, the Italy-Germany system carries through to three survivors at different points on the curve, which is the structural finding that motivates the rest of the paper plus the two follow-up pieces in Appendix B.


3. What Drives the BTP-Schatz Spread

The BTP-Schatz spread is determined by four long-run drivers and a smaller set of week-to-week movers.

Long-run drivers. (a) Italian fundamentals: debt-to-Gross-Domestic-Product (GDP) trajectory, primary balance, growth differential against Germany, banking-sovereign loop. (b) German fiscal stance: the constitutional debt brake, the 2024-2026 defence-spending amendment, joint-issuance episodes, and safe-asset demand for the Schatz. (c) ECB regime: Asset Purchase Programme history, Pandemic Emergency Purchase Programme (PEPP) reinvestment wind-down, Transmission Protection Instrument trigger conditions and capacity. (d) Italian political risk: election cycles, European Union fiscal-rules compliance, and the pricing of populist-coalition formation.

Short-run movers. (a) The Italian Treasury auction calendar and the supply concessions it generates. (b) ECB Governing Council meeting cadence: rate decisions, press conferences, hawkish or dovish dissents in the accounts. (c) Risk-off correlation, in which the Bund and the Schatz both bid as safe assets while the periphery widens. (d) Position-unwind episodes in real-money and hedge-fund flow.

The most consequential structural change in the recent sample is the Transmission Protection Instrument, announced 21 July 2022. The TPI is the ECB's commitment to make secondary-market purchases of sovereign bonds when a member state's borrowing conditions diverge from policy-warranted fundamentals. The Instrument is asymmetric across the curve: it is targeted at the 10Y end, where Italian sovereign-risk dispersion lives, and barely touches the 2Y. That asymmetry is part of the reason the 10Y is harder to model as a cointegrated relationship while the 2Y is easier.


4. Methodology

4.1 I(1) Confirmation on Each Leg

A pair is eligible for a cointegration test only when each constituent series is itself integrated of order one. Both ADF (Dickey and Fuller, 1979) and KPSS (Kwiatkowski et al., 1992) are applied to each leg, in level and in first difference. Each leg is treated as I(1) when all four conditions hold at the 5% level: ADF on level fails to reject the unit root, KPSS on level rejects stationarity, ADF on first difference rejects the unit root, and KPSS on first difference fails to reject stationarity. Both BTP-Schatz legs pass.

4.2 Johansen Cointegration Test

The Johansen test (Johansen, 1988; Johansen, 1991) is applied to the joint two-leg panel of [italy2y, bund2y] with det_order=0 (constant in the cointegrating equation, no level trend) and k_ar_diff=1 (one lagged difference in the VECM). The trace test sequentially tests the null of cointegrating rank r = 0, 1, 2, ... against the alternative of higher rank, rejecting at each step until the data stops supporting an additional cointegrating relation. The maximum-eigenvalue test is reported as a cross-check. The cointegrating vector is normalised so the first-leg coefficient equals one.

Why Johansen rather than Engle-Granger (Engle and Granger, 1987)? Three reasons. Symmetry. Engle-Granger requires choosing one series as dependent and another as independent, and gives a different p-value depending on the choice. Johansen does not. Joint estimation. Johansen estimates the cointegrating vector(s) jointly across legs rather than recovering them from a reduced-form OLS regression. Generality. Johansen tests the rank of cointegration, which means an n-asset system can carry multiple stationary linear combinations. The Italy-Germany 2s3s10s butterfly (Appendix B) carries two; Engle-Granger applied to a derived single-series butterfly cannot recover that fact.

4.3 Ornstein-Uhlenbeck Framework on the Residual

The cointegration residual is treated as an Ornstein-Uhlenbeck (OU) mean-reverting process. The OU half-life is the implied time-to-halve: the time the residual is expected to take to close half the distance back to its equilibrium level, conditional on no new shocks. It is computed from an autoregressive fit on the change in the residual against the lagged level, which is the discrete-time approximation of the OU stochastic differential equation. The half-life sets the natural time-scale over which the residual closes and informs the time-stop parameter on the signal-form definition in §4.4.

4.4 The Error Correction Model

The Error Correction Model is the standard cointegration-implied dynamic, written in change-of-variable form:

Δy_t = c + γ Δx_t + λ ε_{t-1} + u_t

where y is the BTP 2Y, x is the Schatz, and ε is the Johansen-based cointegration residual. The mean-reversion coefficient λ is negative and significant when the cointegration is genuine: a positive residual yesterday produces a negative expected change in y today, that is, the residual closes a fixed proportion of its gap each period.

The signal triggers when two conditions are met. (a) The residual z-score is stretched, |z| ≥ 1.0. (b) The absolute ECM forecast for the next-day change is in the top 40% of the rolling distribution of absolute forecasts, so the model is not just stretched but also expecting reversion to start now. Signal exit occurs on any of three conditions: |z| ≤ 0.25 (reversion has happened), forecast sign flips against the position (model has lost confidence), or 120 trading days elapse (time stop, calibrated against the OU half-life of 17.8 days, giving the residual roughly seven half-lives to resolve).

These parameters define the signal form whose equity-curve evidence is presented in §5. They are not a trade prescription. Conversion of a signal-form characterisation into a deployable rule requires the cost, carry, and live-regime work scoped in §7.

4.5 Cross-Checks: Kalman and Principal Component Analysis

Two independent methods cross-check the static Johansen approach. The Kalman filter lets the hedge ratio drift as a random walk and produces a dynamic counterpart to the static Johansen β. The Principal Component Analysis (PCA) residual approach decomposes the joint US, UK, Germany, and Italy rates panel and isolates the Italian 2Y residual to the dominant common factor, which is interpretable as a global rates-level factor. Both are reported in §5 as corroborating evidence on the signal form rather than as standalone strategies.


5. Evidence: cointegration, mean-reversion form, and signal performance (gross of cost and carry)

This section presents the empirical evidence on three things: (a) the BTP-Schatz pair survives Johansen at conventional significance levels; (b) the Johansen residual mean-reverts in a form consistent with the Ornstein-Uhlenbeck framework, with a half-life of 17.8 days and a statistically negative Error Correction Model coefficient; (c) a static ECM signal on that residual produces a 0.76 gross Sharpe over 2020-2026, illustrating that the signal form is tradeable in principle. All numbers in this section are pre-transaction-cost and pre-carry; their conversion into a deployable Sharpe is scoped in §7.

5.1 Cointegration Evidence

Source: 20260517_232529_coint_johansen_italy_bund_2y_tests.json.

Quantity Value
Sample 2020-05-04 to 2026-05-12 (1,533 obs)
Trace statistic, H0: r = 0 24.747
Trace 5% critical value 15.494
Verdict at r = 0 Reject (cointegration present)
Trace statistic, H0: r = 1 0.698
Trace 5% critical value 3.841
Verdict at r = 1 Fail to reject
Max-eigenvalue statistic, H0: r = 0 24.048 (CV 14.264)
Cointegrating rank at 5% 1
Leading cointegrating vector [italy2y: 1, bund2y: -1.0387]
BIC-selected VECM lag 3
Rank at BIC lag 1 (robust)
Residual OU half-life 17.8 days
Residual ADF p-value 0.00031

Both tests reject the null of no cointegration at the 5% level by margins of roughly 1.6 times the critical value. The second null (rank one, i.e. full rank) is comfortably not rejected. The result is robust to a more flexible lag specification.

The leading cointegrating vector implies a hedge ratio of 1.04 on the Schatz leg, normalised so the BTP leg has unit weight. The ratio is statistically distinguishable from 1.0 on 1,533 observations but only just. The interpretation is the standard one for euro-area front-end spreads: BTP and Schatz yields share near-identical exposure to ECB policy expectations, and the small premium over 1.0 reflects the modestly-greater policy sensitivity of Italian rates given the sovereign-risk overlay (Codogno, Favero and Missale, 2003; Afonso, Arghyrou and Kontonikas, 2015).

The OU half-life of 17.8 days places the natural mean-reversion horizon for the residual at three to four weeks. The 120-day time-stop parameter in the signal form (§4.4) is therefore generous against the half-life and allows slow regimes to resolve.

5.2 Error Correction Model Signal — Gross Performance

Source: 20260517_235141_coint_ecm_italy_bund_2y_metrics.json. The Error Correction Model is fit on the Johansen residual; the signal form is as specified in §4.4. All metrics are gross of cost and carry.

Metric Ungated With rolling stability filter
Compound Annual Growth Rate (CAGR) 8.36% 2.25%
Annualised volatility 11.05% 6.03%
Sharpe ratio (gross) 0.76 0.37
Sortino ratio (gross) 0.63 0.19
Maximum drawdown -8.66% -10.54%
Calmar ratio (CAGR / max DD) 0.97 0.21
Number of trades 170 64
Win rate 30.6% 26.6%
Profit factor 1.81 0.95
ECM mean-reversion coefficient λ -0.0201 (p = 0.0008) (identical)

The ungated 0.76 Sharpe is the headline gross number. It is consistent with the cointegration evidence in §5.1: a statistically negative λ, a residual half-life under twenty days, and a positive equity curve at a Sharpe that places the residual's mean-reversion in a tradeable-in-form range — before any cost, carry, or regime adjustment.

Figure 1. Static Error Correction Model signal equity curve, BTP-Schatz, 2020-2026 — illustrative of mean-reversion in the Johansen residual, gross of cost and carry. Signal (dark), residual buy-and-hold (dashed grey), cash (dotted red).

The filtered column is included for completeness. The filter is the project's legacy Engle-Granger-style rolling diagnostic and is not yet Johansen-consistent: the residual being filtered is the Johansen residual but the filter logic is the older reduced-form rolling regression. The mismatch is the most likely reason the filtered Sharpe falls below the unfiltered Sharpe, which is the opposite of the typical pattern. A methodology-consistent rolling filter is scoped in §7.

5.3 Cross-Checks

Kalman filter (research/backtests/runs/coint/20260512_195333_*). Dynamic hedge ratio drifts as a random walk. Ungated CAGR 11.45%, gross Sharpe 1.00, maximum drawdown -11.94%.

Figure 2. Time-varying Kalman hedge ratio β_t, BTP-Schatz, 2020-2026. Static Johansen estimate (1.04) overlaid as horizontal reference.

The Kalman hedge ratio is unstable from 2020 into early 2021 (the late-Coronavirus zero-lower-bound regime) and collapses into a tight 0.8-1.0 band from mid-2022, hugging the static Johansen estimate. The visual implication is direct: the dynamic-hedge advantage lives mostly in the pre-2022 window. The static Johansen estimate is the natural anchor for the post-2022 window.

Principal Component Analysis residual (research/backtests/runs/coint/20260512_223356_*). PC1 explains 95.6% of variance across US, UK, Germany, and Italy rates and is interpretable as a global rates-level factor. The Italian 2Y residual to this factor produces 9.89% CAGR at gross Sharpe 0.69 over a shorter 2022-2026 window. The result corroborates the Johansen story: there is a stationary Italian 2Y signal that survives removal of the dominant common rates factor. Trade count is low (four) and the result is best read as cross-validation rather than as a standalone signal.


6. Regime Dependence Across the Sample

The cointegration relationship is statistically valid on the full 2020-2026 sample but is not uniformly active across rolling windows. Only 46% of rolling 504-day windows in the sample produce a beta, Engle-Granger p-value, ADF p-value, and half-life combination that passes the project's legacy stability gate. This section presents the regime structure of the signal as a methodological observation; live-regime gating using a Johansen-consistent rolling diagnostic is scoped in §7.

Figure 3. Rolling diagnostics across 504-day windows, BTP-Schatz, 2020-2026. Hedge ratio, Engle-Granger p-value, ADF p-value, OU half-life. (Rolling diagnostics use the legacy Engle-Granger pipeline; a Johansen-consistent rolling version is in scope for follow-up — see §7.)

Figure 4. Rolling 252-day gross Sharpe of the static Error Correction Model signal, BTP-Schatz, 2020-2026.

The rolling-Sharpe figure turns "regime-dependent" from a statistic into a picture. The signal averages a Sharpe close to the static full-sample value, with material dispersion around it. The signal performed most cleanly in the 2022 hiking-to-peak phase, with rolling Sharpe peaks above 1.5 around ECB rate decisions as the spread mean-reverted aggressively. Late 2023 produces a rolling-Sharpe trough, coinciding with the ECB's hold-at-peak period in which the BTP-Schatz widened persistently without mean-reverting. Recovery through 2024 and into 2025 brings the rolling Sharpe back above the long-run mean.

The methodologically honest read is that the BTP-Schatz cointegration is real but thin: full-sample stationarity is statistically clear, but the relationship's strength varies materially across rolling windows. Any deployable version of this signal would need a live-regime gate that conditions activity on a Johansen-consistent rolling cointegration check rather than on the static full-sample result.


7. Out of Scope: What Would Make This Deployable

The headline numbers in §5 are gross, pre-cost, pre-carry, and full-sample. Five workstreams stand between this screen + signal-form note and a defensible claim of out-of-sample deployable edge. They are scoped here, not executed in this paper.

1. Transaction-cost and carry-adjusted backtest. All §5 numbers are price-only. A position long the BTP-2Y and short the Schatz pays carry approximately equal to the spread itself, which over 2020-2026 averaged in the low-to-mid double-digits of basis points — material against an 8% gross CAGR. A proper deployment backtest applies realistic 2Y BTP and Schatz bid-ask, repo / financing, and futures roll where applicable, and restates the net-of-cost Sharpe. This is the single largest unmeasured contributor to the headline numbers.

2. Live-regime gating with a Johansen-consistent rolling diagnostic. The static cointegration step is Johansen-based; the rolling diagnostic and stability filter inside the Error Correction Model runner are still on Engle-Granger rolling reduced-form estimation. The mismatch is the most likely reason the filtered Sharpe in §5.2 came out below the unfiltered Sharpe, the opposite of the typical pattern. Extending convergence_common.py to support a rolling Johansen estimator would produce a methodology-consistent live-regime gate. This is the natural way to convert the §6 regime evidence into a forward-looking activity filter.

3. Deflated Sharpe and Probability of Backtest Overfitting (López de Prado, 2018). Standard Sharpe ignores selection bias across the number of strategies tested. This lab has tested at minimum thirteen pairs, three methods (Error Correction Model, Kalman, Principal Component Analysis), and several filter configurations. A deflated Sharpe will reduce every headline number above by an amount that depends on the trial count and the heaviness of the Sharpe distribution across trials. Combinatorially Symmetric Cross-Validation would put a probability on the in-sample Sharpe ranking being preserved out-of-sample. White's Reality Check or the Romano-Wolf step-down would test whether the best-performing strategy in the universe is statistically distinguishable from the best-performing strategy under the null of no edge.

4. Walk-forward validation. In-sample / out-of-sample split of the 2020-2026 window with parameter robustness across folds. Pairs naturally with bullet 3.

5. Why the 10Y fails — structural investigation. The headline finding is that the BTP-Bund 10Y does not cointegrate on this sample while three Italy-Germany relationships at the front end and on the curve do. The methodological question this opens is: what structural features of the 10Y break residual stationarity that the front end retains? Candidate drivers include political and credit premium dynamics that move on a separate clock from policy-rate expectations, TPI optionality concentrated at the long end, real-money positioning effects, and structural-break candidates around the 2022 ECB regime shift. This is itself a worthwhile research question and is independent of the deployability stack in bullets 1–4.

A separate methodological follow-up is volatility-regime-conditional gating: a filter that conditions on residual or rates-volatility regime, distinct from the cointegration-regime filter, that may produce a better Sharpe-versus-trade-count trade-off. Scoped, not run.


8. Markers of Signal Re-Activation

Five markers would indicate the BTP-Schatz cointegration regime is re-establishing in real time. They are presented as research observations on the conditions under which the §6 regime structure has historically aligned with the signal performing well.

  1. Rolling Engle-Granger and ADF p-values on the BTP-Schatz residual drop below the 5% threshold and stay there for at least one full month of trading days. This is the cleanest single marker; the rolling diagnostic is published as part of the lab output and would be upgraded to a Johansen-consistent version under §7 bullet 2.
  2. Rolling Ornstein-Uhlenbeck half-life shortens into the 5-to-30-day band. Long or negative half-lives indicate that even where the residual is technically stationary, the close-rate is too slow to be tradeable.
  3. ECB phase shift. The signal performed best in the 2022-2023 hiking-to-peak phase. A symmetric easing-to-trough phase, currently underway, may eventually produce a similar regime in reverse. Track the trajectory of the Council's policy-rate guidance and the dispersion of dot-plot dissents.
  4. TPI activation or credible threat of activation. The Transmission Protection Instrument's secondary-market purchase commitment is most relevant for the 10Y but compresses front-end risk premia as a side-effect. A clear TPI signal would compress the BTP-Schatz residual variance and may re-activate the mean-reversion.
  5. Italian political event risk priced into BTP-Schatz. Election cycles, EU fiscal-rule confrontations, and government formation episodes have historically widened the spread; the mean-reverting snap-back to fair value is the signal pattern. The political calendar is the natural watch list.

9. Conclusion

A proper Johansen pipeline applied to thirteen EU sovereign convergence pairs produces a single clear screen finding: the headline BTP-Bund 10Y does not cointegrate on the 2020-2026 sample, while three Italy-Germany relationships at the front end and on the curve do. On the BTP-Schatz 2Y pair, the hedge ratio is 1.04, the residual mean-reverts with an Ornstein-Uhlenbeck half-life of 17.8 days, and a static Error Correction Model signal produces a 0.76 gross annualised Sharpe — illustrative of mean-reversion in signal form, not a deployable number, with the cost, carry, regime, and deflation work scoped in §7. The two further Italy-Germany survivors (the 2s10s slope and the 2s3s10s butterfly) form the next research agenda and are summarised in Appendix B. The note presents a screen result and signal-form evidence; the deployability work is named and scoped, not silently absent.


References

Citations to be verified against current bibliographic detail before publication.

Afonso, A., Arghyrou, M. G. and Kontonikas, A. (2015). The determinants of sovereign bond yield spreads in the EMU. Journal of International Money and Finance, 49, 198-218.

Avellaneda, M. and Lee, J.-H. (2010). Statistical arbitrage in the U.S. equities market. Quantitative Finance, 10(7), 761-782.

Codogno, L., Favero, C. and Missale, A. (2003). Yield spreads on EMU government bonds. Economic Policy, 18(37), 503-532.

De Santis, R. A. (2014). The euro area sovereign debt crisis: identifying flight-to-liquidity and the spillover mechanisms. European Central Bank Working Paper Series 1419.

Dickey, D. A. and Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366), 427-431.

Engle, R. F. and Granger, C. W. J. (1987). Co-integration and error correction: representation, estimation, and testing. Econometrica, 55(2), 251-276.

European Central Bank (2022). The Transmission Protection Instrument. Press release, 21 July 2022.

European Central Bank (2022). The Transmission Protection Instrument. Economic Bulletin, Issue 6/2022.

Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control, 12(2-3), 231-254.

Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica, 59(6), 1551-1580.

Kwiatkowski, D., Phillips, P. C. B., Schmidt, P. and Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics, 54(1-3), 159-178.

López de Prado, M. (2018). Advances in Financial Machine Learning. Hoboken, NJ: Wiley.

Manasse, P. and Roubini, N. (2009). "Rules of thumb" for sovereign debt crises. Journal of International Economics, 78(2), 192-205.


Appendix A. Universe Screen — Full Results

Pair Sample (days) All legs I(1) Trace rank Max-eig rank Leading vector (selected) Verdict
BTP-Bund 10Y 1,533 Yes 0 0 n/a Fail
BTP-Schatz (Italy-Germany 2Y) 1,533 Yes 1 1 [italy2y: 1, bund2y: -1.039] Pass
UK-US 10Y 1,491 Yes 0 0 n/a Fail
Bund-US 10Y 5,737 Yes 0 0 n/a Fail
US 10Y nominal vs 10Y breakeven 5,840 Yes 2 (full) 2 (full) [nominal: 1, real: -1.182] Pass, full rank (ambiguous; non-candidate)
UK-US 2Y 1,491 Yes 0 0 n/a Fail
SONIA-SOFR 1,984 No (SONIA fails KPSS on diff) n/a n/a n/a Fail
Schatz-US 2Y 3,025 Yes 0 0 n/a Fail
UK-US 2s10s slope (4 legs) 1,487 Yes 0 0 n/a Fail
Bund-US 2s10s slope (4 legs) 3,025 No (Bund 10Y fails KPSS on diff) 1 0 n/a Fail
Italy-Germany 2s10s slope (4 legs) 1,533 Yes 1 1 see Appendix B Pass (new)
Bund-US 2s3s10s butterfly (6 legs) 3,025 No (Bund 10Y fails KPSS on diff) 3 2 n/a Fail
Italy-Germany 2s3s10s butterfly (6 legs) 1,533 Yes 2 2 see Appendix B Pass rank 2 (new)

Three pairs fail the strict I(1) test because the long-history series (SONIA, Bund 10Y) display non-constant variance across multiple monetary regimes, which KPSS detects in first differences. This is a real finding, not a defect of the test specification.


Appendix B. Other Johansen Survivors (Scoped for Follow-Up)

Italy-Germany 2s10s slope, 4-leg system. Cointegrating rank 1. Leading cointegrating vector, normalised on the Italian 10Y leg: [italy10y: 1, italy2y: -2.84, bund10y: -1.30, bund2y: +3.13]. The vector is decisively not the naive slope-vs-slope spread (which would carry coefficients [1, -1, -1, 1]). The 2Y legs load substantially more strongly than the 10Y legs, and the sign pattern implies the relationship is closer to an Italian-10Y-anchored credit-tilted combination of the four legs. A separate working paper will fit an Error Correction Model on the four-leg system and report its signal-form evidence.

Italy-Germany 2s3s10s butterfly, 6-leg system. Cointegrating rank 2. Two independent stationary linear combinations among the Italian and German short, intermediate, and long-end yields. Leading vector, normalised on the Italian 2Y leg: [italy2y: 1, italy3y: -1.66, italy10y: +0.69, bund2y: -2.74, bund3y: +3.86, bund10y: -1.14]. The second cointegrating vector and the economic interpretation of both relations are the subject of a separate working paper.

Both follow-up pieces will share the methodology stack defined in section 4: I(1) leg confirmation, Johansen with BIC lag sensitivity, OU framework on the leading residual, Error Correction Model on the residual.


Appendix C. Replication

All numbers in this paper are reproducible from the project repository. Key commands:

# Full Johansen universe screen (13 pairs)
python -m research.backtests.runners.run_coint_johansen

# Single pair, with lag sensitivity
python -m research.backtests.runners.run_coint_johansen --preset italy_bund_2y

# BTP-Schatz Error Correction Model with Johansen β
python -m research.backtests.runners.run_coint_ecm italy_bund_2y --beta-override 1.0387

# Kalman backtest
python -m research.backtests.runners.run_coint_kalman italy_bund_2y

# Principal Component Analysis residual
python -m research.backtests.runners.run_coint_pca italy2y

Run artefacts (JSON metrics, CSV trades, PNG charts): research/backtests/runs/coint/20260517_*.

Sample window for the Error Correction Model and Kalman runs: 2020-05-04 to 2026-05-12. Sample window for the Principal Component Analysis runs: 2022-05-24 to 2026-05-07.


Working paper, personal portfolio. Not investment advice. Not a recommendation.