BTP-Schatz Convergence

A Johansen-Based Mean-Reversion Trade in Italian and German 2Y Yields

Vikram Bahure · 18 May 2026 · v1.0 working draft

Executive Summary

Trade. Long Italy 2Y / short German 2Y (Schatz) when the cointegration residual is more than one standard deviation away from its equilibrium. Sized at a Dynamic Value of one basis point (DV01) hedge ratio of 1.04 (Johansen-estimated, normalised on the Italian leg). Targeted holding period: 3-6 weeks. Time stop: 120 trading days.
Methodology. Proper cointegration pipeline. Step one: ADF and KPSS confirm I(1) on each leg in level and first difference. Step two: Johansen on the joint two-leg system confirms rank 1 (one stationary linear combination). Step three: an Error Correction Model on the residual generates the trading signal. Engle-Granger is not used; the Johansen approach is symmetric, jointly estimates the cointegrating vector, and extends naturally to the n-asset systems in the appendix.
Headline result (2020-05-04 to 2026-05-12, 1,533 trading days, gross of carry). Ungated Sharpe 0.76, Compound Annual Growth Rate (CAGR) 8.36%, maximum drawdown -8.66%, 170 trades, Ornstein-Uhlenbeck (OU) half-life 17.8 days, Error Correction Model mean-reversion coefficient -0.0201 (p = 0.0008).
Deeper finding from the screen. Two further Italy-Germany convergence relationships survive Johansen but failed the older Engle-Granger screen: the Italy-Germany 2s10s slope (4-leg system, rank 1) and the Italy-Germany 2s3s10s butterfly (6-leg system, rank 2). Both are summarised in Appendix B.
Limitations. The analysis does not yet include Deflated Sharpe, Probability of Backtest Overfitting, carry-and-repo adjustment, or a Johansen-consistent rolling stability filter. The headline BTP-Bund 10Y does not cointegrate on this sample, so the trade is at the front of the curve rather than the headline tape. The rolling stability test is currently failing, so the relationship is off-regime today.

1. Context & Motivation

The Italy-Germany 10Y sovereign spread (BTP-Bund) is the macroeconomic tape that the European rates community watches in real time. It is the headline measure of Italian sovereign risk, the channel through which Italian fiscal and political news transmits to euro-area financial conditions, and the trigger condition the European Central Bank (ECB) writes its Transmission Protection Instrument (TPI) policy around. It is also, on the 2020-2026 sample, not cointegrated. The next-most-obvious convergence pair, the matching 2Y, is.

This paper argues that the tradeable Italy-Germany convergence relationship in the current window lives at the front end of the curve. The trade is in the 2Y, not the 10Y. The argument rests on three pillars: a Johansen-based universe screen that puts the BTP-Schatz through a proper I(1)-then-cointegration pipeline; an Error Correction Model backtest on the Johansen-estimated residual; and a regime-stability check that explains both why the trade works in places and why it is currently dormant.

The Italian-German front end is a natural object of attention for three reasons. First, the 2Y is overwhelmingly driven by ECB policy expectations, so the credit and political premium between the two sovereigns is structurally smaller and more mean-reverting than at the 10Y. Second, the post-2022 ECB hiking-and-easing cycle has produced enough variation in the spread to make the cointegration test informative; the pre-2022 zero-lower-bound regime produced almost no testable variation. Third, the TPI is targeted at the 10Y end of the curve under stress, which leaves the 2Y as the cleaner residual after the central-bank backstop has been priced in.

2. Universe Screen and the BTP-Schatz Selection

The screen tests thirteen candidate sovereign-rate convergence pairs across the United States (US), United Kingdom (UK), Germany, and Italy. Three survive Johansen. The BTP-Schatz is the only 2-leg survivor; the other two are an Italy-Germany 2s10s slope (4-leg, rank 1) and an Italy-Germany 2s3s10s butterfly (6-leg, rank 2). The remaining ten pairs fail.

For each pair, the screen runs: (a) ADF and KPSS on every individual leg in level and first difference to confirm strict integration of order one, I(1); (b) Johansen cointegration test on the joint panel of legs with det_order=0 (constant in the cointegrating equation, no trend) and k_ar_diff=1 (one lagged difference in the Vector Error Correction Model, VECM); (c) a Bayesian Information Criterion (BIC) lag-sensitivity check as a robustness gate. The pair is treated as a survivor when both/all legs are I(1) and the trace test rejects rank zero at the 5% level.

The BTP-Schatz is the only 2-leg survivor and is the focus of the rest of this paper. The summary table follows; Appendix A reports the full per-pair test statistics.

Pair	Both legs I(1)	Trace rank at 5%	Survives
BTP-Bund 10Y	Yes	0	No
BTP-Schatz (Italy-Germany 2Y)	Yes	1	Yes
US 10Y nominal vs 10Y breakeven	Yes	2 (full rank, ambiguous)	Yes, not a candidate (see below)
Italy-Germany 2s10s slope (4 legs)	Yes	1	Yes (new finding, see Appendix B)
Italy-Germany 2s3s10s butterfly (6 legs)	Yes	2	Yes (new finding, see Appendix B)
Eight other pairs	various	0 or n/a	No

Three observations follow. First, the headline BTP-Bund 10Y spread that drives most macroeconomic commentary fails cointegration on this sample. The 10Y carries Italian sovereign credit and political risk premia that move on their own clock and break the residual stationarity. Second, the United States 10Y nominal versus 10Y Treasury Inflation-Protected Securities (TIPS) breakeven proxy returns full rank, that is, the Johansen test treats both legs as effectively stationary in level on the long 5,840-observation sample. The verdict is mathematically valid but it sits awkwardly against the per-leg ADF result; the pair is also flagged as a non-candidate because an earlier Error Correction Model backtest on it lost money. Third, the Italy-Germany system carries through to three survivors at different points on the curve, which is the structural finding that motivates the rest of the paper plus the two follow-up pieces in Appendix B.

3. What Drives the BTP-Schatz Spread

The BTP-Schatz spread is determined by four long-run drivers and a smaller set of week-to-week movers.

Long-run drivers. (a) Italian fundamentals: debt-to-Gross-Domestic-Product (GDP) trajectory, primary balance, growth differential against Germany, banking-sovereign loop. (b) German fiscal stance: the constitutional debt brake, the 2024-2026 defence-spending amendment, joint-issuance episodes, and safe-asset demand for the Schatz. (c) ECB regime: Asset Purchase Programme history, Pandemic Emergency Purchase Programme (PEPP) reinvestment wind-down, Transmission Protection Instrument trigger conditions and capacity. (d) Italian political risk: election cycles, European Union fiscal-rules compliance, and the pricing of populist-coalition formation.

Short-run movers. (a) The Italian Treasury auction calendar and the supply concessions it generates. (b) ECB Governing Council meeting cadence: rate decisions, press conferences, hawkish or dovish dissents in the accounts. (c) Risk-off correlation, in which the Bund and the Schatz both bid as safe assets while the periphery widens. (d) Position-unwind episodes in real-money and hedge-fund flow.

The most consequential structural change in the recent sample is the Transmission Protection Instrument, announced 21 July 2022. The TPI is the ECB's commitment to make secondary-market purchases of sovereign bonds when a member state's borrowing conditions diverge from policy-warranted fundamentals. The Instrument is asymmetric across the curve: it is targeted at the 10Y end, where Italian sovereign-risk dispersion lives, and barely touches the 2Y. That asymmetry is part of the reason the 10Y is harder to model as a cointegrated relationship while the 2Y is easier.

4. Methodology

4.1 I(1) Confirmation on Each Leg

A pair is eligible for a cointegration test only when each constituent series is itself integrated of order one. Both ADF (Dickey and Fuller, 1979) and KPSS (Kwiatkowski et al., 1992) are applied to each leg, in level and in first difference. Each leg is treated as I(1) when all four conditions hold at the 5% level: ADF on level fails to reject the unit root, KPSS on level rejects stationarity, ADF on first difference rejects the unit root, and KPSS on first difference fails to reject stationarity. Both BTP-Schatz legs pass.

4.2 Johansen Cointegration Test

The Johansen test (Johansen, 1988; Johansen, 1991) is applied to the joint two-leg panel of [italy2y, bund2y] with det_order=0 (constant in the cointegrating equation, no level trend) and k_ar_diff=1 (one lagged difference in the VECM). The trace test sequentially tests the null of cointegrating rank r = 0, 1, 2, ... against the alternative of higher rank, rejecting at each step until the data stops supporting an additional cointegrating relation. The maximum-eigenvalue test is reported as a cross-check. The cointegrating vector is normalised so the first-leg coefficient equals one.

Why Johansen rather than Engle-Granger (Engle and Granger, 1987)? Three reasons. Symmetry. Engle-Granger requires choosing one series as dependent and another as independent, and gives a different p-value depending on the choice. Johansen does not. Joint estimation. Johansen estimates the cointegrating vector(s) jointly across legs rather than recovering them from a reduced-form OLS regression. Generality. Johansen tests the rank of cointegration, which means an n-asset system can carry multiple stationary linear combinations. The Italy-Germany 2s3s10s butterfly (Appendix B) carries two; Engle-Granger applied to a derived single-series butterfly cannot recover that fact.

4.3 Ornstein-Uhlenbeck Framework on the Residual

The cointegration residual is treated as an Ornstein-Uhlenbeck (OU) mean-reverting process. The OU half-life is the implied time-to-halve: the time the residual is expected to take to close half the distance back to its equilibrium level, conditional on no new shocks. It is computed from an autoregressive fit on the change in the residual against the lagged level, which is the discrete-time approximation of the OU stochastic differential equation. The half-life sets the expected holding period and therefore the time stop on the trade.

4.4 The Error Correction Model

The Error Correction Model is the standard cointegration-implied dynamic, written in change-of-variable form:

Δy_t = c + γ Δx_t + λ ε_{t-1} + u_t

where y is the Italy 2Y, x is the Schatz, and ε is the Johansen-based cointegration residual. The mean-reversion coefficient λ is negative and significant when the cointegration is genuine: a positive residual yesterday produces a negative expected change in y today, that is, the residual closes a fixed proportion of its gap each period.

The trading rule combines two conditions for entry. (a) The residual z-score is stretched, |z| ≥ 1.0. (b) The absolute ECM forecast for the next-day change is in the top 40% of the rolling distribution of absolute forecasts (so the model is not just stretched but also expecting reversion to start now). Exit is triggered by any of three conditions: |z| ≤ 0.25 (reversion has happened), forecast sign flips (model has lost confidence), or 120 trading days elapse (time stop, calibrated against the OU half-life).

4.5 Cross-Checks: Kalman and Principal Component Analysis

Two independent methods cross-check the static Johansen approach. The Kalman filter lets the hedge ratio drift as a random walk and produces a dynamic counterpart to the static Johansen β. The Principal Component Analysis (PCA) residual strategy decomposes the joint US, UK, Germany, and Italy rates panel and trades the Italian 2Y minus its loading on the dominant common factor, which is interpretable as a global rates-level factor. Both are reported in section 5 as corroborating evidence rather than as standalone strategies.

5. Results

5.1 Cointegration Evidence

Source: 20260517_232529_coint_johansen_italy_bund_2y_tests.json.

Quantity	Value
Sample	2020-05-04 to 2026-05-12 (1,533 obs)
Trace statistic, H0: r = 0	24.747
Trace 5% critical value	15.494
Verdict at r = 0	Reject (cointegration present)
Trace statistic, H0: r = 1	0.698
Trace 5% critical value	3.841
Verdict at r = 1	Fail to reject
Max-eigenvalue statistic, H0: r = 0	24.048 (CV 14.264)
Cointegrating rank at 5%	1
Leading cointegrating vector	[italy2y: 1, bund2y: -1.0387]
BIC-selected VECM lag	3
Rank at BIC lag	1 (robust)
Residual OU half-life	17.8 days
Residual ADF p-value	0.00031

Both tests reject the null of no cointegration at the 5% level by margins of roughly 1.6 times the critical value. The second null (rank one, i.e. full rank) is comfortably not rejected. The result is robust to a more flexible lag specification.

The leading cointegrating vector implies a hedge ratio of 1.04 on the Schatz leg, normalised so the Italian leg has unit weight. The ratio is statistically distinguishable from 1.0 (the confidence interval around the Johansen estimate is narrow on 1,533 observations) but only just. The interpretation is the standard one for euro-area front-end spreads: Italian and German 2Y yields share near-identical exposure to ECB policy expectations, and the small premium over 1.0 reflects the modestly-greater policy sensitivity of Italian rates given the sovereign-risk overlay (Codogno, Favero and Missale, 2003; Afonso, Arghyrou and Kontonikas, 2015).

The OU half-life of 17.8 days sets the expected holding period at three to four weeks. The 120-day time stop in the trading rule is therefore generous against the half-life and allows slow regimes to resolve.

5.2 Error Correction Model Backtest

Source: 20260517_235141_coint_ecm_italy_bund_2y_metrics.json. The Error Correction Model is fit on the Johansen residual; the trading rule is as specified in section 4.4.

Metric	Ungated	With rolling stability filter
Compound Annual Growth Rate (CAGR)	8.36%	2.25%
Annualised volatility	11.05%	6.03%
Sharpe ratio	0.76	0.37
Sortino ratio	0.63	0.19
Maximum drawdown	-8.66%	-10.54%
Calmar ratio (CAGR / max DD)	0.97	0.21
Number of trades	170	64
Win rate	30.6%	26.6%
Profit factor	1.81	0.95
ECM mean-reversion coefficient λ	-0.0201 (p = 0.0008)	(identical)

The ungated Sharpe of 0.76 is the headline. The result is consistent with the cointegration evidence: a negative and significant λ, a residual half-life under twenty days, and a positive backtest at a Sharpe in the "worth more work" range.

Figure 1. Static Error Correction Model equity curve versus benchmarks, BTP-Schatz, 2020-2026. Strategy (dark), residual buy-and-hold (dashed grey), cash (dotted red).

The filtered version is included for completeness and as a stand-in for an out-of-sample stability check. The filter is currently the project's Engle-Granger-style rolling diagnostic and is not yet Johansen-consistent: the residual being filtered is the Johansen residual but the filter logic is the older reduced-form rolling regression. The mismatch is the most likely reason the filtered Sharpe falls below the unfiltered Sharpe, which is the opposite of the typical pattern. Section 7 discusses the need for a methodology-consistent rolling filter.

5.3 Cross-Checks

Kalman filter (research/backtests/runs/coint/20260512_195333_*). Dynamic hedge ratio drifts as a random walk. Ungated CAGR 11.45%, Sharpe 1.00, maximum drawdown -11.94%.

Figure 2. Time-varying Kalman hedge ratio β_t, BTP-Schatz, 2020-2026. Static Johansen estimate (1.04) overlaid as horizontal reference.

The Kalman hedge ratio is unstable from 2020 into early 2021 (the late-Coronavirus zero-lower-bound regime) and collapses into a tight 0.8-1.0 band from mid-2022, hugging the static Johansen estimate. The visual implication is direct: the dynamic-hedge advantage lives mostly in the pre-2022 window, which coincides with the window that the rolling stability check rejects. The headline Kalman Sharpe of 1.00 therefore combines two distinct things: a stable cointegration regime that genuinely traded well, and a regime that does not pass a contemporaneous stability check and may not generalise.

Principal Component Analysis residual (research/backtests/runs/coint/20260512_223356_*). PC1 explains 95.6% of variance across US, UK, Germany, and Italy rates and is interpretable as a global rates-level factor. The Italian 2Y residual to this factor produces 9.89% CAGR at Sharpe 0.69 over a shorter 2022-2026 window. The result corroborates the Johansen story: there is a stationary Italian 2Y signal that survives removal of the dominant common rates factor. Trade count is low (four) and the result is best read as cross-validation rather than as a standalone strategy.

6. Regime Dependence and Current State

The cointegration relationship is real on the full sample but is not currently active. Only 46% of rolling 504-day windows in the sample produce a beta, Engle-Granger p-value, ADF p-value, and half-life combination that passes the project's stability gate. The most recent window, ending 2026-05-12, fails: rolling Engle-Granger p = 0.52, ADF p = 0.28. A practitioner running this signal today would not be in a position to claim the relationship is currently mean-reverting.

Figure 3. Rolling diagnostics across 504-day windows, BTP-Schatz, 2020-2026. Hedge ratio, Engle-Granger p-value, ADF p-value, OU half-life. (Rolling diagnostics use the legacy Engle-Granger pipeline; a Johansen-consistent rolling version is in scope for follow-up.)

Figure 4. Rolling 252-day Sharpe of the static Error Correction Model signal, BTP-Schatz, 2020-2026.

The rolling-Sharpe figure turns "regime-dependent" from a statistic into a picture. The signal averages a Sharpe close to the static full-sample value. The dispersion around that mean is what matters. The strategy traded well from late 2022 through mid-2023 with a rolling Sharpe peak above 1.5 during the ECB's hiking-to-peak phase, when the spread mean-reverted aggressively around each rate decision. Late 2023 produces a rolling Sharpe trough, coinciding with the ECB's hold-at-peak period in which the BTP-Schatz widened persistently without mean-reverting and the model was on the wrong side of the move. Recovery through 2024 and into 2025 brings the rolling Sharpe back above the long-run mean. The most recent observations are below zero, consistent with the most recent rolling cointegration window failing.

The implication is that the BTP-Schatz is a real but thin cointegration: stable full-sample, unstable across rolling windows, currently off-regime, and still awaiting a broader robustness pass.

7. Risks and the Rigour Gap

Five robustness checks remain before treating the current numbers as evidence of out-of-sample edge.

Deflated Sharpe (López de Prado, 2018). Standard Sharpe ignores selection bias across the number of strategies tested. This lab has tested at minimum thirteen pairs, three methods (Error Correction Model, Kalman, Principal Component Analysis), and several filter configurations. A deflated Sharpe will reduce every headline number above by an amount that depends on the trial count and the heaviness of the Sharpe distribution across trials. The largest expected reduction is on the highest-Sharpe results (the ungated Kalman); the Error Correction Model is more robust to the selection-bias adjustment.

Probability of Backtest Overfitting (López de Prado, 2018). Combinatorially Symmetric Cross-Validation would put a probability on the in-sample Sharpe ranking being preserved out-of-sample. PBO is more a discipline than a binary verdict at the present trade count, but it is the standard test a quant interview expects.

Carry, roll, and repo. All numbers above are price-only. A long-Italy-2Y / short-Schatz position pays carry approximately equal to the spread itself, which over 2020-2026 averaged in the low-to-mid double-digits of basis points. That is material against an 8% CAGR. The single largest unmeasured contributor to the headline Sharpe.

Johansen-consistent rolling diagnostic. The static cointegration step has been upgraded to Johansen; the rolling diagnostic and stability filter inside the Error Correction Model runner are still on Engle-Granger rolling reduced-form estimation. The mismatch is the most likely reason the filtered Sharpe in section 5 came out below the unfiltered Sharpe, which is the opposite of the typical pattern. The natural next step is to extend convergence_common.py to support a rolling Johansen estimator.

Multiple-testing correction. White's Reality Check or the Romano-Wolf step-down would test whether the best-performing strategy in the universe is statistically distinguishable from the best-performing strategy under the null of no edge. This is the gold standard for the kind of universe screen in section 2.

A separate methodological follow-up is volatility-regime-conditional gating: a filter that conditions on residual or rates-volatility regime, distinct from the cointegration-regime filter, that may produce a better Sharpe-versus-trade-count trade-off.

8. Trade Construction

Direction. Long Italian 2Y benchmark, short German 2Y benchmark (Schatz).

Sizing. DV01-weighted with a Schatz-leg multiplier of 1.04 (the Johansen-estimated cointegrating coefficient, normalised on the Italian leg). The choice neutralises the systematic co-movement of the two legs and isolates the residual.

Entry condition. Cointegration residual z-score |z| ≥ 1.0, computed on a 252-day rolling window of the residual's mean and standard deviation, and absolute Error Correction Model forecast of the next-day Italian-2Y change in the top 40% of the rolling distribution of absolute forecasts.

Exit condition. Any of: |z| ≤ 0.25 (reversion has happened); ECM forecast sign flips against the position (model loses confidence); 120 trading days elapse (time stop). The time stop is calibrated to about seven half-lives, giving the residual generous room to resolve.

Cost assumption. One basis point round-trip transaction cost on turnover, applied to position changes.

Capital and exposure. Initial capital 100,000 (currency-neutral for the equity curve). Average daily exposure 27% of capital, that is, the rule is flat the majority of trading days and waits for stretched residuals.

What is not in the rule. Carry / roll / repo is not modelled. The filter is not yet Johansen-consistent. Volatility-regime gating is not applied.

9. What We Would Watch

Five live indicators would tell us the relationship is turning back on.

Rolling Engle-Granger and ADF p-values on the BTP-Schatz residual drop below the 5% threshold and stay there for at least one full month of trading days. This is the cleanest single signal; the rolling diagnostic is published as part of the lab output.
Rolling Ornstein-Uhlenbeck half-life shortens into the 5-to-30-day band. Long or negative half-lives indicate that even where the residual is technically stationary, the close-rate is too slow to trade.
ECB phase shift. The signal traded best in the 2022-2023 hiking-to-peak phase. A symmetric easing-to-trough phase, currently underway, may eventually produce a similar regime in reverse. Watch the trajectory of the Council's policy-rate guidance and the dispersion of dot-plot dissents.
TPI activation or credible threat of activation. The Transmission Protection Instrument's secondary-market purchase commitment is most relevant for the 10Y but compresses front-end risk premia as a side-effect. A clear TPI signal would compress the BTP-Schatz residual variance and may re-activate the mean-reversion.
Italian political event risk priced into BTP-Schatz. Election cycles, EU fiscal-rule confrontations, and government formation episodes have historically widened the spread; the snap-back to fair value is the trade. Watch the political calendar.

When two or more of (1), (2), and (3) align, the rule is worth running live with conservative sizing.

10. Conclusion

The Italy-Germany 2Y front-end carries a statistically valid cointegration relationship that the headline BTP-Bund 10Y does not. Under a Johansen pipeline the hedge ratio is 1.04, the residual mean-reverts in about eighteen trading days, and an Error Correction Model trades the residual at an in-sample Sharpe of 0.76 gross of carry over 2020-2026. The relationship is not currently active on a rolling stability test, the Sharpe is preliminary, and the carry overlay is unmeasured. With those caveats, the front-end Italy-Germany spread is the cleaner convergence trade in the European sovereign-rate universe on the current sample, and the Johansen screen turns up two further Italian-German survivors (the 2s10s slope and the 2s3s10s butterfly) which together form the next research agenda.

References

Citations to be verified against current bibliographic detail before publication.

Afonso, A., Arghyrou, M. G. and Kontonikas, A. (2015). The determinants of sovereign bond yield spreads in the EMU. Journal of International Money and Finance, 49, 198-218.

Avellaneda, M. and Lee, J.-H. (2010). Statistical arbitrage in the U.S. equities market. Quantitative Finance, 10(7), 761-782.

Codogno, L., Favero, C. and Missale, A. (2003). Yield spreads on EMU government bonds. Economic Policy, 18(37), 503-532.

De Santis, R. A. (2014). The euro area sovereign debt crisis: identifying flight-to-liquidity and the spillover mechanisms. European Central Bank Working Paper Series 1419.

Dickey, D. A. and Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366), 427-431.

Engle, R. F. and Granger, C. W. J. (1987). Co-integration and error correction: representation, estimation, and testing. Econometrica, 55(2), 251-276.

European Central Bank (2022). The Transmission Protection Instrument. Press release, 21 July 2022.

European Central Bank (2022). The Transmission Protection Instrument. Economic Bulletin, Issue 6/2022.

Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control, 12(2-3), 231-254.

Johansen, S. (1991). Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica, 59(6), 1551-1580.

Kwiatkowski, D., Phillips, P. C. B., Schmidt, P. and Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics, 54(1-3), 159-178.

López de Prado, M. (2018). Advances in Financial Machine Learning. Hoboken, NJ: Wiley.

Manasse, P. and Roubini, N. (2009). "Rules of thumb" for sovereign debt crises. Journal of International Economics, 78(2), 192-205.

Appendix A. Universe Screen — Full Results

Pair	Sample (days)	All legs I(1)	Trace rank	Max-eig rank	Leading vector (selected)	Verdict
BTP-Bund 10Y	1,533	Yes	0	0	n/a	Fail
BTP-Schatz (Italy-Germany 2Y)	1,533	Yes	1	1	[italy2y: 1, bund2y: -1.039]	Pass
UK-US 10Y	1,491	Yes	0	0	n/a	Fail
Bund-US 10Y	5,737	Yes	0	0	n/a	Fail
US 10Y nominal vs 10Y breakeven	5,840	Yes	2 (full)	2 (full)	[nominal: 1, real: -1.182]	Pass, full rank (ambiguous; non-candidate)
UK-US 2Y	1,491	Yes	0	0	n/a	Fail
SONIA-SOFR	1,984	No (SONIA fails KPSS on diff)	n/a	n/a	n/a	Fail
Schatz-US 2Y	3,025	Yes	0	0	n/a	Fail
UK-US 2s10s slope (4 legs)	1,487	Yes	0	0	n/a	Fail
Bund-US 2s10s slope (4 legs)	3,025	No (Bund 10Y fails KPSS on diff)	1	0	n/a	Fail
Italy-Germany 2s10s slope (4 legs)	1,533	Yes	1	1	see Appendix B	Pass (new)
Bund-US 2s3s10s butterfly (6 legs)	3,025	No (Bund 10Y fails KPSS on diff)	3	2	n/a	Fail
Italy-Germany 2s3s10s butterfly (6 legs)	1,533	Yes	2	2	see Appendix B	Pass rank 2 (new)

Three pairs fail the strict I(1) test because the long-history series (SONIA, Bund 10Y) display non-constant variance across multiple monetary regimes, which KPSS detects in first differences. This is a real finding, not a defect of the test specification.

Appendix B. Other Johansen Survivors

Italy-Germany 2s10s slope, 4-leg system. Cointegrating rank 1. Leading cointegrating vector, normalised on the Italian 10Y leg: [italy10y: 1, italy2y: -2.84, bund10y: -1.30, bund2y: +3.13]. The vector is decisively not the naive slope-vs-slope spread (which would carry coefficients [1, -1, -1, 1]). The 2Y legs load substantially more strongly than the 10Y legs, and the sign pattern implies the relationship is closer to an Italian-10Y-anchored credit-tilted combination of the four legs. A separate working paper will fit an Error Correction Model on the four-leg system and report a backtest.

Italy-Germany 2s3s10s butterfly, 6-leg system. Cointegrating rank 2. Two independent stationary linear combinations among the Italian and German short, intermediate, and long-end yields. Leading vector, normalised on the Italian 2Y leg: [italy2y: 1, italy3y: -1.66, italy10y: +0.69, bund2y: -2.74, bund3y: +3.86, bund10y: -1.14]. The second cointegrating vector and the economic interpretation of both relations are the subject of a separate working paper.

Both follow-up pieces will share the methodology stack defined in section 4: I(1) leg confirmation, Johansen with BIC lag sensitivity, OU framework on the leading residual, Error Correction Model on the residual.

Appendix C. Replication

All numbers in this paper are reproducible from the project repository. Key commands:

# Full Johansen universe screen (13 pairs)
python -m research.backtests.runners.run_coint_johansen

# Single pair, with lag sensitivity
python -m research.backtests.runners.run_coint_johansen --preset italy_bund_2y

# BTP-Schatz Error Correction Model with Johansen β
python -m research.backtests.runners.run_coint_ecm italy_bund_2y --beta-override 1.0387

# Kalman backtest
python -m research.backtests.runners.run_coint_kalman italy_bund_2y

# Principal Component Analysis residual
python -m research.backtests.runners.run_coint_pca italy2y

Run artefacts (JSON metrics, CSV trades, PNG charts): research/backtests/runs/coint/20260517_*.

Sample window for the Error Correction Model and Kalman runs: 2020-05-04 to 2026-05-12. Sample window for the Principal Component Analysis runs: 2022-05-24 to 2026-05-07.

Working paper, personal portfolio. Not investment advice. Not a recommendation.