Based on Gatev et al. (2006), pairs trading follows a two-period process: in the first period

(training period) the objective is to find two securities with similar historical prices movements; the second period (trading period) involves the monitoring of the spread between the prices. Whenever the spread widens, two positions are opened: short position on the overvalued stock and a long position on the undervalued one. The positions are then closed as soon as the spread reverts to its historical mean.

In total, we identify the following five streams of literature relevant to pairs trading research:

__Distance approach__: This is the most thoroughly explored pair trading methodology. Various distance metrics are used throughout the formation period to detect comoving securities. Simple nonparametric threshold criteria are used to trigger trading signals during the trading period. This strategy's main strengths are its simplicity and transparency, which allow for large-scale empirical applications. The major findings show that trading distance pairs is lucrative in a variety of markets, asset classes, and time durations.__Cointegration method__: Cointegration tests are used to identify comoving securities during the formation phase. Simple algorithms are employed to create trade signals during the trading period, with the majority of them based on GGR's threshold criterion. The main advantage of this approach is that the established equilibrium relationship is more econometrically trustworthy.__Time series approach__: The formation period is often overlooked in the time series approach. All authors in this domain presume that past analyses have developed a collection of comoving securities. Instead, they concentrate on the trading period and how alternative approaches of time series analysis, such as modeling the spread as a mean-reverting process, might yield optimum trading signals.__Stochastic control technique__: The formation period is neglected, as in the time series approach. This body of research seeks to determine the best portfolio holdings in the legs of a pair trade in comparison to other available assets. The value and optimal policy functions for this portfolio problem are determined using stochastic control theory.__Other techniques__: This bucket covers additional pairs trading frameworks with only a small amount of supporting material and only a tenuous relationship to the previously stated approaches.

The machine learning and combined predictions approach, the copula approach, and the Principal Components Analysis (PCA) approach are all included in this category.

**Cointegration method**

Today we will focus on a backtest of a pair trading strategy with Kalman's Python filter on co-integrated equity pairs. The theory underlying this strategy assumes that statistically co-integrated equity securities move directly correlated. When their prices begin to deviate to a certain threshold (i.e. the spread between the prices of the 2 stocks increases), we expect prices to begin to return to the average and finally converge again. The strategy is to sell the higher-performing securities and buy the lower-performance securities. We expect that the lower-performance securities will have to “recover” the higher-performing securities and therefore increase in price. On the contrary, the higher-performing securities will have to return to the average and therefore suffer a downward pressure that decreases the relative value of the securities.

Pairs trading is a market-neutral trading strategy that allows traders to profit from virtually any market condition: uptrend, downtrend or side movement. We must first identify the co-integrated actions. To this end, economic theory suggests that we are more likely to find equity pairs driven by the same factors, if we look among equity securities in the same or similar sectors. So we start by screening 30 companies in the information technology sector and this is what we get:

Cointegration is a statistical concept used in time series analysis to examine the long-term relationship between two or more non-stationary time series variables. In simpler terms, it helps identify whether there is a stable, long-run relationship between variables that may individually exhibit non-stationarity (i.e., their statistical properties, such as mean and variance, change over time).

The main idea behind cointegration is that while individual time series variables may be non-stationary (i.e., they have a unit root, which means their mean and variance can change over time), a linear combination of these variables can be stationary. In other words, cointegrated variables tend to move together in the long run, even if they exhibit short-term fluctuations. This is usually shown by regressing one variable on the others and testing whether the residuals are stationary.

Now we test cointegration between all the pairs and we produce a heatmap to show the results and see which pair could be interesting for our analysis. The darker the color the more significant is the cointegration between the two stocks.

These are the results and here we can see which pairs have a significant cointegration.

**The Kalman filter**

To go further in the application we need to introduce a new concept, that is the Kalman filter, which will be later applied to our strategy. The Kalman filter is an algorithm used to solve state-space models. The main presumption of a model of state space is to have a collection of states that evolve over time (such as the coverage ratio between two cointegrated firms). Unfortunately, we are never in a position to directly see "true" statistics since our observations of these statistics contain statistical rumors (such as market microstructure). The goal of the state space model is to provide information about states based on observations, as new information becomes available.

Since system states depend on time, you need to enter t in the pedice. We use θt to represent a vector of states. In a linear model of the space of states, the latter can be considered as a linear combination of the state prior to time t−1, i.e. as the noise of the system. To simplify the analysis we consider this noise as a multivariate normal distribution, but of course other distributions can also be used. The linear dependency of θt on the previous state θt−1 is given by the Gt matrix, which can vary over time. The time-dependent multivariate noise is given by wt. The report is summarized below in what is often called a state equation:

We also have to describe the observations, that is, what we actually see, because the states are hidden from the noise of the system. We can denote observations (dependent on time) with yt. The observations are a linear combination of the current state and some additional random variations, referred to as measurement noise, also characterized by a multivariate normal distribution. Consider Ft (dependent on time) as the linear dependency matrix of θt on yt, and vt as the measurement noise, then the observation equation is given by:

Now that we have specified the linear model of state space, we need an algorithm to actually solve it. This is where the Kalman filter comes into play. We can use the Bayes rule and conjugate the priors to help us derive the algorithm. If we are in time t, we can represent all known data on the system by quantity Dt, while our current observations are given by yt. So we can extract Dt=(Dt−1,yt), our current knowledge is a mixture of previous knowledge and the most recent observation.

Applying the Bayes rule to this situation results in the following:

From the above it is deduced that, given the current yt observation and the previous data Dt−t, the probability, a posteriori or updated, of obtaining a θt state is equal to the likelihood of seeing a yt observance, given that the current state θt is multiplied by the a priori or previous conviction of the present state, given only the earlier data DT−t normalized by a probability of seeing independently the observation yt.

In summary, ft is the predicted value of the observation at time t, prediction which is made at the time t−1. From et=yt–ft we see that et is the error associated with the prediction.

First we need to show why the Kalman filter is a better option than a fixed one. Running a regression with the Kalman approach between a pair chosen at random from the first sample, we obtain that the slope of the regression would assume values between 1.05 and 0.4, showing a high volatility. The same happens for the intercept which ranges between 1 and 1.5. This means that having a static approach would keep the same parameter both for the slope and for the intercept during the 13 years and as we can see it is not really efficient.

The logic behind the strategy is very simple: the Kalman filter is used to perform regressions to find the rolling means and rolling volatilities to compute the z-scores. The z-scores are used to see if the spread is higher or lower than a predefined parameter (usually 1 standard deviation). A positive z-score indicates that the spread is above its mean, suggesting that one stock is overvalued relative to the other. This might trigger a short position on the overvalued stock and a long position on the undervalued stock.

A negative z-score indicates that the spread is below its mean, suggesting that one stock is undervalued relative to the other. This might trigger a long position on the undervalued stock and a short position on the overvalued stock.

You set predefined thresholds for the z-score, such as entry and exit thresholds. For example, you may decide to open a long position when the z-score falls below a certain negative threshold (indicating the spread is too wide) and open a short position when the z-score rises above a certain positive threshold (indicating the spread is too narrow). Similarly, you may exit a long position when the z-score rises above a certain positive threshold or exit a short position when the z-score falls below a certain negative threshold.

The Sharpe Ratio for the strategy giving equal weight to each pair will be 2.45 and a CAGR of 2.3%, which is impressive and the most important thing is that large drawdowns are really rare, so what makes this strategy fairly profitable is the possibility to use a lot of leverage.

## Comments