Mastering Causal Inference with Python: A Guide to Synthetic Control Groups

One can feel intrigued when a newspaper like the Washington Post writes an article about the statistical method. Statistical modeling isn’t usually the most exciting topic. However, in 2015 (yes, it was a long time ago), the Washington Post released an article describing the synthetic control group method. The fact that such a reputable source was discussing it hints at its importance. This article examines one of the most critical components of the causal inference arsenal: the synthetic control group.

https://www.washingtonpost.com/news/wonk/wp/2015/10/30/how-to-measure-things-in-a-world-of-competing-claims/

This post is based on a seminal article by Alberto Abadie and Javier Gardeazabal, The Economic Costs of Conflict: A Case Study of the Basque Country. Their analysis not only examined the effect of terrorist activity on the economic development of the Basque Country but also set a new standard for causal inference research.

Terrorism activity in the Basque Country

Let’s start with a brief description of the analyzed problem. In the middle of the 1970s, the Basque Country, one of the Spanish regions, became affected by terrorist activity conducted by the separatist ETA group. The group wanted to gain independence from Spain for this region. The authors of the article mentioned above evaluated the effect of violence on the economic activity of the Basque Country.

It is a perfect topic to explore from a causal inference perspective. As a first try, I tried to find an answer using the difference-in-differences method. This approach worked well when comparing the economic development in the Basque Country with that of similar regions, such as Catalonia. However, it does not work well when comparing several regions.

This happened because the parallel trend assumption does not match our data. The chart below shows that distinct Spanish regions experienced varying economic growth rates before 1975. Therefore, we must find another reliable method to detect a causal effect. The synthetic control group comes to the rescue.

Counterfactual

Before we explain this method, let’s consider what we want to accomplish. Given a set of different covariates X, how can we find a causal effect of activity T on outcome Y? In our case, we would like to determine the impact of terrorism activity on the Basque Country’s economic development.

Let’s call T an activity we would like to find an effect on. Y is the outcome variable. The treatment effect (indicated by the letter delta) is defined by:

We aim to obtain a difference between the outcome variable in the presence of the given activity and the same outcome variable without the specified activity. This is a bold formula, as we cannot observe the second part of the equation. The equation above contains the counterfactual term P(Y | T = 0). In causal inference, a counterfactual is a hypothetical scenario representing what would have happened if a particular event or condition had not occurred.

In the analyzed problem, the first term of the equation contains actual data — the GDP of the Basque Country affected by terrorism activity. The counterfactual is a hypothetical Basque Country not affected by terrorist activity. To illustrate this, let’s consider a hypothetical scenario where the Basque Country did not experience terrorist activity. In this scenario, we can estimate the potential increase in GDP that could have occurred.

But at a given time, there is only one Basque Country. The counterfactual is just a hypothetical world. The equation above introduces two states of the world, two different realities. And we can observe only one of them — this is the fundamental problem of causal inference.

It is impossible to observe the same unit in two distinct versions of reality. However, we can still attempt to estimate it based on available information and clever techniques. To find an answer to our question, we need to establish what would have happened to the Basque Country if ETA had not started their violence in the middle of the 1970s. We have to find the counterfactual.

Our quest for the counterfactual leads us to the concept of a ‘clone’ of the Basque Country. This is not a search for an alternative reality but a pursuit of another unit in the real world that mirrors the Basque Country yet remains unaffected by the given activity. This approach brings us a step closer to our goal.

Of course, we will never find a perfect clone of any social entity. We have to estimate it. The clue to many evaluation methods is to find a valid comparison. This group must be as similar as possible to the affected unit. If we achieve it, we will have the following groups:

  • treatment group — a unit that was affected by the treatment
  • comparison group- a unit that was not affected by the treatment but resembles the treatment group in all other criteria

We assume the treatment group would have the same outcome variable values as the comparison group without treatment (because they are identical). Thus, any difference between those two groups after the treatment must have occurred solely due to it. This approach allows us to determine the causal effect of a given phenomenon.

This idea is simple, but we still have a lot of work to do. How can we find the ideal comparison group? The optimal approach is to conduct random experiments, which is not feasible in numerous real-world settings. We must search for the next best solution.

All non-experimental approaches require us to investigate a particular issue and review existing information to find comparison units that resemble the affected group. We already examined the difference-in-differences method; now, it’s time to explore the synthetic control group.

Donor pool

What would be the best comparison group to estimate the causal effect of terrorism activity on the Basque Country’s economy? We have already answered this question – other Spanish regions. We have data about their GDP per capita over the years. The remaining task is determining a method for combining them into a single entity. As we saw previously, we cannot use difference-in-differences due to a lack of parallel trends.

But we can use a synthetic control group! This method allows us to create a weighted combination of the comparison groups to create the synthetic control group. We will combine the data from many potential comparison units into one. Those will be a part of the so-called donor pool.

All we have to do is weigh each of these units, and their combined weight should create a comparison group that will resemble the treatment group as much as possible. We have to make them comparable in the period before the treatment.

Synthetic control models optimally choose a set of weights which, when applied to a group of corresponding units, produce an estimated counterfactual to the unit that received the treatment.

To create a synthetic control group, we can only look at the pre-treatment period because this is the period when they are supposed to be as similar as possible.

To evaluate the treatment effect, we can compare the outcome variable in the treatment group to the values observed in the synthetic control group. The former is our counterfactual — it is a proxy to estimate what would have happened to the treatment group had the event of interest not occurred.

For the Basque Country problem, we have all the ingredients needed to apply the method. We have data about the Basque Country and all the other regions that are part of the donor pool. Now, we have to find a way to combine all the remaining areas to create a group resembling the pre-1970s Basque Country as much as possible.

How do we create a synthetic control group?

We are almost there. Let’s summarize the approach we have taken so far. To create a synthetic control group, we should:

  1. Define our donor pool
  2. Combine the donor pool units into one to resemble the treatment group before the treatment
  3. Calculate the treatment effect

We have already defined the donor pool. Now, the crucial step is ahead of us. How can we combine the remaining Spanish regions to create a synthetic Basque Country? To accomplish this, we must determine the respective weights of each of them. Those weights have to resemble the pre-treatment GDP trend in the Basque Country. Fortunately, we already have statistical methods we can apply — the good old linear regression.

We can regress the GDP per capita in all Spanish regions on the Basque Country. In such a regression, the GDP of all other provinces is an independent variable, and the Basque Country’s GDP is the dependent variable. I’m reducing the details here, and you can find numerous resources that explain this approach better with statistical and technical information. Since we want to apply this approach, let us focus only on the necessary aspects. We will use the most straightforward (but still powerful) approach to see the power of this method.

We will use the Basque dataset as in the previous difference-in-differences study. Below is the data which describes the GDP per capita in a given Spanish region in a given year. To calculate, the synthetic control group we will use only data points from 1975:

First, we must transform the data to build a synthetic group. We then have to pivot it from the panel structure. Each region has to be stored in a separate column. Rows will indicate each year, with GDP as a value. This can be quickly done using pandas pivot.

The transformed dataset looks like this:

Another transformation we need to make is to separate the dependent and independent variables. We must store the Basque Country in a separate series and remove it from the table above.

And the data is ready. If the values were on a different scale, we would have to standardize them, but it is not needed here as the GDP is on a similar scale across all regions.

Let’s build the first synthetic control group. We will start with a simple linear regression. It will give weights to each region to minimize the sum of squared errors between them and the Basque Country. Since we want to make predictions, I will use the scikit-learn library.

Voilà, the model is already created. Now, we can apply it to our original dataset. We have to make another pivoted table. This time with all the years, but still without the Basque Country (as we can’t use it for prediction):

As the last step, we must apply the synthetic control group results to our main table, which contains the Basque Country. We will use this structure to calculate the treatment effect.

The last column in the table above contains the synthetic control group prediction. We have created a linear combination of all the regions, which is supposed to resemble the Basque Country as closely as possible.

Or maybe it’s too close? When we compare values between 1955 and 1975, we find that the GDP per capita of the synthetic control is almost identical to that of the Basque Country. It indicates overfitting. Before jumping to conclusions, we can finally see trends in synthetic control and Basque Country.

It looks like I did something wrong. Since 1975, the Basque Country has had a much higher trend than the synthetic control group. The overfitting indeed occurred. Our synthetic control group was too good before the treatment — it just mimics the behavior of the Basque Country but can’t generalize well. We have to restrain linear regression.

This picture shows two important things. First, better tools exist to create a synthetic control group than standard linear regression. Second, to evaluate the results, we must apply domain knowledge, compare it with other methods (like difference-in-differences), and use common sense. Any strange results, such as those above, must be verified.

Constraining the coefficients

We are back at the drawing board. There are many ways to prevent linear regression from overfitting. The most common and powerful ones are the ridge and lasso regressions. Both apply the regularization parameter, which keeps the weights from getting too high.

Ridge regression effectively shrinks the parameters, while lasso goes even further and sets some of them as zeros. I will use ridge regression, as even the least similar region can contribute slightly to the synthetic control group weights. We can build a ridge regression with the skicit learn:

I applied here the 5-fold cross-validation, which will choose the optimal regularization parameter. After repeating the transformation we did for the linear regression above, we ended up with a new solution:

It looks much better now. The fit pre-1975 is not perfect, but the synthetic control group resembles the Basque Country. After 1975, the Basque Country decreased significantly compared to the control group.

The synthetic control group shows what would have happened to the Basque Country had the ETA terrorist activity not occurred. The difference between the blue and orange lines is the treatment effect — the loss of GDP per capita in the Basque Country caused by terrorism activity.

Let’s explore the synthetic control group more before calculating the treatment effect. We can examine the weights of each of the regions.

It’s worth noting that some regions have contributed positively while others have contributed negatively. It raises an important question: Can we assume that some areas contribute negatively to creating synthetic control groups? Wouldn’t it be more appropriate to constrain each weight to be between 0 and 1?

Abadie, Diamond, and Harmuller solved this issue in the 2010 article. They applied the optimization constraints to restrict the weights in this way. I recommend reading this article to gain a more profound understanding of his approach.

Since then, a few libraries have implemented the synthetic control group method. I won’t implement this optimization here, as it would not significantly expand our knowledge of this technique or enhance the model in the Basque Country scenario.

Treatment effect

As we established earlier, the synthetic control group is the counterfactual. We assume that the treatment group would have the same outcome variable values as the control group had the event not happened.

To estimate the effect of terrorist activity, we can compare the post-1975 GDP in the Basque Country to the synthetic control group. Let’s plot the differences.

After 1975, the Basque Country’s GDP was nearly one unit lower than it would have been without the terrorism activity. Throughout the 1980s, this gap remained wide and only began to narrow in the 1990s. It demonstrates the potent impact of terrorism and the extent to which the violence has impacted this region. The rebound in the 1990s is related to the peace process that ended the streak of violence.

We also calculate the average effect of terrorist activity by calculating the average GDP per capita in both groups for the post-1975 period.

The code above shows that, on average, terrorism reduced the Basque Country’s GDP per person by 0.84 units.

The impact of lost opportunities is significant and shows that combating terrorism is crucial to maintaining good prospects for economic prosperity. This result is similar to what we obtained in the difference-in-differences estimation with Catalonia, which supports our initial hypothesis that there is a close pre-treatment resemblance between those two regions.

Placebo test

Before we conclude, let’s address a crucial question: How reliable are these results? Could they be mere chance? The synthetic control group method offers a straightforward way to calculate our confidence in the obtained results. This is done through a placebo test.

The objective of this test is similar to the placebo test in clinical trials. It involves applying a synthetic control group to a set of fake treatment units that did not get the treatment but are treated as if they did. By comparing the treatment effect in placebo units with the outcome in the affected unit, we can ascertain the statistical significance of the event of interest. The step-by-step algorithm for conducting a placebo test is as follows:

  1. Specifying a placebo unit — in our case, we will iteratively treat each region as the affected group.
  2. Constructing the synthetic control group — at this step, we will create a synthetic control group like we did for the Basque Country, but we will use the placebo unit as the target region.
  3. Calculating the treatment effect — the difference in GDP for each person between the synthetic control group and the placebo region.
  4. Repeat the process in steps 1–3 for all regions.
  5. Calculate the statistical significance of the original synthetic control group by measuring how often the GDP difference in the placebo regions was greater than in the actual control group.

It is helpful to create a function to calculate a synthetic control group since we will have to repeat this process a few times. I will package all the above steps into a function, allowing us to select different regions as the affected units. I am just an analyst, not a programmer, so please forgive me if it’s not the most optimal way (but it works, so we’re good 🙂 ).

The function’s output is the data frame with synthetic control group estimates and the difference between them and the actual values in the target region.

The code below adds one additional caveat we haven’t discussed yet. It excludes regions with a poor pre-treatment synthetic control group fit, allowing us to have more stable results. We can follow the procedure presented in Abadie (2010), and excluding all regions for each MSE was more than five times higher than in the Basque Country.

This additional condition allows us to include only those placebo units with a solid pre-treatment fit. We can now actually execute the test. In this step, we must follow the placebo test procedure by running the function many times, each time selecting a different region as if it was affected by ETA terrorist activity.

The placebo test excluded only one region, which indicates that the pre-treatment fit was good for all the units.

After executing the code outlined above, we obtained a list of data frames containing placebo synthetic control groups. The placebo test, in its simplest form, follows a straightforward procedure.

Let’s plot the GDP difference between the placebo regions and the one we obtained from the Basque Country’s synthetic control group.

It is immediately apparent that the effect visible in the Basque Country is very unusual. The placebo regions did not have an immediate negative impact of this magnitude, but the actual synthetic control group quickly became intensely different from the placebo units.

We can confirm the visual by calculating the p-value of the synthetic control group. There is no one correct option for calculating the p-value of the synthetic control group method. For our exercise, I decided to use the following procedure.

We will iterate over each post-treatment year, from 1975 to 1997. Each year, we will calculate the difference in GDP between the Basque Country and the synthetic control group. We will then iterate over each placebo region with its synthetic control group and calculate the GDP difference for this year.

As a result, we will store a boolean variable as True when the placebo effect is larger than the effect obtained in the Basque Country and False otherwise. This procedure will be repeated for each year and each placebo group.

To calculate the p-value, we only need to calculate the average of the diff_list. It calculates the proportion of times when the placebo effect was more substantial than the effect obtained in the Basque Country synthetic control group.

After calculating the average, we obtained the following results:

The p-value is below the standard threshold of 0.05, which indicates that the effect obtained by the Basque Country study is unusual. We can reject the null hypothesis, stating that terrorism activity did not negatively impact the Basque Country GDP per capita.

Summary

The description presented in this article shows the power and usefulness of the synthetic control group. It is one of the best tools in the causal inference arsenal because it allows us to create a counterfactual without assuming parallel trends. It’s particularly useful when traditional experimental methods like randomized controlled trials are not feasible or ethical.  

The method is, of course, not perfect. Since the method constructs a control group based on historical data from untreated units, it assumes that the relationships between covariates and outcomes remain stable over time (in the absence of treatment). Additionally, we can only sometimes be sure that our event of interest is the only thing affecting the outcome in the affected group. Such an assumption requires much domain knowledge and knowledge of the analyzed problem. Hence, we can’t apply it as a black box algorithm to solve the issues.

This approach is beneficial for evaluating different interventions at the aggregated level. Applying synthetic control at the level of specific regions and countries is the best application of a synthetic control group. The relative ease of this method makes it much more popular. I regret learning about it only relatively recently, and I hope this article will add a small contribution to the popularization of the synthetic control group.

References:

Abadie, Alberto & Gardeazabal, Javier. (2003). The Economic Costs of Conflict: A Case Study of the Basque Country. American Economic Review. 93. 113–132. 10.1257/000282803321455188.

https://mixtape.scunning.com/10-synthetic_control

https://matheusfacure.github.io/python-causality-handbook/15-Synthetic-Control.html

Recommended Articles

1 Comment

  1. Thanks for the comprehensive overview. Very helpful!

Leave a Reply

Your email address will not be published. Required fields are marked *