Parallel Trends: The Make-or-Break Assumption for Difference-in-Differences

Difference-in-Differences (DiD) is a widely used technique to estimate causal effects when randomized experiments are not feasible. From policy evaluation to marketing campaign effectiveness, DiD is a straightforward and efficient method for assessing the causal effect. The goal of this article is not to explain the details of this method, but rather to delve a bit deeper into the foundation of this method—the parallel trends assumption.

To better understand this problem, we will simulate a scenario in which we are interested in estimating the causal effect of the job training program on the employment rate. We would like to assess whether the job training program in city A had a positive impact on employment in this city. 

However, policymakers decided to roll this program out to the entire city, so we do not have information from a randomized trial. 

Moreover, we do not have information about individual participants in the program. This makes it impossible to use methods like propensity score matching to estimate the causal effect.

To summarize, we only have data at the aggregated level. This is a standard setup for analyzing the effect of a given action. We’re talking here not only about evaluating public policy but also a wide range of business activities, such as marketing campaigns or sponsorships. Hence, understanding methods that help us evaluate such activities is of the utmost importance.

In a nutshell, and somewhat oversimplified, causal inference is all about finding benchmarks or comparison groups. We can’t estimate a causal effect just by analyzing city A, as there are too many factors that we are unable to control. Luckily, in our example, we have access to the employment data from City B. City B is a neighboring city to City A with a very similar size, demographics, and economic structure. Fortunately for us, City B didn’t introduce any job training program.

Difference-in-difference

What do we know so far? In 2017, City A launched a job training program designed to reduce unemployment. Its neighboring City, B, did not. By 2021, unemployment in City A had dropped sharply. Does it allow us to conclude that the policy was effective? 

But before celebrating, we must ask a crucial question: Did the program cause the drop, or was City A on track to achieve it without the job training program? This is precisely the kind of question the Difference-in-Differences method is designed to answer.

DiD is a widely used causal inference method that estimates the effect of a treatment or intervention by comparing the changes in outcomes over time between a treated group and a control group. If both groups had evolved similarly in the absence of treatment, then any post-treatment divergence in outcomes can be attributed to the intervention.

The assumption that both groups would have followed parallel trends in the absence of treatment is essential. If this assumption holds, DiD can give reliable causal estimates. If not, the results may be misleading. We will investigate the effect of this assumption on the DiD method further.

Simulating the scenario

To demonstrate the importance of parallel trends, we simulated 11 years of unemployment data (2011–2021) for both cities:

  • City A implemented a job training program in 2017.
  • City B serves as the control and does not implement any intervention.

Let’s start by building code that will allow us to apply DiD in two scenarios—one in which the parallel trends assumption holds and one in which it doesn’t. We will hard-code a true treatment effect, represented by a -1 decrease in the unemployment rate, to see which scenario can be correctly identified when the parallel trends assumption holds or doesn’t hold.

Let’s encompass our simulation in the following function with one parameter switching on and off the parallel trends assumption:

The function consists of the following parts allowing us to create the simulated scenario

  • Macro trend: Unemployment declines gradually over time.
  • Treatment effect: Decrease of unemployment in city A after the treatment starts
  • Violation of Parallel Trends: If parallel_trends=False, City A improves faster before the treatment starts. This violates the parallel trends assumption by introducing a differential pre-treatment trend.
  • Random noise: Adds individual-level randomness to simulate real-world variation

Scenario 1: Parallel Trends Hold

Before 2017, both cities showed similar trends in unemployment. After the program began, City A’s unemployment rate dropped more rapidly. A DiD regression estimates a treatment effect of about -2 percentage points, matching the true effect programmed into the simulation.

model_parallel = smf.ols('unemployment ~ treated + post + did', data=df_parallel).fit()

print(model_parallel.summary())

While inspecting the data visually, we can see that in Scenario 1, City A and City B track closely before 2017. The treatment effect becomes visible as a divergence afterward.

When we run a Difference-in-Differences regression on this data, it accurately estimates a treatment effect of approximately -2 percentage points, very close to the actual, simulated impact. This illustrates how well DiD can work when the parallel trends assumption holds.

Scenario 2: Parallel Trends Violated

Scenario 1 enables us to estimate the treatment effect using the DiD approach accurately. It happens because we know that the parallel trends assumption holds. 

However, consider an alternative scenario. Suppose City A had already begun experiencing improvements in unemployment rates well before the 2017 intervention, perhaps due to stronger economic fundamentals or demographic advantages not readily apparent initially. In this scenario, pre-existing differences would violate the parallel trends assumption.

This creates a misleading dynamic. Before 2017, City A was already outperforming City B due to factors unrelated to the policy:

When we apply the same DiD regression model, it estimates a treatment effect of approximately -3 percentage points. While this result might seem even more impressive than in Scenario 1, it’s invalid. The method incorrectly attributes pre-existing improvements to the job training program, thereby inflating the treatment effect. 

This scenario illustrates how even subtle deviations from parallel trends can lead to inflated or biased estimates and result in erroneous conclusions.

Fortunately, in both scenarios, we assumed that the trend direction is the same and that treatment has a positive effect. However, if we allowed the parallel trends violation to be even more substantial, then it’s possible to obtain an adverse impact of the treatment. Such a scenario would be way more dangerous, as it would lead to wrong conclusions and decisions. We can observe an example of such situations below:

Diagnosing the Parallel Trends Assumption

The simple example illustrates the importance of the parallel trends assumption for the difference-in-difference analysis. Without it, measuring the actual effect of the treatment is invalid. 

Unfortunately, I have some bad news: the parallel trends assumption is effectively untestable. We must remember that we assume that without treatment, both groups would continue to trend in the same direction. It is a very strong assumption; we can never be sure about it because we don’t know the future. We have no way of checking the alternative universe in which the treatment didn’t occur. And if it would be the only sure way to validate it.

However,  we shouldn’t be too worried about it. Ultimately, all analytics in all contexts are about making assumptions. Our job as analysts or data scientists is to make sure that those assumptions are valid and logical. We must be able to defend them based on existing data, domain knowledge, and a bit of cleverness. This is also a case with difference-in-difference. We can’t be sure about the validity of the parallel trends assumption, but we could attempt to test it to the greatest extent possible. 

To do this, we can apply the following diagnostic tools:

1. Visual Inspection

This approach is the simplest, the most obvious, and very often the most logical one.  Plotting outcomes by group and time is the first step. We just look for pre-treatment alignment in trends. If the visual inspection reveals discrepancies in the trends, then we can be pretty sure that DiD may not be the most suitable causal inference tool for our problem. 

In our example, it’s pretty simple. Examining the chart in scenario 1, we can see that the parallel trends assumption holds. In scenario 2, the trends are diverging and not trending in the same direction. It would make the parallel trends assumption highly disputable and likely invalid.

2. Pre-Trend Tests

As the next step, we could consider applying a more scientific and rigorous approach to assess the pre-trend assumption. After all, visual inspection might be misleading.

One of the most direct ways to verify this assumption is through a pre-trend test.

A pre-trend test involves checking if the trends of the outcome variable differed significantly between the treated and control groups before the intervention. We will check if the DiD estimates might falsely attribute these pre-existing differences to the policy or treatment effect.

We achieve this by running a regression with the treatment variable as the target and an interaction term between the treatment indicator and year as the predictor. We only use the pre-treatment data to assess the validity of our assumption.

In scenario one, the interaction term (treated:year) is not statistically significant. A non-significant result indicates no substantial difference in trends before treatment, making the DiD analysis credible:

In scenario two, the interaction term is statistically significant, indicating that the treated group’s unemployment changed differently even before the treatment started. This suggests that the parallel trends assumption is violated, cautioning against directly relying on the DiD estimate:

3. Placebo Tests

The pre-test adds a bit more complexity to our analysis, but ultimately, it’s just a more rigorous way of visually checking data. It’s good to confirm what we see, but I think it would be pretty rare to consider data that matches visually and find a definitive answer using this approach. 

I’m not saying it’s not possible, but I recommend using this test as a confirmation method, in conjunction with other approaches. And this way, we arrived at the most complex method to check the validity of the parallel test assumption—the placebo test.

A placebo test helps reveal false positives by pretending the intervention started earlier, during a period when you know there was no actual treatment. Placebo tests serve as diagnostic checks by verifying if the treatment effect is specific to the real intervention period or if it appears erroneously in earlier, untreated periods. Finding significant effects in these “fake” treatment periods indicates serious issues, such as violations of the parallel trends assumption or unobserved confounding factors.

First, we have to prepare the data for the placebo test. We do it by running the following piece of code. We introduce a new variable, placebo_post, which flags the years from 2014 onwards as the hypothetical “post-treatment” period, even though no actual treatment occurred at that time. 

If the placebo model finds a significant treatment effect during this placebo period (before the actual intervention), we have uncovered a critical red flag. This suggests that our analysis may be capturing pre-existing trends or other biases rather than an actual causal effect of the policy.

Scenario 1 (Valid Parallel Trends):

The coefficient on placebo_did is statistically insignificant. The lack of significant findings confirms that no false treatment effect existed before the real intervention.

Scenario 2 (Violated Parallel Trends):

The coefficient of placebo_did is significant this time, indicating a non-existent effect. This result is a red flag, clearly showing that the parallel trends assumption is violated and that the data were already diverging before the real treatment.

If our DiD passes the placebo test, we gain confidence in results. If it fails, you know to reconsider your assumptions, refine your model, or explore alternative methods.

In summary, the Difference-in-Differences method remains a powerful tool for causal inference, especially when controlled experiments aren’t feasible. However, its credibility hinges critically on the parallel trends assumption, which asserts that treated and control groups would have followed similar trajectories in the absence of intervention.

Although this assumption is untestable, leveraging visual inspections, pre-trend tests, and placebo analyses can help analysts gain confidence in their DiD results or identify potential pitfalls early. Ultimately, careful diagnostics, combined with strong domain expertise, are essential to ensuring that causal inference derived from difference-in-differences remains valid.

References

  • Angrist, J. D., & Pischke, J. S. (2008). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.
  • Card, D., & Krueger, A. B. (1994). Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania. American Economic Review, 84(4), 772-793.