Marketing teams often celebrate when dashboards show a clear, positive correlation between sales and advertising activity.
Our campaigns are driving 5% more conversions, so we can increase the budget now!
However, everyone working with data knows it is not simple; the world is much more complex. Correlations are often very deceptive. In the real world, the advertising budget is usually allocated to already high-performing products.
For example, a brand with already high conversion might have allocated a better budget. In this example, marketing spend does not necessarily drive conversion; it is only correlated with it.
The problem is that what appears to be a clear relationship in purely observational data is usually a complex structure beneath the surface. Higher-performing brands naturally attract more budget, so their strong conversion performance is intertwined with their higher ad spend. Competitors’ activity, the broader economic context, and many other factors influence both spending and performance. And often in the same direction.
When everything moves together, it becomes difficult to tell whether ad spend actually caused the increase in conversions.
This is exactly where controlling for other variables comes into play. Adding more control variables to our regression model can reveal the causal effect, assuming we understand the causal structure well.
And such an approach is excellent from a technical standpoint. However, controlling for confounding variables hides many practical details which are essential for a better understanding of causal inference.
Enter Frisch-Waugh-Lovell
This is exactly where the Frisch-Waugh-Lowell (FWL) theorem comes into play. It provides a clear explanation of what the model does when controlling for other variables.
The FWL theorem shows that when we want to measure the effect of one variable while holding others fixed, we can decompose a multiple regression into a simple, step‑by‑step “partialling‑out” procedure.
In particular, it proves that the coefficient from a standard multiple regression can be obtained in two equivalent ways:
- The standard approach
- The partialling out approach
In the standard approach, we run a single regression of the outcome on all independent variables simultaneously and read off the coefficient for the variable of interest.
In the partialling‑out approach, we first remove the influence of the control variables from both the outcome and the variable whose effect we want to estimate (by regressing each on the controls and taking residuals). We then regress these two “cleaned” residuals on each other.
The FWL theorem guarantees that the coefficient from this residual‑on‑residual regression is exactly the same as the coefficient from the full multiple regression.
That is, the effect we estimate in a multivariable model is the predictor’s effect on the part of the outcome that remains after all control variables are accounted for. This may sound like a mathematical curiosity, but it is fundamental for modern causal inference and underpins many causal AI methods.
The Scenario
Let’s work with a realistic business problem. We work in the marketing department and want to understand the impact of paid search ad spend on sales conversion on our website.
However, advertising spend is strongly correlated with other factors that drive sales. We allocated more marketing budget to campaigns with higher historical conversion rates, high organic search volume, and high competitor spend in the same category. All of these factors influence sales regardless of new advertising spend.
The data generation process is depicted in the following graph (DAG = Directed Acyclic Graph), which shows which variables influence spend and conversion.
We aim to isolate the effect of advertisement spend on conversion.
At first glance, the correlation between advertising spend and conversion rate is clear and indisputable. The higher the budget, the higher the conversion rate.
However, even based on the diagram above, we already know that such a correlation is misleading. For example, we can notice that there also exists a correlation between spend and the organic search volume:
How can we untangle all those messy correlations to get to the true effect of the ad spend on our sales?
The Naive Approach
The first idea would be to run the simple regression models of conversions on ad spend. This is a trap that many businesses fall into.
Using the following command, we ask the model to attribute every conversation only to the advertisement budget.
mod_naive = smf.ols("conversions ~ ad_spend", data=df).fit(cov_type="HC1")
As shown below, the naive model inflates the effect of advertising by about 40%! The ad_spend coefficient 0.028 is higher than the one we hard-coded in our simulated dataset.
It happens because high-budget campaigns were assigned to high-traffic, high-performing audiences. The naive model attributes ad spend to conversions that were actually driven by other factors.
This is a very dangerous trap and tricky to discover in practice. In this tutorial, we have the luxury of knowing the real effect. In reality, we have to understand the problem and the data first to avoid jumping to such conclusions too fast.
The FWL Solution
To fix the bias, the best approach is to include all the confounding variables in the regression model. In the formula below, we explicitly tell the model to account for the additional variables.
In this way, we force it to calculate the effect of ad spend while holding the other factors constant.
formula_full = (“conversions ~ ad_spend + organic_search_volume + competitor_spend_estimate " "+ day_of_week_index + past_conversion_rate" )
mod_full = smf.ols(formula_full, data=df).fit(cov_type="HC1")
The results obtained this way confirm the theory. As the coefficients table presents, the estimate for ad_spend has dropped from 0.028 to 0.0196. Which is practically identical to the causal effect of 0.020. And the 95% confidence interval encloses the true value.
We quickly solved the bias problem by including all confounding variables in the model. This method is very effective, but it is often treated as a black box. To fully understand how the model isolates the signal, we can turn to the FWL Theory.
The FWL Solution
Instead of relying solely on the standard regression model, the Frisch-Waugh-Lovell Theorem shows exactly how confounders are removed from the estimated effect of the parameter of interest.
The FWL Theorem can be applied in three steps, each of which removes bias and moves us closer to the desired parameter.
Step 1 – Remove confounders from the outcome
To start, we need to isolate the variation in conversions that cannot be explained by confounding variables. We can achieve it by regressing the outcome variable (conversions) on all confounding variables.
control_vars = [ "organic_search_volume", "competitor_spend_estimate", "day_of_week_index", "past_conversion_rate", ]
Z = df[control_vars]
Z_const = sm.add_constant(Z)
y_vec = df["conversions"].values fit_y_on_Z = sm.OLS(y_vec, Z_const).fit()
y_t = y_vec - fit_y_on_Z.fittedvalues
The residuals (y_t) represent the part of the outcome variable that is uncorrelated with the confounders. In other words, this is the part of the website’s sales that isn’t accounted for by all the included factors. Technically speaking, y_t is orthogonal to the confounders, meaning it’s independent from them.
After this step, we are left with the variation in conversions that cannot be predicted by what is available to us. We effectively removed the effects of other factors on the outcome.
Step 2 – Remove confounders from the treatment
Next, we need to perform the same operation as above on the treatment variable. We will regress ad_spend on the same set of control variables.
The question here is: how much of the variation in advertising spend can be explained by the confounders?
x_vec = df["ad_spend"].values
fit_x_on_Z = sm.OLS(x_vec, Z_const).fit()
x_t = x_vec - fit_x_on_Z.fittedvalues
The obtained residuals, x_t, capture the variation in ad spend that is independent of the control variables. We can think of it as part of the advertising budget that wasn’t determined by external factors.
Step 3 – Regress cleaned outcome on cleaned treatment
As the final step, we will run a simple regression of the two cleaned (residualised) variables. By design, both parts are now independent of the confounders.
X_t = sm.add_constant(x_t)
mod_fwl = sm.OLS(y_t, X_t).fit(cov_type="HC1")
We immediately see that the resulting coefficient is identical to that obtained from the full regression model. It confirms that controlling for all the confounding variables is equivalent to filtering out their influence from both the outcome and treatment variables.
By removing the predictable part of both conversions and advertisement spend, we have isolated the variation that matters for obtaining a causal effect. The residual-on-residual regression examines whether spending an extra euro on advertising for campaigns with identical external conditions results in more conversions.
And in our case, the answer is yes. Additional advertising spending results in approximately 0.02 additional conversions.
Why It Matters
While running a three-step regression to solve a simple problem might seem like overkill, it provides a deeper understanding of how causal inference works.
We can now understand multivariable regression as a filtering or partialling-out process. We can estimate the causal effect of the treatment variable by controlling for the confounders’ effects on both the treatment and the outcome. What remains is the effect of the treatment on the outcome.
But this concept goes far beyond advances in understanding the regression model. After all, FWL shows us how to obtain the same results we already have with a standard regression model.
However, the logic of residualisation, using one model to ‘clean’ the treatment from the influence of confounders and another to do the same for the outcome, is the engine behind advanced machine learning models applied to the causal inference – Double Machine Learning or even Causal AI.
In Causal AI, we can replace simple linear regressions with more powerful machine learning models to uncover complex nonlinear effects. Hence, understanding FWL is not only an academic exercise but also the first step toward building sophisticated causal models that can navigate the complex world of real data.