Causal InferenceMarketing AnalyticsA/B TestingStatistics

Marketing Analytics · 11 min read

Causal Inference for Marketing Analysts: Why Prediction Models Won't Tell You If Your Campaigns Work

Marketing analytics is dominated by predictive thinking — propensity scores, churn models, LTV forecasts. Useful, but they can't tell you whether your discount caused a purchase or just correlated with one. Here's the framework you're missing.

April 3, 2026 · Martin Guzman

Marketing analytics is dominated by prediction. Propensity models, churn scores, LTV forecasts — all powerful, all built on the same fundamental idea: given a set of features $X$ , estimate $\mathbb{E}[Y \mid X]$ . But prediction answers a fundamentally different question than the one your business is actually asking.

Your business doesn't want to know who will churn. It wants to know whether sending a retention offer will stop them from churning. That's not a prediction problem. That's a causal one.

The Prediction Trap

When a data scientist builds a model, the training loop is familiar: define a target $Y$ , collect features $X$ , minimize a loss function, ship a model. The mental model underneath is statistical association. The model learns that customers with certain characteristics tend to behave in certain ways, and it uses that pattern to predict.

This works remarkably well for what it is. Gradient boosting models can predict next-month churn with high precision. Neural networks can rank which customers are most likely to respond to an email. The problem is not that these models are wrong — it's that they are answering a different question than the one marketing actually needs answered.

The question marketing needs answered is: what is the effect of this intervention?

Does this discount cause an incremental purchase, or does it just reach customers who were going to buy anyway?
Does this email campaign cause retention, or does it just land in the inboxes of customers who were never at risk?
Does this price change cause a drop in revenue, or does revenue decline for unrelated seasonal reasons?

A predictive model trained on observational data cannot answer these questions — not because it's inaccurate, but because it was never designed to.

Association Is Not Causation (But Sometimes It Is)

Your statistics teacher spent years warning you that association is not causation. They were right, but the warning is incomplete. The full picture is more interesting.

Sometimes association is causation. You know from experience that drinking four glasses of wine causes a headache the next morning. You didn't run a randomized controlled trial — you inferred causality from repeated observation, controlling for confounders in your head. The association between wine and headaches is causal, and acting on it is rational.

Other times, association is a trap. Chocolate consumption per capita is strongly correlated with the number of Nobel Prizes a country produces. No one concludes that eating more chocolate will win you a Nobel. The two variables share a common cause — wealth and economic development drive both — but there is no causal link between them.

Marketing data is full of the second kind of association. High-discount customers often show higher purchase rates. Does that mean discounts drive purchases? Not necessarily. It might mean that marketing operations tend to target customers who were already likely to buy. The association is real; the causal inference is wrong.

Causal inference is the discipline of distinguishing the two — of understanding when and why association diverges from causation, and what you need to do to recover the causal signal from observational data.

A Concrete Problem: Discounting and Profit

Consider an e-commerce company trying to decide whether discounts are worth it. The intuition is straightforward: discounts boost sales volume, but they also directly reduce revenue per transaction. The question is whether the volume increase is large enough to compensate for the margin hit.

Formally, the company models customer-level profitability as:

\text{Profits}_i = \text{Sales}_i \times 0.05 - \text{Discount}_i

The 5% margin means that for every dollar of sales, the company keeps five cents. A discount of $D$ dollars costs $D$ dollars in direct margin. The discount is only worth it if it causes enough incremental sales to recover that cost.

This is not a prediction problem. The company is not trying to forecast profits for a given discount level. It wants to know the causal effect of discounting on profits — what happens to $\text{Profits}_i$ when you intervene and set $\text{Discount}_i$ to some value, compared to not discounting at all.

The difference is subtle but critical. Predicting profits given discount level uses the statistical relationship in the data. Estimating the causal effect requires understanding what would have happened counterfactually — had the discount been different, everything else equal.

Why Your Historical Data Is Lying to You

Here is where things get uncomfortable. If you take your historical transaction data, fit a regression of profits on discounts, and read off the coefficient, you will almost certainly get a biased estimate of the causal effect. Possibly a badly biased one.

The reason is confounding bias. In historical data, discounts are not distributed randomly. Marketing operations give larger discounts to customers who are:

Less likely to buy at full price
At higher churn risk
In lower-value segments
Targeted by specific campaigns with their own selection logic

This means that the customers receiving large discounts are systematically different from customers receiving small discounts, along dimensions that also affect profitability. When you regress profits on discounts without accounting for this, you are confounding the effect of the discount with the pre-existing characteristics of the customers who received it.

Formally, let $D_i$ be the discount and $Y_i$ be profits. What you observe in historical data is:

\mathbb{E}[Y_i \mid D_i = d]

What you want is the causal effect:

\mathbb{E}[Y_i(d) - Y_i(0)]

where $Y_i(d)$ is the potential outcome — the profit customer $i$ would have generated had they received discount $d$ , regardless of what actually happened. These two quantities are only equal when $D_i$ is independent of $Y_i(0)$ — that is, when discounts are assigned without regard to any characteristic that also affects the outcome. In historical marketing data, that condition is almost never satisfied.

The gap between what you observe and what you want to know has a name: confounding bias. And it is the central challenge of causal inference from observational data.

The Gold Standard and Its Limitations

The cleanest solution to confounding bias is a randomized controlled trial (RCT) — what marketing calls an A/B test. Randomly assign some customers to receive a discount and others not to. Because assignment is random, the treatment group and control group are statistically identical on all dimensions — observed and unobserved. Any difference in outcomes is causally attributable to the discount.

Formally, randomization ensures:

D_i \perp\!\!\!\perp \{Y_i(0), Y_i(1)\}

The treatment is independent of the potential outcomes. This means the observed difference in means is an unbiased estimator of the average treatment effect (ATE):

\hat{\tau} = \mathbb{E}[Y_i \mid D_i = 1] - \mathbb{E}[Y_i \mid D_i = 0]

In practice, this is exactly what a two-sample t-test on your A/B test results is computing.

The problem is that RCTs are expensive, slow, and not always feasible. Running a proper discount experiment requires withholding discounts from a control group — a decision with real revenue implications and organizational friction. Some experiments take months to accumulate enough data to detect a meaningful effect. Others are blocked by legal, operational, or ethical constraints.

And so, most marketing teams operate in a world where they want causal answers but only have observational data. That is precisely where the rest of causal inference methodology lives.

Key Concepts Every Marketing Analyst Should Know

Before reaching for the toolkit, you need the vocabulary.

Potential outcomes. For each customer $i$ , define $Y_i(1)$ as the profit they would generate if given a discount, and $Y_i(0)$ as the profit they would generate without one. Only one of these is ever observed — the one corresponding to the treatment they actually received. The other is the counterfactual. Causal inference is, at its core, a missing data problem: you are trying to impute the counterfactual for every customer.

Average treatment effect (ATE). The quantity most marketing teams actually want:

\tau = \mathbb{E}[Y_i(1) - Y_i(0)]

The average, across all customers, of the individual causal effect. This is what you are estimating when you run an A/B test.

Confounders. Variables $W$ that affect both the treatment assignment $D$ and the outcome $Y$ . Customer purchase history, for example, affects both the likelihood of receiving a discount (marketing targets high-LTV customers) and the baseline level of profits (high-LTV customers generate more revenue). Failing to control for confounders means your estimate of $\tau$ is biased.

Causal graphs. A directed acyclic graph (DAG) where nodes are variables and edges represent causal relationships. In our discounting example, the DAG has edges from purchase history $W$ to both $D$ (marketing uses history to assign discounts) and $Y$ (history directly affects profitability). The graph makes the confounding structure explicit and tells you what you need to control for.

Conditional ignorability. The key assumption that makes observational causal inference possible:

\{Y_i(0), Y_i(1)\} \perp\!\!\!\perp D_i \mid W_i

Once you condition on the observed confounders $W_i$ , treatment assignment is as good as random. This is the assumption that justifies regression adjustment, propensity score methods, and most other observational techniques. It is an assumption — you cannot test it directly — but it is made plausible by having a rich set of pre-treatment covariates.

Adjusting for Bias with Linear Regression

The most direct method for recovering a causal estimate from observational data under conditional ignorability is linear regression with covariate adjustment. The idea is to control for confounders $W$ so that the coefficient on $D$ captures only the variation in $Y$ that is orthogonal to the confounding structure.

The specification is:

Y_i = \alpha + \tau D_i + \beta^\top W_i + \varepsilon_i

where $W_i$ is the vector of confounders (customer history, segment, prior purchase frequency, etc.) and $\hat{\tau}$ is the estimated average treatment effect.

The key intuition is that by including $W_i$ in the regression, you are comparing customers who received different discount levels but are otherwise similar on all measured confounders. The regression is doing the covariate adjustment that randomization would have done automatically in an A/B test.

import pandas as pd
import statsmodels.formula.api as smf
 
# df has columns: profits, discount, purchase_history, segment, prior_orders
model = smf.ols(
    'profits ~ discount + purchase_history + C(segment) + prior_orders',
    data=df
).fit()
 
print(model.summary())
# The coefficient on `discount` is your ATE estimate
# Standard errors tell you whether the effect is distinguishable from noise

The coefficient on discount now estimates the causal effect, under the assumption that purchase_history, segment, and prior_orders capture all the confounding. If those three variables explain why certain customers receive larger discounts, and you've included them in the model, the remaining variation in discount assignment is effectively random.

Coming back to the e-commerce problem: if $\hat{\tau} > 0$ , discounts are worth it — the incremental sales they generate more than offset the direct margin hit. If $\hat{\tau} \leq 0$ , the company is subsidizing purchases that would have happened anyway.

The Assumption You Cannot Ignore

Regression adjustment is not magic. It rests on conditional ignorability — the assumption that you have measured and included all the relevant confounders. If there is an unmeasured variable that affects both discount assignment and profits, your estimate remains biased.

This is why domain knowledge matters as much as methodology. Building the right causal model requires understanding why discounts get assigned in your organization — what rules, what signals, what human judgment calls drive the process. Every factor in that process that also affects profitability needs to be in your adjustment set.

When you suspect unmeasured confounding, the answer is not to abandon the analysis — it is to reach for stronger identification strategies: instrumental variables, difference-in-differences, regression discontinuity designs. Each one makes a different structural assumption to recover the causal effect when simple conditioning is not enough.

But before reaching for those methods, the regression framework is where you start: state the causal question precisely, draw the DAG, identify the confounders, adjust for them, and read the coefficient. That sequence alone puts you ahead of most marketing analytics being done today.