Structural Causal Models (II): Selection Bias and Transportability

My notes for reading Causal Inference and Data Fusion in Econometrics, by Paul Hünermund and Elias Bareinboim.

Selection Bias

Nonrandom sampling compromises the representativeness of the collected data for the underlying population, which can explain away the conclusions of both statistical and causal inference.

Augmented Causal DAG

To capture the sample selection mechanism, a dichotomous variable $S$ is introduced to indicate whether a unit is part of the sample. Endogenous variables that affect the sampling probability will be connected to $S$ with directed edges. (Here we do not consider the cases where $S$ emits arrows.) The augmented DAG $\mathcal{G}$ is denoted as $\mathcal{G}_S.$

An oversimplified example is given below.

After augmentation, we need to simultaneously control for confounding and selection bias in do-calculus. Not only is it necessary to transform interventional distributions into do-free expressions, but the probabilities that make up these expressions now also need to be conditional on $S = 1$, because that is all the analyst is able to observe.

Recoverability

We first give the necessary and sufficient condition of recoverability in do-free expressions.

Theorem 1 (Bareinboim et al., 2014). The conditional distribution $p(y\vert t)$ is recoverable from $\mathcal{G}_ S,$ as $p(y\vert t, S = 1),$ if and only if $(Y\perp S \vert T).$

Combining Theorem 1 with do-calculus suggests a straightforward strategy for recovering do-expressions from selection bias:

Corollary 1 (Bareinboim and Tian, 2015). The causal effect $Q = p(y\vert\mathrm{do}(x))$ is recoverable from selection-biased data, i.e. $p(v\vert S = 1),$ if using the rules of the do-calculus, $Q$ is reducible to a do-free expression, and recoverability is determined by Theorem 1.

Remark. Corollary 1 is not a necessary condition for recovering conditional probabilities with respect to do-expressions.

Analytical Example. Consider the following two causal DAGs below.

In causal DAG (a), controlling for $Z$ suffices to eliminate confounding bias since it satisfies the backddor criterion. Note that $Z\perp S$ and $Y\perp S\vert \lbrace Z,X\rbrace,$ $p(y\vert\mathrm{do}(x))$ is recoverable via

$$\begin{align} p(y|\mathrm{do}(x)) &= \sum_{z}p(y|x,z)p(z)\\ &= \sum_z p(y|x,z,S=1)p(z,S=1). \end{align}$$

In causal DAG (b), we need to adjust for both $W$ and $Z$ to eliminate confounding between $X$ and $Y$. However, $W$ and $S$ are not d-seperated, hence we cannot apply Corollary 1 to recover $p(y\vert\mathrm{do}(x)).$ But, $p(y\vert\mathrm{do}(x))$ is still recoverable via

$$\begin{align} p(y|\mathrm{do}(x)) &= p(y|\mathrm{do}(x), w, S=1) \\ &= \sum_z p(y|\mathrm{do}(x), z, w, S=1)p(z|\mathrm{do}(x), w, S=1)\\ &= \sum_z p(y|x, z, w, S=1)p(z|w, S=1); \end{align}$$

The first equality holds by do-calculus rule 1, $Y\vert \lbrace W,S\rbrace \vert X$ in $\mathcal{G}_{\overline{X}}$;

The second equality holds by low of total probability;

The third equality holds by do-calculus rule 2 and 3, $Y\perp X \vert \lbrace Z,W,S \rbrace$ in $\mathcal{G}_ {\underline{X}},$ and $Z\perp X\vert \lbrace W,S\rbrace$ in $\mathcal{G}_ {\overline{Z(W,S)}}.$

The following lemma is a straightforward conclusion of Theorem 1.

Lemma 1. The distribution $p(v_i\vert pa_i)$ is recoverable if and only if $V_i$ is not an ancestor of $S.$ When recoverable, $p(v_i\vert pa_i) = p(v_i\vert pa_i, S=1).$

Then, the necessary and sufficient condition of recovering a causal effect from selection bias is given below.

Theorem 2 (Bareinboim and Tian, 2015). The causal effect $Q = p(y\vert\mathrm{do}(x))$ is recoverable from selection-biased data, if and only if for every $V_i\in D,$ where $D=An(Y)_ {\mathcal{G}_ {V\backslash X}},$ $V_i$ is not an ancestor of $S$. When recoverable, $p(y\vert\mathrm{do}(x))$ is given by

$$p(y\vert\mathrm{do}(x)) = \sum_{\sigma(D\backslash Y)}\prod_{i:V_i\in D}p(v_i\vert pa_i, S=1), \tag{1}$$

$D=An(Y)_ {\mathcal{G}_ {V\backslash X}}$ denotes the ancestors of $Y$ (including itself) in graph $\mathcal{G}_ {V\backslash X}$ obtained by removing all nodes in $X$ and their incoming and outgoing edges, and $\sigma(D\backslash Y)$ denotes the correpsonding (joint) domain of variables in $D\backslash Y.$

We only concentrate on the if statement. Let $C_X = V\backslash X,$ we have

$$p(y\vert\mathrm{do}(x)) = \sum_{\sigma(C_X\backslash Y)}p(c'\vert\mathrm{do}(x)) = \sum_{\sigma(C_X\backslash Y)}\prod_{i:V_i\in C_X} p(v_i\vert pa_i), \tag{2}$$

where the first equality holds by the law of total probability, and the second is from Markov factorization.

For all $V_i\in C_X\backslash D,$ i.e. $V_i$ is not an ancestor of $Y,$ it can be summed out:

$$\begin{align} \sum_{\sigma(C_X\backslash Y)}\prod_{i:V_i\in C_X} p(v_i\vert pa_i) &= \sum_{\sigma(D\backslash Y)}\sum_{\sigma(C_X\backslash D)}\prod_{i:V_i\in D} p(v_i\vert pa_i)\prod_{j:V_j\in C_X\backslash D} p(v_j\vert pa_j)\\ &= \left[\sum_{\sigma(D\backslash Y)}\prod_{i:V_i\in D}p(v_i\vert pa_i)\right] \left[\sum_{\sigma(C_X\backslash D)}\prod_{j:V_j\in C_X\backslash D} p(v_j\vert pa_j)\right]\\ &= \sum_{\sigma(D\backslash Y)}\prod_{i:V_i\in D}p(v_i\vert pa_i)\\ &= \sum_{\sigma(D\backslash Y)}\prod_{i:V_i\in D}p(v_i\vert pa_i,S=1), \end{align} \tag{3}$$

where the last equality holds by Lemma 1.

Remark. Theorem 2 is applicable for Markovian models, where there is no unobserved confounder like $U$ in the previous causal DAG (b). For the case of semi-Markovian models, a general algorithm, which is sound and complete for the task of recoverability, has been introduced in Bareinboim and Tian, 2015.

Combination of biased and unbiased data

In some cases, biased and unbiased data sources are combined, which can facilitate recoverability. For example, in a scientific research, some baseline data like age and gender are collected from census, while smoe other covariates are obtained by mediacal examination which is not feasible to conduct on whole population.

Definition 1 (Selection backdoor criterion) Let $Z$ be a set of variables which can be partitioned to $Z^+ \biguplus Z^-$ such that $Z^+$ contains all non-descendents of $X$ and $Z^-$ the descendents of $X$. Let $\mathcal{G}_S$ denote the DAG that includes sampling mechanism $S$. $Z$ is said to be s-backdoor admissible, if it satisfies the following conditions:

$Z^+$ blocks all backdoor paths from $X$ to $Y$ in $\mathcal{G}_S$;
$X$ and $Z^+$ block all paths between $Z^-$ and $Y$ in $\mathcal{G}_S,$ i.e. $Z^-\perp Y\vert \lbrace X,Z^+\rbrace$;
$X$ and $Z$ block all paths between $S$ and $Y$ in $\mathcal{G}_S$, namely, $Y\perp S\vert\lbrace X, Z\rbrace$; and
$Z$ and $Z\cup\lbrace X, Y\rbrace$ are measured in the unbiased and biased studies, respectively.

Theorem 3 (Bareinboim et al., 2014). If $Z$ is s-backdoor admissible, then causal effect $p(y\vert\mathrm{do}(x))$ can be identified by

$$p(y|\mathrm{do}(x)) = \sum_z p(y|x,z,S=1)p(z). \tag{4}$$

Now we see an illustrative example. Consider the causal DAG presented below.

From the backdoor adjustment formula, we have

$$\begin{align} p(y|\mathrm{do}(x)) &= \sum_{z}\sum_{w} p(y|x,z,w)p(z,w)\\ &= \sum_{z}\sum_{w} p(y|x,z,w,S=1)p(z,w),\end{align}$$

where the last equality follows from Theorem 1.

Transportability

In real scenario, experiments and observational studies are often conducted in different contexts, where the target population can vary structurally in important ways. Hence, to extrapolate causal knowledge across heterogeneous domains is highly challenging. In this section we discuss the conditions under which the validity of transferring causal knowledge across different domains can be guaranteed.

Practically, we often implicitly assume that an experimental result obtained in a population $\Pi$ provides at least a good approximation for the impact of the same intervention in other settings. Equipped with the definition of a selection diagram, we can state the following theorem which allows to transport experimental results obtained in a source $\Pi$ to another target domain $\Pi^*$ where only passive observation is possible. The idea leads to the following definition.

Definition 2 (Transportability; Pearl and Bareinboim, 2011). Given two domains $\Pi$ and $\Pi^{* }$ characterized by shared causal DAG $\mathcal{G}$ and probability distributions $p$ and $p^{* }$, respectively. A causal relation $R$ is said to be transportable from $\Pi$ to $\Pi^{* }$ if $R(\Pi)$ is estimable from the set $I$ of interventions on $\Pi,$ and $\Pi$ is identified from $p,p^{* },I,\mathcal{G}.$

Definition 3 (Direct transportability; Pearl and Bareinboim, 2011). A causal relation $R$ is said to be directly transportable from a domain $\Pi$ to another $\Pi^* ,$ if $R(\Pi) = R(\Pi^*).$

Selection diagram

Generally, two populations or domains that share a causal DAG $\mathcal{G}$ may differ structurally in two ways: the distribution of background factors $p(u),$ or the underlying causal mechanisms $\lbrace f_i\rbrace.$ To account for such differences, we can add a set of selection nodes $S$ in a causal diagram.

Definition 4 (Selection Diagram). Let $\langle \mathcal{M},\mathcal{M}^{* }\rangle$ be a pair of structural causal models relative to domains $\langle \Pi,\Pi^{* }\rangle$ that share a causal DAG $\mathcal{G}$. $\langle \mathcal{M},\mathcal{M}^*\rangle$ is said to induce a selection diagram $\mathcal{D}$ which is constructed as follows:

Every edge in $\mathcal{G}$ has an edge in $\mathcal{D}$.
$\mathcal{D}$ contains an extra edge $S_i\to V_i$ whenever there might exist a discrepancy $f_i\neq f_i^{* }$ or $p(U_i) \neq p^* (U_i)$ between $\mathcal{M}$ and $\mathcal{M}^*.$

In a selection diagram, switching between two populations $\Pi$ and $\Pi^{* }$ is captured by conditioning on different values of $S$. Note that $S$-nodes only have outgoing arrows, whereas selection nodes in the selection bias case only have incoming arrows.

Theorem 4 (Pearl and Bareinboim, 2011). For two population $\Pi$ and $\Pi^{* },$ suppose we have the selection diagram $\mathcal{D}$ with selection variable set $S$. The stratum-specific causal effect $p^{* }(y\vert\mathrm{do}(x),z)$ is transportable from $\Pi$ to $\Pi^{* }$ if $Y$ is d-seperated from $S$ conditional on $Z$ in the $X$-manipulated diagram $\mathcal{D}_ {\overline{X}},$ that is, $Y\perp S\vert Z$ in $\mathcal{D}_ {\overline{X}}.$

Proof. By definition,

\[p^{* }(y|\mathrm{do}(x), z) = p(y|\mathrm{do}(x), z, s).\]

Since $Y\perp S\vert Z$ in $\mathcal{D}_ {\overline{X}},$ we have

\[p(y|\mathrm{do}(x), z, s) = p(y|\mathrm{do}(x), z),\]

which follows from rule 1 of do-calculus.

Definition 5 (S-admissibility). A variable set $Z$ satisfying $Y\perp S\vert Z$ in $\mathcal{D}_ {\overline{X}}$ is said to be S-admissible (with respect to the causal effect of $X$ on $Y$).

Immediately, we can obtain the weighting equation of causal effects:

Corollary 2. The causal effect $p^{* }(y\vert\mathrm{do}(x))$ is transportable from $\Pi$ to $\Pi^{* }$ if there exists an S-admissible set $Z$ of pre-treatment covariates. Moreover, the transport formula is given by the weighting

$$p^{* }(y\vert\mathrm{do}(x)) = \sum_z p(y\vert\mathrm{do}(x), z)p^{*}(z). \tag{5}$$

We illustrate this conclusion through the following selection diagram.

In this selection diagram, $\lbrace Z\rbrace$ is an S-admissible set which d-seperates $S$ and $Y$. Hence we can apply the transportability across domains:

$$\begin{align} p^{* }(y\vert\mathrm{do}(x)) &= p(y\vert\mathrm{do}(x),s) \\ &= \sum_z p(y\vert\mathrm{do}(x), z, s)p(z|\mathrm{do}(x), s)\\ &\quad(\text{by Theorem 4 and do-calculus rule 3 that}\ Z\perp X\vert S\ \text{in}\ \mathcal{G}_{\overline{X(S)}})\\ &= \sum_z p(y\vert\mathrm{do}(x), z)p(z|s)\\ &= \sum_z p(y\vert\mathrm{do}(x), z)p^{*}(z). \end{align}$$

Transportability in more general cases can be estabilished using the following theorem.

Theorem 5 (Pearl and Bareinboim, 2011). For two population $\Pi$ and $\Pi^{* },$ suppose we have the selection diagram $\mathcal{D}$ with selection variable set $S$. The relation $R = p^{* }(y\vert\mathrm{do}(x))$ is transportable from $\Pi$ to $\Pi^{* }$ if the expression $p(y\vert\mathrm{do}(x),s)$ is reducible, using the rules of do-calculus, to an expression in which $S$ appears only as a conditioning variable in do-free terms.

Remark. Theorem 5 is proven to be a necessary and sufficient condition for transporting causal effect estimates across domains. An algorithmic solution of causal effects is developed by Bareinboim and Pearl, 2013.

Now we consider the selection diagram presented below, in which the effect of $X$ on $Y$ is partly transmitted by $Z$. With respect to the effect of $X$ on $Y,$ $\lbrace Z\rbrace$ is an S-admissible set composed of post-treatment variables.

The causal effect of $X$ on $Y$ can be transported as

$$\begin{align} p^{* }(y\vert\mathrm{do}(x)) &= p(y\vert\mathrm{do}(x),s) \\ &= \sum_z p(y\vert\mathrm{do}(x), z, s)p(z|\mathrm{do}(x), s)\\ &\quad(\text{by Theorem 4 and do-calculus rule 2 that}\ Z\perp X\vert S\ \text{in}\ \mathcal{G}_{\underline{X}})\\ &= \sum_z p(y\vert\mathrm{do}(x), z)p(z|x,s)\\ &= \sum_z p(y\vert\mathrm{do}(x), z)p^{*}(z|x). \end{align} \tag{6}$$

When post-treatment variables are S-admissible, we reweight the stratum-specific effects by the conditional distribution $p^{* }(z\vert x)$ in the target population, whereas in the pre-treatment case given by equation (5), we use unconditional distribution $p^{* }(z)$ instead.

$Z$-transportability

By combining the idea of transportability and $z$-identifibility, we can transfer causal knowledge obtained via surrogate experiments. Analogous to Theorem 5, the rules of do-calculus can be used to transfer causal knowledge from surrogate experiments in the following way.

Theorem 6 (Bareinboim and Pearl, 2013a) For two population $\Pi$ and $\Pi^{* },$ suppose we have the selection diagram $\mathcal{D}$ with selection variable set $S$. The relation $R = p^{* }(y\vert\mathrm{do}(x))$ is $z$-transportable from $\Pi$ to $\Pi^{* }$ if and only if the expression $p(y\vert\mathrm{do}(x),s)$ is reducible, using the rules of do-calculus, to an expression in which all do-operators are applied to subsets of $Z$, and the $S$-variables are separated from these do-operators.

Meta-transportability

Transportability techniques are particularly valuable in situations that allow to combine empirical knowledge from multiple source domains.

Fix the target domain $\pi^{* }$. Let $\mathscr{D} = \lbrace\mathcal{D}_1,\cdots,\mathcal{D}_n\rbrace$ be a collection of selection diagrams relative to source domains $\Pi = \lbrace\pi_1,\cdots,\pi_n\rbrace,$ from which the data are collected. While transportability is not realizable in any single diagram, meta-transportability (or $\mu$-transportability, for short) can be feasible.

An illustrative example is presented below. Panel (a) depicts the selection diagram corresponding to source domain $\pi_a,$ while panel (b) refers to $\pi_b.$ Square nodes indicate where discrepancies between the target domain $\pi^{* }$ and the source domains originate.

In diagram (a), there exists no S-admissible set with respect to the effect of $X$ on $Y$ since $Y$ is directly connected to a selection node.

In diagram (b), $Z$ is S-admissible with respect to the effect of $X$ on $Y,$ hence $p^{* }(y\vert\mathrm{do}(x))$ can be expanded as

$$\begin{align} p^{* }(y\vert\mathrm{do}(x)) &= p^{(b)}(y\vert\mathrm{do}(x),s) \\ &= \sum_z p^{(b)}(y\vert\mathrm{do}(x), z, s)p^{(b)}(z|\mathrm{do}(x), s)\\ &= \sum_z p^{(b)}(y\vert\mathrm{do}(x), z)p^{*}(z|\mathrm{do}(x)). \end{align}$$

However, the term $p^{* }(z\vert\mathrm{do}(x))$ is neither transportable from $\pi_b$ since $Z$ is directly connected to a selection node, nor equivalent to $p^{* }(z\vert x)$ as in equation (6) due to the unobserved confounding $U_0$ between $X$ and $Z.$ Hence, transportability is impossible in both diagrams.

If we combine data from $\pi_a$ and $\pi_b,$ then $\mu$-transportability is practicable. Consider diagram (a), denoting $S_a$ as the set of selection nodes, we have $Z\perp S_a$ in $\mathcal{D}^{(a)}_{\overline{X}},$ hence

\[p^*(z|\mathrm{do}(x)) = p^{(a)}(z|\mathrm{do}(x))\]

by Theorem 4. In diagram (b), analogously, that $Y\perp S_b$ in $\mathcal{D}^{(b)}_{\overline{XZ}}$ implies

\[p^*(y|\mathrm{do}(z),\mathrm{do}(x)) = p^{(b)}(y|\mathrm{do}(z),\mathrm{do}(x)).\]

Now back to the post-intervention distribution in the target domain $\pi^{* },$ we have

$$\begin{align} p^{*}(y\vert\mathrm{do}(x)) &= \sum_z p^{*}(y\vert\mathrm{do}(x),z)p^{*}(z\vert\mathrm{do}(x)) \\ &\quad(\text{by do-calculus rule 2 that}\ Y\perp Z\vert X\ \text{in}\ \mathcal{G}_{\overline{X}\underline{Z}})\\ &= \sum_z p^{*}(y\vert\mathrm{do}(x),\mathrm{do}(z))p^{*}(z\vert\mathrm{do}(x)) \\ &= \sum_z p^{(b)}(y\vert\mathrm{do}(x),\mathrm{do}(z))p^{(a)}(z\vert\mathrm{do}(x)). \end{align}$$

While refuting the necessity of multiple pairwise transportability in $\mu$-transportability, this example also provides perspective into the efficiency of combining data from multiple source domains. Detailed formulation and algorithm of $\mu$-transportability can be refered to Bareinboim and Pearl (2013c).

References

Bareinboim, E. and J. Pearl (2013a). Causal transportability with limited experiments. In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI 2013).
Bareinboim, E. and J. Pearl (2013b). A general algorithm for deciding transportability of experimental results. Journal of Causal Inference 1, 107–34.
Bareinboim, E. and J. Pearl (2013c). Meta-transportability of causal effects: A formal approach. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS 2013).
Bareinboim, E. and J. Tian (2015). Recovering causal effects from selection bias. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI 2015).
Bareinboim, E., J. Tian, and J. Pearl (2014). Recovering from selection bias in causal and statistical inference. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI 2014).
Paul, H. and E. Bareinboim (2023). Causal Inference and Data Fusion in Econometrics. arXiv preprint arXiv:1912.09104.
Pearl, J. and E. Bareinboim (2011). Transportability of causal and statistical relations: A formal approach. In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI 2011).
Pearl, J. and E. Bareinboim (2014). External validity: From do-calculus to transportability across populations. Statistical Science 29, 579–95.

Structural Causal Models (I): Formulation and Confounding Bias

Causal Inference: Meta-Transportability