My notes for reading Causal Inference and Data Fusion in Econometrics, by Paul Hünermund and Elias Bareinboim.

Formulation

Definition 1 (Structural causal model; Pearl, 2009). A structural causal model (SCM) is a 4-tuple $\mathcal{M}=\langle U,V,\mathcal{F},p\rangle$ where

  • $U$ is a set of exogenous variables that are determined by factors outside the model.
  • $V = \lbrace V_1, \cdots , V_n\rbrace$ is a set of endogenous variables that are determined by factors in the model.
  • $\mathcal{F}$ is a set of functions $\lbrace f_1, \cdots , f_n\rbrace$ such that each $f_i$ is a mapping from (the respective domains of) $U_i\cup Pa_i$ to $V_i,$ where $U_i\subseteq U$ is some subset of $U$ whose elements are related to the variation of $V_i,$ $Pa_i\subseteq V\backslash V_i$ includes the parents of $V_i$ in the underlying causal DAG, and entire set of $F$ forms a mapping from $U$ to $V.$ In other words, $f_i$ assigns a value to the corresponding $V_i\in V,\ v_i\gets f(u_i,pa_i).$
  • $p$ is a probability distribution defined over the domain of $U,$ which creates variation in the endogenous variables $V$.

Every SCM $\mathcal{M}$ defines a directed acyclic graph (DAG) $\mathcal{G}(\mathcal{M}) = (V,E)$ (denoted as $\mathcal{G}$ for simplicity). Node in $\mathcal{G}$ correspond to endogenous variables in $V,$ and directed edges $E$ emit from the set of parent nodes $Pa_i$ towards $V_i.$ This causal DAG specifies the causal mechanism according to which the data are generated.

Remark. In a fully specfied SCM, any counterfactual quantity can be immediately constructed explicitly by variables from the underlying causal mechanism. These quantities are taken as axiomatic primitves in the potential outcome framework.

Definition 2 (Intervention). Interventions in structural causal models are carried out by removing individual functions in $F$ from the model and fixing their left-hand side variables at some constant value. Mathematically, such an action is denoted by the operator $\mathrm{do}(\cdot).$

Analytical Example. Consider the directed acyclic graphs representing structural causal models.

The structural causal model $\mathcal{M}$ underlying in (a) can be formulated as

$$\begin{align} Z &\gets f_Z(U_Z)\\ X &\gets f_X(Z, U_X)\\ Y &\gets f_Y(X,Z,U_Y). \end{align} \tag{1}$$

Naturally, we can explicitly write the counterfactual $Y_x$ as

\[Y_x \gets f_Y(x,Z,U_Y), \tag{2}\]

which would specify the value of $Y$ if $X$ had been $x$.

Now if we intervene on $X$ so that $X=x_0,$ then we can get another SCM $\mathcal{M}_{x_0}$ underlying in (b):

$$\begin{align} Z &\gets f_Z(U_Z)\\ X &\gets x_0\\ Y &\gets f_Y(X,Z,U_Y). \end{align} \tag{3}$$

Furthermore, the post-intervention distribution of $Y$ can also be denoted in counterfactual notation as

\[{p}(Y\in A\ |\ \mathrm{do}(X=x_0)) = {p}(Y_{x_0}\in A), \tag{4}\]

where $A$ is any measurable set in the domain of $Y.$

Given interventions $\mathrm{do}(X=x_0)$ (control) and $\mathrm{do}(X=x_1)$ (treatment), any difference between two probability distributions associated with $\mathcal{M}_ {x_0}$ and $\mathcal{M}_ {x_1}$ captures the variation of $Y$ resulted by the causal effect of $\Delta x = x_1-x_0.$

Definition 3 (Observational identifiability). Let $Q(\mathcal{M})$ be any computable quantity of a model $\mathcal{M}$. Then $Q$ is identifiable from distribution $p(v)$ compatible with a causal DAG $\mathcal{G}$, if for any two (fully specified) model $\mathcal{M}_0$ and $\mathcal{M}_1$ that satisfy the assumptions of $\mathcal{G}$, we have

\[p_0(v) = p_1(v)\ \Rightarrow\ Q(\mathcal{M}_0)=Q(\mathcal{M}_1). \tag{5}\]

This definition states that for any two unobserved SCMs $\mathcal{M}_0$ and $\mathcal{M}_1$ with their induced distributions $p_0(v)$ and $p_1(v)$ coincided, then they provide the same answers to query $Q$.

Remark. If $Q$ is identifiable, then it depends only on $p(v)$ and the assumptions in $\mathcal{G}$, and furthermore, can be uniquely expressed in terms of the observed distribution.

Confounding Bias

Notations. Given a causal DAG $\mathcal{G}$ includes node set $X$. We use $\mathcal{G}_ {\underline{X}}$ to denote the graph obtained by removing all edges emitted by $X$ in $\mathcal{G}.$ We use $\mathcal{G}_ {\overline{X}}$ to denote the graph obtained by removing all edges pointed towards $X$ in $\mathcal{G}.$

Backdoor criterion

Definition 4 (Admissible Sets). Given a causal DAG $\mathcal{G}$ with treatment variable set $X$ and outcome variable set $Y$ assigned, a set $Z$ is called backdoor-admissible if it blocks every path between $X$ and $Y$ in graph $\mathcal{G}_{\underline{X}}.$

The following theorem states that the causal effect of $X$ on $Y$ can be estimated by adjustment if a backdoor-admissible set exists.

Theorem 1 (Backdoor adjustment criterion). If a set of variables $Z$ satisfies the backdoor criterion relative to $(X,Y),$ then the causal effect of $X$ on $Y$ can be identified from observational data by adjustment formula:

\[{p}(Y\in A\ |\ \mathrm{do}(X=x)) = \int {p}(Y\in A\ |\ X=x,Z=z)\mathrm{d}\mu(z),\tag{6}\]

here $\mu(z)$ is a probability measure over the domain of $Z$ induced by $\mathcal{G}$, In discrete case, we can write (6) as

\[{p}(Y\in A\ |\ \mathrm{do}(X=x)) = \sum_z {p}(Y\in A\ |\ X=x,Z=z)\cdot {p}(Z=z).\tag{7}\]

Remark. This conclusion has a counterfactual interpretation: For all $x$ in the domain of $X,$ the counterfactual $Y_x$ is conditionally independent of $X$ given $Z,$ i.e. $Y_x\perp X\ \vert\ Z.$

Frontdoor criterion

Definition 5 (Conditional frontdoor criterion). In causal DAG $\mathcal{G}$ set of variables $M$ is said to satisfy the conditional frontdoor criterion relative to a triplet $(X, Y, W)$ if:

  • $M$ intercepts all directed paths from $X$ to $Y$;
  • There is no unblocked backdoor path from $X$ to $M$ (paths in $\mathcal{G}_{\underline{X}}$) given $W$;
  • All the backdoor paths from $M$ to $Y$ are blocked by $\lbrace X, W\rbrace$.

An example is given below.

Theorem 2 (Conditional frontdoor adjustment). If a set $M$ satisfies the conditional frontdoor criterion related to $(X,Y,W),$ then the causal effect of $X$ on $Y$ can be identified by the frontdoor formula

\[\begin{align}{p}(Y\in A\ |\ \mathrm{do}(X=x)) &= \sum_w\sum_m {p}(M=m\ |\ X=x, w){p}(W=w)\\ &\times \sum_{x'}{p}(Y\in A\ |\ X=x',m,w){p}(X=x'|w)\end{align}\tag{8}\]

Remark. The frontdoor adjustment can be seen as a sequential application of the backdoor criterion.

Symbolic engine: do-calculus

Let $X, Y, Z$ and $W$ be arbitrary disjoint sets of nodes in $\mathcal{G}$. We use $p$ to stand for probabilisty distribution. Do-calculus states the following three rules:

Rule 1 (Insertion/deletion of observations).

$$p(y\vert \mathrm{do}(x), z, w) = p(y\vert\mathrm{do}(x),w)\ \ \ \textsf{if}\ Y\perp Z\vert X,W\ \ \textsf{in}\ {\mathcal{G}_{\overline{X}}};$$

Rule 2 (Action/observation exchange).

$$p(y\vert \mathrm{do}(x),\mathrm{do}(z), w) = p(y\vert\mathrm{do}(x),z, w)\ \ \ \textsf{if}\ Y\perp Z\vert X,W\ \ \textsf{in}\ {\mathcal{G}_{\overline{X},\underline{Z}}};$$

Rule 3 (Insertion/deletion of actions).

$$p(y\vert \mathrm{do}(x),\mathrm{do}(z), w) = p(y\vert\mathrm{do}(x), w)\ \ \ \textsf{if}\ Y\perp Z\vert X,W\ \ \textsf{in}\ {\mathcal{G}_{\overline{X},\overline{Z(W)}}},$$

where $Z(W)$ denotes the set of $Z$-nodes that are not ancestors of any $W$-nodes in $\mathcal{G}_{\overline{X}}$.

To establish identifiability of a query $Q$, one needs to repeatedly apply the rules of do-calculus to $Q$ until the final expression no longer contains a do-operator.

Remark.

  • Do-calculus was proved to be sound and complete for general queries of the form $Q = P(y\vert do(x), z)$ from a combination of observational and experimental data. Soundness means that an answer returned by do-calculus is correct. Completeness means that do-calculus is guaranteed to return a solution for the identification problem whenever such a solution exists.
  • By applying rule 2, we immediately obtain backdoor criterion: if $Z$ is a backdoor-admissable set for $(X,Y),$ then $Y\perp X\vert Z$ in $\mathcal{G}_{\underline{X}},$ $p(y\vert \mathrm{do}(x),z) = p(y\vert x,z).$

Warm-up: Proof of frontdoor criterion

Let’s revisit the causal DAG proposed above. Denote $W=\lbrace W_0,W_1,W_2\rbrace,$ set $M$ satisfies the conditional frontdoor criterion related to $(X,Y,W).$

Apply the law of total probability, we have

$$p(y|\mathrm{do}(x),w) = \sum_m p(y|\mathrm{do}(x),m,w)p(m|\mathrm{do}(x), w).$$

By rule 2, because $Y\perp M\vert X, W$ in $\mathcal{G}_ {\overline{X},\underline{M}}$ and $M\perp X\vert W$ in $\mathcal{G}_ {\underline{X}}$ we have

$$\begin{align} p(y|\mathrm{do}(x),m,w) &= p(y|\mathrm{do}(x),\mathrm{do}(m),w),\\ p(m\vert \mathrm{do}(x), w) &= p(m\vert x,w).\end{align}$$

By rule 3, because $Y\perp X\vert M, W$ in $\mathcal{G}_ {\overline{M},\overline{X(W)}}$ and $X\perp M\vert W$ in $\mathcal{G}_ {\overline{M(W)}}$ we have

$$\begin{align} p(y|\mathrm{do}(x),\mathrm{do}(m),w) &= p(y|\mathrm{do}(m),w)\\ &= \sum_{x'}p(y|\mathrm{do}(m),x',w)p(x'|\mathrm{do}(m), w)\\ &= \sum_{x'}p(y|\mathrm{do}(m),x',w)p(x'| w). \end{align}$$

By rule 2, because $Y\perp M\vert X, W$ in $\mathcal{G}_ {\underline{M}}$, we have

$$p(y|\mathrm{do}(m),x',w) = p(y|m,x',w)$$

then

$$\begin{align} p(y|\mathrm{do}(x),w) &= \sum_m p(y|\mathrm{do}(x),m,w)p(m|\mathrm{do}(x), w)\\ &= \sum_m p(m|x, w)\sum_{x'} p(y|x',m,w)p(x'|w),\\ p(y|\mathrm{do}(x)) &= \sum_{w}p(y|\mathrm{do}(x),w)p(w)\\ &= \sum_{w}\sum_m p(m|x,w)p(w)\sum_{x'} p(y|x',m,w)p(x'|w). \end{align}$$

Surrogate Experiments

In practice, identification of causal queries based on observational data alone often remains an unattainable goal. Meanwhile, conducting a randomized control trial (RCT) for the treatment of interest might likewise be infeasible due to cost, ethical, or technical considerations. In such cases, a frequently applied strategy is to make use of surrogate experiments involving a third variable, which is only proximately linked to the treatment but more easily manipulable.

Definition 6 ($Z$-identifibility; Bareinboim and Pearl, 2012). Let $X, Y, Z$ be disjoint sets of variables, and let $\mathcal{G}$ be the causal diagram. The causal effect of an action $\mathrm{do}(X=x)$ on $Y$ is said to be $z$-identifiable from $p$ in $\mathcal{G},$ if $p(y\vert\mathrm{do}(x))$ is uniquely computable from $p(V)$ together with the interventional distributions $p(V\backslash Z’\vert\mathrm{do}(Z’))\ \forall Z’\subseteq Z,$ in any model that induces $\mathcal{G}.$

Then, the $z$-identification task can be solved in a similar fashion to the standard identification problem.

Theorem 3 (Necessary and sufficient condition of $z$-identification; Bareinboim and Pearl, 2012) Let $X, Y, Z$ be disjoint sets of variables and $\mathcal{G}$ be the causal diagram. $Q=p(y\vert\mathrm{do}(x))$ is $z$-identifiable from $p$ in $\mathcal{G}$ if the expression $Q$ is reducible, using the rules of do-calculus, to an expression in which only elements of $Z$ may appear as interventional variables.

A weaker condition of $z$-identifibility, which is sufficient but not necessary, is given below.

Theorem 4 (Sufficient condition of $z$-identification; Bareinboim and Pearl, 2012). $Q=p(y\vert\mathrm{do}(x))$ is $z$-identifiable from $p$ in $\mathcal{G}$ if one of the following conditions hold:

(a) $Q$ is identifiable in $\mathcal{G},$ or

(b) There exists $Z’ \subseteq Z$ such that the following conditions hold,

  • $X$ intercepts all directed paths from $Z’$ to $Y$ , and
  • $Q$ is identifiable in $\mathcal{G}_{\overline{Z’}}$.

Remark. The condition (b) states two things:

  • $Z$ has no direct effect on $Y,$ and all directed paths from $Z$ to $Y$ are blocked by $X,$ hence $p(y\vert \mathrm{do}(x),\mathrm{do}(z) = p(y\vert\mathrm{do}(x)).$
  • $X$ can be identified in the post-intervention DAG $\mathcal{G}_{\overline{Z}}$ by removing $\mathrm{do}(x)$ in the previous expression.

An illustrative example is given below.

In the left causal DAG, $p(y\vert\mathrm{do}(x))$ is unidentifiable. However, if we can manipulate on $Z$ to produce $\mathcal{G}_{\overline{Z}}$ in the right, then

$$\begin{align} p(y\vert\mathrm{do}(x)) &= p(y\vert\mathrm{do}(x),\mathrm{do}(z))\\ &= \sum_{w_1}\sum_{w_2} p(y\vert\mathrm{do}(x, z), w_1, w_2)p(w_1, w_2|\mathrm{do}(x),\mathrm{do}(z))\\ &= \sum_{w_1}\sum_{w_2} p(y\vert\mathrm{do}(z), X=x, w_1, w_2)p(w_1, w_2). \end{align}$$

References

  • Bareinboim, E. and J. Pearl (2012). Causal inference by surrogate experiments: z-identifiability. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, pp. 113–120.
  • Paul, H. and E. Bareinboim (2023). Causal Inference and Data Fusion in Econometrics. arXiv preprint arXiv:1912.09104.
  • Pearl, J. (1995). Causal diagrams for empirical research. Biometrika 82, 669–709.
  • Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). New York, NY: Cambridge University Press.

<
Previous Post
Spatial Statistics (VII): Karhunen-Loève Expansion
>
Next Post
Structural Causal Models (II): Selection Bias and Transportability