Causal graphs represent the flow of information in the underlying data generating processof a given data set. They comprise a collection of causal relationships between variables in the data set, which dictatehow a given variable causally affects other variables. It is important to note that the causal graph does not define howa specific node is functionally related to its parents; that information is encoded in a structural causal model. Thecausal graph simply shows how information flows from one node (i.e., a variable in a data set) to another. As a result,causal graphs are a fundamental element of Causal AI.The majority of Causal AI tasks, such as causal modeling,counterfactual reasoning, causal effect estimation,root cause analysis,algorithmic recourse, andcausal fairness, rely on an accurate and comprehensive causal graph thatcorrectly reflects the underlying data generating process.
Causal graphs can be discovered from observational data, which is the goal ofcausal discovery. An essential part ofthis discovery process is providing prior knowledge about specific causal relationships, usually formed by experts, whichsubstantially reduces computational time and improves accuracy of causal discovery. There are many types of priorknowledge that can be provided, such as forbidden edges, directed edges, tiers of causal relationships, among others.This type of human-guided causal discovery is a keycomponent in decisionOS.
The cai-causal-graph
package provides a user-friendly implementation of a causal graph class(CausalGraph) that allows you to easily define mixed graphs that can represent varioustypes of causal graphs. See the Types of Causal Graphs section below for informationon different types of causal graphs.
You can find a quickstart to see how to easily build a basic graph, with further details provided inthe Causal Graph documentation page. For a full list of all the classes and methods, please seethe provided reference docs. For example, these are the reference docs for the CausalGraphclass.
A Directed Acyclic Graph (DAG) is the most common type of mixed graph used to represent a causal graph. It hasonly directed edges between nodes (->
) and permits no cycles.
A Completed Partially Directed Acyclic Graph (CPDAG) can contain directed (->
) and undirected (--
) edges.In this case, an undirected edge implies that a causal relationship exists but can point either way, i.e., A -- B
can beresolved to either A -> B
or A <- B
.
A Maximal Ancestral Graph (MAG) can encode all the information that a CPDAG can, but also providesinformation such as whether a latent confounder is likely to exist or selection bias is likely to be present. Specifically,MAGs may also contain bi-directed edges A <> B
, which imply the existence of a latent confounder between the respectivevariables. Additionally, an undirected edge A -- B
in a MAG implies the existence of a latent selection bias variableleading to the association being observed between A
and B
.
A Partial Ancestral Graph (PAG) describes an equivalence class of MAGs. PAGs may also contain "wild-card" or"circle" edges (-o
), which can either be a directed or undirected arrow head, i.e. A -o B
can be resolved toA -- B
or A -> B
. The o
end is referred to as "unknown" in this package.
Type of Graph | DAG | CPDAG | MAG | PAG |
---|---|---|---|---|
Tester method | graph.is_dag() | ❌ | ❌ | ❌ |
Direct edges -> | ✅ | ✅ | ✅ | ✅ |
Undirected edges -- | ❌ | ✅ | ✅ | ✅ |
Latent confounder edges <> | ❌ | ❌ | ✅ | ✅ |
Wildcard edges o- , o> and oo | ❌ | ❌ | ❌ | ✅ |
See EdgeType for all the supported edge types in this package.Note that the CausalGraph class can contain all the aforementioned edge types, andcan therefore represent the entire hierarchy of DAGs, CPDAGs, MAGs, and PAGs.
info
Discovering a single DAG for a given data set is difficult. Certain causal relationships are indistinguishable fromeach other with only observational data, because they encode the same conditional independencies between variables.The set of such causal relationships is called the Markov equivalence class (MEC) for a particular set of nodes.
Multiple DAGs/CPDAGs/MAGs/PAGs can be consistent with the same MEC. For instance, if you identify thegraphical structure X -> Y -> Z
, then corresponding data would show that X
is independent of Z
given Y
.However, the graphical structures X <- Y <- Z
and X <- Y -> Z
would lead to the exact same conditional independencetest result as above. Only if the graphical structure found was a collider connection X -> Y <- Z
would you be able toidentify the structure from observational data, because the data would tell you that X
and Z
are independent, butbecome dependent given Y
.
danger
In a CPDAG the --
edge implies an existence of an edge which can be in either direction, <-
or ->
. In a MAG ora PAG, the --
edge implies the existence of latent selection variable. When resolving this, it can be possible toresolve to no edge at all. In a PAG, the --
edge is a possible outcome of a wildcard edge (for example o-
).