DAGs with NO TEARS: Continuous Optimization for Structure Learning

_ DAG causal-discovery

Jul 7, 2025

1. Introduction

Propose a new approach for score-based learning of DAGs: Converting the traditional combinatorial optimization problem $ \rightarrow $ continuous program

$$ \begin{array}{c} \displaystyle \min_{W \in R^{d \times d}} F(W) \\ \text{subject to } G(W) \in \text{DAGs} \end{array} \quad \Longleftrightarrow \quad \begin{array}{c} \displaystyle \min_{W \in R^{d \times d}} F(W) \\ \text{subject to } h(W) = 0 \end{array} \tag{1} $$

$ G(W) $: $d$-node graph induced by $ W $
$ W \in R^{d \times d} $: weighted adjacency matrix
$ F: R^{d \times d} \to R $: score function
$ h: R^{d \times d} \to R $: a smooth function over real matrices, whose level set at zero exactly characterizes acyclic graphs

2. Background

The basic DAG learning problem is formulated as

$ \mathbf{X} \in R^{n \times d} $: a data matrix
- $ n $ IID observations of random vector $ X = (X_1, \dots, X_d) $
$ G = (V, E) $: DAG
$ D $: Discrete space of DAGs on $ d $ nodes

Goal: Given $ \mathbf{X} $ learn a DAG $ G \in D $

Idea: Model $ \mathbf{X} $ via a SEM defined by weighted adjacency matrix $ W \in R^{d \times d} \rightarrow $ operate on $ R^{d \times d} $ (continuous space of $ d \times d $ matrices) instead of the discrete space $ D $

2.1 Score function and SEM

$$ \begin{array}{c} \displaystyle \min_{W \in R^{d \times d}} F(W) \\ \text{subject to } G(W) \in D \end{array} \tag{3} $$

3. A new characterization of acyclicity

Replace constraint $ G(W) \in D $ in (3) with a single smooth equality constraint $ h(W) = 0 $.

$ h: R^{d \times d} \to R $ satisfies:

$ h(W) = 0 $ iff $ W $ is acyclic (i.e. $ G(W) \in D $)
The values of h quantify the “DAG-ness” of the graph
$ h $ is smooth
$ h $ and its derivatives are easy to compute