DAGs with NO TEARS: Continuous Optimization for Structure Learning

1. Introduction

Propose a new approach for score-based learning of DAGs: Converting the traditional combinatorial optimization problem $ \rightarrow $ continuous program

$$ \begin{array}{c} \displaystyle \min_{W \in R^{d \times d}} F(W) \\ \text{subject to } G(W) \in \text{DAGs} \end{array} \quad \Longleftrightarrow \quad \begin{array}{c} \displaystyle \min_{W \in R^{d \times d}} F(W) \\ \text{subject to } h(W) = 0 \end{array} \tag{1} $$

  • $ G(W) $: $d$-node graph induced by $ W $
  • $ W \in R^{d \times d} $: weighted adjacency matrix
  • $ F: R^{d \times d} \to R $: score function
  • $ h: R^{d \times d} \to R $: a smooth function over real matrices, whose level set at zero exactly characterizes acyclic graphs

2. Background

The basic DAG learning problem is formulated as

  • $ \mathbf{X} \in R^{n \times d} $: a data matrix
    • $ n $ IID observations of random vector $ X = (X_1, \dots, X_d) $
  • $ G = (V, E) $: DAG
  • $ D $: Discrete space of DAGs on $ d $ nodes

Goal: Given $ \mathbf{X} $ learn a DAG $ G \in D $


Idea: Model $ \mathbf{X} $ via a SEM defined by weighted adjacency matrix $ W \in R^{d \times d} \rightarrow $ operate on $ R^{d \times d} $ (continuous space of $ d \times d $ matrices) instead of the discrete space $ D $

2.1 Score function and SEM

$$ \begin{array}{c} \displaystyle \min_{W \in R^{d \times d}} F(W) \\ \text{subject to } G(W) \in D \end{array} \tag{3} $$

3. A new characterization of acyclicity

Replace constraint $ G(W) \in D $ in (3) with a single smooth equality constraint $ h(W) = 0 $.


$ h: R^{d \times d} \to R $ satisfies:

  1. $ h(W) = 0 $ iff $ W $ is acyclic (i.e. $ G(W) \in D $)
  2. The values of h quantify the “DAG-ness” of the graph
  3. $ h $ is smooth
  4. $ h $ and its derivatives are easy to compute