Propose a new approach for score-based learning of DAGs: Converting the traditional combinatorial optimization problem $ \rightarrow $ continuous program
$$ \begin{array}{c} \displaystyle \min_{W \in R^{d \times d}} F(W) \\ \text{subject to } G(W) \in \text{DAGs} \end{array} \quad \Longleftrightarrow \quad \begin{array}{c} \displaystyle \min_{W \in R^{d \times d}} F(W) \\ \text{subject to } h(W) = 0 \end{array} \tag{1} $$
The basic DAG learning problem is formulated as
Goal: Given $ \mathbf{X} $ learn a DAG $ G \in D $
Idea: Model $ \mathbf{X} $ via a SEM defined by weighted adjacency matrix $ W \in R^{d \times d} \rightarrow $ operate on $ R^{d \times d} $ (continuous space of $ d \times d $ matrices) instead of the discrete space $ D $
$$ \begin{array}{c} \displaystyle \min_{W \in R^{d \times d}} F(W) \\ \text{subject to } G(W) \in D \end{array} \tag{3} $$
Replace constraint $ G(W) \in D $ in (3) with a single smooth equality constraint $ h(W) = 0 $.
$ h: R^{d \times d} \to R $ satisfies: