Supervised learning
This chapter focuses on regression problems.
We have a training set $\mathcal{D}$ of $n$ observations.
$$\mathcal{D} = \{(x_i, y_i) \mid i = 1, \dots, n\}$$
$x_i$: column input vector (covariates) of dimension $D$,
$X$: $D \times n$ design matrix.
$y_i$: a scalar output (its value depends on the covariates $\rightarrow$ dependent variable). $$ y = \begin{bmatrix} y_1\\ y_2\\ \vdots\\ y_n \end{bmatrix} $$
Input vectors are arranged into a $D \times n$ design matrix $X$. Outputs are collected into a vector $y$. Thus the dataset is represented as $\mathcal{D} = (X, y)$.
The goal is to infer the conditional distribution $p(y \mid x)$ $\rightarrow$ captures the relationship between the inputs and outputs.
$$ f(x) = x^\top w, \quad y = f(x) + \varepsilon, $$
Now, the first weight \(b\) acts as the bias.
Thus,
$$
y = \tilde{\mathbf{x}}^\top \tilde{\mathbf{w}} + \varepsilon
$$