\item \subquestionpoints{5} Recall that in GDA we model the joint distribution of $(x, y)$ by the following equations: % \begin{eqnarray*} p(y) &=& \begin{cases} \phi & \mbox{if~} y = 1 \\ 1 - \phi & \mbox{if~} y = 0 \end{cases} \\ p(x | y=0) &=& \frac{1}{(2\pi)^{\di/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2}(x-\mu_{0})^T \Sigma^{-1} (x-\mu_{0})\right) \\ p(x | y=1) &=& \frac{1}{(2\pi)^{\di/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2}(x-\mu_1)^T \Sigma^{-1} (x-\mu_1) \right), \end{eqnarray*} % where $\phi$, $\mu_0$, $\mu_1$, and $\Sigma$ are the parameters of our model. Suppose we have already fit $\phi$, $\mu_0$, $\mu_1$, and $\Sigma$, and now want to predict $y$ given a new point $x$. To show that GDA results in a classifier that has a linear decision boundary, show the posterior distribution can be written as % \begin{equation*} p(y = 1\mid x; \phi, \mu_0, \mu_1, \Sigma) = \frac{1}{1 + \exp(-(\theta^T x + \theta_0))}, \end{equation*} % where $\theta\in\Re^\di$ and $\theta_{0}\in\Re$ are appropriate functions of $\phi$, $\Sigma$, $\mu_0$, and $\mu_1$.