* Machine Learning
:PROPERTIES:
:ANKI_DECK: machine-learning
:END:

** What are the advantages and disadvantages for Support Vector Machines?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_DECK: machine-learning
:ANKI_NOTE_ID: 1765173425980
:ANKI_NOTE_HASH: 0c5fb780e3409300214480e50b8bb5b6
:END:

advantages:
- effective in high dimensions
- features $\gg$ training data size
- sparse data
- well separable
- kernel trick; can work with non-linear data

disadvantages:
- training takes long on big data
- bad on overlapping classes
- hyper tuning required


** Linear Regression form? Closed form solution?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173425985
:ANKI_NOTE_HASH: cf8dc03c76ec952ca7b418f3c57478e1
:END:

\[
\begin{align*}
Ap &= b \\
\implies \hat{p} &= (A^T A)^{-1} A^T b
\end{align*}
\]


** What is pooling? Why do we do it?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173425990
:ANKI_NOTE_HASH: 4f84ecdf97e707866385527cb6bcf9cd
:END:

Pooling is a downsampling operation over local windows of a feature map (e.g. max/avg pooling) that reduces spatial resolution while retaining important activations.

We do it to:
- reduce computation + parameters downstream
- increase receptive field
- add some local translation invariance / robustness to small shifts


** How to fix overfitting?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173425993
:ANKI_NOTE_HASH: 9c37bdb720ffe8e22b8e372a3a86ca37
:END:

more data, augmentation, early stopping, weight decay, dropout, batch normalisation.


** What is the formula for _precision_? What about _recall_?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173425997
:ANKI_NOTE_HASH: c314ba5c29c4448d81425ed8a4cec047
:END:

precision: \[\frac{\text{TP}}{\text{TP}+\text{FP}}\]

recall: \[\frac{\text{TP}}{\text{TP}+\text{FN}}\]

** What are the formulas for *specificity* and *sensitivity*?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426001
:ANKI_NOTE_HASH: 820b5488af62ebd985ef6b1fe2812466
:END:

specificity: \[\frac{\text{TN}}{\text{TN}+\text{FP}}\]

sensitivity: \[\frac{\text{TP}}{\text{TP}+\text{FN}}\]


** Conceptually what is precision and what is recall?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426005
:ANKI_NOTE_HASH: 517b3537f5081128b7e35c963d3c2ad1
:END:

- Precision: of all predicted positives, how many are truly positive.
- Recall: of all true positives, how many did we correctly predict as positive.


** Conceptually what is specificity and what is sensitivity?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426010
:ANKI_NOTE_HASH: f24be9cefeeb30be2fd39167f5f27c13
:END:

- Sensitivity (recall): of all true positives, how many test positive.
- Specificity: of all true negatives, how many test negative.


** Label the confusion matrix below
:PROPERTIES:
:ANKI_NOTE_TYPE: Cloze
:ANKI_NOTE_ID: 1765178995009
:ANKI_NOTE_HASH: a3163f30c825caf109b0adea7bbeab54
:END:

*** Text

\[
\begin{tikzpicture}[
  font=\small,
  cell/.style={draw, minimum width=3cm, minimum height=1.2cm, align=center},
  head/.style={cell, font=\bfseries},
  bighead/.style={draw, font=\bfseries, align=center}
]

% Row 2: Total population and predicted positive/negative
\node[head, fill=gray]           (pp)  at (6,3)   {Positive};
\node[head, fill=gray]  (pn)  at (9,3)   {Negative};

% Row 3: Actual positive row
\node[head, fill=gray]         (ap)  at (3,1.5) { {{c2::Positive}} };
\node[cell, fill=gray!15]          (tp)  at (6,1.5) { {{c3::True positive\\(TP)}} };
\node[cell, fill=gray!15]            (fn)  at (9,1.5) { {{c3::False negative\\(FN)}} };

% Row 4: Actual negative row
\node[head, fill=gray](an)  at (3,0)   { {{c2::Negative}} };
\node[cell, fill=gray!15]            (fp)  at (6,0)   { {{c3::False positive\\(FP)}} };
\node[cell, fill=gray!15]          (tn)  at (9,0)   { {{c3::True negative\\(TN)}} };

% Top merged header: Predicted condition
\node[bighead, fill=gray!15,
      minimum width=6cm,
      minimum height=1.2cm] (pred) at (7.5,4.2) { {{c1::Predicted condition}} };

% Side merged header: Actual condition (vertical)
\node[bighead, fill=gray!15,
      rotate=90, transform shape,
      minimum width=3cm,
      minimum height=1.2cm] (act) at (0,0.75) { {{c1::Actual condition}} };

\end{tikzpicture}
\]


*** Back Extra

** Label the confusion matrix below
:PROPERTIES:
:ANKI_NOTE_TYPE: Cloze
:ANKI_NOTE_ID: 1765178995014
:ANKI_NOTE_HASH: 70209c79acedb544fa42b0754a326e1e
:END:

*** Text

\[
\begin{tikzpicture}[
  font=\small,
  cell/.style={draw, minimum width=3cm, minimum height=1.2cm, align=center},
  head/.style={cell, font=\bfseries},
  bighead/.style={draw, font=\bfseries, align=center}
]

% Row 2: Total population and predicted positive/negative
\node[head, fill=gray]           (pp)  at (6,3)   {Positive};
\node[head, fill=gray]  (pn)  at (9,3)   {Negative};

% Row 3: Actual positive row
\node[head, fill=gray]         (ap)  at (3,1.5) {Positive};
\node[cell, fill=gray!15]          (tp)  at (6,1.5) {{{c2::True positive\\(TP)}}};
\node[cell, fill=gray!15]            (fn)  at (9,1.5) {{{c1::False negative\\(FN)}}};

% Row 4: Actual negative row
\node[head, fill=gray](an)  at (3,0)   {Negative};
\node[cell, fill=gray!15]            (fp)  at (6,0)   {{{c2::False positive\\(FP)}}};
\node[cell, fill=gray!15]          (tn)  at (9,0)   {{{c1::True negative\\(TN)}}};

% Top merged header: Predicted condition
\node[bighead, fill=gray!15,
      minimum width=6cm,
      minimum height=1.2cm] (pred) at (7.5,4.2) {Predicted condition};

% Side merged header: Actual condition (vertical)
\node[bighead, fill=gray!15,
      rotate=90, transform shape,
      minimum width=3cm,
      minimum height=1.2cm] (act) at (0,0.75) {Actual condition};

\end{tikzpicture}
\]


*** Back Extra


** Entropy may be viewed as the {{c1::average uncertainty}} of the information source.
:PROPERTIES:
:ANKI_NOTE_TYPE: Cloze
:ANKI_NOTE_ID: 1765172582606
:ANKI_NOTE_HASH: 972c1d6023ae844249c7da16d7991b42
:ANKI_PREPEND_HEADING: t
:END:

*** Text

Entropy may be viewed as the {{c1::average uncertainty}} of the information source.

- If the source information has no uncertainty {{c2:: \(H=0\)}}
- If the source information is uncertain then {{c2:: \(H>0\)}}

*** Back Extra

** What is the formulation for Root Mean Squared Error? What about MAE?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426015
:ANKI_NOTE_HASH: 1e70cdc632d472af3cb58e5e99de7a8d
:END:

\[\text{RMSE} = \large\sqrt{\frac{1}{N}\sum^N_{i=i} (y_i - \hat{y})^2}\]
\[\text{MAE} = \large\frac{1}{N}\sum^N_{i=1}|y_i - \hat{y}|\]


** SVM objective? Hard margin.
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426018
:ANKI_NOTE_HASH: 18f5c742e1d74dacb0778d56dc6ab0ce
:END:

Hard-margin SVM (linearly separable case):

\[
\begin{align*}
\min_{\mathbf{w},\, b}\quad & \frac{1}{2}\lVert \mathbf{w} \rVert_2^2 \\
\text{s.t.}\quad & y_i\big(\mathbf{w}^\top \mathbf{x}_i + b\big) \ge 1,\qquad i=1,\dots,n.
\end{align*}
\]

(Maximises the margin while correctly classifying all training points.)


** Which KNN distance should you use for continuous or binary / categorical data?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426020
:ANKI_NOTE_HASH: 57beabe8c1c1775858f8543b374633f0
:END:

euclidean $\rightarrow$ continuous

Hamming $\rightarrow$ binary / categorical


** What is Bayes' Rule? Label everything.
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426023
:ANKI_NOTE_HASH: 3557ed1fb198b699f8bdd4e52739dc20
:END:

\[
\underbrace{P(c_i \mid x)}_{\text{posterior}}
=
\frac{
\overbrace{P(x \mid c_i)}^{\text{likelihood / class-conditional}}
\;\overbrace{P(c_i)}^{\text{prior}}
}{
\underbrace{P(x)}_{\text{evidence / marginal likelihood}}
}
\]

For classification (MAP):
\[
\arg\max_{c_i} P(c_i\mid x)
=
\arg\max_{c_i} P(x\mid c_i)\,P(c_i)
\]
since \(P(x)\) does not depend on \(c_i\).


** What is Bayes' Rule? Label everything.                          :noexport:
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426023
:ANKI_NOTE_HASH: c611f850eb811f39c1d9531ee6cec030
:END:

\[P(c|x) \propto P(x|c) P(c)\]

choose argmax posterior

\[\underbrace{P(c_i | x )}_{\text{posterior}} = \overbrace{\underbrace{P(x|c_i)}_{\text{class conditional distribution}} \underbrace{P(c_i)}_{\text{prior}}}^{\text{learned from data}} \]

** How does RMSE deal with outliers in comparison with MAE?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426026
:ANKI_NOTE_HASH: 9a3f326f856993b1b2b2d5674792fc34
:END:

RMSE penalises large errors more; MAE is robust to outliers.

** SVM soft margin effect of c?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426031
:ANKI_NOTE_HASH: d29e75f0735e10d311bc476ff8df5528
:END:

In soft-margin SVM, \(C\) controls the trade-off between a wide margin and penalizing violations:

\[
\min_{\mathbf w,b,\boldsymbol{\xi}}\ \frac{1}{2}\lVert\mathbf w\rVert^2 + C\sum_{i=1}^n \xi_i
\quad \text{s.t. } y_i(\mathbf w^\top \mathbf x_i + b)\ge 1-\xi_i,\ \xi_i\ge 0.
\]

- Small \(C\): slack is cheap \(\Rightarrow\) allow more margin violations/misclassifications to get a wider margin (more regularization).
- Large \(C\): slack is expensive \(\Rightarrow\) push for fewer violations/training errors, often at the cost of a narrower margin (less regularization).


** Entropy mathematical definition?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426034
:ANKI_NOTE_HASH: eb46ce958063420c02a5a742c911b45d
:END:

\[H = -\sum_i p_i \log_2(p_i)\]


** How does higher tree correlation affect Random Forests error? What about individual tree strength?
:PROPERTIES:
:ANKI_NOTE_TYPE: Basic
:ANKI_NOTE_ID: 1765173426039
:ANKI_NOTE_HASH: f53ce8a56b1e94ae8b623a032bf6660a
:END:

correlation $\uparrow \implies$ error $\uparrow$

strength $\uparrow \implies$ error $\downarrow$