* Machine Learning :PROPERTIES: :ANKI_DECK: machine-learning :END: ** What are the advantages and disadvantages for Support Vector Machines? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_DECK: machine-learning :ANKI_NOTE_ID: 1765173425980 :ANKI_NOTE_HASH: 0c5fb780e3409300214480e50b8bb5b6 :END: advantages: - effective in high dimensions - features $\gg$ training data size - sparse data - well separable - kernel trick; can work with non-linear data disadvantages: - training takes long on big data - bad on overlapping classes - hyper tuning required ** Linear Regression form? Closed form solution? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173425985 :ANKI_NOTE_HASH: cf8dc03c76ec952ca7b418f3c57478e1 :END: \[ \begin{align*} Ap &= b \\ \implies \hat{p} &= (A^T A)^{-1} A^T b \end{align*} \] ** What is pooling? Why do we do it? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173425990 :ANKI_NOTE_HASH: 4f84ecdf97e707866385527cb6bcf9cd :END: Pooling is a downsampling operation over local windows of a feature map (e.g. max/avg pooling) that reduces spatial resolution while retaining important activations. We do it to: - reduce computation + parameters downstream - increase receptive field - add some local translation invariance / robustness to small shifts ** How to fix overfitting? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173425993 :ANKI_NOTE_HASH: 9c37bdb720ffe8e22b8e372a3a86ca37 :END: more data, augmentation, early stopping, weight decay, dropout, batch normalisation. ** What is the formula for _precision_? What about _recall_? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173425997 :ANKI_NOTE_HASH: c314ba5c29c4448d81425ed8a4cec047 :END: precision: \[\frac{\text{TP}}{\text{TP}+\text{FP}}\] recall: \[\frac{\text{TP}}{\text{TP}+\text{FN}}\] ** What are the formulas for *specificity* and *sensitivity*? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426001 :ANKI_NOTE_HASH: 820b5488af62ebd985ef6b1fe2812466 :END: specificity: \[\frac{\text{TN}}{\text{TN}+\text{FP}}\] sensitivity: \[\frac{\text{TP}}{\text{TP}+\text{FN}}\] ** Conceptually what is precision and what is recall? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426005 :ANKI_NOTE_HASH: 517b3537f5081128b7e35c963d3c2ad1 :END: - Precision: of all predicted positives, how many are truly positive. - Recall: of all true positives, how many did we correctly predict as positive. ** Conceptually what is specificity and what is sensitivity? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426010 :ANKI_NOTE_HASH: f24be9cefeeb30be2fd39167f5f27c13 :END: - Sensitivity (recall): of all true positives, how many test positive. - Specificity: of all true negatives, how many test negative. ** Label the confusion matrix below :PROPERTIES: :ANKI_NOTE_TYPE: Cloze :ANKI_NOTE_ID: 1765178995009 :ANKI_NOTE_HASH: a3163f30c825caf109b0adea7bbeab54 :END: *** Text \[ \begin{tikzpicture}[ font=\small, cell/.style={draw, minimum width=3cm, minimum height=1.2cm, align=center}, head/.style={cell, font=\bfseries}, bighead/.style={draw, font=\bfseries, align=center} ] % Row 2: Total population and predicted positive/negative \node[head, fill=gray] (pp) at (6,3) {Positive}; \node[head, fill=gray] (pn) at (9,3) {Negative}; % Row 3: Actual positive row \node[head, fill=gray] (ap) at (3,1.5) { {{c2::Positive}} }; \node[cell, fill=gray!15] (tp) at (6,1.5) { {{c3::True positive\\(TP)}} }; \node[cell, fill=gray!15] (fn) at (9,1.5) { {{c3::False negative\\(FN)}} }; % Row 4: Actual negative row \node[head, fill=gray](an) at (3,0) { {{c2::Negative}} }; \node[cell, fill=gray!15] (fp) at (6,0) { {{c3::False positive\\(FP)}} }; \node[cell, fill=gray!15] (tn) at (9,0) { {{c3::True negative\\(TN)}} }; % Top merged header: Predicted condition \node[bighead, fill=gray!15, minimum width=6cm, minimum height=1.2cm] (pred) at (7.5,4.2) { {{c1::Predicted condition}} }; % Side merged header: Actual condition (vertical) \node[bighead, fill=gray!15, rotate=90, transform shape, minimum width=3cm, minimum height=1.2cm] (act) at (0,0.75) { {{c1::Actual condition}} }; \end{tikzpicture} \] *** Back Extra ** Label the confusion matrix below :PROPERTIES: :ANKI_NOTE_TYPE: Cloze :ANKI_NOTE_ID: 1765178995014 :ANKI_NOTE_HASH: 70209c79acedb544fa42b0754a326e1e :END: *** Text \[ \begin{tikzpicture}[ font=\small, cell/.style={draw, minimum width=3cm, minimum height=1.2cm, align=center}, head/.style={cell, font=\bfseries}, bighead/.style={draw, font=\bfseries, align=center} ] % Row 2: Total population and predicted positive/negative \node[head, fill=gray] (pp) at (6,3) {Positive}; \node[head, fill=gray] (pn) at (9,3) {Negative}; % Row 3: Actual positive row \node[head, fill=gray] (ap) at (3,1.5) {Positive}; \node[cell, fill=gray!15] (tp) at (6,1.5) {{{c2::True positive\\(TP)}}}; \node[cell, fill=gray!15] (fn) at (9,1.5) {{{c1::False negative\\(FN)}}}; % Row 4: Actual negative row \node[head, fill=gray](an) at (3,0) {Negative}; \node[cell, fill=gray!15] (fp) at (6,0) {{{c2::False positive\\(FP)}}}; \node[cell, fill=gray!15] (tn) at (9,0) {{{c1::True negative\\(TN)}}}; % Top merged header: Predicted condition \node[bighead, fill=gray!15, minimum width=6cm, minimum height=1.2cm] (pred) at (7.5,4.2) {Predicted condition}; % Side merged header: Actual condition (vertical) \node[bighead, fill=gray!15, rotate=90, transform shape, minimum width=3cm, minimum height=1.2cm] (act) at (0,0.75) {Actual condition}; \end{tikzpicture} \] *** Back Extra ** Entropy may be viewed as the {{c1::average uncertainty}} of the information source. :PROPERTIES: :ANKI_NOTE_TYPE: Cloze :ANKI_NOTE_ID: 1765172582606 :ANKI_NOTE_HASH: 972c1d6023ae844249c7da16d7991b42 :ANKI_PREPEND_HEADING: t :END: *** Text Entropy may be viewed as the {{c1::average uncertainty}} of the information source. - If the source information has no uncertainty {{c2:: \(H=0\)}} - If the source information is uncertain then {{c2:: \(H>0\)}} *** Back Extra ** What is the formulation for Root Mean Squared Error? What about MAE? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426015 :ANKI_NOTE_HASH: 1e70cdc632d472af3cb58e5e99de7a8d :END: \[\text{RMSE} = \large\sqrt{\frac{1}{N}\sum^N_{i=i} (y_i - \hat{y})^2}\] \[\text{MAE} = \large\frac{1}{N}\sum^N_{i=1}|y_i - \hat{y}|\] ** SVM objective? Hard margin. :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426018 :ANKI_NOTE_HASH: 18f5c742e1d74dacb0778d56dc6ab0ce :END: Hard-margin SVM (linearly separable case): \[ \begin{align*} \min_{\mathbf{w},\, b}\quad & \frac{1}{2}\lVert \mathbf{w} \rVert_2^2 \\ \text{s.t.}\quad & y_i\big(\mathbf{w}^\top \mathbf{x}_i + b\big) \ge 1,\qquad i=1,\dots,n. \end{align*} \] (Maximises the margin while correctly classifying all training points.) ** Which KNN distance should you use for continuous or binary / categorical data? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426020 :ANKI_NOTE_HASH: 57beabe8c1c1775858f8543b374633f0 :END: euclidean $\rightarrow$ continuous Hamming $\rightarrow$ binary / categorical ** What is Bayes' Rule? Label everything. :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426023 :ANKI_NOTE_HASH: 3557ed1fb198b699f8bdd4e52739dc20 :END: \[ \underbrace{P(c_i \mid x)}_{\text{posterior}} = \frac{ \overbrace{P(x \mid c_i)}^{\text{likelihood / class-conditional}} \;\overbrace{P(c_i)}^{\text{prior}} }{ \underbrace{P(x)}_{\text{evidence / marginal likelihood}} } \] For classification (MAP): \[ \arg\max_{c_i} P(c_i\mid x) = \arg\max_{c_i} P(x\mid c_i)\,P(c_i) \] since \(P(x)\) does not depend on \(c_i\). ** What is Bayes' Rule? Label everything. :noexport: :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426023 :ANKI_NOTE_HASH: c611f850eb811f39c1d9531ee6cec030 :END: \[P(c|x) \propto P(x|c) P(c)\] choose argmax posterior \[\underbrace{P(c_i | x )}_{\text{posterior}} = \overbrace{\underbrace{P(x|c_i)}_{\text{class conditional distribution}} \underbrace{P(c_i)}_{\text{prior}}}^{\text{learned from data}} \] ** How does RMSE deal with outliers in comparison with MAE? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426026 :ANKI_NOTE_HASH: 9a3f326f856993b1b2b2d5674792fc34 :END: RMSE penalises large errors more; MAE is robust to outliers. ** SVM soft margin effect of c? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426031 :ANKI_NOTE_HASH: d29e75f0735e10d311bc476ff8df5528 :END: In soft-margin SVM, \(C\) controls the trade-off between a wide margin and penalizing violations: \[ \min_{\mathbf w,b,\boldsymbol{\xi}}\ \frac{1}{2}\lVert\mathbf w\rVert^2 + C\sum_{i=1}^n \xi_i \quad \text{s.t. } y_i(\mathbf w^\top \mathbf x_i + b)\ge 1-\xi_i,\ \xi_i\ge 0. \] - Small \(C\): slack is cheap \(\Rightarrow\) allow more margin violations/misclassifications to get a wider margin (more regularization). - Large \(C\): slack is expensive \(\Rightarrow\) push for fewer violations/training errors, often at the cost of a narrower margin (less regularization). ** Entropy mathematical definition? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426034 :ANKI_NOTE_HASH: eb46ce958063420c02a5a742c911b45d :END: \[H = -\sum_i p_i \log_2(p_i)\] ** How does higher tree correlation affect Random Forests error? What about individual tree strength? :PROPERTIES: :ANKI_NOTE_TYPE: Basic :ANKI_NOTE_ID: 1765173426039 :ANKI_NOTE_HASH: f53ce8a56b1e94ae8b623a032bf6660a :END: correlation $\uparrow \implies$ error $\uparrow$ strength $\uparrow \implies$ error $\downarrow$