24 Birthday Problems

Q1

Problem
[4 marks]

Given that \(A\) and \(B\) are sets like so:

$A$ and $B$ are non-empty sets

Part
[1 marks]

Give an expression for the conditional probability \(P(A|B)\)

Solution

\begin{equation}\label{eq:11}P(A|B) = \frac{P(A\cap B)}{P(B)}\end{equation}

Part
[1 marks]

Hence or otherwise derive Bayes’ (first) rule for conditional probability:

\[P(A|B) = P(B|A) \times \frac{P(A)}{P(B)}\]

Solution

\begin{align*} P(A|B) &= \frac{P(A\cap B)}{P(B)} \qquad\text{ as }\eqref{eq:11}\\ P(B|A) &= \frac{P(A\cap B)}{P(A)}\\ &\implies P(B|A) \cdot P(A) = P(A|B) \cdot P(B)\\ &\implies P(A|B) = P(B|A)\times \frac{P(A)}{P(B)} \end{align*}

Part
[2 marks]

Prove that the symmetric difference \(A\Delta B = (A - B) \cup (B - A)\) is the same as \((A\cup B) - (A\cap B)\).

Solution

Let \(x\in A \Delta B\).

Then:

  • \(x\in A-B \implies x\in A\) and \(x\not\in B\)

OR:

  • \(x\in B-A \implies x\in B\) and \(x\not\in A\)

So:

  • \(x\in A\) or \(x \in B \implies x\in (A\cup B)\)
  • \(x \not \in (A\cap B) \implies x\in (A\cap B)^c\)

Therefore: \[x\in(A\cup B) \cap (A\cap B)^c =(A\cup B)- (A\cap B)\]

Q2

Problem
[7 marks]

The Gamma Function

Part
[1 marks]

Recall the integral of \[\begin{equation}\label{eq:1}\int e^{-x}\; \mathrm{d}x\end{equation} \]

Solution

\begin{equation}\label{eq:2}\large\int e^{-x}\; \mathrm{d}x = -e^{-x}\end{equation}

Part
[2 marks]

Apply integration by parts on \[\int x e^{-x}\mathrm{d}x.\] Factorise your result.

Solution

\begin{align*} &= -xe^{-x} + \int e^{-x} \mathrm{d}x\\ &= -e^{-x}(1+x) \end{align*}

Part
[1 marks]

These above are special cases of \(\alpha=1\) and \(\alpha=2\). More generally we can manipulate the form of the integral \[\begin{equation}\label{eq:33}\int x^{\alpha-1} e^{-x}\mathrm{d}x\end{equation}\] with IBP to produce:

Solution

\begin{align} &= -x^{\alpha-1}e^{-x} + (\alpha-1)\int x^{\alpha-2} e^{-x} \mathrm{d}x\\ \end{align}

Part
[3 marks]

Fixing the limits of integration leads to the Gamma function, defined by \[\Gamma(\alpha) = \int^\infty_0 x^{\alpha-1}e^{-x} \mathrm{d}x, \quad\text{for }\alpha > 0\] Show that with \(\eqref{eq:33}\), it follows that \[\Gamma(\alpha) = (\alpha-1)\times\Gamma(\alpha-1).\]

So, for any integral \(k\):

\begin{align*}\Gamma(k) &= (k-1)\times(k-2)\times\ldots\times 2\times\Gamma(1)\\ &=(k-1)!\end{align*}

with \(\Gamma(1) = 1\) \(\eqref{eq:1}\), and the factorial function has a recursive structure intimately linked with the number \(e\).

Solution

From \eqref{eq:33}, we have:

\begin{align*} \int x^{\alpha-1} e^{-x}\mathrm{d}x &= -x^{\alpha-1}e^{-x} + (\alpha-1)\int x^{\alpha-2} e^{-x} \mathrm{d}x \end{align*}

Evaluating with limits from \(0\) to \(\infty\):

\begin{align*} \Gamma(\alpha) &= \int^\infty_0 x^{\alpha-1}e^{-x} \mathrm{d}x\\ &= \left[-x^{\alpha-1}e^{-x}\right]^\infty_0 + (\alpha-1)\int^\infty_0 x^{\alpha-2} e^{-x} \mathrm{d}x \end{align*}

For \(\alpha > 1\), as \(x\to\infty\): \(x^{\alpha-1}e^{-x}\to 0\) (exponential dominates polynomial).

At \(x=0\): \(0^{\alpha-1}e^{0} = 0\) for \(\alpha > 1\).

Therefore:

\begin{align*} \Gamma(\alpha) &= [0-0] + (\alpha-1)\int^\infty_0 x^{\alpha-2} e^{-x} \mathrm{d}x\\ &= (\alpha-1)\Gamma(\alpha-1) \end{align*}

Q3

Problem
[5 marks]

Monty Hall Problem:

Part
[2 marks]

Suppose there are three curtains. Behind one curtain there is a nice prize, while behind the other two there are worthless prizes (perhaps 1 car and 2 goats). A contestant selects one curtain at random, and then Monte Hall opens one of the other two curtains to reveal a worthless prize (goat). Hall then expresses the willingness to trade the curtain that the contestant has chosen for the other curtain that has not been opened. Should the contestant switch curtains or stick with the one that she has?

Solution

She should switch.

Let \(C\) denote the event “car is behind chosen door” and \(S\) denote “car is behind switch door.”

Initially: \(P( C) = \frac{1}{3}\) and \(P(S) = \frac{2}{3}\).

Monty knows where the car is and always opens a door with a goat. This action does not change the initial probabilities:

  • If the car was behind the contestant’s door (prob \(\frac{1}{3}\)), switching loses.
  • If the car was behind one of the other two doors (prob \(\frac{2}{3}\)), Monty must reveal the goat from the remaining door, so switching wins.

Using Bayes’ theorem: Let \(M\) be the event “Monty opens door 3 (revealing goat).” Assuming the contestant chose door 1:

\begin{align*} P(\text{Car at 2}|M) &= \frac{P(M|\text{Car at 2})P(\text{Car at 2})}{P(M)}\\ &= \frac{1 \cdot \frac{1}{3}}{\frac{1}{2}} = \frac{2}{3} \end{align*}

where \(P(M) = P(M|\text{Car at 1})P(\text{Car at 1}) + P(M|\text{Car at 2})P(\text{Car at 2}) + P(M|\text{Car at 3})P(\text{Car at 3})\) \(= \frac{1}{2} \cdot \frac{1}{3} + 1 \cdot \frac{1}{3} + 0 \cdot \frac{1}{3} = \frac{1}{2}\)

Therefore, switching gives probability \(\frac{2}{3}\) of winning vs \(\frac{1}{3}\) for staying.

Part
[3 marks]

Now consider the variant where the host does not know which door hides the car. What is the posterior probability of winning the prize / car if our contestant switches?

Solution

When the host does not know where the car is and randomly opens a door, we must carefully condition on observing a goat.

Let contestant choose door 1, host randomly opens door 3 revealing a goat. Let \(C_i =\) “car behind door \(i\)”, \(G_3 =\) “host opens door 3 and reveals goat.”

\begin{align*} P(C_2|G_3) &= \frac{P(G_3|C_2)P(C_2)}{P(G_3)} \end{align*}

Computing the probabilities (host picks uniformly from doors 2 and 3):

  • \(P(G_3|C_1) = \frac{1}{2}\) (picks door 3 with prob \(\frac{1}{2}\), finds goat)
  • \(P(G_3|C_2) = \frac{1}{2}\) (picks door 3 with prob \(\frac{1}{2}\), finds goat)
  • \(P(G_3|C_3) = 0\) (if picks door 3, finds car, not goat)

Therefore:

\begin{align*} P(G_3) &= P(G_3|C_1)P(C_1) + P(G_3|C_2)P(C_2) + P(G_3|C_3)P(C_3)\\ &= \frac{1}{2} \cdot \frac{1}{3} + \frac{1}{2} \cdot \frac{1}{3} + 0 \cdot \frac{1}{3}\\ &= \frac{1}{3} \end{align*}

Thus:

\begin{align*} P(C_2|G_3) &= \frac{\frac{1}{2} \cdot \frac{1}{3}}{\frac{1}{3}} = \frac{1}{2} \end{align*}

Answer: The posterior probability of winning by switching is \[ \boxed{\frac{1}{2}} \]

The key insight: when the host doesn’t know, revealing a goat provides no information to distinguish between doors 1 and 2, so they become equally likely.

Q4

Problem
[7 marks]
Part
[2 marks]

Prove that there are equally as many natural numbers as there are integers, i.e. that \(|\mathbb{Z}| = |\mathbb{N}|\).

Solution

Infinite sets are equal in cardinality if there is a bijection between both sets:

Consider \(f:\mathbb{N}\to \mathbb{Z}\) given by \(f(2n)=n\) and \(f(2n+1)=-(n+1)\). Such a function is injective and surjective. The correspondence is illustrated below.

Part
[2 marks]

Prove that there are equally as many integers as there are rational numbers, i.e. that \(|\mathbb{Z}| = |\mathbb{Q}|\).

Solution

Every rational number \(q\in\mathbb{Q}\) has a unique reduced representation \[ q=\frac{a}{b}\quad\text{with }a\in\mathbb{Z},\; b\in\mathbb{N}_{>0},\;\gcd(|a|,b)=1. \] Define an injection \(\iota:\mathbb{Q}\to \mathbb{Z}\times \mathbb{N}_{>0}\) by \[ \iota(0)=(0,1),\qquad \iota\!\left(\frac{a}{b}\right)=(a,b)\ \ \text{(in lowest terms, }b>0\text{)}. \] This is injective because the reduced representation is unique.

Next, \(\mathbb{Z}\times\mathbb{N}_{>0}\) is countable: let \(g:\mathbb{Z}\to\mathbb{N}\) be the bijection \[ g(z)=\begin{cases} 2z,& z\ge 0,\\ -2z-1,& z<0, \end{cases} \] and let \(\pi:\mathbb{N}\times\mathbb{N}\to\mathbb{N}\) be Cantor’s pairing function \[ \pi(x,y)=\frac{(x+y)(x+y+1)}{2}+y. \] Then \[ h:\mathbb{Z}\times\mathbb{N}_{>0}\to\mathbb{N},\qquad h(z,b)=\pi\bigl(g(z),\,b-1\bigr) \] is a bijection, so \(\mathbb{Z}\times\mathbb{N}_{>0}\) is countable.

Since \(\mathbb{Q}\) injects into a countable set, \(\mathbb{Q}\) is countable. Also \(\mathbb{Z}\subseteq\mathbb{Q}\), so \(\mathbb{Q}\) is infinite. Hence \(\mathbb{Q}\) is countably infinite, and therefore \[\lvert\mathbb{Q}\rvert=\lvert\mathbb{N}\rvert=\lvert\mathbb{Z}\rvert .\]

Part
[3 marks]

Prove that the real numbers are uncountable; i.e. that \(|\mathbb{R}| \neq |\mathbb{N}|\).

Solution

It suffices to show that \([0,1]\subseteq\mathbb{R}\) is uncountable. Suppose for contradiction that \([0,1]\) is countable. Then we can list all its elements as a sequence \[ x_1,\;x_2,\;x_3,\;\dots \] Write each \(x_n\) in a decimal expansion \[ x_n = 0.a_{n1}a_{n2}a_{n3}\cdots \] choosing the representation that does not end in repeating \(9\)s (so the expansion is unique).

Now define a new real number \(y\in[0,1]\) by specifying its digits: \[y = 0.b_1b_2b_3\cdots,\qquad b_n =\begin{cases} 1, & a_{nn}\neq 1,\\ 2, & a_{nn}=1. \end{cases}\] By construction, \(b_n\neq a_{nn}\) for every \(n\). Hence \(y\) differs from \(x_n\) in the \(n\)-th decimal place, so \(y\neq x_n\) for all \(n\). This contradicts the assumption that \((x_n)\) lists all of \([0,1]\).

Therefore \([0,1]\) is uncountable, and since \([0,1]\subseteq\mathbb{R}\), the real numbers are uncountable. In particular, \[\lvert\mathbb{R}\rvert\neq\lvert\mathbb{N}\rvert .\]

Q5

Problem
[4 marks]
Part
[2 marks]

What does the series \(1+\tfrac14+\tfrac19+\tfrac1{16}+\ldots\) converge to? (If at all)

Solution

Write the series as \(\sum_{n=1}^{\infty}\frac{1}{n^2}\).

Consider the function \(f(x)=x\) on the interval \((-\pi,\pi)\). It is odd, so its Fourier series contains only sine terms: \[x \sim \sum_{n=1}^{\infty} b_n \sin(nx),\qquad b_n=\frac{1}{\pi}\int_{-\pi}^{\pi} x\sin(nx)\,dx.\]

Since the integrand is even, we can write \[b_n=\frac{2}{\pi}\int_{0}^{\pi} x\sin(nx)\,dx.\] Integrate by parts with \(u=x\), \(dv=\sin(nx)\,dx\) (so \(du=dx\), \(v=-\cos(nx)/n\)): \[\int_{0}^{\pi} x\sin(nx)\,dx = \left[-\frac{x\cos(nx)}{n}\right]_{0}^{\pi} + \frac{1}{n}\int_{0}^{\pi}\cos(nx)\,dx.\] But \(\int_{0}^{\pi}\cos(nx)\,dx=\left[\frac{\sin(nx)}{n}\right]_{0}^{\pi}=0\). Hence \[ \int_{0}^{\pi} x\sin(nx)\,dx = -\frac{\pi\cos(n\pi)}{n} = -\frac{\pi(-1)^n}{n}.\] Therefore \[b_n=\frac{2}{\pi}\left(-\frac{\pi(-1)^n}{n}\right)=\frac{2(-1)^{n+1}}{n}.\] So \[x \sim 2\sum_{n=1}^{\infty}\frac{(-1)^{n+1}}{n}\sin(nx).\]

Now apply Parseval’s identity: \[\frac{1}{\pi}\int_{-\pi}^{\pi} |f(x)|^2\,dx=\frac{a_0^2}{2}+\sum_{n=1}^{\infty}(a_n^2+b_n^2).\] For \(f(x)=x\) we have \(a_0=a_n=0\), so \[\frac{1}{\pi}\int_{-\pi}^{\pi} x^2\,dx=\sum_{n=1}^{\infty} b_n^2=\sum_{n=1}^{\infty}\left(\frac{2(-1)^{n+1}}{n}\right)^2 =4\sum_{n=1}^{\infty}\frac{1}{n^2}.\] Compute the integral: \[\frac{1}{\pi}\int_{-\pi}^{\pi} x^2\,dx=\frac{1}{\pi}\cdot 2\int_{0}^{\pi} x^2\,dx =\frac{1}{\pi}\cdot 2\cdot \frac{\pi^3}{3} =\frac{2\pi^2}{3}.\] Thus \[\frac{2\pi^2}{3}=4\sum_{n=1}^{\infty}\frac{1}{n^2} \quad\Longrightarrow\quad \sum_{n=1}^{\infty}\frac{1}{n^2}=\frac{\pi^2}{6}.\]

Therefore: \[\boxed{1+\tfrac14+\tfrac19+\tfrac1{16}+\cdots=\frac{\pi^2}{6}}.\]

Part
[2 marks]

What does the series \(1+\tfrac12+\tfrac13+\tfrac14+\ldots\) converge to? (If at all)

Solution

This is the harmonic series \[\sum_{n=1}^{\infty}\frac{1}{n}.\] Using the integral test: \[\int_{1}^{\infty}\frac{1}{x}\,dx=\left[\ln x\right]_{1}^{\infty}=\infty,\] so the series diverges.

Therefore: \[ \boxed{1+\tfrac12+\tfrac13+\tfrac14+\cdots\ \text{diverges (does not converge to a finite value).}} \]

Q6

Problem
[4 marks]

Which is larger asymptotically as \(n \rightarrow \infty\)? \[2^n \ll n!\] OR \[2^n \gg n!\] Give a proof by induction.

Solution

We claim that \[ \boxed{2^n \ll n!} \] for sufficiently large \(n\).

Proof by induction:

We will prove that \(n! > 2^n\) for all \(n \geq 4\).

Base case: \(n = 4\) \[ 4! = 24 > 16 = 2^4 \]

Inductive step: Assume \(k! > 2^k\) for some \(k \geq 4\). We must show \((k+1)! > 2^{k+1}\).

\begin{align*} (k+1)! &= (k+1) \cdot k!\\ &> (k+1) \cdot 2^k \qquad\text{(by inductive hypothesis)}\\ &> 2 \cdot 2^k \qquad\text{(since }k+1 > 2\text{ for }k \geq 4\text{)}\\ &= 2^{k+1} \end{align*}

Therefore, by induction, \(n! > 2^n\) for all \(n \geq 4\).

Asymptotic analysis: The ratio \(\frac{n!}{2^n}\) grows without bound: \[ \frac{n!}{2^n} = \frac{1 \cdot 2 \cdot 3 \cdot 4 \cdots n}{2^n} = \frac{1}{2} \cdot \frac{2}{2} \cdot \frac{3}{2} \cdot \frac{4}{2} \cdots \frac{n}{2} \]

For \(n \geq 5\), each factor \(\frac{k}{2}\) for \(k \geq 3\) is \(> 1\), and we have infinitely many such factors as \(n \to \infty\). Thus \(\frac{n!}{2^n} \to \infty\), confirming \(n! \gg 2^n\).

Q7

Problem
[12 marks]
Part
[4 marks]

Find the eigenspaces of the following matrices:

  1. \begin{bmatrix}1&0\\1&1\end{bmatrix}
  2. \begin{bmatrix}-2&2\\2&1\end{bmatrix}
  3. \begin{bmatrix}2&3&0\\1&4&3\\0&0&1\end{bmatrix}
  4. \begin{bmatrix}1&1&0&0\\0&0&0&0\\0&0&0&0\\0&0&0&0\end{bmatrix}
Solution

(a) \(A = \begin{bmatrix}1&0\\1&1\end{bmatrix}\)

Characteristic polynomial: \(\det(A - \lambda I) = (1-\lambda)^2 = 0\)

Eigenvalue: \(\lambda = 1\) (multiplicity 2)

Eigenspace for \(\lambda = 1\): Solve \((A - I)\mathbf{v} = 0\): \[\begin{bmatrix}0&0\\1&0\end{bmatrix}\begin{bmatrix}x\\ y\end{bmatrix} = 0 \implies x = 0\] Eigenspace: \(E_1 = \text{span}\left\{\begin{bmatrix}0\\ 1\end{bmatrix}\right\}\)

(b) \(B = \begin{bmatrix}-2&2\\2&1\end{bmatrix}\)

Characteristic polynomial: \(\det(B - \lambda I) = (-2-\lambda)(1-\lambda) - 4 = \lambda^2 + \lambda - 6 = (\lambda+3)(\lambda-2) = 0\)

Eigenvalues: \(\lambda_1 = -3\), \(\lambda_2 = 2\)

For \(\lambda_1 = -3\): \((B + 3I)\mathbf{v} = 0\): \[\begin{bmatrix}1&2\\2&4\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} = 0 \implies x + 2y = 0\] \(E_{-3} = \text{span}\left\{\begin{bmatrix}2\\ -1\end{bmatrix}\right\}\)

For \(\lambda_2 = 2\): \((B - 2I)\mathbf{v} = 0\): \[\begin{bmatrix}-4&2\\2&-1\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} = 0 \implies -4x + 2y = 0\] \(E_2 = \text{span}\left\{\begin{bmatrix}1\\2\end{bmatrix}\right\}\)

(c) \(C = \begin{bmatrix}2&3&0\\ 1&4&3\\ 0&0&1\end{bmatrix}\)

Characteristic polynomial: \begin{gather*}det(C - λ I) = (1-λ)[(2-λ)(4-λ) - 3] \\= (1-λ)(λ^2 - 6λ + 5) \\= (1-λ)(λ-1)(λ-5) \\= 0\end{gather*}

Eigenvalues: \(\lambda_1 = 1\) (multiplicity 2), \(\lambda_2 = 5\)

For \(\lambda = 1\): \((C - I)\mathbf{v} = 0\): \[\begin{bmatrix}1&3&0\\ 1&3&3\\ 0&0&0\end{bmatrix}\begin{bmatrix}x \\y\\ z\end{bmatrix} = 0\] From row 1: \(x + 3y = 0\), from row 2: \(x + 3y + 3z = 0 \implies z = 0\)

\(E_1 = \text{span}\left\{\begin{bmatrix}-3 \\1 \\0\end{bmatrix}\right\}\)

For \(\lambda = 5\): \((C - 5I)\mathbf{v} = 0\): \[\begin{bmatrix}-3&3&0\\1&-1&3\\0&0&-4\end{bmatrix}\begin{bmatrix}x\\y\\z\end{bmatrix} = 0\] From row 3: \(z = 0\), from row 1: \(-3x + 3y = 0 \implies x = y\)

\(E_5 = \text{span}\left\{\begin{bmatrix}1\\1\\0\end{bmatrix}\right\}\)

(d) \(D = \begin{bmatrix}1&1&0&0\\0&0&0&0\\0&0&0&0\\0&0&0&0\end{bmatrix}\)

Characteristic polynomial: Since \(D\) is upper triangular, its eigenvalues are the diagonal entries. Thus \[\chi_D(\lambda) = (1-\lambda)(-\lambda)^3 = 0\] Eigenvalues: \(\lambda_1 = 1\) (multiplicity 1), \(\lambda_2 = 0\) (multiplicity 3)

For \(\lambda = 0\): Solve \(D\mathbf{v} = 0\): \[\begin{bmatrix}1&1&0&0\\0&0&0&0\\0&0&0&0\\0&0&0&0\end{bmatrix}\begin{bmatrix}x\\y\\z\\w\end{bmatrix}= 0\;\implies\; x+y = 0\] So \(y=-x\) and \(z,w\) are free: \[E_0 = \text{span}\left\{\begin{bmatrix}1\\ -1\\0\\0\end{bmatrix},\begin{bmatrix}0\\0\\1\\0\end{bmatrix},\begin{bmatrix}0\\0\\0\\1\end{bmatrix}\right\}\]

For \(\lambda = 1\): Solve \((D-I)\mathbf{v} = 0\): \[\left(\begin{bmatrix}1&1&0&0\\0&0&0&0\\0&0&0&0\\0&0&0&0\end{bmatrix}-\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&0&1\end{bmatrix}\right) \begin{bmatrix}x\\y\\z\\w\end{bmatrix}=\begin{bmatrix}0&1&0&0\\0&-1&0&0\\0&0&-1&0\\0&0&0&-1\end{bmatrix}\begin{bmatrix}x\\y\\z\\w\end{bmatrix}= 0\] This gives/block-implies \(y=0\), \(z=0\), \(w=0\), with \(x\) free: \[E_1 = \text{span}\left\{\begin{bmatrix}1\\0\\0\\0\end{bmatrix}\right\}\]

Part
[8 marks]

Determine if the following matrices are diagonalisable. If so, determine their diagonal form and a basis with respect to which the transformation matrices are diagonal.

If they are not diagonal, give reasons why not.

  1. \begin{bmatrix}0&1\\ -8&4\end{bmatrix}
  2. \begin{bmatrix}1&1&1\\1&1&1\\1&1&1\end{bmatrix}
  3. \begin{bmatrix}5&-6&-6\\ -1&4&2\\3&-6&-4\end{bmatrix}
  4. \begin{bmatrix}5&4&2&1\\0&1&-1&-1\\ -1&-1&3&0\\1&1&-1&2\end{bmatrix}
Solution

(a) \(A = \begin{bmatrix}0&1\\ -8&4\end{bmatrix}\)

Characteristic polynomial: \(\det(A - \lambda I) = -\lambda(4-\lambda) + 8 = \lambda^2 - 4\lambda + 8 = 0\)

\(\lambda = \frac{4 \pm \sqrt{16-32}}{2} = \frac{4 \pm 4i}{2} = 2 \pm 2i\)

Not diagonalisable over \(\mathbb{R}\) (complex eigenvalues). However, it is diagonalisable over \(\mathbb{C}\).

For \(\lambda_1 = 2 + 2i\): \((A - \lambda_1 I)\mathbf{v} = 0\): \[\begin{bmatrix}-2-2i&1\\ -8&2-2i\end{bmatrix}\mathbf{v} = 0\] Eigenvector: \(\mathbf{v}_1 = \begin{bmatrix}1\\2+2i\end{bmatrix}\)

For \(\lambda_2 = 2 - 2i\): \(\mathbf{v}_2 = \begin{bmatrix}1\\2-2i\end{bmatrix}\)

Over \(\mathbb{C}\): \(D = \begin{bmatrix}2+2i&0\\0&2-2i\end{bmatrix}\), \(P = \begin{bmatrix}1&1\\2+2i&2-2i\end{bmatrix}\)

(b) \(B = \begin{bmatrix}1&1&1\\1&1&1\\1&1&1\end{bmatrix}\)

Characteristic polynomial: Notice \(\text{rank}(B) = 1\), so \(\lambda = 0\) has multiplicity at least 2. \(\text{tr}(B) = 3\), so eigenvalues sum to 3.

Eigenvalues: \(\lambda_1 = 3\), \(\lambda_2 = \lambda_3 = 0\)

For \(\lambda = 3\): \((B - 3I)\mathbf{v} = 0\): \[\begin{bmatrix}-2&1&1\\1&-2&1\\1&1&-2\end{bmatrix}\mathbf{v} = 0\] All rows are equivalent to \(x + y + z = 0\) after reduction. Eigenvector: \(\mathbf{v}_1 = \begin{bmatrix}1\\1\\1\end{bmatrix}\)

For \(\lambda = 0\): \(B\mathbf{v} = 0 \implies x + y + z = 0\) Eigenspace: \(E_0 = \text{span}\left\{\begin{bmatrix}1\\ -1\\0\end{bmatrix}, \begin{bmatrix}1\\0\\ -1\end{bmatrix}\right\}\) (dimension 2)

Diagonalisable: \(D = \begin{bmatrix}3&0&0\\0&0&0\\0&0&0\end{bmatrix}\)

Basis: \(P = \begin{bmatrix}1&1&1\\1&-1&0\\1&0&-1\end{bmatrix}\)

(c) \(C = \begin{bmatrix}5&-6&-6\\ -1&4&2\\3&-6&-4\end{bmatrix}\)

Characteristic polynomial (expanding along suitable row/column): \(\det(C - \lambda I) = -\lambda^3 + 5\lambda^2 - 8\lambda + 4 = -(\lambda - 1)(\lambda - 2)^2 = 0\)

Eigenvalues: \(\lambda_1 = 1\), \(\lambda_2 = 2\) (multiplicity 2)

For \(\lambda = 1\): \((C - I)\mathbf{v} = 0\): \[\begin{bmatrix}4&-6&-6\\ -1&3&2\\3&-6&-5\end{bmatrix}\mathbf{v} = 0\] Solving: \(\mathbf{v}_1 = \begin{bmatrix}3\\1\\1\end{bmatrix}\)

For \(\lambda = 2\): \((C - 2I)\mathbf{v} = 0\): \[\begin{bmatrix}3&-6&-6\\ -1&2&2\\3&-6&-6\end{bmatrix}\mathbf{v} = 0\] Row reduces to: \(x - 2y - 2z = 0\) Eigenspace: \(\dim(E_2) = 2\), spanned by \(\begin{bmatrix}2\\1\\0\end{bmatrix}, \begin{bmatrix}2\\0\\1\end{bmatrix}\)

Diagonalisable: \(D = \begin{bmatrix}1&0&0\\0&2&0\\0&0&2\end{bmatrix}\)

Basis: \(P = \begin{bmatrix}3&2&2\\1&1&0\\1&0&1\end{bmatrix}\)

(d) \(D = \begin{bmatrix}5&4&2&1\\0&1&-1&-1\\ -1&-1&3&0\\1&1&-1&2\end{bmatrix}\)

Characteristic polynomial: \[\det(D-\lambda I)=\lambda^4-11\lambda^3+42\lambda^2-64\lambda+32 =(\lambda-4)^2(\lambda-2)(\lambda-1)\] Eigenvalues: \(\lambda=4\) (multiplicity 2), \(\lambda=2\), \(\lambda=1\)

Compute the eigenspace for the repeated eigenvalue \(\lambda=4\): \[(D-4I)=\begin{bmatrix}1&4&2&1\\0&-3&-1&-1\\ -1&-1&-1&0\\1&1&-1&-2\end{bmatrix}\sim \begin{bmatrix}1&0&0&-1\\0&1&0&0\\0&0&1&1\\0&0&0&0\end{bmatrix}\] So \(x-w=0\), \(y=0\), \(z+w=0\), hence \[E_4=\text{span}\left\{\begin{bmatrix}1\\0\\ -1\\1\end{bmatrix}\right\} \quad\text{(dimension 1)}\] But the algebraic multiplicity of \(\lambda=4\) is \(2\) while \(\dim(E_4)=1\), so there are not enough linearly independent eigenvectors to form a basis.

Not diagonalisable (geometric multiplicity $< $ algebraic multiplicity for \(\lambda=4\)).

Q8

Problem
[4 marks]

Consider the following bivariate distribution \(p(x,y)\) of two discrete random variables \(X\) and \(Y\).

Part
[2 marks]

Compute the marginal distributions \(p(x)\) and \(p(y)\).

Solution

Marginal distribution \(p(x)\): Sum over all values of \(y\):

\begin{align*} p(x_1) &= 0.01 + 0.05 + 0.10 = 0.16\\ p(x_2) &= 0.02 + 0.10 + 0.05 = 0.17\\ p(x_3) &= 0.03 + 0.05 + 0.03 = 0.11\\ p(x_4) &= 0.10 + 0.07 + 0.05 = 0.22\\ p(x_5) &= 0.10 + 0.20 + 0.04 = 0.34 \end{align*}

Therefore: \(p(x) = (0.16, 0.17, 0.11, 0.22, 0.34)\)

Marginal distribution \(p(y)\): Sum over all values of \(x\):

\begin{align*} p(y_1) &= 0.01 + 0.02 + 0.03 + 0.10 + 0.10 = 0.26\\ p(y_2) &= 0.05 + 0.10 + 0.05 + 0.07 + 0.20 = 0.47\\ p(y_3) &= 0.10 + 0.05 + 0.03 + 0.05 + 0.04 = 0.27 \end{align*}

Therefore: \(p(y) = (0.26, 0.47, 0.27)\)

Verification: \(\sum p(x) = \sum p(y) = 1.00\)

Part
[2 marks]

Compute the conditional distributions \(p(x|Y = y_1)\) and \(p(y|X = x_3)\).

Solution

Conditional distribution \(p(x|Y = y_1)\):

Using \(p(x|y) = \frac{p(x,y)}{p(y)}\), with \(p(y_1) = 0.26\):

\begin{align*} p(x_1|y_1) &= \frac{0.01}{0.26} \approx 0.0385\\ p(x_2|y_1) &= \frac{0.02}{0.26} \approx 0.0769\\ p(x_3|y_1) &= \frac{0.03}{0.26} \approx 0.1154\\ p(x_4|y_1) &= \frac{0.10}{0.26} \approx 0.3846\\ p(x_5|y_1) &= \frac{0.10}{0.26} \approx 0.3846 \end{align*}

Or exactly: \(p(x|y_1) = \left(\frac{1}{26}, \frac{2}{26}, \frac{3}{26}, \frac{10}{26}, \frac{10}{26}\right)\)

Conditional distribution \(p(y|X = x_3)\):

Using \(p(y|x) = \frac{p(x,y)}{p(x)}\), with \(p(x_3) = 0.11\):

\begin{align*} p(y_1|x_3) &= \frac{0.03}{0.11} = \frac{3}{11} \approx 0.2727\\ p(y_2|x_3) &= \frac{0.05}{0.11} = \frac{5}{11} \approx 0.4545\\ p(y_3|x_3) &= \frac{0.03}{0.11} = \frac{3}{11} \approx 0.2727 \end{align*}

Therefore: \(p(y|x_3) = \left(\frac{3}{11}, \frac{5}{11}, \frac{3}{11}\right)\)

Q9

Problem (Law of Iterated Expectations)
[3 marks]

Consider two random variables, \(x\), \(y\) with joint distribution \(p(x,y)\). Show that \[\mathbb{E}_X[x] = \mathbb{E}_Y[\mathbb{E}_X[x|y]]\] Here, \(\mathbb{E}_X[x|y]\) denotes the expected value of \(x\) under the conditional distribution \(p(x|y)\).

Solution

Discrete case:

Starting with the right-hand side:

\begin{align*} \mathbb{E}_Y[\mathbb{E}_X[x|y]] &= \sum_y p(y) \mathbb{E}_X[x|y]\\ &= \sum_y p(y) \sum_x x \cdot p(x|y)\\ &= \sum_y \sum_x x \cdot p(x|y) \cdot p(y)\\ &= \sum_y \sum_x x \cdot p(x,y) \qquad\text{(since }p(x,y) = p(x|y)p(y)\text{)}\\ &= \sum_x x \sum_y p(x,y)\\ &= \sum_x x \cdot p(x)\\ &= \mathbb{E}_X[x] \end{align*}

Continuous case:

\begin{align*} \mathbb{E}_Y[\mathbb{E}_X[x|y]] &= \int_{-\infty}^\infty \mathbb{E}_X[x|y] \cdot p(y) \, \mathrm{d}y\\ &= \int_{-\infty}^\infty \left(\int_{-\infty}^\infty x \cdot p(x|y) \, \mathrm{d}x\right) p(y) \, \mathrm{d}y\\ &= \int_{-\infty}^\infty \int_{-\infty}^\infty x \cdot p(x|y) \cdot p(y) \, \mathrm{d}x \, \mathrm{d}y\\ &= \int_{-\infty}^\infty \int_{-\infty}^\infty x \cdot p(x,y) \, \mathrm{d}x \, \mathrm{d}y\\ &= \int_{-\infty}^\infty x \left(\int_{-\infty}^\infty p(x,y) \, \mathrm{d}y\right) \mathrm{d}x\\ &= \int_{-\infty}^\infty x \cdot p(x) \, \mathrm{d}x\\ &= \mathbb{E}_X[x] \end{align*}

Therefore, \(\mathbb{E}_X[x] = \mathbb{E}_Y[\mathbb{E}_X[x|y]]\). \(\quad\square\)

Q10

Problem
[7 marks]

Last year we qualitatively described a number of Probability Distributions, this year we shall discover more results.

To begin, find the Probability Mass/Density Functions for the following distributions:

  1. Bernoulli
  2. Binomial
  3. Geometric
  4. Poisson
  5. Exponential
  6. Gamma
  7. Normal
Solution

(a) Bernoulli: \(X \sim \text{Ber}(p)\), where \(p \in [0,1]\) \[P(X = k) = p^k (1-p)^{1-k}, \quad k \in \{0, 1\}\] Equivalently: \begin{gather*}P(X=1) = p\\ P(X=0) = 1-p\end{gather*}

(b) Binomial: \(X \sim \text{Bin}(n, p)\), where \(n \in \mathbb{N}\), \(p \in [0,1]\) \[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n\]

(c) Geometric: \(X \sim \text{Geo}(p)\), where \(p \in (0,1]\) \[P(X = k) = (1-p)^{k-1} p, \quad k = 1, 2, 3, \ldots\] (Number of trials until first success)

Alternative parameterisation (number of failures before first success): \[P(X = k) = (1-p)^k p, \quad k = 0, 1, 2, \ldots\]

(d) Poisson: \(X \sim \text{Pois}(\lambda)\), where \(\lambda > 0\) \[P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots\]

(e) Exponential: \(X \sim \text{Exp}(\lambda)\), where \(\lambda > 0\) \[f(x) = \lambda e^{-\lambda x}, \quad x \geq 0\]

(f) Gamma: \(X \sim \text{Gamma}(\alpha, \beta)\), where \(\alpha, \beta > 0\) \[f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}, \quad x > 0\] where \(\Gamma(\alpha) = \int_0^\infty t^{\alpha-1} e^{-t} \, \mathrm{d}t\)

(g) Normal (Gaussian): \(X \sim \mathcal{N}(\mu, \sigma^2)\), where \(\mu \in \mathbb{R}\), \(\sigma > 0\) \[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right), \quad x \in \mathbb{R}\]

Q11

Problem
[23 marks]
Part
[2 marks]

What is the formula for the Moment Generating Function? What about the \(k\textup{th}\) (central) MGF?

Hint

\[\large\varphi_X (s) = \mathbb{E}\left( ? \right) = \int ? \]

Solution

\begin{align}\large\varphi_X (s) &= \mathbb{E}\left( e^{sX} \right)\\ &= \int^\infty_{-\infty} \mathrm{d}F_X(x)\\ \varphi_X^{(k)}(0) &= \mathbb{E}(X^k) \end{align}

Part
[21 marks]

Hence, or otherwise derive the Expectation, Variance and Moment Generating Function for all 7 distributions in

Q10. Show minimal working.

Solution

1. Bernoulli \(X \sim \text{Ber}(p)\):

  • \(\mathbb{E}[X] = p\)
  • \(\text{Var}(X) = p(1-p)\)
  • \(M_X(t) = pe^t + (1-p)\)

Derivation: \[\mathbb{E}[X] = 0 \cdot (1-p) + 1 \cdot p = p.\] \[\mathbb{E}[X^2] = p,\] so \[\text{Var}(X) = p - p^2 = p(1-p).\] \[M_X(t) = \mathbb{E}[e^{tX}] = (1-p)e^{0} + pe^t.\]

2. Binomial \(X \sim \text{Bin}(n,p)\):

  • \(\mathbb{E}[X] = np\)
  • \(\text{Var}(X) = np(1-p)\)
  • \(M_X(t) = (pe^t + (1-p))^n\)

Derivation: \[X = \sum_{i=1}^n X_i\] where \[X_i \sim \text{Ber}(p)\] are independent. By linearity: \[\mathbb{E}[X] = \sum \mathbb{E}[X_i] = np,\] \[\text{Var}(X) = \sum \text{Var}(X_i) = np(1-p).\] For MGF: \[M_X(t) = \prod_{i=1}^n M_{X_i}(t) = (pe^t + 1-p)^n.\]

3. Geometric \(X \sim \text{Geo}(p)\) (trials until first success):

  • \(\mathbb{E}[X] = \frac{1}{p}\)
  • \(\mathrm{Var}(X) = \frac{1-p}{p^2}\)
  • \(M_X(t) = \dfrac{pe^t}{1-(1-p)e^t}\) for \(t < -\ln(1-p)\)

Derivation: For this parameterisation, \[\mathbb{P}(X=k)=(1-p)^{k-1}p,\qquad k=1,2,3,\dots\] Let \(q:=1-p\).

MGF.

\begin{align*} M_X(t) &= \mathbb{E}[e^{tX}] = \sum_{k=1}^\infty e^{tk}\,q^{k-1}p = pe^t\sum_{k=1}^\infty (qe^t)^{k-1}\\ &= pe^t\cdot \frac{1}{1-qe^t} = \frac{pe^t}{1-(1-p)e^t}, \end{align*}

which converges when \(|qe^t|<1\), i.e. \(t<-\ln(1-p)\).

Mean and variance. Using \(M_X’(0)=\mathbb{E}[X]\) and \(M_X’’(0)=\mathbb{E}[X^2]\): \[M_X(t)=\frac{pe^t}{1-qe^t}.\] Differentiate: \[M_X’(t)=\frac{pe^t}{(1-qe^t)^2},\qquad M_X’’(t)=\frac{pe^t(1+qe^t)}{(1-qe^t)^3}.\] Evaluate at \(t=0\) (noting \(1-q=p\)): \[\mathbb{E}[X]=M_X’(0)=\frac{p}{(1-q)^2}=\frac{p}{p^2}=\frac{1}{p},\] \[\mathbb{E}[X^2]=M_X’’(0)=\frac{p(1+q)}{(1-q)^3} =\frac{p(2-p)}{p^3}=\frac{2-p}{p^2}.\] Hence \[\mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2 =\frac{2-p}{p^2}-\frac{1}{p^2} =\frac{1-p}{p^2}.\]

4. Poisson \(X \sim \text{Pois}(\lambda)\):

  • \(\mathbb{E}[X] = \lambda\)
  • \(\text{Var}(X) = \lambda\)
  • \(M_X(t) = e^{\lambda(e^t - 1)}\)

Derivation:

\begin{align*} \mathbb{E}[X] &= \sum_{k=0}^\infty k \frac{\lambda^k e^{-\lambda}}{k!} = \lambda e^{-\lambda} \sum_{k=1}^\infty \frac{\lambda^{k-1}}{(k-1)!} = \lambda\\ M_X(t) &= \sum_{k=0}^\infty e^{tk} \frac{\lambda^k e^{-\lambda}}{k!} = e^{-\lambda} \sum_{k=0}^\infty \frac{(\lambda e^t)^k}{k!} = e^{\lambda(e^t-1)} \end{align*}

For variance: \(\mathbb{E}[X^2] = \mathbb{E}[X(X-1)] + \mathbb{E}[X] = \lambda^2 + \lambda\), so \(\text{Var}(X) = \lambda^2 + \lambda - \lambda^2 = \lambda\).

5. Exponential \(X \sim \text{Exp}(\lambda)\):

  • \(\mathbb{E}[X] = \frac{1}{\lambda}\)
  • \(\mathrm{Var}(X) = \frac{1}{\lambda^2}\)
  • \(M_X(t) = \dfrac{\lambda}{\lambda - t}\) for \(t < \lambda\)

Derivation: For \(X\sim\mathrm{Exp}(\lambda)\) the pdf is \(f(x)=\lambda e^{-\lambda x}\) for \(x\ge 0\).

Mean.

\begin{align*} \mathbb{E}[X] &= \int_0^\infty x\,\lambda e^{-\lambda x}\,\mathrm{d}x \quad\text{(integration by parts: }u=x,\ \mathrm{d}v=\lambda e^{-\lambda x}\mathrm{d}x\text{)}\\ &= \left[-x e^{-\lambda x}\right]_{0}^{\infty} + \int_0^\infty e^{-\lambda x}\,\mathrm{d}x = 0 + \frac{1}{\lambda}. \end{align*}

MGF.

\begin{align*} M_X(t) &= \mathbb{E}[e^{tX}] = \int_0^\infty e^{tx}\,\lambda e^{-\lambda x}\,\mathrm{d}x = \lambda \int_0^\infty e^{-(\lambda-t)x}\,\mathrm{d}x\\ &= \lambda \left[\frac{1}{\lambda-t}\right] = \frac{\lambda}{\lambda-t}, \qquad t<\lambda. \end{align*}

Variance. Using \(M_X’(0)=\mathbb{E}[X]\) and \(M_X’’(0)=\mathbb{E}[X^2]\): \[M_X(t)=\frac{\lambda}{\lambda-t}=(1-\tfrac{t}{\lambda})^{-1}, \quad M_X’(t)=\frac{\lambda}{(\lambda-t)^2}, \quad M_X’’(t)=\frac{2\lambda}{(\lambda-t)^3}.\] Hence \[\mathbb{E}[X^2]=M_X’’(0)=\frac{2\lambda}{\lambda^3}=\frac{2}{\lambda^2},\] and therefore \[\mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2 =\frac{2}{\lambda^2}-\left(\frac{1}{\lambda}\right)^2 =\frac{1}{\lambda^2}.\]

6. Gamma \(X \sim \text{Gamma}(\alpha, \beta)\):

  • \(\mathbb{E}[X] = \frac{\alpha}{\beta}\)
  • \(\mathrm{Var}(X) = \frac{\alpha}{\beta^2}\)
  • \(M_X(t) = \left(\frac{\beta}{\beta-t}\right)^\alpha\) for \(t < \beta\)

Derivation: (Shape \(\alpha>0\), rate \(\beta>0\).) The pdf is \[f(x)=\frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x},\qquad x>0.\] Use the Gamma-function identity \(\Gamma(\alpha+1)=\alpha\Gamma(\alpha)\).

MGF.

\begin{align*} M_X(t) &=\mathbb{E}[e^{tX}] =\int_0^\infty e^{tx}\frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}\,\mathrm{d}x\\ &=\frac{\beta^\alpha}{\Gamma(\alpha)}\int_0^\infty x^{\alpha-1}e^{-(\beta-t)x}\,\mathrm{d}x =\frac{\beta^\alpha}{\Gamma(\alpha)}\cdot \frac{\Gamma(\alpha)}{(\beta-t)^\alpha}\\ &=\left(\frac{\beta}{\beta-t}\right)^\alpha,\qquad t<\beta. \end{align*}

Mean and variance. Using \(M_X’(0)=\mathbb{E}[X]\) and \(M_X’’(0)=\mathbb{E}[X^2]\): \[M_X(t)=\beta^\alpha(\beta-t)^{-\alpha}.\] Differentiate: \[M_X’(t)=\alpha\,\beta^\alpha(\beta-t)^{-\alpha-1},\qquad M_X’’(t)=\alpha(\alpha+1)\,\beta^\alpha(\beta-t)^{-\alpha-2}.\] Evaluate at \(t=0\): \[\mathbb{E}[X]=M_X’(0)=\alpha\,\beta^\alpha\beta^{-\alpha-1}=\frac{\alpha}{\beta},\] \[\mathbb{E}[X^2]=M_X’’(0)=\alpha(\alpha+1)\,\beta^\alpha\beta^{-\alpha-2} =\frac{\alpha(\alpha+1)}{\beta^2}.\] Hence \[\mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2 =\frac{\alpha(\alpha+1)}{\beta^2}-\left(\frac{\alpha}{\beta}\right)^2 =\frac{\alpha}{\beta^2}.\]

7. Normal \(X \sim \mathcal{N}(\mu, \sigma^2)\):

  • \(\mathbb{E}[X] = \mu\)
  • \(\mathrm{Var}(X) = \sigma^2\)
  • \(M_X(t) = e^{\mu t + \frac{1}{2}\sigma^2 t^2}\)

Derivation: Let \(Z\sim\mathcal{N}(0,1)\) and write \(X=\mu+\sigma Z\).

MGF of \(Z\). Complete the square: \[tz-\frac{z^2}{2} =-\frac{1}{2}\left(z^2-2tz\right) =-\frac{1}{2}(z-t)^2+\frac{t^2}{2}.\] Then

\begin{align*} M_Z(t) &=\mathbb{E}[e^{tZ}] =\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty} e^{tz}e^{-z^2/2}\,\mathrm{d}z\\ &=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty} \exp\!\left(-\frac{(z-t)^2}{2}\right)\mathrm{d}z\; \exp\!\left(\frac{t^2}{2}\right) = e^{t^2/2}. \end{align*}

MGF of \(X\). \[M_X(t)=\mathbb{E}\!\left[e^{t(\mu+\sigma Z)}\right] =e^{\mu t}\, \mathbb{E}[e^{(\sigma t)Z}] =e^{\mu t}M_Z(\sigma t) = e^{\mu t+\frac{1}{2}\sigma^2 t^2}.\]

Mean and variance. From \(M_X’(0)=\mathbb{E}[X]\) and \(M_X’’(0)=\mathbb{E}[X^2]\): \[M_X’(t)=(\mu+\sigma^2 t)M_X(t)\ \Rightarrow\ \mathbb{E}[X]=M_X’(0)=\mu,\] \[M_X’’(t)=\big(\sigma^2+(\mu+\sigma^2 t)^2\big)M_X(t)\ \Rightarrow\ \mathbb{E}[X^2]=M_X’’(0)=\sigma^2+\mu^2.\] Thus \[\mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2=(\sigma^2+\mu^2)-\mu^2=\sigma^2.\]

Q12

Problem
[4 marks]

Shortest distance between an arbitrary point \(\mathbf{x}_0 \in \mathbb{R}^n\) and a hyperplane given by \(\mathbf{w}^T \mathbf{x} + b = 0\)

Solution

Let \(\mathbf{x}_0 \in \mathbb{R}^n\) be an arbitrary point, and consider the hyperplane \(H = \{\mathbf{x} : \mathbf{w}^T\mathbf{x} + b = 0\}\) where \(\mathbf{w} \neq \mathbf{0}\).

The shortest distance from \(\mathbf{x}_0\) to \(H\) is the perpendicular distance.

Derivation:

  1. The normal vector to the hyperplane is \(\mathbf{w}\).

  2. Any point on the hyperplane can be written as \(\mathbf{x}_H\) where \(\mathbf{w}^T\mathbf{x}_H + b = 0\).

  3. The projection of \(\mathbf{x}_0\) onto the hyperplane is found by moving in the direction of \(-\mathbf{w}\) (toward the hyperplane): \[ \mathbf{x}_{\text{proj}} = \mathbf{x}_0 - \alpha \mathbf{w} \] where \(\alpha\) is chosen so that \(\mathbf{x}_{\text{proj}} \in H\): \[ \mathbf{w}^T(\mathbf{x}_0 - \alpha \mathbf{w}) + b = 0 \implies \mathbf{w}^T\mathbf{x}_0 - \alpha \|\mathbf{w}\|^2 + b = 0 \] \[ \alpha = \frac{\mathbf{w}^T\mathbf{x}_0 + b}{\|\mathbf{w}\|^2} \]

  4. The distance is: \[ d(\mathbf{x}_0, H) = \|\mathbf{x}_0 - \mathbf{x}_{\text{proj}}\| = \|\alpha \mathbf{w}\| = |\alpha| \|\mathbf{w}\| = \frac{|\mathbf{w}^T\mathbf{x}_0 + b|}{\|\mathbf{w}\|} \]

Answer: \[\boxed{d(\mathbf{x}_0, H) = \frac{|\mathbf{w}^T\mathbf{x}_0 + b|}{\|\mathbf{w}\|}}\]

Q13

Problem
[4 marks]

Consider minimising the strictly convex quadratic function

\[q(\mathbf{x}) = \frac{1}{2}\mathbf{x}^T\mathbf{G}\mathbf{x} + \mathbf{d}^T\mathbf{x} + c\]

on \(\mathbb{R}^n\), where \(n > 1\), \(\mathbf{d}\) is a constant \(n \times 1\) vector, \(\mathbf{G}\) is a constant \(n \times n\) symmetric positive definite matrix and \(c\) is a scalar. Let \(\mathbf{x}^* \in \mathbb{R}^n\) be the global minimiser for \(q(\mathbf{x})\), \(\mathbf{x}^{(1)} \in \mathbb{R}^n\) and \(\mathbf{x}^{(1)} \neq \mathbf{x}^*\).

Consider applying Newton’s method to \(q(\mathbf{x})\) starting at \(\mathbf{x}^{(1)}\)

Part
[2 marks]

Write down the Newton direction at \(\mathbf{x}^{(1)}\) and show that it is a descent direction.

Solution

For the quadratic \(q(\mathbf{x}) = \frac{1}{2}\mathbf{x}^T\mathbf{G}\mathbf{x} + \mathbf{d}^T\mathbf{x} + c\):

Gradient: \(\nabla q(\mathbf{x}) = \mathbf{G}\mathbf{x} + \mathbf{d}\)

Hessian: \(\nabla^2 q(\mathbf{x}) = \mathbf{G}\)

The Newton direction at \(\mathbf{x}^{(1)}\) is: \[\mathbf{p}^{(1)} = -[\nabla^2 q(\mathbf{x}^{(1)})]^{-1} \nabla q(\mathbf{x}^{(1)}) = -\mathbf{G}^{-1}(\mathbf{G}\mathbf{x}^{(1)} + \mathbf{d})\]

Showing it’s a descent direction:

A direction \(\mathbf{p}\) is a descent direction if \(\nabla q(\mathbf{x}^{(1)})^T \mathbf{p} < 0\).

\begin{align*} \nabla q(\mathbf{x}^{(1)})^T \mathbf{p}^{(1)} &= (\mathbf{G}\mathbf{x}^{(1)} + \mathbf{d})^T \left[-\mathbf{G}^{-1}(\mathbf{G}\mathbf{x}^{(1)} + \mathbf{d})\right]\\ &= -(\mathbf{G}\mathbf{x}^{(1)} + \mathbf{d})^T \mathbf{G}^{-1} (\mathbf{G}\mathbf{x}^{(1)} + \mathbf{d})\\ &= -\|\nabla q(\mathbf{x}^{(1)})\|_{\mathbf{G}^{-1}}^2 \end{align*}

Since \(\mathbf{G}\) is positive definite, \(\mathbf{G}^{-1}\) is also positive definite. Therefore, the quadratic form \(\mathbf{v}^T\mathbf{G}^{-1}\mathbf{v} > 0\) for all \(\mathbf{v} \neq \mathbf{0}\).

Since \(\mathbf{x}^{(1)} \neq \mathbf{x}^*\), we have \(\nabla q(\mathbf{x}^{(1)}) \neq \mathbf{0}\) (as \(\mathbf{x}^*\) is the unique stationary point).

Thus: \(\nabla q(\mathbf{x}^{(1)})^T \mathbf{p}^{(1)} < 0\), confirming \(\mathbf{p}^{(1)}\) is a descent direction.

Part
[2 marks]

How many iteration(s) will Newton’s method take to reach the minimiser \(\mathbf{x}^*\) of \(q(\mathbf{x})\). Give reasons for your answer.

Solution

Exactly 1 iteration.

Newton’s method update is: \[\mathbf{x}^{(2)} = \mathbf{x}^{(1)} + \mathbf{p}^{(1)} = \mathbf{x}^{(1)} - \mathbf{G}^{-1}(\mathbf{G}\mathbf{x}^{(1)} + \mathbf{d})\]

The minimiser \(\mathbf{x}^*\) satisfies the first-order optimality condition: \[\nabla q(\mathbf{x}^*) = \mathbf{0} \implies \mathbf{G}\mathbf{x}^* + \mathbf{d} = \mathbf{0} \implies \mathbf{x}^* = -\mathbf{G}^{-1}\mathbf{d}\]

Substituting into the Newton update:

\begin{align*} \mathbf{x}^{(2)} &= \mathbf{x}^{(1)} - \mathbf{G}^{-1}\mathbf{G}\mathbf{x}^{(1)} - \mathbf{G}^{-1}\mathbf{d}\\ &= \mathbf{x}^{(1)} - \mathbf{x}^{(1)} - \mathbf{G}^{-1}\mathbf{d}\\ &= -\mathbf{G}^{-1}\mathbf{d}\\ &= \mathbf{x}^* \end{align*}

Therefore, Newton’s method reaches the minimiser in exactly 1 iteration for any quadratic function, regardless of the starting point \(\mathbf{x}^{(1)}\).

This is because the Hessian of a quadratic is constant, and Newton’s method is exact for quadratic functions—it finds the point where the quadratic approximation (which is the function itself) is minimised.

Q14

Problem
[15 marks]

Consider the following inequality constrained optimisation problem

\begin{align} \text{(IP)} \quad \underset{\mathbf{x}\in\mathbb{R}^n}{\text{minimise}} \quad & \mathbf{a}^T\mathbf{x}\notag \\ \text{subject to} \quad & \mathbf{x}^TQ\mathbf{x} - 1 \leq 0, \label{eq:ip} \end{align}

where \(n\ge 1\), \(Q\) is a symmetric and positive definite \(n\times n\) matrix, \(\mathbf{a}\in\mathbb{R}^n, \mathbf{a} \neq \mathbf{0}\) and the constraint function \(\mathbf{c}(\mathbf{x}) = \mathbf{x}^TQ\mathbf{x} -1\).

Part
[2 marks]

Write down the gradient \(\nabla c(\mathbf{x})\) and the Hessian \(\nabla^2 c(\mathbf{x})\).

Solution

For \(c(\mathbf{x}) = \mathbf{x}^TQ\mathbf{x} - 1\):

Gradient: \[\nabla c(\mathbf{x}) = 2Q\mathbf{x}\]

(Since \(Q\) is symmetric: \(\nabla(\mathbf{x}^TQ\mathbf{x}) = (Q + Q^T)\mathbf{x} = 2Q\mathbf{x}\))

Hessian: \[\nabla^2 c(\mathbf{x}) = 2Q\]

Part
[2 marks]

Show that \(c(\mathbf{x})\) is a convex function.

Solution

A twice-differentiable function \(c: \mathbb{R}^n \to \mathbb{R}\) is convex if and only if its Hessian is positive semidefinite everywhere.

From part (a): \(\nabla^2 c(\mathbf{x}) = 2Q\)

Since \(Q\) is given to be positive definite, we have for all \(\mathbf{v} \neq \mathbf{0}\): \[\mathbf{v}^T Q \mathbf{v} > 0\]

Therefore: \[\mathbf{v}^T (\nabla^2 c(\mathbf{x})) \mathbf{v} = 2\mathbf{v}^T Q \mathbf{v} > 0 \quad \forall \mathbf{v} \neq \mathbf{0}\]

Thus \(\nabla^2 c(\mathbf{x})\) is positive definite (hence positive semidefinite), which implies \(c(\mathbf{x})\) is strictly convex.

Part
[2 marks]

Show that the problem (IP) is a convex optimisation problem.

Solution

A problem is a convex optimisation problem if:

  1. The objective function is convex
  2. The feasible region is convex (i.e., all inequality constraints \(g_i(\mathbf{x}) \leq 0\) have convex \(g_i\))

1. Objective function: \(f(\mathbf{x}) = \mathbf{a}^T\mathbf{x}\) is linear, hence convex.

2. Constraint: \(c(\mathbf{x}) = \mathbf{x}^TQ\mathbf{x} - 1 \leq 0\)

From part (b), \(c(\mathbf{x})\) is (strictly) convex.

The feasible region is \(\Omega = \{\mathbf{x} : \mathbf{x}^TQ\mathbf{x} \leq 1\}\), which is a convex set (it’s a sublevel set of a convex function).

Alternatively: for \(\mathbf{x}_1, \mathbf{x}_2 \in \Omega\) and \(\lambda \in [0,1]\):

\begin{align*} c(\lambda \mathbf{x}_1 + (1-\lambda)\mathbf{x}_2) &\leq \lambda c(\mathbf{x}_1) + (1-\lambda) c(\mathbf{x}_2) \quad\text{(by convexity of }c\text{)}\\ &\leq \lambda \cdot 0 + (1-\lambda) \cdot 0 = 0 \end{align*}

So \(\lambda \mathbf{x}_1 + (1-\lambda)\mathbf{x}_2 \in \Omega\), confirming \(\Omega\) is convex.

Therefore, (IP) is a convex optimisation problem.

Part
[3 marks]

Show that \(\large \mathbf{x}^* = \frac{-Q^{-1}\mathbf{a}}{\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}}\) is a constrained stationary point for the problem (IP).

Solution

A point \(\mathbf{x}^*\) is a constrained stationary point if it satisfies the KKT conditions:

  1. Feasibility: \(c(\mathbf{x}^*) \leq 0\)
  2. Stationarity: \(\nabla f(\mathbf{x}^*) + \mu \nabla c(\mathbf{x}^*) = \mathbf{0}\) for some \(\mu\)
  3. Complementarity: \(\mu c(\mathbf{x}^*) = 0\)
  4. Dual feasibility: \(\mu \geq 0\)

Let’s verify:

1. Feasibility:

\begin{align*} c(\mathbf{x}^*) &= (\mathbf{x}^*)^T Q \mathbf{x}^* - 1\\ &= \frac{(-Q^{-1}\mathbf{a})^T Q (-Q^{-1}\mathbf{a})}{\mathbf{a}^TQ^{-1}\mathbf{a}} - 1\\ &= \frac{\mathbf{a}^T Q^{-1} Q Q^{-1} \mathbf{a}}{\mathbf{a}^TQ^{-1}\mathbf{a}} - 1\\ &= \frac{\mathbf{a}^T Q^{-1} \mathbf{a}}{\mathbf{a}^TQ^{-1}\mathbf{a}} - 1 = 0 \end{align*}

So \(\mathbf{x}^*\) lies on the boundary of the constraint (constraint is active).

2. Stationarity: We need \(\mathbf{a} + \mu \cdot 2Q\mathbf{x}^* = \mathbf{0}\) for some \(\mu > 0\).

\begin{align*} \mathbf{a} + 2\mu Q\mathbf{x}^* &= \mathbf{a} + 2\mu Q \cdot \frac{-Q^{-1}\mathbf{a}}{\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}}\\ &= \mathbf{a} - \frac{2\mu \mathbf{a}}{\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}} \end{align*}

Setting this to \(\mathbf{0}\): \[\mathbf{a}\left(1 - \frac{2\mu}{\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}}\right) = \mathbf{0}\]

Since \(\mathbf{a} \neq \mathbf{0}\), we need: \[\mu = \frac{\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}}{2} > 0\]

3 & 4: With \(\mu > 0\) and \(c(\mathbf{x}^*) = 0\), complementarity and dual feasibility are satisfied.

Therefore, \(\mathbf{x}^*\) is a constrained stationary point.

Part
[2 marks]

Determine whether \(\large \mathbf{x}^* = \frac{-Q^{-1}\mathbf{a}}{\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}}\) is a local minimiser, global minimiser or neither for the problem (IP).

Solution

Answer: \(\mathbf{x}^*\) is a global minimiser.

Proof:

Since (IP) is a convex optimisation problem (from part c) with:

  • Convex objective function \(f(\mathbf{x}) = \mathbf{a}^T\mathbf{x}\)
  • Convex feasible region \(\Omega\)

Any local minimiser is also a global minimiser for convex problems.

We showed in part (d) that \(\mathbf{x}^*\) satisfies the KKT conditions. For convex problems, the KKT conditions are sufficient for global optimality.

Alternative geometric argument:

The problem minimises a linear function over an ellipsoid. The minimum occurs where the level set \(\{\mathbf{x} : \mathbf{a}^T\mathbf{x} = t\}\) (a hyperplane orthogonal to \(\mathbf{a}\)) is tangent to the ellipsoid \(\{\mathbf{x} : \mathbf{x}^TQ\mathbf{x} \leq 1\}\).

This tangency occurs at: \[\mathbf{x}^* = \frac{-Q^{-1}\mathbf{a}}{\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}}\]

The direction \(-Q^{-1}\mathbf{a}\) points from the origin toward decreasing values of \(\mathbf{a}^T\mathbf{x}\), and we scale to hit the boundary of the ellipsoid.

Verification: The objective value at \(\mathbf{x}^*\) is: \[f(\mathbf{x}^*) = \mathbf{a}^T\mathbf{x}^* = -\frac{\mathbf{a}^TQ^{-1}\mathbf{a}}{\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}} = -\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}\]

This is the minimum value over the feasible region.

Therefore, \(\mathbf{x}^*\) is a global minimiser.

Part
[2 marks]

Write down the Wolfe dual problem for (IP).

Solution

The Wolfe dual for the problem

\begin{align*} \min_{\mathbf{x}} \quad & \mathbf{a}^T\mathbf{x}\\ \text{s.t.} \quad & \mathbf{x}^TQ\mathbf{x} - 1 \leq 0 \end{align*}

is constructed from the Lagrangian: \[L(\mathbf{x}, \mu) = \mathbf{a}^T\mathbf{x} + \mu(\mathbf{x}^TQ\mathbf{x} - 1)\]

The Wolfe dual is:

\begin{align*} \text{(WD)} \quad \max_{\mathbf{x}, \mu} \quad & L(\mathbf{x}, \mu) = \mathbf{a}^T\mathbf{x} + \mu(\mathbf{x}^TQ\mathbf{x} - 1)\\ \text{s.t.} \quad & \nabla_{\mathbf{x}} L(\mathbf{x}, \mu) = \mathbf{a} + 2\mu Q\mathbf{x} = \mathbf{0}\\ & \mu \geq 0 \end{align*}

Equivalently, eliminating \(\mathbf{x}\) using the stationarity condition \(\mathbf{x} = -\frac{1}{2\mu}Q^{-1}\mathbf{a}\):

\begin{align*} \text{(WD)} \quad \max_{\mu \geq 0} \quad & -\frac{\mathbf{a}^TQ^{-1}\mathbf{a}}{4\mu} - \mu \end{align*}

Part
[2 marks]

What is the optimal objective value of the Wolfe dual problem in part f)? Give reasons for your answer. You do not need to solve the Wolfe dual problem.

Solution

The optimal objective value of the Wolfe dual is \(-\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}\).

For convex optimisation problems with differentiable objective and constraints, strong duality holds under Slater’s condition (which is satisfied here since the constraint is strictly feasible at the origin if \(0 < 1\), or at any interior point).

By strong duality: \[\text{Primal optimal value} = \text{Dual optimal value}\]

From part (e), we found that the primal optimal value is: \[f(\mathbf{x}^*) = \mathbf{a}^T\mathbf{x}^* = -\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}\]

Therefore, the Wolfe dual optimal value is also: \[\boxed{-\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}}\]

Verification: At the optimal \(\mu^* = \frac{\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}}}{2}\) from part (d), the dual objective is:

\begin{align*} L(\mathbf{x}^*, \mu^*) &= \mathbf{a}^T\mathbf{x}^* + \mu^* \cdot 0\\ &= -\sqrt{\mathbf{a}^TQ^{-1}\mathbf{a}} \end{align*}

confirming strong duality with zero duality gap.

Q15

Problem
[15 marks]

Consider the following equality constrained optimisation problem

\begin{align*} \text{(EP)} \quad \min_{\mathbf{x}\in\mathbb{R}^n} \quad & \mathbf{a}^T\mathbf{x} \\ \text{s.t.} \quad & \mathbf{x}^T\mathbf{x} - 1 = 0, \end{align*}

where \(n \geq 1\), \(\mathbf{a} \in \mathbb{R}^n\), \(\mathbf{a} \neq \mathbf{0}\) and \(\|\mathbf{a}\|_2 = \sqrt{\mathbf{a}^T\mathbf{a}}\).

Let \(\mathbf{z}^* = \frac{-\mathbf{a}}{\|\mathbf{a}\|_2}\).

Part
[2 marks]

Show that global minima of (EP) must exist.

Solution

We use the Extreme Value Theorem: A continuous function on a compact set attains its minimum.

1. Objective is continuous: \(f(\mathbf{x}) = \mathbf{a}^T\mathbf{x}\) is linear, hence continuous.

2. Feasible region is compact: The constraint set is \(\Omega = \{\mathbf{x} \in \mathbb{R}^n : \mathbf{x}^T\mathbf{x} = 1\}\), which is the unit sphere \(S^{n-1}\).

The unit sphere is:

  • Closed: It’s the preimage of the closed set \(\{1\}\) under the continuous function \(\mathbf{x} \mapsto \|\mathbf{x}\|^2\)
  • Bounded: \(\|\mathbf{x}\| = 1\) for all \(\mathbf{x} \in \Omega\)

By the Heine-Borel theorem, a subset of \(\mathbb{R}^n\) is compact if and only if it is closed and bounded. Therefore \(\Omega\) is compact.

Since \(f\) is continuous and \(\Omega\) is compact and non-empty, by the Extreme Value Theorem, \(f\) attains its minimum on \(\Omega\).

Therefore, global minima of (EP) must exist.

Part
[2 marks]

Show that \(\mathbf{z}^*\) is a regular feasible point for (EP).

Solution

A point is regular (satisfies LICQ - Linear Independence Constraint Qualification) if the gradients of the active constraints are linearly independent.

For (EP), the constraint is \(h(\mathbf{x}) = \mathbf{x}^T\mathbf{x} - 1 = 0\).

Gradient: \[\nabla h(\mathbf{x}) = 2\mathbf{x}\]

At \(\mathbf{z}^* = \frac{-\mathbf{a}}{\|\mathbf{a}\|_2}\): \[\nabla h(\mathbf{z}^*) = 2\mathbf{z}^* = \frac{-2\mathbf{a}}{\|\mathbf{a}\|_2}\]

Since \(\mathbf{a} \neq \mathbf{0}\), we have \(\nabla h(\mathbf{z}^*) \neq \mathbf{0}\).

A single non-zero vector is linearly independent, so the LICQ condition is satisfied.

Verification of feasibility: \[(\mathbf{z}^*)^T\mathbf{z}^* = \frac{\mathbf{a}^T\mathbf{a}}{\|\mathbf{a}\|_2^2} = \frac{\|\mathbf{a}\|_2^2}{\|\mathbf{a}\|_2^2} = 1\]

Therefore, \(\mathbf{z}^*\) is a regular feasible point for (EP).

Part
[2 marks]

Show rigorously that the feasible region \(\Omega = \{\mathbf{x} \in \mathbb{R}^n : \mathbf{x}^T\mathbf{x} - 1 = 0\}\) of (EP) is not a convex set.

Solution

Proof by counterexample:

Consider \(n = 2\). Let \(\mathbf{x}_1 = \begin{bmatrix}1\\0\end{bmatrix}\) and \(\mathbf{x}_2 = \begin{bmatrix}-1\\0\end{bmatrix}\).

Both points are in \(\Omega\): \[\mathbf{x}_1^T\mathbf{x}_1 = 1, \quad \mathbf{x}_2^T\mathbf{x}_2 = 1\]

For a convex set, the line segment between any two points must lie entirely in the set. Consider the midpoint with \(\lambda = \frac{1}{2}\): \[\mathbf{x}_{\text{mid}} = \frac{1}{2}\mathbf{x}_1 + \frac{1}{2}\mathbf{x}_2 = \frac{1}{2}\begin{bmatrix}1\\0\end{bmatrix} + \frac{1}{2}\begin{bmatrix}-1\\0\end{bmatrix} = \begin{bmatrix}0\\0\end{bmatrix}\]

Check if \(\mathbf{x}_{\text{mid}} \in \Omega\): \[\mathbf{x}_{\text{mid}}^T\mathbf{x}_{\text{mid}} = 0 \neq 1\]

Therefore \(\mathbf{x}_{\text{mid}} \notin \Omega\).

Since we found \(\mathbf{x}_1, \mathbf{x}_2 \in \Omega\) but their convex combination \(\frac{1}{2}\mathbf{x}_1 + \frac{1}{2}\mathbf{x}_2 \notin \Omega\), the set \(\Omega\) is not convex.

General argument: For any \(n \geq 1\), the unit sphere \(S^{n-1}\) is not convex because it does not contain the line segments between antipodal points (or any pair of distinct points on the sphere).

Part
[3 marks]

Verify that \(\mathbf{z}^*\) is a constrained stationary point for (EP).

Solution

For an equality-constrained problem, a point \(\mathbf{z}^*\) is a constrained stationary point if there exists a Lagrange multiplier \(\lambda\) such that:

Lagrange condition: \[\nabla f(\mathbf{z}^*) + \lambda \nabla h(\mathbf{z}^*) = \mathbf{0}\]

where \(f(\mathbf{x}) = \mathbf{a}^T\mathbf{x}\) and \(h(\mathbf{x}) = \mathbf{x}^T\mathbf{x} - 1\).

We have:

  • \(\nabla f(\mathbf{x}) = \mathbf{a}\)
  • \(\nabla h(\mathbf{x}) = 2\mathbf{x}\)

At \(\mathbf{z}^* = \frac{-\mathbf{a}}{\|\mathbf{a}\|_2}\):

\[\nabla h(\mathbf{z}^*) = 2\mathbf{z}^* = \frac{-2\mathbf{a}}{\|\mathbf{a}\|_2}\]

The Lagrange condition becomes: \[\mathbf{a} + \lambda \cdot \frac{-2\mathbf{a}}{\|\mathbf{a}\|_2} = \mathbf{0}\]

\[\mathbf{a}\left(1 - \frac{2\lambda}{\|\mathbf{a}\|_2}\right) = \mathbf{0}\]

Since \(\mathbf{a} \neq \mathbf{0}\), we need: \[1 - \frac{2\lambda}{\|\mathbf{a}\|_2} = 0 \implies \lambda = \frac{\|\mathbf{a}\|_2}{2}\]

With this value of \(\lambda\), the Lagrange condition is satisfied.

Therefore, \(\mathbf{z}^*\) is a constrained stationary point with Lagrange multiplier \(\lambda^* = \frac{\|\mathbf{a}\|_2}{2}\).

Part
[3 marks]

Using the second-order sufficient optimality conditions, show that \(\mathbf{z}^*\) is a strict local minimiser for (EP).

Solution

The second-order sufficient condition states: If \(\mathbf{z}^*\) is a regular stationary point and \[\mathbf{d}^T \nabla^2_{xx} \mathcal{L}(\mathbf{z}^*, \lambda^*) \mathbf{d} > 0\] for all \(\mathbf{d} \neq \mathbf{0}\) satisfying \(\nabla h(\mathbf{z}^*)^T \mathbf{d} = 0\) (tangent to the constraint), then \(\mathbf{z}^*\) is a strict local minimiser.

The Lagrangian is: \[\mathcal{L}(\mathbf{x}, \lambda) = \mathbf{a}^T\mathbf{x} + \lambda(\mathbf{x}^T\mathbf{x} - 1)\]

The Hessian with respect to \(\mathbf{x}\): \[\nabla^2_{xx} \mathcal{L}(\mathbf{x}, \lambda) = 2\lambda I\]

At \(\mathbf{z}^*\) with \(\lambda^* = \frac{\|\mathbf{a}\|_2}{2} > 0\) (since \(\mathbf{a} \neq \mathbf{0}\)): \[\nabla^2_{xx} \mathcal{L}(\mathbf{z}^*, \lambda^*) = 2 \cdot \frac{\|\mathbf{a}\|_2}{2} I = \|\mathbf{a}\|_2 I\]

The tangent space at \(\mathbf{z}^*\) is: \[T_{\mathbf{z}^*} = \{\mathbf{d} : \nabla h(\mathbf{z}^*)^T \mathbf{d} = 0\} = \{\mathbf{d} : 2(\mathbf{z}^*)^T \mathbf{d} = 0\} = \{\mathbf{d} : \mathbf{d} \perp \mathbf{z}^*\}\]

For any \(\mathbf{d} \in T_{\mathbf{z}^*}\) with \(\mathbf{d} \neq \mathbf{0}\): \[\mathbf{d}^T \nabla^2_{xx} \mathcal{L}(\mathbf{z}^*, \lambda^*) \mathbf{d} = \mathbf{d}^T (\|\mathbf{a}\|_2 I) \mathbf{d} = \|\mathbf{a}\|_2 \|\mathbf{d}\|^2 > 0\]

Since the second-order sufficient condition is satisfied, \(\mathbf{z}^*\) is a strict local minimiser for (EP).

Part
[3 marks]

Show that \(\mathbf{z}^*\) is a global minimiser for (EP).

Solution

We’ll show that \(f(\mathbf{z}^*) \leq f(\mathbf{x})\) for all feasible \(\mathbf{x}\).

For any \(\mathbf{x} \in \Omega\) (i.e., \(\|\mathbf{x}\| = 1\)), by the Cauchy-Schwarz inequality: \[\mathbf{a}^T\mathbf{x} \geq -|\mathbf{a}^T\mathbf{x}| \geq -\|\mathbf{a}\|_2 \|\mathbf{x}\|_2 = -\|\mathbf{a}\|_2\]

Equality holds when \(\mathbf{x}\) and \(\mathbf{a}\) point in opposite directions: \[\mathbf{x} = \frac{-\mathbf{a}}{\|\mathbf{a}\|_2} = \mathbf{z}^*\]

At \(\mathbf{z}^*\): \[f(\mathbf{z}^*) = \mathbf{a}^T\mathbf{z}^* = \mathbf{a}^T \cdot \frac{-\mathbf{a}}{\|\mathbf{a}\|_2} = -\frac{\|\mathbf{a}\|_2^2}{\|\mathbf{a}\|_2} = -\|\mathbf{a}\|_2\]

Therefore: \[f(\mathbf{z}^*) = -\|\mathbf{a}\|_2 \leq \mathbf{a}^T\mathbf{x} = f(\mathbf{x}) \quad \forall \mathbf{x} \in \Omega\]

This shows that \(\mathbf{z}^*\) achieves the minimum value of the objective function over the feasible region.

Therefore, \(\mathbf{z}^*\) is a global minimiser for (EP).

Note: The maximum is achieved at \(-\mathbf{z}^* = \frac{\mathbf{a}}{\|\mathbf{a}\|_2}\) with value \(\|\mathbf{a}\|_2\).

Q16

Problem
[6 marks]
Part
[2 marks]

Prove that

Proposition

\[|a| \leq b \iff -b \leq a \leq b\]

Proof
Solution

\((\Rightarrow)\) Assume \(|a| \leq b\).

By definition of absolute value: \[|a| = \begin{cases} a, & \text{if } a \geq 0\\ -a, & \text{if } a < 0 \end{cases}\]

Case 1: If \(a \geq 0\), then \(|a| = a\), so \(a \leq b\). Also, since \(a \geq 0\), we have \(a \geq -b\) (as \(-b \leq 0 \leq a\) when \(b \geq 0\)). Thus \(-b \leq a \leq b\).

Case 2: If \(a < 0\), then \(|a| = -a\), so \(-a \leq b\), which gives \(a \geq -b\). Also, \(a < 0 \leq b\) (assuming \(b \geq 0\), which must hold since \(|a| \geq 0\)). Thus \(-b \leq a < 0 \leq b\).

In both cases, \(-b \leq a \leq b\).

\((\Leftarrow)\) Assume \(-b \leq a \leq b\).

Case 1: If \(a \geq 0\), then \(|a| = a \leq b\) by assumption.

Case 2: If \(a < 0\), then \(|a| = -a\). From \(-b \leq a\), we get \(-a \leq b\), so \(|a| = -a \leq b\).

In both cases, \(|a| \leq b\). \(\quad\square\)

Part
[2 marks]

Prove that

Proposition

\[\forall a, b \in \mathbb{R}, \quad |a|\cdot |b| = |a\cdot b |\]

Proof
Solution

We consider four cases based on the signs of \(a\) and \(b\):

Case 1: \(a \geq 0\) and \(b \geq 0\)

Then \(ab \geq 0\), so: \[|a| \cdot |b| = a \cdot b = |ab|\]

Case 2: \(a \geq 0\) and \(b < 0\)

Then \(ab \leq 0\), so: \[|a| \cdot |b| = a \cdot (-b) = -ab = |ab|\]

Case 3: \(a < 0\) and \(b \geq 0\)

Then \(ab \leq 0\), so: \[|a| \cdot |b| = (-a) \cdot b = -ab = |ab|\]

Case 4: \(a < 0\) and \(b < 0\)

Then \(ab > 0\), so: \[|a| \cdot |b| = (-a) \cdot (-b) = ab = |ab|\]

In all cases, \(|a| \cdot |b| = |ab|\). \(\quad\square\)

Alternative proof: Note that \(|a| = \sqrt{a^2}\) and \(|b| = \sqrt{b^2}\). Then: \[|a| \cdot |b| = \sqrt{a^2} \cdot \sqrt{b^2} = \sqrt{a^2 b^2} = \sqrt{(ab)^2} = |ab|\]

Part
[2 marks]

Complete the following

Proposition

Two real numbers, \(a\) and \(b\) are equal if and only if, for every \(\epsilon > 0\), \(|a-b| <\epsilon\).

Proof
Solution

\((\Rightarrow)\) Assume \(a = b\).

Then \(|a - b| = |0| = 0 < \epsilon\) for all \(\epsilon > 0\).

\((\Leftarrow)\) Assume that for every \(\epsilon > 0\), \(|a - b| < \epsilon\).

Suppose, for contradiction, that \(a \neq b\). Then \(|a - b| = d > 0\) for some \(d \in \mathbb{R}\).

Choose \(\epsilon_0 =\frac{d}{2} > 0\). By assumption, \(|a - b| < \epsilon_0 = \frac{d}{2}\).

But this gives \(d < \frac{d}{2}\), which implies \(2d < d\), hence \(d < 0\). This contradicts \(d > 0\).

Therefore, \(a = b\). \(\quad\square\)

Interpretation:

This states that two real numbers are equal if and only if the distance between them is arbitrarily small. This is a fundamental characterisation of equality in terms of the metric structure of \(\mathbb{R}\).

Q17

Problem
[3 marks]

Complete the following

Proposition

Let \(\mathcal{A}\subseteq \mathbb{R}\) and \(m\in \mathcal{A}\) be the minimum of \(\mathcal{A}\), then \(\inf \mathcal{A} = m\).

Proof
Solution

Recall that \(m\) is the minimum of \(\mathcal{A}\) if:

  1. \(m \in \mathcal{A}\)
  2. \(m \leq a\) for all \(a \in \mathcal{A}\)

Recall that \(\inf \mathcal{A}\) (the infimum or greatest lower bound) satisfies:

  1. \(\inf \mathcal{A} \leq a\) for all \(a \in \mathcal{A}\) (it is a lower bound)
  2. If \(\ell\) is any lower bound of \(\mathcal{A}\), then \(\ell \leq \inf \mathcal{A}\) (it is the greatest lower bound)

We must show \(\inf \mathcal{A} = m\):

Part 1: Show \(m\) is a lower bound of \(\mathcal{A}\).

Since \(m\) is the minimum, \(m \leq a\) for all \(a \in \mathcal{A}\). Thus \(m\) is a lower bound.

Part 2: Show \(m\) is the greatest lower bound.

Let \(\ell\) be any lower bound of \(\mathcal{A}\). Then \(\ell \leq a\) for all \(a \in \mathcal{A}\).

In particular, since \(m \in \mathcal{A}\), we have \(\ell \leq m\).

Since \(m\) is a lower bound and every lower bound \(\ell\) satisfies \(\ell \leq m\), we conclude that \(m = \inf \mathcal{A}\). \(\quad\square\)

Note: The minimum is the infimum when it exists and belongs to the set. However, not all sets have a minimum (e.g., \((0,1)\) has no minimum), but every non-empty bounded-below set has an infimum (by the completeness of \(\mathbb{R}\)).

Q18

Problem
[7 marks]
Part
[4 marks]

State and prove the Cauchy-Schwarz inequality in \(\mathbb{R}^n\).

Proposition
Solution

Cauchy-Schwarz Inequality: For vectors \(\mathbf{x}, \mathbf{y} \in \mathbb{R}^n\): \[|\mathbf{x}^T\mathbf{y}| \leq \|\mathbf{x}\| \|\mathbf{y}\|\]

Equivalently, for real numbers: \[\left|\sum_{i=1}^n x_i y_i\right| \leq \sqrt{\sum_{i=1}^n x_i^2} \sqrt{\sum_{i=1}^n y_i^2}\]

Proof
Solution

Case 1: If \(\mathbf{y} = \mathbf{0}\), then both sides equal \(0\), so the inequality holds.

Case 2: Assume \(\mathbf{y} \neq \mathbf{0}\). For any \(t \in \mathbb{R}\), consider: \[0 \leq \|\mathbf{x} - t\mathbf{y}\|^2 = (\mathbf{x} - t\mathbf{y})^T(\mathbf{x} - t\mathbf{y})\]

Expanding:

\begin{align*} 0 &\leq \mathbf{x}^T\mathbf{x} - 2t\mathbf{x}^T\mathbf{y} + t^2\mathbf{y}^T\mathbf{y}\\ &= \|\mathbf{x}\|^2 - 2t(\mathbf{x}^T\mathbf{y}) + t^2\|\mathbf{y}\|^2 \end{align*}

This is a quadratic in \(t\) that is non-negative for all \(t\). Choose \(t = \frac{\mathbf{x}^T\mathbf{y}}{\|\mathbf{y}\|^2}\) (the value that minimises the quadratic):

\begin{align*} 0 &\leq \|\mathbf{x}\|^2 - 2\frac{(\mathbf{x}^T\mathbf{y})^2}{\|\mathbf{y}\|^2} + \frac{(\mathbf{x}^T\mathbf{y})^2}{\|\mathbf{y}\|^4} \cdot \|\mathbf{y}\|^2\\ &= \|\mathbf{x}\|^2 - \frac{(\mathbf{x}^T\mathbf{y})^2}{\|\mathbf{y}\|^2} \end{align*}

Rearranging: \[(\mathbf{x}^T\mathbf{y})^2 \leq \|\mathbf{x}\|^2 \|\mathbf{y}\|^2\]

Taking square roots: \[|\mathbf{x}^T\mathbf{y}| \leq \|\mathbf{x}\| \|\mathbf{y}\|\]

Equality condition: Equality holds if and only if \(\mathbf{x}\) and \(\mathbf{y}\) are linearly dependent (i.e., \(\mathbf{x} = c\mathbf{y}\) for some scalar \(c\)).

Alternative proof via discriminant: The quadratic \(at^2 + bt + c\) with \(a = \|\mathbf{y}\|^2 > 0\), \(b = -2\mathbf{x}^T\mathbf{y}\), \(c = \|\mathbf{x}\|^2\) is non-negative for all \(t\). Thus its discriminant must be non-positive: \[b^2 - 4ac \leq 0 \implies 4(\mathbf{x}^T\mathbf{y})^2 - 4\|\mathbf{x}\|^2\|\mathbf{y}\|^2 \leq 0\] which gives the result.

Part
[3 marks]

Hence or otherwise, state and prove the Triangle Inequality in \(\mathbb{R}^n\).

Proposition
Solution

Triangle Inequality: For all real numbers \(a, b\): \[|a + b| \leq |a| + |b|\]

More generally, for vectors \(\mathbf{x}, \mathbf{y} \in \mathbb{R}^n\): \[\|\mathbf{x} + \mathbf{y}\| \leq \|\mathbf{x}\| + \|\mathbf{y}\|\]

Proof
Solution

We prove the vector version (the scalar case follows by setting \(n=1\)).

\begin{align*} \|\mathbf{x} + \mathbf{y}\|^2 &= (\mathbf{x} + \mathbf{y})^T(\mathbf{x} + \mathbf{y})\\ &= \mathbf{x}^T\mathbf{x} + 2\mathbf{x}^T\mathbf{y} + \mathbf{y}^T\mathbf{y}\\ &= \|\mathbf{x}\|^2 + 2\mathbf{x}^T\mathbf{y} + \|\mathbf{y}\|^2 \end{align*}

By the Cauchy-Schwarz inequality (from part a): \[\mathbf{x}^T\mathbf{y} \leq |\mathbf{x}^T\mathbf{y}| \leq \|\mathbf{x}\| \|\mathbf{y}\|\]

Therefore:

\begin{align*} \|\mathbf{x} + \mathbf{y}\|^2 &\leq \|\mathbf{x}\|^2 + 2\|\mathbf{x}\| \|\mathbf{y}\| + \|\mathbf{y}\|^2\\ &= (\|\mathbf{x}\| + \|\mathbf{y}\|)^2 \end{align*}

Taking square roots (both sides are non-negative): \[\|\mathbf{x} + \mathbf{y}\| \leq \|\mathbf{x}\| + \|\mathbf{y}\|\]

Equality condition: Equality holds if and only if \(\mathbf{x}\) and \(\mathbf{y}\) are non-negative scalar multiples of each other (i.e., point in the same direction).

Q19

Problem
[14 marks]
Part
[8 marks]

Give the definitions of these elementary analysis facts:

Definition (Metric Space)
Solution

A metric space is a pair \((X,d)\), where \(X\) is a (non-empty) set and \(d: X \times X \rightarrow [0,\infty)\) is a function, such that the following conditions hold for all \(x,y,z \in X\):

  1. \(d(x,y) = 0 \iff x=y\)
  2. \(d(x,y) = d(y,x)\)
  3. \(d(x,z) \leq d(x,y) + d(y,z)\quad\) (triangle inequality)
Definition (Epsilon Ball)
Solution

\[B_\epsilon(x_0) = \set{x\in X: d(x, x_0) < \epsilon}\]

Definition (Interior Point)
Solution

\(x_0 \in Y \subseteq X\) is an interior point if \(B(x_0,\epsilon)\) lies completely within \(Y\).

Definition (Open Set)
Solution

A subset \(Y\) in \((X,d)\) is open if \[Y=\text{Int}(Y)\]

Part
[6 marks]

Complete the following

Proposition

Every $ε$-ball in a metric space is open.

Proof
Solution

Let \((X, d)\) be a metric space, \(x_0 \in X\), and \(\epsilon > 0\). We want to show that the $ε$-ball \[B_\epsilon(x_0) = \{x \in X : d(x, x_0) < \epsilon\}\] is an open set.

To show \(B_\epsilon(x_0)\) is open, we must show that every point in \(B_\epsilon(x_0)\) is an interior point, i.e., for each \(y \in B_\epsilon(x_0)\), there exists \(\delta > 0\) such that \(B_\delta(y) \subseteq B_\epsilon(x_0)\).

Construction:

Let \(y \in B_\epsilon(x_0)\) be arbitrary. Then \(d(y, x_0) < \epsilon\).

Define \(\delta = \epsilon - d(y, x_0) > 0\).

We claim that \(B_\delta(y) \subseteq B_\epsilon(x_0)\).

Verification:

Let \(z \in B_\delta(y)\). Then \(d(z, y) < \delta\).

By the triangle inequality:

\begin{align*} d(z, x_0) &\leq d(z, y) + d(y, x_0)\\ &< \delta + d(y, x_0)\\ &= (\epsilon - d(y, x_0)) + d(y, x_0)\\ &= \epsilon \end{align*}

Therefore \(d(z, x_0) < \epsilon\), which means \(z \in B_\epsilon(x_0)\).

Since \(z\) was arbitrary, we have \(B_\delta(y) \subseteq B_\epsilon(x_0)\).

This shows that \(y\) is an interior point of \(B_\epsilon(x_0)\).

Since \(y \in B_\epsilon(x_0)\) was arbitrary, every point of \(B_\epsilon(x_0)\) is an interior point.

Therefore, \(B_\epsilon(x_0) = \text{Int}(B_\epsilon(x_0))\), which means \(B_\epsilon(x_0)\) is open. \(\quad\square\)

Q20

Problem
[12 marks]

Infinite-dimensional vector spaces

Part
[8 marks]

What are the sets \(c_{00}, c_0, \ell^p\) and \(\ell^\infty\)? Give examples of sequences in each.

Definition ($c_{00}$)
Solution

The space of sequences with only finitely many non-zero terms. \[c_{00} = \left\{(x_n)_{n=1}^\infty : \text{only finitely many } x_n \neq 0\right\}\]

Example ($c_{00}$)
Solution

\((1, 2, 3, 0, 0, 0, \ldots)\), \((1, 0, 1, 0, 0, \ldots)\)

Definition ($c_{0}$)
Solution

The space of sequences that converge to zero. \[c_0 = \left\{(x_n)_{n=1}^\infty : \lim_{n\to\infty} x_n = 0\right\}\]

Example ($c_{0}$)
Solution

\(\left(\frac{1}{n}\right)_{n=1}^\infty = \left(1, \frac{1}{2}, \frac{1}{3}, \frac{1}{4}, \ldots\right)\), \(\left(\frac{1}{2^n}\right)_{n=1}^\infty\)

Definition ($\ell^p$)
Solution

The space of sequences whose $p$-th powers are summable. \[\ell^p = \left\{(x_n)_{n=1}^\infty : \sum_{n=1}^\infty |x_n|^p < \infty\right\}\]

Example ($\ell^{p}$)
Solution
  • \(\left(\frac{1}{n}\right)_{n=1}^\infty\) (since \(\sum \frac{1}{n^2} < \infty\)), \(\left(\frac{1}{2^n}\right)_{n=1}^\infty\)
  • \(\left(\frac{1}{n^2}\right)_{n=1}^\infty\) (since \(\sum \frac{1}{n^2} < \infty\))
Definition ($\ell^\infty$)
Solution

The space of bounded sequences. \[\ell^\infty = \left\{(x_n)_{n=1}^\infty : \sup_{n\in\mathbb{N}} |x_n| < \infty\right\}\]

Example ($\ell^{\infty}$)
Solution

\((1, -1, 1, -1, \ldots)\), \(\left(\sin(n)\right)_{n=1}^\infty\), any constant sequence \((c, c, c, \ldots)\)

Remark

Relationships: \[c_{00} \subsetneq c_0 \subsetneq \ell^p \subsetneq \ell^\infty \quad \text{(for any } 1 \leq p < \infty\text{)}\]

Part
[4 marks]

Give the definition and at least 2 examples for each each:

Definition (Hilbert Space)
Solution

A complete inner product space.

Example (Hilbert Space)
Solution
  • \(\mathbb{R}^n\) with the standard inner product \(\langle \mathbf{x}, \mathbf{y} \rangle = \sum_{i=1}^n x_i y_i\)
  • \(\ell^2 = \left\{(x_n)_{n=1}^\infty : \sum_{n=1}^\infty |x_n|^2 < \infty\right\}\) with inner product \(\langle x, y \rangle = \sum_{n=1}^\infty x_n y_n\)
Definition (Banach Space)
Solution

A complete normed vector space

Example (Banach Space)
Solution

\((C[0,1], ||\cdot||_\infty)\) and \((\ell^p, ||\cdot||_p)\)

Q21

Problem
[8 marks]
Part
[2 marks]

State pointwise convergence for a sequence of functions.

Solution

\(\lim_{n\rightarrow \infty} f_n = f\) p.w (point-wise) \(\iff \forall x\in X, \forall \epsilon > 0, \; \exists K = K(\epsilon, x) : \forall n\geq K \; |f_n(x)-f(x)| \;< \epsilon\)

Part
[2 marks]

State uniform convergence for a sequence of functions

Solution

\(\lim_{n\rightarrow \infty} f_n = f\) uniformly \(\iff \forall\epsilon>0, \; \exists K=K(\epsilon) : \forall n \geq K \; |f_n(x)-f(x) |\, < \epsilon\quad\forall x\in X\)

Part
[2 marks]

Does the following function converge pointwise? What about uniformly?

For each integer \(k\ge1\) define \[ f_k : [0,1] \longrightarrow \mathbb{R}, \qquad f_k(x)=\max\{0,1-4k^{2}|\,x-\tfrac{1}{k^{2}}|\}.\]

Solution

Pointwise convergence and failure of uniform convergence of \[ S(x):=\sum_{k=1}^{\infty}\frac{f_k(x)}{k}, \qquad x\in[0,1]. \]

  1. Pointwise convergence.

    Fix \(x\in(0,1]\). The inequality \(\frac{3}{4k^{2}}\lt x \lt \frac{5}{4k^{2}}\) is equivalent to \[ A(x):=\sqrt{\tfrac{3}{4x}} \lt k \lt B(x):=\sqrt{\tfrac{5}{4x}}. \] Whose length is \[ L(x):=B(x)-A(x) =\frac{\sqrt{5}-\sqrt{3}}{2\sqrt{x}} <\frac{0.253}{\sqrt{x}}<\infty, \] so the interval holds at most \(\left\lceil L(x)\right\rceil\) integers. Hence only finitely many \(k\) satisfy \(f_k(x)\neq0\); the series \(S(x)\) reduces to a finite sum and converges. For \(x=0\) every term is \(0\), so \(S(0)=0\). Thus \(S\) converges for every \(x\in[0,1]\).

  2. Failure of uniform convergence.

    Let \(S_N(x):=\sum_{k=1}^{N}\frac{f_k(x)}{k}\). For \(N\ge10\) choose \[ x_N:=\frac{1}{N^{2}}\in[0,1]. \]

    Claim: For every \(k\in\left[N+1,N+\lfloor N/10\rfloor\right]\) we have \(f_k(x_N)\ge\frac{1}{2}\).

    Proof: For such \(k\),

    \begin{align*} \left|x_N-\tfrac{1}{k^{2}}\right| &=\frac{ | k^{2}-N^{2} | }{k^{2}N^{2}} \\ &= \frac{(k-N)(k+N)}{k^{2}N^{2}} \le\frac{(N/10)(11N/10)}{k^{2}N^{2}} \\ &<\frac{11}{100\,k^{2}} \\ &<\frac{1}{8\,k^{2}} \\ &<\frac{1}{4k^{2}}, \end{align*}

    so \(x_N\in\operatorname{supp}(f_k)\) and \(f_k(x_N)=1-4k^{2}\lvert x_N-\tfrac{1}{k^{2}}\rvert\ge\frac{1}{2}\).

    Therefore the tail \(T_N(x):=\sum_{k>N}\frac{f_k(x)}{k}\) satisfies \[ T_N(x_N) \ge \frac{1}{2} \sum_{k=N+1}^{N+\lfloor N/10\rfloor}\frac{1}{k} \ge \frac{1}{2}\ln\left(1+\tfrac{1}{10}\right) =:c>0 \qquad(N\ge10), \] using \(\sum_{k=m}^{n} \tfrac{1}{k} \ge \ln\dfrac{n}{m}\), \(\forall n,m\in\mathbb{N}\), \(n\ge m\ge 1\): \[ \|S-S_N\|_{\infty} =\sup_{x\in[0,1]}|S(x)-S_N(x)| \ge |T_N(x_N)| \ge c \quad\text{for all }N\ge10, \] so \(\|S-S_N\|_{\infty}\not\to0\). The convergence of the series is therefore not uniform on \([0,1]\).

Part
[2 marks]

What does uniform convergence imply about a series?

Solution

\(L^p\) convergence and pointwise convergence.

Q22

Problem
[7 marks]
Part
[2 marks]

Give the definition of a topological space \((X, \tau)\).

Solution

A topological space is a pair \(\left( X ,\tau \right)\), where \(X\) is a set and \(\tau\subseteq\mathcal{P}\!\left( X \right)\) satisfies

  1. \(\varnothing , X \in \tau\);
  2. if \(\{V_i\}_{i\in I}\subseteq\tau\) then \(\bigcup_{i\in I} V_i \in \tau\);
    • “an arbitrary union of open sets is open”
  3. if \(V_1,V_2\in\tau\) then \(V_1\cap V_2\in\tau\).
    • “a finite intersection of open sets is open”
Part
[3 marks]

Give the definitions of these Topological spaces:

Definition (Co-countable Topology)
Solution

\[\tau = \set{Y\subseteq X: Y^c\text{ is countable }} \cup \set{\varnothing}\]

Definition (Co-finite Topology)
Solution

\[\tau = \set{Y\subseteq X: Y^c\text{ is finite }} \cup \set{\varnothing}\]

Definition (Discrete Topology)
Solution

\[\tau = \set{\varnothing, X}\]

Part
[2 marks]

Are limits unique in a topological space? What about in a metric space?

Solution

Not unique. Only in Hausdorff spaces, of which metric is a type of.

Q23

Problem
[14 marks]

Let \((X_1, \ldots, X_n) \stackrel{\text{i.i.d}}{\sim} \mathcal{P}(\lambda)\).

Part
[3 marks]

Let \[\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i\] be an estimator of \(\lambda\). Find the bias, standard error and Mean Squared Error of \(\bar{X}_n\).

Solution

For \((X_1, \ldots, X_n) \stackrel{\text{i.i.d}}{\sim} \mathcal{P}(\lambda)\), we have \(\mathbb{E}[X_i] = \lambda\) and \(\text{Var}(X_i) = \lambda\).

Bias:

\begin{align*} \text{Bias}(\bar{X}_n) &= \mathbb{E}[\bar{X}_n] - \lambda\\ &= \mathbb{E}\left[\frac{1}{n}\sum_{i=1}^n X_i\right] - \lambda\\ &= \frac{1}{n}\sum_{i=1}^n \mathbb{E}[X_i] - \lambda\\ &= \frac{1}{n} \cdot n\lambda - \lambda\\ &= 0 \end{align*}

So \(\bar{X}_n\) is unbiased.

Standard Error:

\begin{align*} \text{SE}(\bar{X}_n) &= \sqrt{\text{Var}(\bar{X}_n)}\\ &= \sqrt{\text{Var}\left(\frac{1}{n}\sum_{i=1}^n X_i\right)}\\ &= \sqrt{\frac{1}{n^2}\sum_{i=1}^n \text{Var}(X_i)} \quad\text{(by independence)}\\ &= \sqrt{\frac{1}{n^2} \cdot n\lambda}\\ &= \sqrt{\frac{\lambda}{n}}\\ &= \frac{\sqrt{\lambda}}{\sqrt{n}} \end{align*}

Mean Squared Error:

\begin{align*} \text{MSE}(\bar{X}_n) &= \text{Var}(\bar{X}_n) + [\text{Bias}(\bar{X}_n)]^2\\ &= \frac{\lambda}{n} + 0^2\\ &= \frac{\lambda}{n} \end{align*}

Part
[2 marks]

Let \[\widehat{T}_n = \frac{X_1 + 2X_2 + \ldots + nX_n}{1+2+\dots+n}.\] Compute the bias and MSE of \(\widehat{T}_n\) as an estimator of \(\lambda\).

Solution

Note that \(\sum_{i=1}^n i = \frac{n(n+1)}{2}\).

Bias:

\begin{align*} \mathbb{E}[\widehat{T}_n] &= \frac{1}{\frac{n(n+1)}{2}} \sum_{i=1}^n i \cdot \mathbb{E}[X_i]\\ &= \frac{2}{n(n+1)} \sum_{i=1}^n i \cdot \lambda\\ &= \frac{2\lambda}{n(n+1)} \cdot \frac{n(n+1)}{2}\\ &= \lambda \end{align*}

Therefore: \(\text{Bias}(\widehat{T}_n) = 0\) (also unbiased).

Variance:

\begin{align*} \text{Var}(\widehat{T}_n) &= \text{Var}\left(\frac{2}{n(n+1)}\sum_{i=1}^n i X_i\right)\\ &= \frac{4}{n^2(n+1)^2} \sum_{i=1}^n i^2 \text{Var}(X_i) \quad\text{(by independence)}\\ &= \frac{4\lambda}{n^2(n+1)^2} \sum_{i=1}^n i^2\\ &= \frac{4\lambda}{n^2(n+1)^2} \cdot \frac{n(n+1)(2n+1)}{6}\\ &= \frac{4\lambda(2n+1)}{6n(n+1)}\\ &= \frac{2\lambda(2n+1)}{3n(n+1)} \end{align*}

MSE: \[\text{MSE}(\widehat{T}_n) = \text{Var}(\widehat{T}_n) + 0^2 = \frac{2\lambda(2n+1)}{3n(n+1)}\]

Part
[2 marks]

Compare \(\widehat{T}_n\) to \(\bar{X}_n\) in terms of MSE. Which estimator is preferable? Intuitively, why is one better than the other?

Solution

Comparison:

\begin{align*} \text{MSE}(\bar{X}_n) &= \frac{\lambda}{n}\\ \text{MSE}(\widehat{T}_n) &= \frac{2\lambda(2n+1)}{3n(n+1)} \end{align*}

To compare, compute the ratio:

\begin{align*} \frac{\text{MSE}(\widehat{T}_n)}{\text{MSE}(\bar{X}_n)} &= \frac{\frac{2\lambda(2n+1)}{3n(n+1)}}{\frac{\lambda}{n}}\\ &= \frac{2(2n+1)}{3(n+1)}\\ &= \frac{4n+2}{3n+3}\\ &= \frac{4n+2}{3n+3} \end{align*}

For large \(n\): \(\frac{4n+2}{3n+3} \approx \frac{4n}{3n} = \frac{4}{3} > 1\)

More precisely: \(\text{MSE}(\widehat{T}_n) > \text{MSE}(\bar{X}_n)\) for all \(n \geq 1\).

To verify, check if \(\frac{2(2n+1)}{3(n+1)} > 1\): \[2(2n+1) > 3(n+1) \iff 4n+2 > 3n+3 \iff n > 1\]

For \(n = 1\): both equal \(\lambda\) (both reduce to \(X_1\)). For \(n \geq 2\): \(\text{MSE}(\bar{X}_n) < \text{MSE}(\widehat{T}_n)\).

Conclusion: \(\bar{X}_n\) is preferable (lower MSE).

Intuition: \(\bar{X}_n\) weights all observations equally, making efficient use of all data. \(\widehat{T}_n\) gives more weight to later observations, which introduces unnecessary variability without reducing bias (both are unbiased). The equal weighting of the sample mean is optimal for i.i.d. data.

Part
[2 marks]

Find the moment estimator for \(\lambda\).

Solution

The method of moments equates population moments to sample moments.

For the Poisson distribution: \(\mathbb{E}[X] = \lambda\) (first moment).

Sample first moment: \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\)

Setting them equal: \[\lambda = \bar{X}_n\]

Therefore, the moment estimator is: \[\boxed{\widehat{\lambda}_{\text{MM}} = \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i}\]

This is simply the sample mean.

Part
[3 marks]

Find the maximum likelihood estimator for \(\lambda\).

Solution

The likelihood function for i.i.d. Poisson observations is:

\begin{align*} L(\lambda) &= \prod_{i=1}^n P(X_i = x_i)\\ &= \prod_{i=1}^n \frac{\lambda^{x_i} e^{-\lambda}}{x_i!}\\ &= \frac{\lambda^{\sum_{i=1}^n x_i} e^{-n\lambda}}{\prod_{i=1}^n x_i!} \end{align*}

The log-likelihood is:

\begin{align*} \ell(\lambda) &= \log L(\lambda)\\ &= \sum_{i=1}^n x_i \log \lambda - n\lambda - \sum_{i=1}^n \log(x_i!)\\ &= \left(\sum_{i=1}^n x_i\right) \log \lambda - n\lambda - \text{const} \end{align*}

Taking the derivative with respect to \(\lambda\): \[\frac{\mathrm{d}\ell}{\mathrm{d}\lambda} = \frac{\sum_{i=1}^n x_i}{\lambda} - n\]

Setting equal to zero: \[\frac{\sum_{i=1}^n x_i}{\lambda} - n = 0 \implies \lambda = \frac{1}{n}\sum_{i=1}^n x_i\]

Checking the second derivative: \[\frac{\mathrm{d}^2\ell}{\mathrm{d}\lambda^2} = -\frac{\sum_{i=1}^n x_i}{\lambda^2} < 0\]

This confirms a maximum.

Therefore, the maximum likelihood estimator is: \[\boxed{\widehat{\lambda}_{\text{MLE}} = \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i}\]

Note: For the Poisson distribution, the MLE equals the moment estimator.

Part
[2 marks]

Find the Fisher information \(\mathcal{I}(\lambda)\)

Solution

The Fisher information for a single observation is: \[\mathcal{I}(\lambda) = \mathbb{E}\left[\left(\frac{\partial}{\partial \lambda} \log f(X; \lambda)\right)^2\right]\]

or equivalently (when regularity conditions hold): \[\mathcal{I}(\lambda) = -\mathbb{E}\left[\frac{\partial^2}{\partial \lambda^2} \log f(X; \lambda)\right]\]

For \(X \sim \mathcal{P}(\lambda)\): \[\log f(x; \lambda) = x \log \lambda - \lambda - \log(x!)\]

First derivative: \[\frac{\partial}{\partial \lambda} \log f(x; \lambda) = \frac{x}{\lambda} - 1\]

Second derivative: \[\frac{\partial^2}{\partial \lambda^2} \log f(x; \lambda) = -\frac{x}{\lambda^2}\]

Taking the negative expectation:

\begin{align*} \mathcal{I}(\lambda) &= -\mathbb{E}\left[-\frac{X}{\lambda^2}\right]\\ &= \frac{\mathbb{E}[X]}{\lambda^2}\\ &= \frac{\lambda}{\lambda^2}\\ &= \frac{1}{\lambda} \end{align*}

Verification using first formula:

\begin{align*} \mathcal{I}(\lambda) &= \mathbb{E}\left[\left(\frac{X}{\lambda} - 1\right)^2\right]\\ &= \mathbb{E}\left[\frac{X^2}{\lambda^2} - \frac{2X}{\lambda} + 1\right]\\ &= \frac{\mathbb{E}[X^2]}{\lambda^2} - \frac{2\mathbb{E}[X]}{\lambda} + 1\\ &= \frac{\lambda^2 + \lambda}{\lambda^2} - \frac{2\lambda}{\lambda} + 1\\ &= 1 + \frac{1}{\lambda} - 2 + 1\\ &= \frac{1}{\lambda} \end{align*}

Therefore: \[\boxed{\mathcal{I}(\lambda) = \frac{1}{\lambda}}\]

For \(n\) i.i.d. observations: \(\mathcal{I}_n(\lambda) = n \cdot \mathcal{I}(\lambda) = \frac{n}{\lambda}\)

Q24

Problem
[3 marks]

What is the maximum straight-line distance in an N-dimensional unit hyper-cube?

Solution

Answer: \[\boxed{\sqrt{N}}\]

Derivation:

An $N$-dimensional unit hyper-cube is the set \([0,1]^N = \{(x_1, x_2, \ldots, x_N) : 0 \leq x_i \leq 1 \text{ for all } i\}\).

The maximum distance is achieved between opposite corners (vertices). Consider the vertices at the origin \(\mathbf{0} = (0, 0, \ldots, 0)\) and the opposite corner \(\mathbf{1} = (1, 1, \ldots, 1)\).

The Euclidean distance is:

\begin{align*} d(\mathbf{0}, \mathbf{1}) &= \sqrt{(1-0)^2 + (1-0)^2 + \cdots + (1-0)^2}\\ &= \sqrt{\underbrace{1 + 1 + \cdots + 1}_{N \text{ times}}}\\ &= \sqrt{N} \end{align*}

Visualisation:

2D Unit Square

3D Unit Cube

General Pattern:

  • 1D (line segment): maximum distance = \(\sqrt{1} = 1\)
  • 2D (square): maximum distance = \(\sqrt{2} \approx 1.414\)
  • 3D (cube): maximum distance = \(\sqrt{3} \approx 1.732\)
  • $N$D (hyper-cube): maximum distance = \(\sqrt{N}\)

This demonstrates the curse of dimensionality: as dimension increases, the diagonal of the unit hyper-cube grows without bound, even though each side length remains \(1\).