When you submit your work, follow the instructions on the submission workflow page * scrupulously* for full credit.

**Important: Failure to do any of the following will result in lost points:**

Submit

**one**PDF file and**one**notebook per groupEnter

**all the group members**in your PDF Gradescope submission, not just in your documents. Look at this Piazza Post if you don't know how to do thisMatch each

**answer**(not question!) with the appropriate page in GradescopeAvoid large blank spaces in your PDF file

The following code loads a small version of the MNIST handwritten-digit recognition database, which we used in a previous assignment as well. The data points are tiny 8x8 images of handwritten digits, and the labels are the digits from 0 to 9. A training set `T`

and a testing set `S`

are stored in two dictionaries with keys `x`

and `y`

, as usual.

Note that 60 percent of the available data is used for testing, and only 40 percent for training. This is so that (i) the problem is realistically difficult and (ii) the test results are relatively consistent.

The code also defines a function `evaluate`

that both prints and returns the training and testing error rates on the provided classifier (assumed to be one of the classifiers in the package `sklearn`

). The returned value is a triple with training error rate, test error rate, and the name of the classifier, which is passed to `evaluate`

as its last argument. Error rates are expressed as percentages. A later question in this part will ask to compare error rates across different classifiers, so it will save you time if you save the values returned by `evaluate`

in a list for later use.

- "Error rate" means risk computed with the zero-one loss. All error rates should be reported as percentages with two decimal digits after the period.
- Do not change anything in this code snippet.
- Use the keyword argument
`random_state=0`

every time you define a classifier other than the k-NN classifier, which uses no randomness. This will make results somewhat more consistent across runs, and make our grading job easier. - Unless told otherwise, use the default versions of the classifiers. That is,
`random_state = 0`

should be the only argument passed to the classifier constructors.

In [1]:

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
digits = datasets.load_digits()
NData = len(digits.images)
data = {'x': digits.images.reshape((NData, -1)), 'y': digits.target}
testFraction = 0.6
T, S = {}, {}
T['x'], S['x'], T['y'], S['y'] = train_test_split(data['x'], data['y'],
test_size=testFraction, random_state=0)
NT, NS = len(T['y']), len(S['y'])
def evaluate(h, T, S, name):
def errorRate(h, S):
x, y = S['x'], S['y']
return (1 - h.score(x, y)) * 100
f = '{:s}: training error rate is {:.2f}, test error rate is {:.2f}'
err = (errorRate(h, T), errorRate(h, S), name)
print(f.format(name, err[0], err[1]))
return err
```

Assuming that all digits appear with equal frequency in the data, what is the expected error rate (in percent) of a classifier that ignores the input and guesses the output uniformly and at random? Justify your answer with a very brief sentence.

Train a `sklearn.tree.DecisionTreeClassifier`

. Show your code and report training and test error rates.

Use `GridSearchCV`

to find a good number of trees for a `sklearn.ensemble.RandomForestClassifier`

by 10-fold cross-validation. Show your code and report (i) the number of trees in the forest you end up choosing, (ii) the training and test error rates, and (iii) the out-of-bag error rate estimate.

Use `scoring='accuracy'`

, but remember to convert accuracy to error rate in percent.

To obtain the out-of-bag accuracy, train a forest one more time with the number of trees found by cross-validation, and set `oob_score = True`

during that final training. If `h`

is the classifier, the out-of-bag accuracy will be in `h.oob_score_`

.

Add the key-value pair `'random_state': [0]`

to the set of hyper-parameters to search over, so that all training runs are started with this random state.

Trees are in the hundreds. Try a few values that differ by at least 100, to save time.

Train a `sklearn.linear_model.LogisticRegression`

classifier. Show your code and report training and test error rates.

Use parameters `C=1e5`

, `solver='lbfgs'`

, `multi_class='multinomial'`

, `random_state=0`

Use `GridSearchCV`

to find a good number of neighbors for a `sklearn.neighbors.KNeighborsClassifier`

by 10-fold cross-validation. Show your code and report (i) the number of neighbors you end up choosing, and (ii) the training and test error rates.

Neighbors are in the single digits. No point in trying dozens of neighbors.

Let us define the *overfit rate* of a classifier as the difference between testing error and training error, both expressed as percentages. Write code that prints out the overfit rates for the tree, forest, linear classifier, and k-NN classifier, in this order, one per line. Your printout should say which is which.

Discuss briefly the results you obtained, by answering the following questions:

- Which classifier works best on this data set?
- Why doesn't everyone always use the best-performing classifier for a problem like this?
- Do random forests reduce overfitting when compared with decision trees?
- In what ways is a linear classifier more convenient than algorithms that may perform better, when optimal performance is not crucial?
- For all but one of the classifiers used, a zero training error rate is no surprise. Which is the odd-man-out, and why?

The questions in this part ask you to convert between the implicit and parametric representations of Closed, Convex, Polyhedral (CCP) cones in two dimensions. The general conversion is difficult, but in two dimensions things are much simpler.

Pay attention to special cases in which the input representation has one vector, or when two vectors point along the same line.

The outputs from your functions should be minimal, that is, they should not contain redundant vectors.

The input to the functions you will write is a list of tuples. The list represents a CCP cone in the input representation. Each tuple in the list contains two numbers, which are the $x$ and $y$ coordinates of a vector on the plane. To keep things simple, you may assume the following:

- Each input list has either one or two vectors (in particular, empty lists or lists with three vectors, although generally appropriate, need not be considered in this assignment). However, the output from your code may have a different number of vectors, as appropriate.
- The number of input vectors is minimal. That is, there are no redundant vectors.
- The coordinates of the vectors are small integers. This simplifies checking if any two vectors are parallel to each other or not.

*Hints:*

- The vectors $(x, y)$ and $(-y, x)$ are orthogonal to each other.
- It is important to output vectors with the correct orientations. In particular, the vectors $\mathbf{p}$ and $-\mathbf{p}$ are
*not*the same. A simple way to obtain vectors with the appropriate signs is to try all signs or all sign combinations until you find the right one.

Write a function with header
`def parametric(a):`

that takes a list `a`

with the vectors of an implicit representation of a CCP cone on the plane and returns a list with the generators of the parametric representation.

The function in this problem and the one in the next one are related to each other, as they perform conversions in opposite directions. Because of this, it will save you time and possible mistakes if you define helper functions that may be useful for both problems.

It may be convenient to use

`numpy`

. If you do that, convert the inputs to that format inside your functions.While it would be safer to add assertions that check that the inputs are correctly formatted, it is not mandatory to do so.

In [2]:

```
### Tests; Your code should be in a separate code cell above this one
%matplotlib inline
tests = [[(0, 1), (1, 0)],
[(1, 2), (2, 1)],
[(1, 2), (-1, 0)],
[(1, 1), (-1, -1)],
[(0, 1)]]
def vectorString(a):
return '{' + ', '.join(['(' + ', '.join([str(b[i]) \
for i in range(len(b))]) + ')' for b in a]) + '}'
import matplotlib.pyplot as plt
def draw(a, p, k):
plt.subplot(2, 3, k)
for b in a:
plt.quiver(0, 0, b[0], b[1], color='r', units='width', scale=5)
for q in p:
plt.quiver(0, 0, q[0], q[1], color='b', units='width', scale=5)
plt.axis('equal')
plt.axis('off')
plt.xlim([-1, 1])
plt.ylim([-1, 1])
try:
plt.figure()
k = 1
for a in tests:
p = parametric(a)
print('implicit:', vectorString(a), '; parametric: ', vectorString(p), sep='')
draw(a, p, k)
k += 1
plt.show()
except NameError:
pass
```

Write a function with header
`def implicit(p):`

that takes a list `p`

with the generators of a parametric representation of a CCP cone on the plane and returns a list with the vectors for the implicit representation.

In [3]:

```
### Tests; Your code should be in a separate code cell above this one
try:
plt.figure()
k = 1
for p in tests:
a = implicit(p)
print('parametric: ', vectorString(p), '; implicit:', vectorString(a), sep='')
draw(a, p, k)
k += 1
plt.show()
except NameError:
pass
```

Let

$$ P = \{\mathbf{u}\in\mathbb{R}^2\ :\ \mathbf{a}_1^T\mathbf{u} \geq 0 \;\;\text{and}\;\; \mathbf{a}_2^T\mathbf{u} \geq 0\ \}$$

where

$$ \mathbf{a}_1^T = (0, 1) \;\;\;\text{and}\;\;\; \mathbf{a}_2^T = (-2, 1)\;. $$

What is the *implicit* (not parametric!) representation of the dual $D$ of $P$?

*Hint:* The previous problems may help.

This part explores convex programming in a very simple setting, in order to gain familiarity with the key concepts.

Use the definition of function convexity to prove that the function

$$ f(u, v) = u + v $$

is (weakly) convex in the vector $\mathbf{u} = (u, v)$.

Let

\begin{eqnarray*} f(\mathbf{u}) &=& (u_1 - a_1)^2 + (u_2 - a_2)^2\\ c_1(\mathbf{u}) &=& u_1 \\ c_2(\mathbf{u}) &=& u_2 \end{eqnarray*}

and consider the following convex program (more specifically, quadratic program):

$$ \mathbf{u}^{\ast} = \arg\min_{\mathbf{u}} f(\mathbf{u}) \;\;\;\text{subject to constraints}\;\;\; c_1(\mathbf{u}) \geq 0 \;\;\;\text{and}\;\;\; c_2(\mathbf{u}) \geq 0\;. $$

The KKT conditions for a quadratic program in two variables and two constraints are as follows:

\begin{eqnarray*} (1) &\;\;\;\;\;\;\;\;\;\;& \frac{\partial\mathcal{L}}{\partial u_1} = 0 \\ (2) && \frac{\partial\mathcal{L}}{\partial u_2} = 0 \\ (3) && c_1(\mathbf{u}) \geq 0 \\ (4) && c_2(\mathbf{u}) \geq 0 \\ (5) && \alpha_1 \geq 0 \\ (6) && \alpha_2 \geq 0 \\ (7) && \alpha_1 u_1 + \alpha_2 u_2 = 0 \end{eqnarray*}

where $$ \mathcal{L}(\mathbf{u}, \boldsymbol{\alpha}) = f(\mathbf{u}) - \boldsymbol{\alpha}^T \mathbf{c}(\mathbf{u}) $$

and $$ \mathbf{u} = \left[\begin{array}{cc}u_1\\u_2\end{array}\right] \;\;\;,\;\;\; \mathbf{c}(\mathbf{u}) = \left[\begin{array}{cc}c_1(\mathbf{u})\\c_2(\mathbf{u})\end{array}\right] \;\;\;,\;\;\; \boldsymbol{\alpha} = \left[\begin{array}{cc}\alpha_1\\\alpha_2\end{array}\right] $$

We will examine the KKT conditions for the three functions $f$ obtained by replacing $$ \mathbf{a} = \left[\begin{array}{cc}a_1\\a_2\end{array}\right] $$ in turn with $$ \mathbf{a}_1 = \left[\begin{array}{rr}1\\1\end{array}\right] \;\;\;,\;\;\; \mathbf{a}_2 = \left[\begin{array}{rr}-1\\1\end{array}\right] \;\;\;,\;\;\; \mathbf{a}_3 = \left[\begin{array}{rr}-1\\-2\end{array}\right] $$ and at the four test points $$ \mathbf{v}_1 = \left[\begin{array}{cc}0\\0\end{array}\right] \;\;\;,\;\;\; \mathbf{v}_2 = \left[\begin{array}{cc}0\\1\end{array}\right] \;\;\;,\;\;\; \mathbf{v}_3 = \left[\begin{array}{cc}1\\0\end{array}\right] \;\;\;,\;\;\; \mathbf{v}_4 = \left[\begin{array}{cc}1\\1\end{array}\right] \;. $$

The figure below shows isocontours of the three functions, the two constraint boundaries (red and blue lines), and the four test points (black dots).

It is immediate to verify that all four test points satisfy the KKT conditions (3) and (4), so we will ignore these from now on.

Use the KKT conditions (1) and (2) to write expressions for $\alpha_1$ and $\alpha_2$ in terms of $u_1$, $a_1$, $u_2$, $a_2$.

For $\mathbf{a} = \mathbf{a}_1$ (given in the preamble), compute the values of the Lagrange multipliers $\alpha_1$ and $\alpha_2$ for each of the four test points using the expressions you found in Problem 3.2, and check which of the test points, if any, satisfies the remaining KKT conditions (5), (6), (7). Based on this check, is any of the four test points a solution to the convex program? If so, which?

Your answer should just

- state which of the four points solve(s) the convex program,
- list the corresponding values of $\alpha_1$ and $\alpha_2$, and
- give the active set $\mathcal{A}$ for the solution.

If there is no solution, so state.

Same as Problem 3.3, but for $\mathbf{a} = \mathbf{a}_2$.

Same as Problem 3.3, but for $\mathbf{a} = \mathbf{a}_3$.