Homework 8

Homework Submission Workflow

When you submit your work, follow the instructions on the submission workflow page scrupulously for full credit.

Important: Failure to do any of the following will result in lost points:

  • Submit one PDF file and one notebook per group

  • Enter all the group members in your PDF Gradescope submission, not just in your documents. Look at this Piazza Post if you don't know how to do this

  • Match each answer (not question!) with the appropriate page in Gradescope

  • Avoid large blank spaces in your PDF file

Part 1: Classifying Digit Images

The following code loads a small version of the MNIST handwritten-digit recognition database, which we used in a previous assignment as well. The data points are tiny 8x8 images of handwritten digits, and the labels are the digits from 0 to 9. A training set T and a testing set S are stored in two dictionaries with keys x and y, as usual.

Note that 60 percent of the available data is used for testing, and only 40 percent for training. This is so that (i) the problem is realistically difficult and (ii) the test results are relatively consistent.

The code also defines a function evaluate that both prints and returns the training and testing error rates on the provided classifier (assumed to be one of the classifiers in the package sklearn). The returned value is a triple with training error rate, test error rate, and the name of the classifier, which is passed to evaluate as its last argument. Error rates are expressed as percentages. A later question in this part will ask to compare error rates across different classifiers, so it will save you time if you save the values returned by evaluate in a list for later use.

Important Points

  • "Error rate" means risk computed with the zero-one loss. All error rates should be reported as percentages with two decimal digits after the period.
  • Do not change anything in this code snippet.
  • Use the keyword argument random_state=0 every time you define a classifier other than the k-NN classifier, which uses no randomness. This will make results somewhat more consistent across runs, and make our grading job easier.
  • Unless told otherwise, use the default versions of the classifiers. That is, random_state = 0 should be the only argument passed to the classifier constructors.
In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

digits = datasets.load_digits()
NData = len(digits.images)
data = {'x': digits.images.reshape((NData, -1)), 'y': digits.target}

testFraction = 0.6
T, S = {}, {}
T['x'], S['x'], T['y'], S['y'] = train_test_split(data['x'], data['y'],
 test_size=testFraction, random_state=0)

NT, NS = len(T['y']), len(S['y'])

def evaluate(h, T, S, name):
    def errorRate(h, S):
        x, y = S['x'], S['y']
        return (1 - h.score(x, y)) * 100
    
    f = '{:s}: training error rate is {:.2f}, test error rate is {:.2f}'
    err = (errorRate(h, T), errorRate(h, S), name)
    print(f.format(name, err[0], err[1]))
    return err

Problem 1.1

Assuming that all digits appear with equal frequency in the data, what is the expected error rate (in percent) of a classifier that ignores the input and guesses the output uniformly and at random? Justify your answer with a very brief sentence.

Problem 1.2

Train a sklearn.tree.DecisionTreeClassifier. Show your code and report training and test error rates.

Problem 1.3

Use GridSearchCV to find a good number of trees for a sklearn.ensemble.RandomForestClassifier by 10-fold cross-validation. Show your code and report (i) the number of trees in the forest you end up choosing, (ii) the training and test error rates, and (iii) the out-of-bag error rate estimate.

Programming Notes

Use scoring='accuracy', but remember to convert accuracy to error rate in percent.

To obtain the out-of-bag accuracy, train a forest one more time with the number of trees found by cross-validation, and set oob_score = True during that final training. If h is the classifier, the out-of-bag accuracy will be in h.oob_score_ .

Add the key-value pair 'random_state': [0] to the set of hyper-parameters to search over, so that all training runs are started with this random state.

Trees are in the hundreds. Try a few values that differ by at least 100, to save time.

Problem 1.4

Train a sklearn.linear_model.LogisticRegression classifier. Show your code and report training and test error rates.

Programming Notes

Use parameters C=1e5, solver='lbfgs', multi_class='multinomial', random_state=0

Problem 1.5

Use GridSearchCV to find a good number of neighbors for a sklearn.neighbors.KNeighborsClassifier by 10-fold cross-validation. Show your code and report (i) the number of neighbors you end up choosing, and (ii) the training and test error rates.

Programming Note

Neighbors are in the single digits. No point in trying dozens of neighbors.

Problem 1.6

Let us define the overfit rate of a classifier as the difference between testing error and training error, both expressed as percentages. Write code that prints out the overfit rates for the tree, forest, linear classifier, and k-NN classifier, in this order, one per line. Your printout should say which is which.

Problem 1.7

Discuss briefly the results you obtained, by answering the following questions:

  • Which classifier works best on this data set?
  • Why doesn't everyone always use the best-performing classifier for a problem like this?
  • Do random forests reduce overfitting when compared with decision trees?
  • In what ways is a linear classifier more convenient than algorithms that may perform better, when optimal performance is not crucial?
  • For all but one of the classifiers used, a zero training error rate is no surprise. Which is the odd-man-out, and why?

Part 2: Closed Convex Polyhedral Cones

The questions in this part ask you to convert between the implicit and parametric representations of Closed, Convex, Polyhedral (CCP) cones in two dimensions. The general conversion is difficult, but in two dimensions things are much simpler.

Pay attention to special cases in which the input representation has one vector, or when two vectors point along the same line.

The outputs from your functions should be minimal, that is, they should not contain redundant vectors.

The input to the functions you will write is a list of tuples. The list represents a CCP cone in the input representation. Each tuple in the list contains two numbers, which are the $x$ and $y$ coordinates of a vector on the plane. To keep things simple, you may assume the following:

  • Each input list has either one or two vectors (in particular, empty lists or lists with three vectors, although generally appropriate, need not be considered in this assignment). However, the output from your code may have a different number of vectors, as appropriate.
  • The number of input vectors is minimal. That is, there are no redundant vectors.
  • The coordinates of the vectors are small integers. This simplifies checking if any two vectors are parallel to each other or not.

Hints:

  • The vectors $(x, y)$ and $(-y, x)$ are orthogonal to each other.
  • It is important to output vectors with the correct orientations. In particular, the vectors $\mathbf{p}$ and $-\mathbf{p}$ are not the same. A simple way to obtain vectors with the appropriate signs is to try all signs or all sign combinations until you find the right one.

Problem 2.1

Write a function with header def parametric(a): that takes a list a with the vectors of an implicit representation of a CCP cone on the plane and returns a list with the generators of the parametric representation.

Programming Notes

  • The function in this problem and the one in the next one are related to each other, as they perform conversions in opposite directions. Because of this, it will save you time and possible mistakes if you define helper functions that may be useful for both problems.

  • It may be convenient to use numpy. If you do that, convert the inputs to that format inside your functions.

  • While it would be safer to add assertions that check that the inputs are correctly formatted, it is not mandatory to do so.

In [2]:
### Tests; Your code should be in a separate code cell above this one

%matplotlib inline

tests = [[(0, 1), (1, 0)],
         [(1, 2), (2, 1)],
         [(1, 2), (-1, 0)],
         [(1, 1), (-1, -1)],
         [(0, 1)]]

def vectorString(a):
    return '{' + ', '.join(['(' + ', '.join([str(b[i]) \
        for i in range(len(b))]) + ')' for b in a]) + '}'

import matplotlib.pyplot as plt
def draw(a, p, k):
    plt.subplot(2, 3, k)
    for b in a:
        plt.quiver(0, 0, b[0], b[1], color='r', units='width', scale=5)    
    for q in p:
        plt.quiver(0, 0, q[0], q[1], color='b', units='width', scale=5)
    plt.axis('equal')
    plt.axis('off')
    plt.xlim([-1, 1])
    plt.ylim([-1, 1])

try:
    plt.figure()
    k = 1
    for a in tests:
        p = parametric(a)
        print('implicit:', vectorString(a), '; parametric: ', vectorString(p), sep='')
        draw(a, p, k)
        k += 1
    plt.show()
except NameError:
    pass
<Figure size 432x288 with 0 Axes>

Problem 2.2

Write a function with header def implicit(p): that takes a list p with the generators of a parametric representation of a CCP cone on the plane and returns a list with the vectors for the implicit representation.

In [3]:
### Tests; Your code should be in a separate code cell above this one

try:
    plt.figure()
    k = 1
    for p in tests:
        a = implicit(p)
        print('parametric: ', vectorString(p), '; implicit:', vectorString(a), sep='')
        draw(a, p, k)
        k += 1
    plt.show()
except NameError:
    pass
<Figure size 432x288 with 0 Axes>

Problem 2.3

Let

$$ P = \{\mathbf{u}\in\mathbb{R}^2\ :\ \mathbf{a}_1^T\mathbf{u} \geq 0 \;\;\text{and}\;\; \mathbf{a}_2^T\mathbf{u} \geq 0\ \}$$

where

$$ \mathbf{a}_1^T = (0, 1) \;\;\;\text{and}\;\;\; \mathbf{a}_2^T = (-2, 1)\;. $$

What is the implicit (not parametric!) representation of the dual $D$ of $P$?

Hint: The previous problems may help.

Part 3: Convex Programs

This part explores convex programming in a very simple setting, in order to gain familiarity with the key concepts.

Problem 3.1

Use the definition of function convexity to prove that the function

$$ f(u, v) = u + v $$

is (weakly) convex in the vector $\mathbf{u} = (u, v)$.

Preamble

Let

\begin{eqnarray*} f(\mathbf{u}) &=& (u_1 - a_1)^2 + (u_2 - a_2)^2\\ c_1(\mathbf{u}) &=& u_1 \\ c_2(\mathbf{u}) &=& u_2 \end{eqnarray*}

and consider the following convex program (more specifically, quadratic program):

$$ \mathbf{u}^{\ast} = \arg\min_{\mathbf{u}} f(\mathbf{u}) \;\;\;\text{subject to constraints}\;\;\; c_1(\mathbf{u}) \geq 0 \;\;\;\text{and}\;\;\; c_2(\mathbf{u}) \geq 0\;. $$

The KKT conditions for a quadratic program in two variables and two constraints are as follows:

\begin{eqnarray*} (1) &\;\;\;\;\;\;\;\;\;\;& \frac{\partial\mathcal{L}}{\partial u_1} = 0 \\ (2) && \frac{\partial\mathcal{L}}{\partial u_2} = 0 \\ (3) && c_1(\mathbf{u}) \geq 0 \\ (4) && c_2(\mathbf{u}) \geq 0 \\ (5) && \alpha_1 \geq 0 \\ (6) && \alpha_2 \geq 0 \\ (7) && \alpha_1 u_1 + \alpha_2 u_2 = 0 \end{eqnarray*}

where $$ \mathcal{L}(\mathbf{u}, \boldsymbol{\alpha}) = f(\mathbf{u}) - \boldsymbol{\alpha}^T \mathbf{c}(\mathbf{u}) $$

and $$ \mathbf{u} = \left[\begin{array}{cc}u_1\\u_2\end{array}\right] \;\;\;,\;\;\; \mathbf{c}(\mathbf{u}) = \left[\begin{array}{cc}c_1(\mathbf{u})\\c_2(\mathbf{u})\end{array}\right] \;\;\;,\;\;\; \boldsymbol{\alpha} = \left[\begin{array}{cc}\alpha_1\\\alpha_2\end{array}\right] $$

We will examine the KKT conditions for the three functions $f$ obtained by replacing $$ \mathbf{a} = \left[\begin{array}{cc}a_1\\a_2\end{array}\right] $$ in turn with $$ \mathbf{a}_1 = \left[\begin{array}{rr}1\\1\end{array}\right] \;\;\;,\;\;\; \mathbf{a}_2 = \left[\begin{array}{rr}-1\\1\end{array}\right] \;\;\;,\;\;\; \mathbf{a}_3 = \left[\begin{array}{rr}-1\\-2\end{array}\right] $$ and at the four test points $$ \mathbf{v}_1 = \left[\begin{array}{cc}0\\0\end{array}\right] \;\;\;,\;\;\; \mathbf{v}_2 = \left[\begin{array}{cc}0\\1\end{array}\right] \;\;\;,\;\;\; \mathbf{v}_3 = \left[\begin{array}{cc}1\\0\end{array}\right] \;\;\;,\;\;\; \mathbf{v}_4 = \left[\begin{array}{cc}1\\1\end{array}\right] \;. $$

The figure below shows isocontours of the three functions, the two constraint boundaries (red and blue lines), and the four test points (black dots).

It is immediate to verify that all four test points satisfy the KKT conditions (3) and (4), so we will ignore these from now on.

convex

Problem 3.2

Use the KKT conditions (1) and (2) to write expressions for $\alpha_1$ and $\alpha_2$ in terms of $u_1$, $a_1$, $u_2$, $a_2$.

Problem 3.3

For $\mathbf{a} = \mathbf{a}_1$ (given in the preamble), compute the values of the Lagrange multipliers $\alpha_1$ and $\alpha_2$ for each of the four test points using the expressions you found in Problem 3.2, and check which of the test points, if any, satisfies the remaining KKT conditions (5), (6), (7). Based on this check, is any of the four test points a solution to the convex program? If so, which?

Your answer should just

  • state which of the four points solve(s) the convex program,
  • list the corresponding values of $\alpha_1$ and $\alpha_2$, and
  • give the active set $\mathcal{A}$ for the solution.

If there is no solution, so state.

Problem 3.4

Same as Problem 3.3, but for $\mathbf{a} = \mathbf{a}_2$.

Problem 3.5

Same as Problem 3.3, but for $\mathbf{a} = \mathbf{a}_3$.