This homework assignment is meant to revisit some concepts from linear algebra you have already encountered in the past. It is also designed to help you get accustomed with Jupyter and Python 3, if you don't know these already, and with the process required to submit your work on Gradescope.

When you submit your work, follow the instructions on the submission workflow page * scrupulously* for full credit.

Before you start, make sure you install the Anaconda Python distribution, which contains essentially all of the code you will need for this course. Please do not use other installations of Python, and install the version for Python 3, not Python 2. Also, if you install new Python packages later on, do not mix package installation methods, unless you know very thoroughly what you are doing (for instance, do not use `pip`

to install new software, only use `conda`

). This will save you big headaches as the semester progresses.

Homework assignments in this course, including this one, are made available as HTML files, which you can only read and print. A separate template file is also made available for each assignment. The template is a Jupyter notebook, a file called `homework0n.ipynb`

, where `n`

is the homework number. This is a file you can also edit, and will become the ntebook you submit for grading.

As an exception, for this first assignment you get the notebook `source01.ipynb`

rather than the HTML file `assignment01.html`

. In this way, you can also open `source01.ipynb`

in the notebook editor and look at how this notebook was made. Once you are comfortable with notebook syntax, **make sure you start from the template notebook**`homework01.ipynb`

, **not from the source notebook, to produce your submission.**

To edit a notebook, start the Anaconda Navigator and launch Jupyter Notebook from the Navigator. This will open a new page in your web browser with a listing of your home directory. Navigate to the directory where you put the files `source01.ipynb`

and `homework01.ipynb`

and click on each of them to open them. The files will open in the notebook editor in separate tabs of your web browser. **Again, remember that**`source01.ipynb`

**is just for playing with, while**`homework01.ipynb`

**is the file you start from for your assignment.**

To edit some text or code in either file, double-click on it. This will change the display of that *cell* to edit mode. Try this now.

A *cell* is a block of content. There are a few different types of cell, but we will be concerned maily with `Markdown`

cells and `Code`

cells. The former contain text, including mathematics, and the latter contain snippets of code (Python 3 for us, although notebooks can deal with any programming language).

When you run a `Markdown`

cell, its contents are formatted and displayed. A `Markdown`

cell can also contain straight HTML code. However, there won't be any need for this in this course.

When you run a `Code`

cell, the code in it is executed in the current *kernel*, and any output from it is displayed after the cell. Any object created during execution (variables, function definitions,...) remains alive in the kernel, so you can assign a value to a variable in one cell and access it in a later cell. You can do various things to the kernel, such as restart it, interrupt it, and so forth. Please browse the `Kernel`

menu at the top of the page.

To make, edit, delete, run, and do several other things to and with cells, you use the commands in the `Edit`

, `View`

, `Insert`

, `Cell`

menus. Look at these menus and try out some of the commands. The icons just below the menus are shortcuts for some of the more common commands.

In `Markdown`

cells, all text except mathematics is formatted with *Markdown* syntax, a simplified document annotation language. The mathematics is enclosed between dollar signs when it is inline with the text like the equation $\frac{de^x}{dx} = e^x$, and between double dollar signs when it is displayed in the middle of a line on its own, like this:

$$\int_0^{\infty} e^{x} dx = 1$$

(look at the source of this text to see how this works). The easiest way to figure our how Markdown and LaTeX work is to look at the source of this notebook to see how text and math were composed, and how code was entered.

For your reference, here is a list of Markdown syntax elements

For your reference, here is a list of LaTeX commands for mathematics

After you make changes to a cell, you can see the results of your changes by selecting `Cell->Run Cells`

or `Cell->Run All`

. The first command runs only the cell you are in, or the cells you select. The second command runs all the cells in the notebook. The `Run`

button in the menu bar is a shortcut for `Cell->Run Cells`

.

To work on this assignment, open the HTML version `source01.html`

of the assignment in one browser window, and open the notebook `homework01.ipynb`

in the notebook editor (in another browser window) next to it. **It is strongly recommended that you close**`source01.ipynb`

to make sure you do not inadvertently put your work into that file.

As you work on `homework01.ipynb`

, add as many cells as you need to answer the questions (use the `Insert->Cell Below`

menu or the `+`

icon in the menu bar). When you do so, make sure you select the proper cell type (`Markdown`

for text, `Code`

for Python code) from the menu, or else your formatting of text will not work, and your code will not run.

Some of the problems below involve coding, some do not.

You will fit a degree-$k$ polynomial $$ h(x) = c_0 + c_1 x + \ldots + c_k x^k $$ for various values of $k$ to a set $$ T = \{(x_0, y_0), \ldots, (x_{N-1}, y_{N-1}) \}$$ of data samples, each sample containing two real numbers. We number the $N$ samples from $0$ to $N-1$ rather than from $1$ to $N$ to avoid confusion with Python-style subscripts, which start at $0$. Note that a polynomial of degree $k$ has $k+1$ coefficients.

To review the theory, the goal is to pick the vector
$$ \mathbf{c} = [c_0, \ldots, c_k]^T $$
of coefficients that minimize the *quadratic risk*
$$L_T(\mathbf{c}) = \sum_{n=0}^{N-1} [y_n - h(x_n)]^2\;.$$
In the expression for $\mathbf{c}$ above, the $T$ superscript denotes transposition, so we think of $\mathbf{c}$ as a column vector.

When this problem is solved, the sample $(x_n, y_n)$ yields the following *linear* equation in the unknowns:
$$ c_0 1 + c_1 x_n + \ldots + c_k x_n^k = y_n $$
which can be written in vector form as follows:
$$ \mathbf{x}_n^T \mathbf{c} = y_n $$
where we define
$$ \mathbf{x}_n = [1, x_n, \ldots, x_n^k]^T \;.$$

We can now form the $N\times (k+1)$ matrix and $N\times 1$ vector $$ A = \left[\begin{array}{c} \mathbf{x}_0^T\\\vdots\\\mathbf{x}_{N-1}^T \end{array}\right] \;\;\;\text{and}\;\;\; \mathbf{a} = \left[\begin{array}{c} y_0\\\vdots\\y_{N-1} \end{array}\right] $$ and write the linear system $$ A \mathbf{c} = \mathbf{a} $$ of $N$ equations in $k+1$ unknowns, to be solved in the least-squares sense: $$\widehat{\mathbf{c}} \;\in\; \arg\min_{\mathbf{c}} \|A \mathbf{c} - \mathbf{a}\|^2\;.$$ This system may be under-determined, exact ($A$ is square and full rank), or over-determined, so it may admit zero, one, or infinitely many solutions.

Let $N = 1$ and $k = 1$, so that we are fitting a straight line to one sample point $(x_0, y_0)$. There are infinitely many lines through a single point, so we expect infinitely many solutions.

That is, write out the equation explicitly in terms of $x_0$, $y_0$, $c_0$, and $c_1$. Do *not* use matrix notation.

Give expressions for two different possible solutions $\mathbf{c}$ to this equation in terms of $x_0$ and $y_0$.

There are infinitely many correct answers (and even more wrong ones!), just pick two easy ones. Make sure that your expressions work even if $x_0$ or $y_0$ (or both) are zero.

Write and run Python code to draw, in a single plot, the two lines corresponding to the two solutions you gave when $$(x_0, y_0) = (2, 1).$$

To help you, the plotting function `show`

is defined for you, and is written so that it will work also for more complex cases later on. The function `show`

takes `numpy`

vectors `x`

and `y`

, which together represent a training set $T$, and a list `cList`

of `numpy`

vectors of polynomial coefficients. It then plots the training points as well as each polynomial in the list. For this problem, `x`

and `y`

contain a single number each, and `cList`

contains the two vectors $\mathbf{c}'$ and $\mathbf{c}''$. A commented-out example is given for how the function is to be called for this problem, with just one (arbitrary) vector $\mathbf{c}$ in the list.

In [1]:

```
import numpy as np
import matplotlib.pyplot as plt
def show(x, y, cList = []):
plt.ion()
plt.plot(x, y, marker='.', markersize=12, ls='')
npt = 100
xrange = [x - 1, x + 1] if x.size == 1 else [np.amin(x), np.amax(x)]
xFine = np.linspace(xrange[0], xrange[1], npt)
for c in cList:
nc = c.size
ycFine = np.zeros(xFine.shape)
xPow = np.ones(xFine.shape)
for i in range(nc):
ycFine += c.item(i) * xPow
xPow *= xFine
plt.plot(xFine, ycFine, label = 'degree ' + str(nc-1))
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
# Usage example
# p = [2, 1]
# show(np.array(p[0]), np.array(p[1]), [np.array([1, 1])])
```

You may wonder why the code above says `c.item(i)`

rather than `c[i]`

. The reason is that the latter version will not work when `nc`

$=1$. In that case, `c`

is what `numpy`

calls a "zero-dimensional array", and therefore it expects zero subscripts, while the expression `c[i]`

uses one subscript. The notation `c.item(i)`

works even in that case.

Some may argue that this is a bug rather than a feature of the `numpy`

package. While expecting a number of subscripts equal to the number of dimensions is a good concept, the behavior with zero-dimensional arrays often surprises programmers with unexpected error messages (`IndexError: too many indices for array`

).

Let us now consider the case $N = 2$ and $k = 1$: Fitting a straight line to two points $(x_0, y_0)$ and $(x_1, y_1)$. We assume that the line is not vertical, that is, that $$x_0 \neq x_1 \;,$$ so that it can be written in the form $$ c_0 + c_1 x = y \;.$$ There is one such line, so we expect exactly one solution. You may remember from geometry that the line between these two points has equation $$ y - y_0 = \frac{y_1 - y_0}{x_1 - x_0}\ (x - x_0) $$

Derive this last equation from the system $$A\mathbf{c} = \mathbf{a}$$ spelled out for this special case.

Specifically, spell out this system for $N = 2$ and $k = 1$, similarly to what you did earlier, and solve the system by Gaussian elimination and back-substitution. Plug the values of $c_0$ and $c_1$ into the line equation and rearrange the resulting expression to derive the second equation for the line given above. Show your work.

Finally, let us consider the over-determined case through an example with $N = 3$ and $k=1$: Fitting a straight line to three points.

We want to write the line in the form $$ c_0 + c_1 x = y $$ and therefore we must assume that $x_0, x_1, x_2$ are not all equal to each other (or else the line would be vertical).

Then the matrix $A$ is full rank. The system admits an exact solution only if the three points happen to be aligned. We want a more general solution, so we solve the system in the Least-Squares sense.

Recall from linear algebra that the unique Least-Squares solution to the over-determined, full-rank system
$$A \mathbf{c} = \mathbf{a}$$
is the solution to the *normal equations*
$$ A^T A \mathbf{c} = A^T \mathbf{a} \;.$$

The key advantage of this system over the original one is that the matrix $A^T A$ is full rank (because $A$ is full rank), and can therefore be inverted. In our case, the matrix $A^T A$ is $2\times 2$, so we can invert it by hand.

To simplify the notation in the manipulations that follow, make the following definitions: $$ \begin{eqnarray*} X &=& \sum_{n=0}^{N-1} x_n \\ Y &=& \sum_{n=0}^{N-1} y_n \\ S &=& \sum_{n=0}^{N-1} x_n^2 \\ P &=& \sum_{n=0}^{N-1} x_n y_n \end{eqnarray*} $$

Write the matrix $A^T A$ and vector $A^T \mathbf{a}$ in terms of $N, X, Y, S, P$ for the special case $k=1$.

The value of $N$ does not matter, since $N$ is hidden by the definitions above.

Find expressions for $c_0$ and $c_1$ in terms of $N, X, Y, S, P$ by solving the normal equations. Recall that $$\left[\begin{array}{cc}a & b\\c & d \end{array}\right]^{-1} \;=\; \frac{1}{ad-bc}\ \left[\begin{array}{rr}d & -b\\-c & a \end{array}\right]$$

Use the function

`show`

given earlier to display the three points $(0, -1), (1, 1), (3, 0)$ and the line fit to them with the formulas you just found for $c_0$ and $c_1$.

Show your code, the numerical values of $c_0$ and $c_1$ with three decimal digits after the period, and the resulting figure.

(Hint: It's less error prone to use Python to compute and print $c_0$ and $c_1$, but if you want to do that by hand that's OK, too.)

**If you are comfortable with Python 3 and numpy, you may skip this section.**

Let us first load the data set $T$ from file `T.txt`

into a `numpy`

array. We use the `loadtxt`

function to load the data, since `T.txt`

is a text file with two numbers per row.

Read the documentation of `loadtxt`

. Do this for other functions as well, we won't keep telling you. Google is your friend.

Let's also assume that we want to fit a quadratic polynomial ($k = 2$), and that we want to assemble the matrix $A$.

In [2]:

```
x, y = np.loadtxt('T.txt', unpack=True)
N, k = len(y), 2
```

The `unpack`

flag transposes the array, so that every item in even position (starting at 0) in the original array, that is all items in the first row of the transposed array, goes into `x`

and every other item goes into `y`

. A `numpy`

vector makes no distinction between row and column vectors, so there is no need to transpose the results `x`

and `y`

.

For later reference, `N`

is the number of samples and `k`

is the degree of the desired polynomial.

If all works well, you should see no result. To check what happened you could display the pair `(x, y)`

with the following command (parentheses are not needed, but can be added without consequence):

In [3]:

```
x, y
```

Out[3]:

To make the array $A$, list comprehensions come in handy. The code below makes the desired array as a Python list of lists (no `numpy`

) and uses the result to initialize a `numpy`

array.

In [4]:

```
A = np.array([[x.item(i) ** j for j in range(k+1)] for i in range(N)])
A
```

Out[4]:

If list comprehensions confuse you, just use `for`

loops, but the code becomes more complicated and less efficient:

In [5]:

```
A = np.zeros((N, k+1))
for i in range(N):
for j in range(k+1):
A[i, j] = x.item(i) ** j
A
```

Out[5]:

The result is the same.

Initially, a good way to understand list comprehensions is to first write the loop version, then rewrite it "inside out" as a comprehension.

The examples in the previous parts were useful to refresh your memory about basic concepts from linear algebra, and to understand when a polynomial fitting problem has 0, 1, or infinitely many solutions. When the degree of the polynomial is greater than 1, however, calculations by hand become impractical or impossible.

Instead, we always solve the linear system $$ A \mathbf{c} = \mathbf{a} $$ in the Least-Square sense (that is, as a minimization problem), and we always use a numerical package. When there is exactly one solution, we obtain just that (the "least square" is 0). When there are infinitely many solutions, we obtain one of them, and which one we get depends on the solution method used.

A good way to make the solution unique in all cases is to stipulate that whenever multiple solutions are available the one of *minimum norm* is returned. "Norm" here refers to the Euclidean norm of the vector $\mathbf{c}$. The function `numpy.linalg.lstsq`

returns the minimum-norm solution when multiple solutions exist, and it achieves this result by using the singular value decomposition of $A$.

In problem 1.3 you found two different solutions to a data fitting problem. Let us call the corresponding coefficient vectors $\mathbf{c}$ and $\mathbf{d}$. Write the values of $\mathbf{c}$ and $\mathbf{d}$, and report their norms. Which is smaller?

Your answer may vary, as it depends on which lines you came up with in 1.3. If you write code (computation by hand is OK, too), you may want to use `numpy.linalg.norm`

to compute the norm of `numpy`

arrays.

Write a function with header

`def fit(x, y, k):`

that fits a polynomial of degree

`k`

to the data in`x`

and`y`

.

This function takes `numpy`

vectors `x`

and `y`

with $N$ coordinates each and a degree `k`

(a nonnegative integer), and returns a `numpy`

vector `c`

with the coefficients of the polynomial of degree `k`

that fits the points `(x[n], y[n])`

best in the Least-Squares sense. The vector of coefficients should have minimum norm when multiple solutions are available, so use `numpy.linalg.lstsq`

. Show your code.

To solve the linear system $A\mathbf{c} = \mathbf{a}$, call `numpy.linalg.lstsq`

as follows:

```
c = np.linalg.lstsq(A, a, rcond=None)[0]
```

The `rcond=None`

option silences a warning message that has to do with the matrix condition number. The `[0]`

at the end picks only the first out of four different data structures returned by the function. We don't need the other three in this assignment.

Your code should work also when `k`

is zero (fitting a constant), so make sure you use `.item(i)`

rather than `[i]`

for indexing `numpy`

arrays wherever the issue described in the programming note after Problem 1.3 is relevant.

Your code should also work when a single point is passed (so that `x.size`

and `y.size`

are equal to $N=1$). This corner case reveals another quirk of `numpy`

. Specifically, the matrix $A$ in that case has a single row (and is possibly a single number if $k=0$ as well). However, `numpy.linalg.lstsq`

expects its first argument to be a two-dimensional array, while we now have a zero- or one-dimensional array. This will raise an error when the function is called if you use `np.array`

to hold your data.

Instead, use `np.matrix`

, which distinguishes between row vectors and column vectors, and handles low-dimensional matrices more gracefully for linear algebra applications. We pass `np.array`

arguments to `fit`

because these are the more commonly used containers and are more convenient to use in several ways. Because of this, you need to cast `np.array`

objects to `np.matrix`

objects inside `fit`

. If `arr`

is an `np.array`

, just say

```
mat = np.matrix(arr)
```

to make a matrix `mat`

out of it. If `arr`

is a one-dimensional array, to make sure that the corresponding matrix has a single *column*, rather than a single row, you need to explicitly reshape the result:

`col = np.reshape(np.matrix(arr), [arr.size, 1])`

Show the plot and output obtained with the following code, where

`fit`

is your function. The file`T.txt`

should be in the same directory as your notebook.

In [6]:

```
try:
p13 = {'x' : np.array(2), 'y' : np.array(1), 'k' : [1]}
np.set_printoptions(formatter={'float': '{: .3f}'.format})
def run(p):
x, y = p['x'], p['y']
cList = [fit(x, y, k) for k in p['k']]
show(x, y, cList)
for c in cList:
print(c.T)
run(p13)
except NameError:
pass
```

What is the approximate norm of the solution (to three decimal digits), and is this result consistent with your results in problem 1.3? Why or why not?

a. Show the plots and outputs obtained with the code below, and answer the following questions:

b. Are the plot and value of $\mathbf{c}$ from the data in

`p33`

consistent with your answer to problem 3.3?c. Give a brief qualitative description of the way(s) in which the curves in the plot from the data in

`notes`

differ from the curves in Figure 1 of the class notes on data fitting. The plots in that Figure were computed from the same ten points as in`T.txt`

but with a different numerical package.

In [7]:

```
try:
p33 = {'x' : np.array([0, 1, 3]), 'y' : np.array([-1, 1, 0]), 'k' : [1]}
x, y = np.loadtxt('T.txt', unpack=True)
notes = {'x' :x, 'y' : y, 'k' : [1, 3, 9]}
run(p33)
run(notes)
except NameError:
pass
```

If you noticed any discrepancies between your plot and the Figure in the class notes, first state why the discrepancies should not occur under ideal circumstances. Then suggest some reasons for them. For your information, the coefficients found for $k=9$ for the Figure in the class notes are as follows:

$2.030, -295.642, 7107.144, -62956.114, 288218.499, -765400.622, 1224862.633, -1164835.634, 606187.939, -132887.534$