Linear Algebra in Machine Learning

Course Info Syllabus Topics Readings Projects Course Info
As Taught in:

2019/2020

Level:

Undergraduate

Learning Resource Types:

=> Problem Sets

=> Notes from instructor insights

=>Reading Resources

Syllabus
Course Description:

Linear Algebra in Machine Learning concepts are key for understanding and creating machine learning algorithms, especially as applied to deep learning and neural networks. This”Linear Algebra in Machine Learning” course reviews linear algebra with applications to probability and statistics and optimization – and above all a full explanation of deep learning.

Course Meeting:

Lectures : 3 sessions @ 1 hour / week

Prerequisites:

Introduction to Linear Algebra

Recommended Text:

There is no required text, but if learner have the book normally used as a textbook for “Introduction to Linear Algebra”, it might be helpful.

Strang, Gilbert. 2009. Linear Algebra and Learning from Data. Published by Wellesley-Cambridge Press. ISBN: 9780692196380.

Strang, Gilbert. 2018. Linear Algebra and Learning from Data. Published by Wellesley-Cambridge Press. ISBN: 9780692196380

Practices

There will be written assignments, labs, and a final project. The combination of those three elements will determine your understanding in this course. 

Note:

The site includes problems assigned for every lecture, aligned with readings in the course textbook.

Topics
A topic that will be covered are:
  1. The Column Space of “A” Contains All Vectors “Ax
  2. Multiplying and Factoring Matrices
  3. Orthonormal Columns in “Q” give “Q’Q=I
  4. Eigenvalues and Eigenvectors
  5. Positive Definite and Semidefinite Matrices
  6. Singular Value Decomposition (SVD)
  7. Eckart-Young: The Closest Rank “k” Matrix to “A
  8. Norms of Vectors and Matrices
  9. Four Ways to Solve Least Squares Problems
  10. Survey of Difficulties with ” Ax=b
  11. Minimizing ” ||x|| ” subject to ” Ax=b
  12. Computing Eigenvalues and Singular Values
  13. Randomized Matrix Multiplication
  14. Low Rank Changes in ” A ” and its inverse
  15. Matrices ” A(t) ” depending on ” “, Derivative = ” dA/dt
  16. Derivative of Inverse and Singular Values
  17. Rapidly Decreasing Singular Values
  18. Counting Parameters in SVD, LU, QR, Saddle Points
  19. Saddle Points Continued, Maxmin Principle
  20. Definitions and Inequalities
  21. Minimizing a Function Step-by-Step
  22. Gradient Descent: Downhill to a Minimum
  23. Accelerating Gradient Descent (Use Momentum)
  24. Linear Programming and Two-Person Games
  25. Stochastic Gradient Descent
  26. Structure of Neural Nets for Deep Learning
  27. Backpropagation: Find Partial Derivatives
  28. Computing in Class
  29. Completing a Rank-One Matrix, Circulants
  30. Eigenvectors of Circulant Matrices: Fourier Matrix
  31. ImageNet is a Convolutional Neural Network (CNN), The Convolution Rule
  32. Neural Nets and the Learning Function
  33. Distance Matrices, Procrustes Problem
  34. Finding Clusters in Graphs
  35. JULIA Language
Readings

Reading assignments are all in the recommended textbook:

Strang, Gilbert. 2018. Linear Algebra and Learning from Data. Published by Wellesley-Cambridge Press. ISBN: 9780692196380

Section 1.1: Multiplication ” Ax ” using columns of ” A ” – The column space of ” ” contains all vector ” Ax

Section 1.2: Matrix Multiplication ” AB ” – Multiplying and Factoring Matrices

Section 1.5: Orthogonal Matrices and Subspaces – Orthonormal Columns in ” Q ” give ” Q’Q = I

Section 1.6: Eigenvalues and Eigenvectors

Section 1.7: Symmetric Positive Definite Matrices – Positive Definite and Semidefinite Matrices

Section 1.8: Singular Values and Singular Vectors in the SVD

Section 1.9: Principal Components and the Best Low Rank Matrix – Eckart-Young: The Closest Rank ” k ” matrix to ” A

Section 1.11: Norms of Vectors and Functions and Matrices

Section II.2: Least Squares – Four Ways to Solve

Intro Chapter 2: Introduction to Computations with Large Matrices – Survey of Difficulties with ” Ax = b

Section 1.11: Norms of Vectors and Functions and Matrices – Minimizing ” ||x|| ” subject to ” Ax = b

Section II.1: Numerical Linear Algebra – Computing Eigenvalues and Singular Values

Section II.4: Randomized Linear Algebra – Randomized Matrix Multiplication

Section III.1: Changes in ” A^-1 ” from Changes in “A” – Low Rank Changes in “A” and it’s inverse

Section III.2: Interlacing Eigenvalues and Low Rank Signals – Matrices ” A(t) ” depending on ” t ” , derivative = ” dA/dt ” – Derivatives of Inverse and Singular Values – Counting Parameters in SVD, LU, QR, Saddle Points

Section III.3: Rapidly Decaying Singular Values – Rapidly Decreasing Singular Values

Section V.1: Mean, Variance, and Probability – Saddle Points Continued, Maxmin Principle

Section VI.1: Minimum Problems: Convexity and Newton’s Method & Section VI.4: Gradient Descent Toward the Minimum – Minimizing a Function Step-by-step

Section VI.4: Gradient Descent Toward the Minimum – Gradient Descent: Downhill to a Minimum & Accelerating Gradient Descent (Use Momentum)

Section VI.2: Lagrange Multipliers = Derivatives of the Cost & Section VI.3: Linear Programming, Game Theory, and Duality – Linear Programming and Two-Person Games 

Section VI.5: Stochastic Gradient Descent and ADAM – Stochastic Gradient Descent

Section VII.1: The Construction of Deep Neural Networks – Structure of Neural Nets for Deep Learning

Section VII.3: Backpropagation and the Chain Rule – Backpropagation: Find Partial Derivatives

Section VII.2: Convolutional Neural Nets – Computing in Class

Section IV.8: Completing Rank One Matrices & Section IV.2: Shift Matrices and Circulant Matrices – Completing a Rank One Matrix, Circulants

Section IV.2: Shift Matrices and Circulant Matrices – Eigenvectors of Circulant Matrices: Fourier Matrix & ImageNet is a Convolutional Neural Network (CNN), the Convolution Rule

Section VII.1: The Construction of Deep Neural Networks & Section IV.10: Distance Matrices – Neural Nets and the Learning Function

Section IV.9: The Orthogonal Procrustes Problem & Section IV.10: Distance Matrices – Distance Matrices, Procrustes Problem

Section IV.6: Graphs and Laplacians and Kirchhoff’s Laws & Section IV.7: Clustering by Spectral Methods and k-means – Finding Clusters in Graphs

Section III.3: Rapidly Decaying Singular Values & Section VII.2: Convolutional Neural Nets – JULIA Language  

Projects
Project Descriptions:

The project takes the place of what would have been the last three homework assignments for the course. You can work in groups of two or three. Pick one of the problems that we are learning about, and take it further – to numerical examples, to applications, to testing a solution algorithm, or certainly to computations (using any language). Submit a short proposal just before or just after you drafted. The project is due at the end of the course.

Topic ideas:
  • SVD Applications to PCA
  • Comparison of Algorithms
  • Applications to Random Matrices
  • Least Squares Comparisons of 4 Ways: Speed and Accuracy
  • Comparison with Gradient Algorithms
  • Sparse Solutions Using l_1
  • Basis Pursuit and Other l_1 Optimizations Matrix Completion
  • Gradient Descent and Stochastic Gradient Descent
  • Acceleration Methods
  • Learning Weights in a Neural Net
  • Many Parameters – Which Solution is Found by Descent?
  • Low Rank Approximations
  • Basic Applications

The key success for the project and insight based on your understanding on below written assignments:

Section 1.1: The Column Space of ” A ” contains all vectors ” Ax ” – problem set 1.1 (please log in to access it)

Section 1.2: Multiplying and Factoring Matrices – problem set 1.2 (please log in to access it)

Section 1.5: Orthonormal Columns in ” Q ” give ” Q’Q = I ” – problem set 1.5 (please log in to access it)

Section 1.6: Eigenvalues and Eigenvectors – problem set 1.6 (please log in to access it)

Section 1.7: Positive Definite and Semidefinite Matrices – problem set 1.7 (please log in to access it)

Section 1.8: Singular Value Decomposition (SVD) – problem set 1.8 (please log in to access it)

Section 1.9: Eckart-Young: The Closest Rank ” k ” matrix to ” ” – problem set 1.9 (please log in to access it)

Section 1.11: Norms of Vectors and Matrices – problem set 1.11 (please log in to access it)

Section II.2: Four Ways to Solve Least Square Problems – problem set II.2 – 02, 08, 09 (please log in to access it)

Introduction Chapter 2: Survey of Difficulties with ” Ax = b ” – problem set II.2 – 12, 17 (please log in to access it)

Section I.11: Minimizing ” ||x|| ” subject to ” Ax = b ” – problem set I.11 – 06 & problem set II.2 – 10 (please log in to access it)

Section II.1: Computing Eigenvalues and Singular Values – problem set II.1 (please log in to access it)

Section II.4: Randomized Matrix Multiplication – problem set II.4 (please log in to access it)

Section III.1: Low Rank Changes in ” A ” and Its inverse – problem set III.1 (please log in to access it)

Section III.1 – III.2: Matrices ” A(t) ” depending on ” t “, derivative = ” dA/dt ” – problem set III.2 – 01, 02, 05 (please log in to access it)

Section III.1 – III.2: Derivatives of Inverse and Singular Values – problem set III.2 – 03, 12 (please log in to access it)

Section III.3: Rapidly Decreasing Singular Values – problem set III.3 (please log in to access it)

Appendix, Section III.2: Counting Parameters in SVD, LU, QR, Saddle Points – problem set III.2 (please log in to access it)

Section III.2 & V.1 : Saddle Points Continued, Maxmin Principle – problem set V.1 – 03, 08 (please log in to access it)

Section V.1 & V.3: Definitions and Inequalities – problem set V.1 – 10, 12 & problem set V.3 – 03 (please log in to access it)

Section VI.1 & VI.4: Minimizing a Function Step-by-step – problem set VI.1 (please log in to access it)

Section VI.4: Gradient Descent Downhill to a Minimum – problem set VI.4 – 01, 06 (please log in to access it)

Section VI.4: Accelerating Gradient Descent (Use Momentum) – problem set VI.4 – 05 (please log in to access it)

Section VI.2 – VI.3: Linear Programming and Two-Person Games – problem set VI.2 – 01 & problem set VI.3 – 02, 05 (please log in to access it)

Section VI.5: Stochastic Gradient Descent – problem set VI.5 (please log in to access it)

Section VII.1: Structure of Neural Nets for Deep Learning – problem set VII.1 (please log in to access it)

Section VII.2: Backpropagation to find Derivative of the Learning Function – problem set VII.2 (please log in to access it)

Section IV.8 & IV.2: Completing a Rank-One Matrix, Circulants! – problem set IV.8 & problem set IV.2 (please log in to access it)

Section IV.2: Eigenvectors of Circulant Matrices: Fourier Matrix – problem set IV.2 – 03, 05 (please log in to access it)

Section IV.2: ImageNet is a Convolutional Neural Network (CNN), the Convolution Rule – problem set IV.2 – 04 (please log in to access it)

Sections VII.1, & IV.10: Neural Nets and the Learning Function – problem set VII.1 & problem set IV.10 (please log in to access it)

Sections IV.9 & IV.10: Distance Matrices, Procrustes Problem – Problem Set IV.9 (please log in to access it)

Section IV.6 & IV.7: Finding Clusters in Graphs – problem set IV.6 (please log in to access it)

Section III.3 & VII.2 – JULIA Language – for insight (please log in to access it)