CSCI152 PO-01 Neural Networks

Spring 2022

Tuesday and Thursday

9:35 to 10:50 AM Pacific Time

EDMS Room 229 (Edmunds)

Instructors

Professor Anthony Clark

Research Website

Philosophy

I’d like to give a few brief thoughts on my philosophy and the common traits I’ve seen in successful students.

Philosophy

Traits of Success

See my advising page for additional information on CS and being a student at Pomona College.

Teaching Assistants

Kevin Ayala
Millie Mince
Antonio Revilla
Kevin Ayala
Millie Mince
Antonio Revilla

TAs have the following duties:

Office and Mentor Hours

Zoom links are pinned in the course Slack Channel.

You can also contact me on Slack to setup additional office hours.

I highly recommend showing up to mentor sessions with the intent that you’ll work on your homework and then ask the occasional question as it pops up.

Objectives

Catalog Description: An introduction to the theory and practical applications of neural networks. This course will cover simple perceptrons through modern convolutional and recurrent neural networks. Topics include gathering and processing data, optimization algorithms, and hyperparameter tuning. Application domains include computer vision, natural language processing, recommender systems, and content generation. Ethical implications of design decisions will also be considered throughout the course.

Prerequisites: Data Structures and Calculus

Learning Objectives: Upon completion of this course, students will be able to:

  1. Explain ethical considerations of NN applications.
  2. Explain recent advances in neural networks (i.e., deep learning).
  3. Explain how NNs compare to other machine learning techniques.
  4. List and understand the major application domains of NNs.
  5. Understand and code foundational NN techniques.
  6. Create, clean, and examine datasets.
  7. Understand the hyperparameters and dynamics (e.g., bias and variance) of training an NN model.
  8. Use an NN framework and deploy build a model for a real-world application.

Resources

We will not have a book for this class. But our course structure if fairly close to the fastai book.

This document from fall 2020 is a nice starting point for help. It includes resources for:

I’d like to recommend listening to the Happiness Lab.

Please let me know if you have any other concerns.

Some helpful links to courses, tutorials, books, etc.

Courses

Books

Math

Extras

Python

Ethics

Libraries/Frameworks/Tools

Logistics

These plans are subject to change based on our experiences and your feedback.

For the most part, we will plan on meeting in person, but I have included plans for remote and asynchronous courses below since it may very well come up.

Meeting Periods

Class periods will include a mix of lecture, live coding, and working on group projects. In general, you can expect the following format:

  1. Prepare for lecture by reading or watching provided materials.
  2. Complete a pre-lecture survey at least 24 hours prior to lecture.
  3. Attend lecture where we discuss points of confusion and work through problems.

Zoom Etiquette

You are not expected to have your video on, but it would be nice to turn it on during group discussions in breakout rooms. Also, please set your display name to your preferred name (and include your pronouns if you’d like; for example, “Prof. Clark he/him/his).

During Zoom lectures, I will provide a link to a google sheet that we can use for anonymous discussions.

After lectures, I’d like to leave the zoom room open for an additional 15 minutes for all of you. So, I’ll likely choose willing people to make the host once I leave.

Communication with Slack

Please use Slack (not email) for all communication. If you email me questions, I will likely ask you to make a comment on Slack. This has several benefits:

Slack has text, video, and audio options that the TAs and I will use (along with Zoom) to hold office hours and mentoring sessions. I will also use Slack to solicit anonymous feedback on how the course is going. You can create group messages that include you, other classmates, the TAs, and me if you want to discuss a question asynchronously.

Useful Slack commands:

Working Together

I recommend using Slack (text, audio, or video) for communication, and using the Visual Studio Code Live Share Extension for pair programming. We will spend some time in class getting this setup, but these instructions will also be of use.

Feedback

If you have any question for me, I would love for you to ask them on Slack. I will enable a bot that will allow you to make comments anonymously. The CS department also has an anonymous feedback form if you would like to provide feedback outside of class channels.

Grading

Grading for this course is fairly straightforward.

Grades of “D” and “F” are reserved for situations in which a student does not meet the criteria above. Mostly it will be for students that work on a successful project, but do not contribute to the project as indicated by peer evaluations. Notice that you can get a “C” in the course by completing the assignments and skipping out on the project. I’ve not had anyone do this before, but the option is there for those that want to spend less time on this course and more time on other activities (courses, life, sports, etc.).

Feedback Surveys

In the Pre-Class column of the schedule you will find links to blog posts, research articles, YouTube videos, videos I create, etc. I expect you to read/watch these materials before class, and I will ask you to complete weekly feedback surveys on gradescope. These surveys are completely optional, but I encourage you to participate if you want to get the most out of the course.

These will make lectures more concrete and provide you with a means of giving me feedback (for example, on a topic that I didn’t cover well).

Assignments

You will submit assignments to gradescope. You may work on assignments individually or with a partner of your choice. If you’d like to be assigned a partner, please send me a message on Slack and I will randomly assign partners when I have a pool of students.

Assignments will not be graded per se; instead, you will meet with a TA and walk them through your answers and code. They will then mark your assignment as “passed” or “needs revisions”. You should not meet with a TA until you have completed your assignment. Of course, you can still visit them during mentor sessions to ask for help.

If you do not pass an assignment on the first meeting, then you will need to work on your answers, resubmit, and then schedule a new time to meet with a TA. As a rule-of-thumb, you can think of a “pass” as at least a 90% on the assignment.

Project

Everyone is expected to complete a course project related to neural networks (though you can still pass the course without the project). All projects reports will be hosted as websites. Project grading will rely heavily on self-assessments and peer evaluations.

Project Milestones

Course projects have the following milestones (see the schedule for exact deadlines).

  1. Individual Proposals (due week 3)
  2. Introduction Outline (due week 4)
  3. Related Works Search (due week 5)
  4. Project Update 1 (due week 6)
  5. Self-Assessment and Peer Evaluations 1 (due week 6)
  6. Introduction and Related Works Draft (due week 7)
  7. Project Update 2 (due week 8)
  8. Methods Outline (due week 9)
  9. Self-Assessment and Peer Evaluations 2 (due week 10)
  10. Discussion Outline (due week 11)
  11. Complete Rough Draft (due week 14)
  12. Final Self-Assessment and Peer Evaluations (due Monday finals week)
  13. Complete Project and Revisions (due Wednesday finals week)

Why Develop a Project Like This?

I want you to work on your project this way because it makes it easier for:

Project Levels

A-level project milestones

All thirteen project milestones must be marked as “pass.”

B-level project milestones

All project milestones 1 through 11 (the Complete Rough Draft due week 14) must be marked as “pass.”

C-level project milestones

The following project milestones must be marked as “pass”:

Peer Evaluations

Peer evaluations are used as a way to ensure that all group members are contributing. If group members disagree in the evaluations I will ask for more information, and it is possible that I will make appropriate grade adjustments based on this feedback.

Policies

Accommodations

If you have a disability (for example, mental health, learning, chronic health, physical, hearing, vision, neurological, etc.) and expect barriers related to this course, it is important to request accommodations and establish a plan. I am happy to help you work through the process, and I encourage you to contact the Student Disability Resource Center (SDRC) as soon as possible.

I also encourage you to reach out to the SDRC if you are at all interested in having a conversation. (Upwards of 20% of students have reported a disability.)

Academic Honesty and Collaboration

I encourage you to study and work on assignments with your peers (unless otherwise specified). If you are ever unsure about what constitutes acceptable collaboration, please ask!

For more information, see the Computer Science Department and the Pomona College policies.

I take violations of academic honesty seriously. I believe it is important to report all instances, as any leniency can reinforce (and even teach) the wrong mindset (“I can get away with cheating at least once in each class.”).

Academic Advisory Notice

I will do my best to update you if I think you are not performing at your best or if you are not on pace to pass the class. I will first reach out to you and then use the system built-in to my.pomona.edu that will notify your advisor so you are encouraged to work with a mentor or advisor on a plan.

Attendance

I expect you to attend every class, but I will not penalize you for missing class. Know that there is a strong correlation between attendance and grades, and you will almost certainly be indirectly penalized.

You are responsible for any discussions, announcements, or handouts that you miss, so please reach out to me. If you need to leave class early for any reason, please let me know before class begins so that I am not concerned when you leave.

Late Submissions

Late assignments will not be accepted. However, if you plan ahead you can ask for an extension prior to the assignment deadline (at least four days).

Unless requested ahead of time, some assessments (e.g., exams) cannot be completed after the class period in which they are scheduled.

Covid Safety Awareness

During the past academic year, we built community remotely, and this year we will build on the pedagogical improvements we acquired last year. For example, we might meet on zoom from time to time, or hold discussions online.

Our health, both mental and physical, is paramount. We must consider the health of others inside and outside the classroom. All Claremont Colleges students have signed agreements regulating on-campus behavior during the pandemic; in the classroom, we will uphold these agreements. We need to take care of each other for this course to be successful. I ask you therefore to adhere to the following principles:

The pandemic is fast-moving, and we might have to adjust these principles as the semester evolves. I am always happy to receive your feedback to make this course work.

Let’s care for each other, show empathy, and be supportive. While there will likely be some community transmission and breakthrough infections, together, we can minimize their effect on our community and on your learning.

Calendar

January
Sun
Mon
Tue
Wed
Thu
Fri
Sat
26
27
28
29
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
19
21
22
23
24
28
29
30
31
1
2
3
4
5
May
Sun
Mon
Tue
Wed
Thu
Fri
Sat
1
2
4
6
7
8
10
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
2
3
4

Schedule

If the schedule below doesn’t look up-to-date, you might need to reload the page while overriding cache.
Here are all resources from the previous time this course was taught

The check-boxes are for your own use so that you can easily come back to this page and see where you left off (I can’t see what you do on this page). I’m using localstorage to get this working, so no guarantees it will work on your device.

DatePre-ClassIn-ClassAssignment
Tue Jan 18
Thu Jan 20
Tue Jan 25
Wed Jan 26
Thu Jan 27
Tue Feb 01
Wed Feb 02
Thu Feb 03
Tue Feb 08
Thu Feb 10
Tue Feb 15
Wed Feb 16
Thu Feb 17
Tue Feb 22
Wed Feb 23
Thu Feb 24
Tue Mar 01
Wed Mar 02
Thu Mar 03
Tue Mar 08
Wed Mar 09
Thu Mar 10
Tue Mar 15Spring break -- No Class
Thu Mar 17Spring break -- No Class
Tue Mar 22
Wed Mar 23
Thu Mar 24
Tue Mar 29
Wed Mar 30
Thu Mar 31
Tue Apr 05
Wed Apr 06
Thu Apr 07
Tue Apr 12
Wed Apr 13
Thu Apr 14
Tue Apr 19
Wed Apr 20
Thu Apr 21
Tue Apr 26
Wed Apr 27
Thu Apr 28
Tue May 03
Thu May 05
  • Reading Day -- No class
Mon May 09
Wed May 11
  • No in-class final

Course Units

Unit 1: Introduction, Demos, and Ethics

We will start by digging into demos created using PyTorch, Hugging Face, and Streamlit. I want to quickly show you some application areas and give you an idea of the different possibilities for projects.

We will discuss applications of deep learning and ethical implications.

Unit 2: Neural Networks from First Principles

Next we will learn how to build neural networks from scratch by deriving the backpropagation algorithm by hand and using Python for implementations.

Unit 3: Advanced Topics

The next major chunk of the class will be devoted to higher-level concepts and state-of-the-art techniques.

Unit 4: Project Demonstrations

We will end the semester with project presentations/demonstrations.

Terminology

Machine Learning

Artificial Intelligence (AI): computer systems that are capable of completing tasks that typically require a human. This is a moving bar–as something becomes easier for a computer, we tend to stop considering it as AI (how long until deep learning is not AI?).

Machine Learning (ML): learn a predictive model from data (e.g., deep learning and random forests). ML is related to data mining and pattern recognition.

Deep Learning (DL): learn a neural network model with two or more hidden layers.

Supervised Learning: learn a mapping from input features to output values using labeled examples (e.g., image classification).

Unsupervised Learning: extract relationships among data examples (e.g., clustering).

Reinforcement Learning (RL): learn a model that maximizes rewards provided by the environment (or minimize penalties).

Hybrid Learning: combine methods from supervised, unsupervised, and reinforcement learning (e.g., semi-supervised learning).

Classification: given a set of input features, produce a discrete output value (e.g., predict whether a written review is negative, neutral, or positive).

Regression: given a set of input features, produce a continuous output value (e.g., predict the price of a house from the square footage, location, etc.).

Clustering: a grouping of input examples such that those that are most similar are in the same group.

Model: (predictor, prediction function, hypothesis, classifier) a model along with its parameters.

Example: (instance, sample, observation, training pair) an input training/validation/testing input (along with its label in the case of supervised learning).

Input: (features, feature vector, attributes, covariates, independent variables) values used to make predictions.

Channel: subset of an input–typically refers to the red, green, or blue values of an image.

Output: (label, dependent variable, class, prediction) a prediction provided by the model.

Linear Separability: two sets of inputs can be divided a hyperplane (a line in the case of two dimensions). This is the easiest case for learning a binary classification.

Neural Network Terms

Neural Network (NN): (multi-layer perceptron (MLP), artificial NN (ANN)) a machine learning model (very loosely) based on biological nervous systems.

Perceptron: a single layer, binary classification NN (only capable of learning linearly separable patterns).

Neuron: (node) a single unit of computation in a NN. A neuron typically refers to a linear (affine) computation followed by a nonlinear activation.

Activation: (activation function, squashing function, nonlinearity) a neuron function that provides a nonlinear transformation (see this Stack Exchange Post for some examples and equations).

Parameter: (weights and biases, beta, etc.) any model values that are learned during training.

Layer: many NNs are simply a sequence of layers, where each layer contains some number of neurons.

Input Layer: the input features of a NN (the first “layer”). These can be the raw values or scaled values–we typically normalize inputs or scale them to either [0, 1] or [-1, 1].

Hidden Layer: a NN layer for which we do not directly observe the values during inference (all layers that are not an input or output layer).

Output Layer: the final layer of a NN. The output of this layer is (are) the prediction(s).

Architecture: a specific instance of a NN, where the types of neurons and connectivity of those neurons are specified (e.g., VGG16, ResNet34, etc.). The architecture sometimes includes optimization techniques such as batch normalization.

Forward Propagation: the process of computing the output from the input.

Training: the process of learning model parameters.

Inference: (deployment, application) the process of using a trained model.

Dataset: (training, validation/development, testing) a set of data used for training a model. Typically a dataset is split into a set used for training (the training set), a set for computing metrics (the validation/development set), and a set for evaluation (the testing set).

Convolutional Neural Network (CNN): a NN using convolutional filters. These are best suited for problems where the input features have geometric properties–mainly images (see 3D Visualization of a Convolutional Neural Network).

Filter: a convolution filter is a matrix that can be used to detect features in an image; they will normally produce a two-dimensional output (see Image Kernels Explained Visually, Convolution Visualizer, and Receptive Field Calculator). Filters will typically have a kernel size, padding size, dilation amount, and stride.

Pooling: (average-pooling, max-pooling, pooling layer) a pooling layer is typically used to reduce the size of a filter output.

Autoencoder: a common type of NN used to learn new or compressed representations.

Recurrent Neural Network (RNN): A NN where neurons can maintain an internal state or backward connections and exhibit with temporal dynamics. One type of RNN is a recursive neural network.

Long Short-Term Memory (LSTM): a common type of RNN developed in part to deal with the vanishing gradient problem (see Understanding LSTM Networks and Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) (YouTube)).

Learning Terms

Loss: (loss function) a function that we minimize during learning. We take the gradient of loss with respect to each parameter and then move down the slope. Loss is frequently defined as the error for a single example in supervised learning.

Cost: (cost function) similar to loss, this is a function that we try to minimize. Cost is frequently defined as the sum of loss for all examples.

Generalization: how well a model extrapolates to unseen data.

Overfitting: how much the model has memorized characteristics of the training input (instead of generalizing).

Regularization: a set of methods meant to prevent overfitting. Regularization reduces overfitting by shrinking parameter values (larger parameters typically means more overfitting).

Bias: when a model has larger-than-expected training and validation loss.

Variance: when model has a much larger validation error compared to the training error (also an indication of overfitting).

Uncertainty: some models can estimate a confidence in a given prediction.

Embedding: a vector representation of a discrete variable (e.g., a method for representing an English language word as an input feature).

Activation Terms

Affine: (affine layer, affine transformation) the combination of a linear transformation and a translation (this results in a linear transformation).

Nonlinear: a function for which the change in the output is not proportional to the change in the input.

Sigmoid: (sigmoid curve, logistic curve/function) a common activation function that is mostly used in the output layer of a binary classifier. Gradient is small whenever the input value is too far from 0.

Hyperbolic Tangent: (tanh) another (formerly) common activation funtcion (better than sigmoid, but typically worse than ReLu). Gradient is small whenever the input value is too far from zero.

ReLu: (rectified linear unit, rectifier) the most widely used activation function.

Leaky ReLu: a slightly modified version of ReLu where there is a non-zero derivative when the input is less than zero.

Softmax: (softmax function, softargmax, log loss) is a standard activation function for the last layer of a multi-class NN classifier. It turns the outputs of several nodes into a probability distribution (see The Softmax function and its derivative).

Learning Techniques

Data Augmentation: the process of altering inputs each epoch thereby increasing the effective training set size.

Transfer Learning: use a trained model (or part of it) on an input from a different distribution. Most frequently this also involve fine-tuning.

Fine-tuning: training/learning only a subset of all parameters (usually only those nearest the output layer).

Dropout: a regularization technique in which neurons are randomly zeroed out during training.

Batch Normalization: is a technique that speeds up training by normalizing the values of hidden layers across input batches. Normalizing hidden neuron values will keep derivatives higher on average.

Attention: (attention mechanism, neural attention) is a technique that enables a NN to focus on a subset of the input data (see Attention in Neural Networks and How to Use It).

Optimization

Gradient Descent (GD): (stochastic GD (SGD), mini-batch GD) a first-order optimization algorithm that can be used to learn parameters for a model.

Backpropagation: application of the calculus chain-rule for NNs.

Learning Rate: a hyperparameter that adjusts the training speed (too high will lead to divergence).

Vanishing Gradients: an issue for deeper NNs where gradients saturate (becomes close to zero) and training is effectively halted.

Exploding Gradients: an issue for deeper NNs where gradients accumulate and result in large updates causing gradient descent to diverge.

Batch: a subset of the input dataset used to update the NN parameters (as opposed to using the entire input dataset at once).

Epoch: each time a NN is updated using all inputs (whether all at once or using all batches).

AdaGrad: a variant of SGD with an adaptive learning rate (see Papers with Coe: AdaGrad).

AdaDelta: a variant of SGD/AdaGrad (see Papers With Code: AdaDelta).

Adam: a variant of SGD with momentum and scaling (see Papers With Code: Adam).

RMSProp: a variant of SGD with an adaptive learning rate (see Papers With Code: RMSProp).

Momentum: an SGD add-on that speeds up training when derivatives stay the same sign each update.

Automatic Differentiation (AD): a technique to numerically/automatically evaluate the derivative of a function.

Cross-Entropy Loss: (negative log likelihood (NLL), logistic loss) a loss function commonly used for classification.

Backpropagation Through Time: (BPTT) a gradient-based optimization technique for recurrent neural networks.