CSCI152 PO-01 Neural Networks

Spring 2022

Tuesday and Thursday

9:35 to 10:50 AM Pacific Time

EDMS Room 229 (Edmunds)

Instructors

Professor Anthony Clark

Research Website

Philosophy

I’d like to give a few brief thoughts on my philosophy and the common traits I’ve seen in successful students.

Philosophy

I above all aim to be thoughtful.
I’ve built this class knowing that failure is an important step in learning.
I want you to realize that you belong here.
I want to teach you about best learning practices.
I want you all to work together and get comfortable with your peers.
I want to acknowledge that diversity, equity, and inclusion are important.
I love students attending office hours (most faculty do). I teach because I like talking about this stuff–plus it helps me fix my course when you tell me what you don’t understand.

Traits of Success

Getting started early.
Sticking to a schedule.
Self-reflecting on your learning habits.
Asking questions (even if you think it is simple or dumb).
Working well with peers.

See my advising page for additional information on CS and being a student at Pomona College.

Teaching Assistants

Kevin Ayala

Millie Mince

Antonio Revilla

TAs have the following duties:

Holding mentoring/help sessions. Similar to office hours, you will have a regular set of times in which you can meet with TAs in-person or over Zoom Slack (depending on your and the TAs preference).
Grading assignments and providing feedback.
Creating questions for assignments and checkpoints.

Office and Mentor Hours

Zoom links are pinned in the course Slack Channel.

Monday
- Prof Clark from 9 to 10 Pacific Time (zoom link on Slack)
- Tony from 3 to 4 Pacific Time (zoom link on Slack)
Tuesday
- Millie from 3 to 4 PM Pacific Time (zoom link on Slack)
Wednesday
- Prof Clark from 9 to 10 Pacific Time (zoom link on Slack)
- Prof Clark from 1:30 to 3:30 Pacific Time (zoom link on slack)
- Tony from 3 to 4 Pacific Time (zoom link on Slack)
- Kevin from 5:30 to 6:30 PM Pacific Time (zoom link on Slack)
Thursday
- Millie from 3 to 4 PM Pacific Time (zoom link on Slack)
Friday
- Kevin from 5:30 to 6:30 PM Pacific Time (zoom link on Slack)

You can also contact me on Slack to setup additional office hours.

I highly recommend showing up to mentor sessions with the intent that you’ll work on your homework and then ask the occasional question as it pops up.

Objectives

Catalog Description: An introduction to the theory and practical applications of neural networks. This course will cover simple perceptrons through modern convolutional and recurrent neural networks. Topics include gathering and processing data, optimization algorithms, and hyperparameter tuning. Application domains include computer vision, natural language processing, recommender systems, and content generation. Ethical implications of design decisions will also be considered throughout the course.

Prerequisites: Data Structures and Calculus

Learning Objectives: Upon completion of this course, students will be able to:

Explain ethical considerations of NN applications.
Explain recent advances in neural networks (i.e., deep learning).
Explain how NNs compare to other machine learning techniques.
List and understand the major application domains of NNs.
Understand and code foundational NN techniques.
Create, clean, and examine datasets.
Understand the hyperparameters and dynamics (e.g., bias and variance) of training an NN model.
Use an NN framework and deploy build a model for a real-world application.

Resources

We will not have a book for this class. But our course structure if fairly close to the fastai book.

This document from fall 2020 is a nice starting point for help. It includes resources for:

Academics help (writing, quantitative skills, etc.)
Mental health help
Technology help (request a laptop, microphone, and/or pre-paid WiFi hotspot) (do you experience slow Internet speeds?)
Financial help
Advice from peers (Sage Fellows)

I’d like to recommend listening to the Happiness Lab.

Please let me know if you have any other concerns.

Some helpful links to courses, tutorials, books, etc.

Courses

Books

Math

Extras

Python

Ethics

Libraries/Frameworks/Tools

Logistics

These plans are subject to change based on our experiences and your feedback.

For the most part, we will plan on meeting in person, but I have included plans for remote and asynchronous courses below since it may very well come up.

Meeting Periods

Class periods will include a mix of lecture, live coding, and working on group projects. In general, you can expect the following format:

Prepare for lecture by reading or watching provided materials.
Complete a pre-lecture survey at least 24 hours prior to lecture.
Attend lecture where we discuss points of confusion and work through problems.

Zoom Etiquette

You are not expected to have your video on, but it would be nice to turn it on during group discussions in breakout rooms. Also, please set your display name to your preferred name (and include your pronouns if you’d like; for example, “Prof. Clark he/him/his).

During Zoom lectures, I will provide a link to a google sheet that we can use for anonymous discussions.

After lectures, I’d like to leave the zoom room open for an additional 15 minutes for all of you. So, I’ll likely choose willing people to make the host once I leave.

Communication with Slack

Please use Slack (not email) for all communication. If you email me questions, I will likely ask you to make a comment on Slack. This has several benefits:

You are more likely to get a quick response from me, your fellow students, or your TAs.
We can easily keep track of our discussion for everyone to see.
I do not have to repeat answers.
I can endorse student answers, which is helpful to both the asker and the answerer.

Slack has text, video, and audio options that the TAs and I will use (along with Zoom) to hold office hours and mentoring sessions. I will also use Slack to solicit anonymous feedback on how the course is going. You can create group messages that include you, other classmates, the TAs, and me if you want to discuss a question asynchronously.

Useful Slack commands:

/giphy [search term]
- example: /giphy computers
/poll <question> <options...> [anonymous] [limit 1]
- example: /poll "How are you?" "Good" "OK" "Bad" anonymous
/dm @user <msg>
- example: /dm @pomcol What up
/remind [@someone | #channel] [what] [when]
- example: /remind #cs152sp22 Project "Sep 9"
/shrug <msg>
- example: /shrug No idea...
/anon #channel <msg>
- example: /anon #cs152sp22 AHHHHHH...
/box
/collapse
/expand

Working Together

I recommend using Slack (text, audio, or video) for communication, and using the Visual Studio Code Live Share Extension for pair programming. We will spend some time in class getting this setup, but these instructions will also be of use.

Feedback

If you have any question for me, I would love for you to ask them on Slack. I will enable a bot that will allow you to make comments anonymously. The CS department also has an anonymous feedback form if you would like to provide feedback outside of class channels.

Grading

Grading for this course is fairly straightforward.

You’ll receive an “A” if you
- pass 11 out of 12 assignments,
- pass your project peer evaluation, and
- pass all A-level project milestones (see below).
You’ll receive a “B” if you
- pass 9 out of 12 assignments,
- pass your project peer evaluation, and
- pass all B-level project milestones (see below).
You’ll receive a “C” if you
- pass all 12 assignments (you can skip working on the project).

Grades of “D” and “F” are reserved for situations in which a student does not meet the criteria above. Mostly it will be for students that work on a successful project, but do not contribute to the project as indicated by peer evaluations. Notice that you can get a “C” in the course by completing the assignments and skipping out on the project. I’ve not had anyone do this before, but the option is there for those that want to spend less time on this course and more time on other activities (courses, life, sports, etc.).

Feedback Surveys

In the Pre-Class column of the schedule you will find links to blog posts, research articles, YouTube videos, videos I create, etc. I expect you to read/watch these materials before class, and I will ask you to complete weekly feedback surveys on gradescope. These surveys are completely optional, but I encourage you to participate if you want to get the most out of the course.

These will make lectures more concrete and provide you with a means of giving me feedback (for example, on a topic that I didn’t cover well).

Assignments

You will submit assignments to gradescope. You may work on assignments individually or with a partner of your choice. If you’d like to be assigned a partner, please send me a message on Slack and I will randomly assign partners when I have a pool of students.

Assignments will not be graded per se; instead, you will meet with a TA and walk them through your answers and code. They will then mark your assignment as “passed” or “needs revisions”. You should not meet with a TA until you have completed your assignment. Of course, you can still visit them during mentor sessions to ask for help.

If you do not pass an assignment on the first meeting, then you will need to work on your answers, resubmit, and then schedule a new time to meet with a TA. As a rule-of-thumb, you can think of a “pass” as at least a 90% on the assignment.

Project

Everyone is expected to complete a course project related to neural networks (though you can still pass the course without the project). All projects reports will be hosted as websites. Project grading will rely heavily on self-assessments and peer evaluations.

Project Milestones

Course projects have the following milestones (see the schedule for exact deadlines).

Why Develop a Project Like This?

I want you to work on your project this way because it makes it easier for:

you all to keep momentum throughout the semester,
everyone to see each others work,
me to see what you’ve changed after revisions (by looking at your commit history and diffs), and
partners to work together on a single code-base.

Project Levels

A-level project milestones

All thirteen project milestones must be marked as “pass.”

B-level project milestones

All project milestones 1 through 11 (the Complete Rough Draft due week 14) must be marked as “pass.”

C-level project milestones

The following project milestones must be marked as “pass”:

Individual Proposals (due week 3)
Introduction Outline (due week 4)
Literature Review (due week 5)
Project Update 1 (due week 6)
Introduction and Related Works Draft (due week 7)
Project Update 2 (due week 8)
Methods Outline (due week 9)
Discussion Outline (due week 11)
All three Peer Evaluations (due weeks 6, 10, and finals)

Peer Evaluations

Peer evaluations are used as a way to ensure that all group members are contributing. If group members disagree in the evaluations I will ask for more information, and it is possible that I will make appropriate grade adjustments based on this feedback.

Policies

Accommodations

If you have a disability (for example, mental health, learning, chronic health, physical, hearing, vision, neurological, etc.) and expect barriers related to this course, it is important to request accommodations and establish a plan. I am happy to help you work through the process, and I encourage you to contact the Student Disability Resource Center (SDRC) as soon as possible.

I also encourage you to reach out to the SDRC if you are at all interested in having a conversation. (Upwards of 20% of students have reported a disability.)

Academic Honesty and Collaboration

I encourage you to study and work on assignments with your peers (unless otherwise specified). If you are ever unsure about what constitutes acceptable collaboration, please ask!

For more information, see the Computer Science Department and the Pomona College policies.

I take violations of academic honesty seriously. I believe it is important to report all instances, as any leniency can reinforce (and even teach) the wrong mindset (“I can get away with cheating at least once in each class.”).

Academic Advisory Notice

I will do my best to update you if I think you are not performing at your best or if you are not on pace to pass the class. I will first reach out to you and then use the system built-in to my.pomona.edu that will notify your advisor so you are encouraged to work with a mentor or advisor on a plan.

Attendance

I expect you to attend every class, but I will not penalize you for missing class. Know that there is a strong correlation between attendance and grades, and you will almost certainly be indirectly penalized.

You are responsible for any discussions, announcements, or handouts that you miss, so please reach out to me. If you need to leave class early for any reason, please let me know before class begins so that I am not concerned when you leave.

Late Submissions

Late assignments will not be accepted. However, if you plan ahead you can ask for an extension prior to the assignment deadline (at least four days).

Unless requested ahead of time, some assessments (e.g., exams) cannot be completed after the class period in which they are scheduled.

Covid Safety Awareness

During the past academic year, we built community remotely, and this year we will build on the pedagogical improvements we acquired last year. For example, we might meet on zoom from time to time, or hold discussions online.

Our health, both mental and physical, is paramount. We must consider the health of others inside and outside the classroom. All Claremont Colleges students have signed agreements regulating on-campus behavior during the pandemic; in the classroom, we will uphold these agreements. We need to take care of each other for this course to be successful. I ask you therefore to adhere to the following principles:

there is a mask mandate for all indoor spaces on campus. You must wear a mask for the entire class; eating and drinking are not permitted. Your mask must cover your mouth and nose. The college has zero-tolerance for violations of this policy, and our shared commitment to the health and safety of our community members means if you come to class unmasked you will have to leave class for the day.
Class attendance is required, but if you need to miss class for health reasons, concerning symptoms, suspected Covid exposure, unexpected dependent care, technology issues, or other emergency reasons I will work with you. Let me underscore this: please make your decisions always based on health, safety, and wellness—yours and others—and I will work with you at the other end. Take any potential symptoms seriously; we’re counting on each other.
When not in class, avoid closed public spaces, and if you can’t avoid them: wear your mask properly, wash your hands, and maintain social distance.
If you, or a family member, are experiencing hardship because of the pandemic, talk to me or to someone in the Dean of Students office. You are not alone during this time.

The pandemic is fast-moving, and we might have to adjust these principles as the semester evolves. I am always happy to receive your feedback to make this course work.

Let’s care for each other, show empathy, and be supportive. While there will likely be some community transmission and breakthrough infections, together, we can minimize their effect on our community and on your learning.

Calendar

January

Sun

Mon

Tue

Wed

Thu

Fri

Sat

18 Administrivia and Introduction

20 Assignment 01: HPC and Neural Network Demos

25 Neural Networks Ethics

26 Assignment 01 is due

27 Assignment 02: Project and Ethics Discussions

February

Sun

Mon

Tue

Wed

Thu

Fri

Sat

1 A Single Neuron and Individual Proposals is due

2 Assignment 02 is due

3 Student Project Proposals and Team Creation

8 Neurons to Networks

10 Assignment 03: Neural Networks from Scratch and Introduction Outline is due

15 PyTorch and Automatic Differentiation

16 Assignment 03 is due

17 Assignment 04: Automatic Differentiation with Match and Related Works Search is due

22 Mini-Batch Stochastic Gradient Descent

23 Assignment 04 is due

24 Assignment 05: Mini-Batch Stochastic Gradient Descent and Project Update 1 and Self-Assessment and Peer Evaluations are due

March

Sun

Mon

Tue

Wed

Thu

Fri

Sat

1 Optimization

2 Assignment 05 is due

3 Assignment 06: Optimization and Introduction and Related Works Draft is due

8 Activations, Initialization, and Input Normalization

9 Assignment 06 is due

10 Assignment 07: Vanishing and Exploding Gradients and Nonlinearity and Project Update 2 is due

15 Spring break -- No Class

17 Spring break -- No Class

22 Overfitting, Regularization, and Dropout

23 Assignment 07 is due

24 Assignment 08: Overfitting and Methods Outline is due

29 Convolutional Neural Networks

30 Assignment 08 is due

31 CNNs Wrap Up; Assignment 09: Convolutional Neural Networks and Project Discussions; and Self-Assessment and Peer Evaluations 2 is due

April

Sun

Mon

Tue

Wed

Thu

Fri

Sat

5 Recurrent Neural Networks

6 Assignment 09 is due

7 Pre-Assignment Discussion; Assignment 10: Recurrent Neural Networks; and Discussion Outline is due

12 Attention and Transformers

13 Assignment 10 is due

14 Assignment 11: Project Discussions

19 Inference and Applications

20 Assignment 11 is due

21 Lecture recording and Assignment 12: Inference and Applications

26 Voted Topic: Reinforcement Learning Part 1

27 Assignment 12 is due

28 Voted Topic: Reinforcement Learning Part 2 and Complete Rough Draft is due

May

Sun

Mon

Tue

Wed

Thu

Fri

Sat

3 Voted Topic: Generative Adversarial Networks

5 Reading Day -- No class

9 Final Self-Assessment and Peer Evaluations is due

11 No in-class final and Complete Project and Revisions is due

Schedule

If the schedule below doesn’t look up-to-date, you might need to reload the page while overriding cache.
Here are all resources from the previous time this course was taught

The check-boxes are for your own use so that you can easily come back to this page and see where you left off (I can’t see what you do on this page). I’m using localstorage to get this working, so no guarantees it will work on your device.

Date	Pre-Class	In-Class	Assignment
Tue Jan 18		Administrivia and Introduction Lecture notes Lecture recording
Thu Jan 20	AI 101 with Greg Edwards - The Machine Ethics Podcast D2L 1. Introduction Using the Pomona College HPC Servers	Assignment 01: HPC and Neural Network Demos Lecture recording
Tue Jan 25	The fastai book chapter 3 ACM Code of Ethics and Professional Conduct Machine Ethics podcast (Optional, but recommended for regular listening)	Neural Networks Ethics Lecture recording
Wed Jan 26			Assignment 01 is due
Thu Jan 27	Linear Digressions: Searching for Datasets with Google	Assignment 02: Project and Ethics Discussions Lecture recording
Tue Feb 01	D2L 3.1. Linear Regression Linear Digressions: CPUs, GPUs, TPUs: Hardware for Deep Learning	A Single Neuron Extra: MIT Mathlet on Linear Regression Lecture notes Lecture recording	Project Milestone: Individual Proposals is due
Wed Feb 02			Assignment 02 is due
Thu Feb 03	D2L 3.4. Softmax Regression	Student Project Proposals and Team Creation Link to proposals
Tue Feb 08	D2L 4.7. Forward Propagation, Backward Propagation, and Computational Graphs How the backpropagation algorithm works Linear Digressions Neural Nets (Part 1)	Neurons to Networks Lectures notes Lecture recording
Thu Feb 10	D2L 11.1. Optimization and Deep Learning Linear Digressions Neural Nets (Part 2)	Assignment 03: Neural Networks from Scratch	Project Milestone: Introduction Outline is due
Tue Feb 15	Linear Digressions: Backpropagation D2L 2.5. Automatic Differentiation Automatic Differentiation with `torch.autograd` Automatic Differentiation in Python and PyTorch - YouTube	PyTorch and Automatic Differentiation Lecture notes Lecture recording
Wed Feb 16			Assignment 03 is due
Thu Feb 17	Overview of PyTorch Autograd Engine How Computational Graphs are Constructed in PyTorch	Assignment 04: Automatic Differentiation with Match Lecture notes Lecture recording	Project Milestone: Related Works Search is due
Tue Feb 22	D2L 11.3. Gradient Descent D2L 11.5. Minibatch Stochastic Gradient Descent Linear Digressions: Convex (and non-convex) Optimization	Mini-Batch Stochastic Gradient Descent Lecture notes Lecture code Lecture recording
Wed Feb 23			Assignment 04 is due
Thu Feb 24	An overview of gradient descent optimization algorithms (Blog) Linear Digressions: Not every deep learning paper is great. Is that a problem?	Assignment 05: Mini-Batch Stochastic Gradient Descent	Project Milestone: Update 1 is due Project Milestone: Self-Assessment and Peer Evaluations 1 is due
Tue Mar 01	D2L 11.2. Convexity D2L 11.6. Momentum D2L 11.7. Adagrad	Optimization (Momentum, RMSPromp, Adam) Lecture notes Lecture recording
Wed Mar 02			Assignment 05 is due
Thu Mar 03	Linear Digressions: Optimization Solutions Linear Digressions: Optimization Problems	Assignment 06: Optimization	Project Milestone: Introduction and Related Works Draft is due
Tue Mar 08	On the importance of initialization and momentum in deep learning	Activations, Initialization, and Input Normalization Lecture recording
Wed Mar 09			Assignment 06 is due
Thu Mar 10	How to initialize deep neural networks? Xavier and Kaiming initialization	Assignment 07: Vanishing and Exploding Gradients and Nonlinearity	Project Milestone: Update 2 is due
Tue Mar 15		Spring break -- No Class
Thu Mar 17		Spring break -- No Class
Tue Mar 22	D2L 4.5. Weight Decay Linear Digressions: Regularization	Overfitting, Regularization, and Dropout Lecture notes Lecture recording
Wed Mar 23			Assignment 07 is due
Thu Mar 24	Overfitting and regularization (Free Book) Linear Digressions: Neural Net Dropout	Assignment 08: Overfitting	Project Milestone: Methods Outline is due
Tue Mar 29	Convolutional Neural Networks (Stanford Course Lecture Notes Linear Digressions: Convolutional Neural Networks Linear Digressions: Limitations of Deep Nets for Computer Vision	Convolutional Neural Networks (CNNs) Extra: Image Kernels Explained Visually Extra: Convolution Visualizer Extra: CNN Visualizer Lecture notes Lecture recording
Wed Mar 30			Assignment 08 is due
Thu Mar 31	Intuitively Understanding Convolutions for Deep Learning (Blog Post) Visualizing and Understanding Convolutional Networks (Paper)	CNNs Wrap Up Lecture notes Lecture recording Assignment 09: Convolutional Neural Networks and Project Discussions	Project Milestone: Self-Assessment and Peer Evaluations 2 is due
Tue Apr 05	The Unreasonable Effectiveness of Recurrent Neural Networks (Blog Post) Linear Digressions: Recurrent Neural Nets	Recurrent Neural Networks (RNNs) Lecture notes Lecture recording
Wed Apr 06			Assignment 09 is due
Thu Apr 07	Understanding LSTMs (Blog) How to build RNNs and LSTMs from scratch with NumPy Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) (YouTube)	Pre-Assignment Discussion Assignment 10: Recurrent Neural Networks	Project Milestone: Discussion Outline is due
Tue Apr 12	Attention Is All You Need Visualizing A Neural Machine Translation Model	Attention and Transformers Lecture notes Lecture recording
Wed Apr 13			Assignment 10 is due
Thu Apr 14	Transformer: A Novel Neural Network Architecture for Language Understanding The Illustrated Transformer An Ethical Toolkit for Engineering/Design Practice	Assignment 11: Project Discussions
Tue Apr 19	About fastai Getting Started with gradio	Inference and Applications Palm Or Pine Model Training Palm Or Pine App Lecture recording
Wed Apr 20			Assignment 11 is due
Thu Apr 21	Hugging Face Tutorial	Lecture recording Assignment 12: Inference and Applications
Tue Apr 26	An Introduction to Deep Reinforcement Learning Deep Reinforcement Learning Doesn't Work Yet Deep Reinforcement Learning: An Overview (PDF) Reinforcement Learning, Bit by Bit (PDF)	Voted Topic: Reinforcement Learning Part 1 Lecture notes Lecture recording
Wed Apr 27			Assignment 12 is due
Thu Apr 28		Voted Topic: Reinforcement Learning Part 2 Lecture code Lecture recording	Project Milestone: Complete Rough Draft is due
Tue May 03	Explaining and Harnessing Adversarial Examples NIPS 2016 Tutorial: Generative Adversarial Networks Practical Generative Adversarial Networks for Beginners	Voted Topic: Generative Adversarial Networks Lecture notes (Fall 2021) Lecture recording (Fall 2021) Lecture code: fastai Bedrooms Lecture code: PyTorch Faces
Thu May 05		Reading Day -- No class
Mon May 09			Project Milestone: Final Self-Assessment and Peer Evaluations is due
Wed May 11		No in-class final	Project Milestone: Complete Project and Revisions is due

Course Units

Unit 1: Introduction, Demos, and Ethics

We will start by digging into demos created using PyTorch, Hugging Face, and Streamlit. I want to quickly show you some application areas and give you an idea of the different possibilities for projects.

We will discuss applications of deep learning and ethical implications.

Unit 2: Neural Networks from First Principles

Next we will learn how to build neural networks from scratch by deriving the backpropagation algorithm by hand and using Python for implementations.

Unit 3: Advanced Topics

The next major chunk of the class will be devoted to higher-level concepts and state-of-the-art techniques.

Unit 4: Project Demonstrations

We will end the semester with project presentations/demonstrations.

Terminology

Machine Learning

Artificial Intelligence (AI): computer systems that are capable of completing tasks that typically require a human. This is a moving bar–as something becomes easier for a computer, we tend to stop considering it as AI (how long until deep learning is not AI?).

Machine Learning (ML): learn a predictive model from data (e.g., deep learning and random forests). ML is related to data mining and pattern recognition.

Deep Learning (DL): learn a neural network model with two or more hidden layers.

Supervised Learning: learn a mapping from input features to output values using labeled examples (e.g., image classification).

Unsupervised Learning: extract relationships among data examples (e.g., clustering).

Reinforcement Learning (RL): learn a model that maximizes rewards provided by the environment (or minimize penalties).

Hybrid Learning: combine methods from supervised, unsupervised, and reinforcement learning (e.g., semi-supervised learning).

Classification: given a set of input features, produce a discrete output value (e.g., predict whether a written review is negative, neutral, or positive).

Regression: given a set of input features, produce a continuous output value (e.g., predict the price of a house from the square footage, location, etc.).

Clustering: a grouping of input examples such that those that are most similar are in the same group.

Model: (predictor, prediction function, hypothesis, classifier) a model along with its parameters.

Example: (instance, sample, observation, training pair) an input training/validation/testing input (along with its label in the case of supervised learning).

Input: (features, feature vector, attributes, covariates, independent variables) values used to make predictions.

Channel: subset of an input–typically refers to the red, green, or blue values of an image.

Output: (label, dependent variable, class, prediction) a prediction provided by the model.

Linear Separability: two sets of inputs can be divided a hyperplane (a line in the case of two dimensions). This is the easiest case for learning a binary classification.

Neural Network Terms

Neural Network (NN): (multi-layer perceptron (MLP), artificial NN (ANN)) a machine learning model (very loosely) based on biological nervous systems.

Perceptron: a single layer, binary classification NN (only capable of learning linearly separable patterns).

Neuron: (node) a single unit of computation in a NN. A neuron typically refers to a linear (affine) computation followed by a nonlinear activation.

Activation: (activation function, squashing function, nonlinearity) a neuron function that provides a nonlinear transformation (see this Stack Exchange Post for some examples and equations).

Parameter: (weights and biases, beta, etc.) any model values that are learned during training.

Layer: many NNs are simply a sequence of layers, where each layer contains some number of neurons.

Input Layer: the input features of a NN (the first “layer”). These can be the raw values or scaled values–we typically normalize inputs or scale them to either [0, 1] or [-1, 1].

Hidden Layer: a NN layer for which we do not directly observe the values during inference (all layers that are not an input or output layer).

Output Layer: the final layer of a NN. The output of this layer is (are) the prediction(s).

Architecture: a specific instance of a NN, where the types of neurons and connectivity of those neurons are specified (e.g., VGG16, ResNet34, etc.). The architecture sometimes includes optimization techniques such as batch normalization.

Forward Propagation: the process of computing the output from the input.

Training: the process of learning model parameters.

Inference: (deployment, application) the process of using a trained model.

Dataset: (training, validation/development, testing) a set of data used for training a model. Typically a dataset is split into a set used for training (the training set), a set for computing metrics (the validation/development set), and a set for evaluation (the testing set).

Convolutional Neural Network (CNN): a NN using convolutional filters. These are best suited for problems where the input features have geometric properties–mainly images (see 3D Visualization of a Convolutional Neural Network).

Filter: a convolution filter is a matrix that can be used to detect features in an image; they will normally produce a two-dimensional output (see Image Kernels Explained Visually, Convolution Visualizer, and Receptive Field Calculator). Filters will typically have a kernel size, padding size, dilation amount, and stride.

Pooling: (average-pooling, max-pooling, pooling layer) a pooling layer is typically used to reduce the size of a filter output.

Autoencoder: a common type of NN used to learn new or compressed representations.

Recurrent Neural Network (RNN): A NN where neurons can maintain an internal state or backward connections and exhibit with temporal dynamics. One type of RNN is a recursive neural network.

Long Short-Term Memory (LSTM): a common type of RNN developed in part to deal with the vanishing gradient problem (see Understanding LSTM Networks and Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) (YouTube)).

Learning Terms

Loss: (loss function) a function that we minimize during learning. We take the gradient of loss with respect to each parameter and then move down the slope. Loss is frequently defined as the error for a single example in supervised learning.

Cost: (cost function) similar to loss, this is a function that we try to minimize. Cost is frequently defined as the sum of loss for all examples.

Generalization: how well a model extrapolates to unseen data.

Overfitting: how much the model has memorized characteristics of the training input (instead of generalizing).

Regularization: a set of methods meant to prevent overfitting. Regularization reduces overfitting by shrinking parameter values (larger parameters typically means more overfitting).

Bias: when a model has larger-than-expected training and validation loss.

Variance: when model has a much larger validation error compared to the training error (also an indication of overfitting).

Uncertainty: some models can estimate a confidence in a given prediction.

Embedding: a vector representation of a discrete variable (e.g., a method for representing an English language word as an input feature).

Activation Terms

Affine: (affine layer, affine transformation) the combination of a linear transformation and a translation (this results in a linear transformation).

Nonlinear: a function for which the change in the output is not proportional to the change in the input.

Sigmoid: (sigmoid curve, logistic curve/function) a common activation function that is mostly used in the output layer of a binary classifier. Gradient is small whenever the input value is too far from 0.

Hyperbolic Tangent: (tanh) another (formerly) common activation funtcion (better than sigmoid, but typically worse than ReLu). Gradient is small whenever the input value is too far from zero.

ReLu: (rectified linear unit, rectifier) the most widely used activation function.

Leaky ReLu: a slightly modified version of ReLu where there is a non-zero derivative when the input is less than zero.

Softmax: (softmax function, softargmax, log loss) is a standard activation function for the last layer of a multi-class NN classifier. It turns the outputs of several nodes into a probability distribution (see The Softmax function and its derivative).

Learning Techniques

Data Augmentation: the process of altering inputs each epoch thereby increasing the effective training set size.

Transfer Learning: use a trained model (or part of it) on an input from a different distribution. Most frequently this also involve fine-tuning.

Fine-tuning: training/learning only a subset of all parameters (usually only those nearest the output layer).

Dropout: a regularization technique in which neurons are randomly zeroed out during training.

Batch Normalization: is a technique that speeds up training by normalizing the values of hidden layers across input batches. Normalizing hidden neuron values will keep derivatives higher on average.

Attention: (attention mechanism, neural attention) is a technique that enables a NN to focus on a subset of the input data (see Attention in Neural Networks and How to Use It).

Optimization

Gradient Descent (GD): (stochastic GD (SGD), mini-batch GD) a first-order optimization algorithm that can be used to learn parameters for a model.

Backpropagation: application of the calculus chain-rule for NNs.

Learning Rate: a hyperparameter that adjusts the training speed (too high will lead to divergence).

Vanishing Gradients: an issue for deeper NNs where gradients saturate (becomes close to zero) and training is effectively halted.

Exploding Gradients: an issue for deeper NNs where gradients accumulate and result in large updates causing gradient descent to diverge.

Batch: a subset of the input dataset used to update the NN parameters (as opposed to using the entire input dataset at once).

Epoch: each time a NN is updated using all inputs (whether all at once or using all batches).

AdaGrad: a variant of SGD with an adaptive learning rate (see Papers with Coe: AdaGrad).

AdaDelta: a variant of SGD/AdaGrad (see Papers With Code: AdaDelta).

Adam: a variant of SGD with momentum and scaling (see Papers With Code: Adam).

RMSProp: a variant of SGD with an adaptive learning rate (see Papers With Code: RMSProp).

Momentum: an SGD add-on that speeds up training when derivatives stay the same sign each update.

Automatic Differentiation (AD): a technique to numerically/automatically evaluate the derivative of a function.

Cross-Entropy Loss: (negative log likelihood (NLL), logistic loss) a loss function commonly used for classification.

Backpropagation Through Time: (BPTT) a gradient-based optimization technique for recurrent neural networks.