**Connecting to the Pomona College HPC**
[Back to Neural Networks](https://cs.pomona.edu/classes/cs152/)
This document describes how to login to the [Pomona College HPC](https://www.pomona.edu/administration/hpc) servers, and how to use the tools I'll demo in class (the CLI, Python, Mamba, Jupyter, etc.).
**At no time will you be required to use the Pomona College HPC**. You can alternatively setup your personal computer to use the same tools and libraries, but the server will have more powerful GPUs, fewer installation issues, and the specific library versions that I will demo in class.
# VPN
To login to the HPC servers you must connect from on campus or first connect to the Pomona College Virtual Private Network (VPN). If you are connecting from off campus, please follow [these VPN instructions](https://servicedesk.pomona.edu/support/solutions/articles/18000021757) before continuing.
# SSH
You'll interact with the HPC using [Secure Shell (SSH)]("https://en.wikipedia.org/wiki/SSH_(Secure_Shell)").
Your username will likely be the first part of your email address (the part before the "@" symbol). You should use your SSO (single sign on) password to login. If you are an off-campus student, then you might need to contact Pomona ITS for information on how to create an account.
The server may change, but to start you can use `pom-itb-dgx01.campus.pomona.edu`. Access to this machine may change depending on how much it is used.
## Linux and macOS
SSH comes preinstalled on Linux and macOS. To use it, you should:
1. Open a terminal application.
- on Linux: Terminal, xterm, etc. (depending on your distribution)
- on macOS: Terminal.app (or [iTerm2](https://iterm2.com/) among others)
2. Type `ssh @` and press enter.
For example:
~~~bash
ssh abcd2022@pom-itb-dgx01.campus.pomona.edu
~~~
## Windows
I am familiar with three options for using SSH on Windows. Please feel free to use any method you prefer.
- (Easiest) Install [PuTTY](https://www.chiark.greenend.org.uk/~sgtatham/putty/) and follow its instructions.
- (Pretty Easy) Install [git for windows](https://gitforwindows.org/) and then follow the instructions in the Linux and macOS section above (using the Git Bash terminal).
- (My Favorite) Install [Ubuntu on WSL](https://ubuntu.com/wsl) (or a different Linux distribution--Ubuntu is typically the best documented) and then follow the instructions in the Linux and macOS section above (using the Ubuntu Application).
## SSH Keys
This part is not strictly necessary, but I prefer to not have to type my password any time I use ssh--whether directly or indirectly when copying files.
- [How to Set Up SSH Keys](https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys-2) these instructions work on pretty much any Linux distribution (including git for windows and Ubuntu on WSL) and on macOS.
- [How to Create SSH Keys with PuTTY on Windows](https://www.digitalocean.com/docs/droplets/how-to/add-ssh-keys/create-with-putty/)
## SSH With Port Forwarding
As you'll see when we get to the Jupyter Notebooks For Development section, we will want to type code in a web browser (or VS Code) running on your personal computer and have it run on the server.
To enable this, we need to point your local web browser (running on your personal computer) to a web application hosted (and running) on the server.
This command will do so:
~~~bash
ssh -L :localhost: @
~~~
This will forward your local port `` to the server's port ``. In theory, these can be two different numbers, but practically you might as well always set the local and remote port to the same value. See [explainshell](https://explainshell.com/explain?cmd=ssh+-L+8888%3Alocalhost%3A8888+username%40server) for some details.
As for your `` value, I think it is best if we all have our own port to use by default. This means you don't have to check for a valid port before ssh'ing into the server.
Look for your initials (first initial followed by last initial as reported by the registrar) in the table below.
Initials | Port
---------|-----
AG | 8921
AL | 8923
CL | 8957
CL (Chuck) | 8961
CR | 8925
DA | 8927
IB | 8929
JA | 8931
JC | 8933
JK | 8935
ML | 8939
NM | 8941
SE | 8943
SP | 8945
SS | 8947
SS (Sam) | 8963
SZ | 8949
TA | 8951
TC | 8953
ZM | 8955
On the server, you can type `netstat -tunlp` to list all ports in use. I will frequently run this command to see which ports are taken by Jupyter Notebooks:
~~~bash
netstat -tunlpae | awk '{print $4}' | grep -P '8\d{3}'
lsof -i -n
~~~
**Starting multiple Jupyter sessions with the same command will result in you *stealing* someone else's port.** This frequently happens when you abuse `tmux` (see the next section).
# tmux for Managing SSH Sessions
[tmux](https://en.wikipedia.org/wiki/Tmux) is a neat tool. For our use-case, you can think of it as a way to
> detach processes from their controlling terminals,
> allowing remote sessions to remain active without being visible.
I (very) frequently forget the main commands for tmux, so I pretty regularly refer back to the [Tmux Cheat Sheet](https://tmuxcheatsheet.com/). Here are the highlights for a simple way to use tmux.
On first use, type the following command:
~~~bash
tmux # On first use
~~~
When you want to logout of the server. Type `ctrl+b d` (press the ctrl and the "b" key at the same time, let go, and then type "d"). And then type `logout` at the prompt.
After you log back onto the server:
~~~bash
tmux a # All other uses
~~~
I will occasionally check for spurious tmux/Jupyter sessions with:
~~~bash
who
ps aux | grep tmux
ps aux | grep python
~~~
# Mambaforge For Managing Your Development Environments
I highly recommend using [Mambaforge](https://github.com/conda-forge/miniforge#Mambaforge) to manage your Python libraries (among other software).
I have already created a Mambaforge environment with all of the libraries we'll need. You can ask me to install libraries if you have any additional needs while working on your projects. To use our class environment, type the following:
~~~bash
source /opt/mambaforge/bin/activate cs152
~~~
After you've done so, you should see `(cs152)` at the start of your prompt. Now you can run `python` and it will use the correct version and have access to fastai, pytorch, and many other libraries.
Now you are ready to start playing around with the Jupyter Notebooks.
# Jupyter Notebooks For Development
Each time you login to the server, you will do the following:
1. SSH with port forwarding,
2. attach to your tmux session, and
3. activate your conda environment (unless it is already activated).
(Note: you can automate all of these steps. For example, when I type `ssh dgx01tmuxfwd` it will automatically ssh with port forwarding and then start a tmux session. Let me know if you want some information on how to set this up.)
If you've followed all steps in this document, you are ready to start a notebook and view it on your local computer. Let's view the class notebooks:
~~~bash
cd into-some-directory-of-your-choice
git clone https://github.com/anthonyjclark/cs152sp22.git
cd cs152sp22
jupyter notebook --port=
~~~
In the above code, you should replace `` with your port from the SSH into the server with port forwarding section.
Now copy the link (something like `http://localhost:8888/?token=...`) and paste it into your browser.
When running class code I recommend that you:
1. Select the notebook you want to run.
2. Duplicate it.
3. Run the copy.
This will make sure that you play nicely with the version controlled files.
When you are done running your Jupyter Notebook, you should save your notebooks and then kill the program typing `ctrl+c` at the command line.
# Useful and Required Environment Variables and Aliases
Please put the following at the bottom of your `~/.bashrc` file on the server:
~~~bash
# Custom aliases that will make common commands shorter and easier to remember
alias c152="source /opt/mambaforge/bin/activate cs152"
alias d152="cd $HOME/courses/152/cs152sp22/"
alias nb="jupyter notebook --port=8920"
alias gpulist="nvidia-htop.py -c -l 80"
# Required cache directories for better hard drive usage
export TORCH_HOME=/raid/cs152/cache/pytorch
export FASTAI_HOME=/raid/cs152/cache/fastai
export HF_HOME=/raid/cs152/cache/hf
~~~
You can do so by editing the file with `micro` (or `nano`, or `vim`):
~~~bash
micro ~/.bashrc
~~~
# Using git
I recommend [Creating a personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token "Creating a personal access token - GitHub Docs") and [Caching your GitHub credentials in Git](https://docs.github.com/en/get-started/getting-started-with-git/caching-your-github-credentials-in-git#github-cli "Caching your GitHub credentials in Git - GitHub Docs") using the GitHub CLI. This command is already installed on the server.
# Storing Your Data
**We will run out of disk space if we have too much data stored in your home directories.**
The server has two physical storage devices connected, which you can see with [`df -h -T -x tmpfs -x squashfs -x devtmpfs`](https://explainshell.com/explain?cmd=df+-h+-T+-x+tmpfs+-x+squashfs+-x+devtmpfs). They are accessed at `/` (the root drive) and `/raid` (an extra, data physical drive).
All home directories are in the `/` directory structure (e.g., `/home/CAMPUS/ajcd2020`).
You should store all data on `/raid` by doing the following.
~~~bash
# 1. Create a directory on /raid
mkdir /raid/cs152/
# 2. Move your files to this directory
mv ~/ /raid/cs152/
# 3. Set file permissions as needed. For example,
# to make all files readable by your group members:
find /raid/cs152/ -type d -exec chmod 755 {} \;
find /raid/cs152/ -type f -exec chmod 644 {} \;
~~~
# Cleaning Your Home Directory
When you logon to the server you start out in your home directory. Run the following command to see how much space you are using:
~~~bash
du -h --max-depth=1
~~~
This command will show you the size of each directory located in your home directory. Please delete or move files that don't need to be there.
I check usage with this ([du](https://explainshell.com/explain?cmd=du+-h+--max-depth%3D1+%2Fhome%2FCAMPUS%2F+2%3E+%2Fdev%2Fnull)):
~~~bash
du -h --max-depth=1 /home/CAMPUS/ 2> /dev/null
~~~
# CUDA Memory Issues
Here some trouble shooting methods to use if you run into CUDA memory errors on the server. These instructions assume you are using a Jupyter notebook and that you are using PyTorch (instructions will be a bit different if you are using Tensorflow).
First, if you can, then you should just restart your notebook.
If you do not want to restart your notebook, then you should open a command prompt (you can do this inside the Jupyter interface if you prefer) and check GPU ussage on the server with the following command:
~~~bash
nvtop
~~~
Look at the memory values and find the device index for a GPU that is less utilized.
Next, add the following function calls into a new cell in your notebook:
~~~python
# Clear your GPU cache (free up any GPU memory you are using)
torch.cuda.empty_cache()
# Choose a different GPU
torch.cuda.set_device(GPU_INDEX)
~~~
# Extra resources
- [How to Use Jupyter Notebook in 2020: A Beginner’s Tutorial](https://www.dataquest.io/blog/jupyter-notebook-tutorial/) start at "Creating Your First Notebook"
- [The Linux command line for beginners](https://ubuntu.com/tutorials/command-line-for-beginners#1-overview)
- [Working with Jupyter Notebooks in Visual Studio Code](https://code.visualstudio.com/docs/python/jupyter-support)
Some other commands to try:
~~~bash
who
top
free
nvidia-smi
~~~
# Setting up your own computer
If you want to setup Mambaforge on your own system, you should do the following:
1. Install [Mambaforge](https://github.com/conda-forge/miniforge#Mambaforge) following their instructions.
2. Follow these instructions for creating a similar environment:
~~~bash
mamba create -n cs152
conda activate cs152
mamba install -c pytorch pytorch torchvision torchaudio cudatoolkit=10.2
mamba install transformers timm wandb isort black jupyter jupyter_contrib_nbextensions jupyterthemes jupytext streamlit seaborn stable-baselines3 gh
python -m pip install nvidia-htop pipdeptree torch-summary
~~~