**Connecting to the Pomona College HPC** [Back to Neural Networks](https://cs.pomona.edu/classes/cs152/) This document describes how to login to the [Pomona College HPC](https://www.pomona.edu/administration/hpc) servers, and how to use the tools I'll demo in class (the CLI, Python, Mamba, Jupyter, etc.). **At no time will you be required to use the Pomona College HPC**. You can alternatively setup your personal computer to use the same tools and libraries, but the server will have more powerful GPUs, fewer installation issues, and the specific library versions that I will demo in class. # VPN To login to the HPC servers you must connect from on campus or first connect to the Pomona College Virtual Private Network (VPN). If you are connecting from off campus, please follow [these VPN instructions](https://servicedesk.pomona.edu/support/solutions/articles/18000021757) before continuing. # SSH You'll interact with the HPC using [Secure Shell (SSH)]("https://en.wikipedia.org/wiki/SSH_(Secure_Shell)"). Your username will likely be the first part of your email address (the part before the "@" symbol). You should use your SSO (single sign on) password to login. If you are an off-campus student, then you might need to contact Pomona ITS for information on how to create an account. The server may change, but to start you can use `pom-itb-dgx01.campus.pomona.edu`. Access to this machine may change depending on how much it is used. ## Linux and macOS SSH comes preinstalled on Linux and macOS. To use it, you should: 1. Open a terminal application. - on Linux: Terminal, xterm, etc. (depending on your distribution) - on macOS: Terminal.app (or [iTerm2](https://iterm2.com/) among others) 2. Type `ssh @` and press enter. For example: ~~~bash ssh abcd2022@pom-itb-dgx01.campus.pomona.edu ~~~ ## Windows I am familiar with three options for using SSH on Windows. Please feel free to use any method you prefer. - (Easiest) Install [PuTTY](https://www.chiark.greenend.org.uk/~sgtatham/putty/) and follow its instructions. - (Pretty Easy) Install [git for windows](https://gitforwindows.org/) and then follow the instructions in the Linux and macOS section above (using the Git Bash terminal). - (My Favorite) Install [Ubuntu on WSL](https://ubuntu.com/wsl) (or a different Linux distribution--Ubuntu is typically the best documented) and then follow the instructions in the Linux and macOS section above (using the Ubuntu Application). ## SSH Keys This part is not strictly necessary, but I prefer to not have to type my password any time I use ssh--whether directly or indirectly when copying files. - [How to Set Up SSH Keys](https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys-2) these instructions work on pretty much any Linux distribution (including git for windows and Ubuntu on WSL) and on macOS. - [How to Create SSH Keys with PuTTY on Windows](https://www.digitalocean.com/docs/droplets/how-to/add-ssh-keys/create-with-putty/) ## SSH With Port Forwarding As you'll see when we get to the Jupyter Notebooks For Development section, we will want to type code in a web browser (or VS Code) running on your personal computer and have it run on the server. To enable this, we need to point your local web browser (running on your personal computer) to a web application hosted (and running) on the server. This command will do so: ~~~bash ssh -L :localhost: @ ~~~ This will forward your local port `` to the server's port ``. In theory, these can be two different numbers, but practically you might as well always set the local and remote port to the same value. See [explainshell](https://explainshell.com/explain?cmd=ssh+-L+8888%3Alocalhost%3A8888+username%40server) for some details. As for your `` value, I think it is best if we all have our own port to use by default. This means you don't have to check for a valid port before ssh'ing into the server. Look for your initials (first initial followed by last initial as reported by the registrar) in the table below. Initials | Port ---------|----- AG | 8921 AL | 8923 CL | 8957 CL (Chuck) | 8961 CR | 8925 DA | 8927 IB | 8929 JA | 8931 JC | 8933 JK | 8935 ML | 8939 NM | 8941 SE | 8943 SP | 8945 SS | 8947 SS (Sam) | 8963 SZ | 8949 TA | 8951 TC | 8953 ZM | 8955 On the server, you can type `netstat -tunlp` to list all ports in use. I will frequently run this command to see which ports are taken by Jupyter Notebooks: ~~~bash netstat -tunlpae | awk '{print $4}' | grep -P '8\d{3}' lsof -i -n ~~~ **Starting multiple Jupyter sessions with the same command will result in you *stealing* someone else's port.** This frequently happens when you abuse `tmux` (see the next section). # tmux for Managing SSH Sessions [tmux](https://en.wikipedia.org/wiki/Tmux) is a neat tool. For our use-case, you can think of it as a way to > detach processes from their controlling terminals, > allowing remote sessions to remain active without being visible. I (very) frequently forget the main commands for tmux, so I pretty regularly refer back to the [Tmux Cheat Sheet](https://tmuxcheatsheet.com/). Here are the highlights for a simple way to use tmux. On first use, type the following command: ~~~bash tmux # On first use ~~~ When you want to logout of the server. Type `ctrl+b d` (press the ctrl and the "b" key at the same time, let go, and then type "d"). And then type `logout` at the prompt. After you log back onto the server: ~~~bash tmux a # All other uses ~~~ I will occasionally check for spurious tmux/Jupyter sessions with: ~~~bash who ps aux | grep tmux ps aux | grep python ~~~ # Mambaforge For Managing Your Development Environments I highly recommend using [Mambaforge](https://github.com/conda-forge/miniforge#Mambaforge) to manage your Python libraries (among other software). I have already created a Mambaforge environment with all of the libraries we'll need. You can ask me to install libraries if you have any additional needs while working on your projects. To use our class environment, type the following: ~~~bash source /opt/mambaforge/bin/activate cs152 ~~~ After you've done so, you should see `(cs152)` at the start of your prompt. Now you can run `python` and it will use the correct version and have access to fastai, pytorch, and many other libraries. Now you are ready to start playing around with the Jupyter Notebooks. # Jupyter Notebooks For Development Each time you login to the server, you will do the following: 1. SSH with port forwarding, 2. attach to your tmux session, and 3. activate your conda environment (unless it is already activated). (Note: you can automate all of these steps. For example, when I type `ssh dgx01tmuxfwd` it will automatically ssh with port forwarding and then start a tmux session. Let me know if you want some information on how to set this up.) If you've followed all steps in this document, you are ready to start a notebook and view it on your local computer. Let's view the class notebooks: ~~~bash cd into-some-directory-of-your-choice git clone https://github.com/anthonyjclark/cs152sp22.git cd cs152sp22 jupyter notebook --port= ~~~ In the above code, you should replace `` with your port from the SSH into the server with port forwarding section. Now copy the link (something like `http://localhost:8888/?token=...`) and paste it into your browser. When running class code I recommend that you: 1. Select the notebook you want to run. 2. Duplicate it. 3. Run the copy. This will make sure that you play nicely with the version controlled files. When you are done running your Jupyter Notebook, you should save your notebooks and then kill the program typing `ctrl+c` at the command line. # Useful and Required Environment Variables and Aliases Please put the following at the bottom of your `~/.bashrc` file on the server: ~~~bash # Custom aliases that will make common commands shorter and easier to remember alias c152="source /opt/mambaforge/bin/activate cs152" alias d152="cd $HOME/courses/152/cs152sp22/" alias nb="jupyter notebook --port=8920" alias gpulist="nvidia-htop.py -c -l 80" # Required cache directories for better hard drive usage export TORCH_HOME=/raid/cs152/cache/pytorch export FASTAI_HOME=/raid/cs152/cache/fastai export HF_HOME=/raid/cs152/cache/hf ~~~ You can do so by editing the file with `micro` (or `nano`, or `vim`): ~~~bash micro ~/.bashrc ~~~ # Using git I recommend [Creating a personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token "Creating a personal access token - GitHub Docs") and [Caching your GitHub credentials in Git](https://docs.github.com/en/get-started/getting-started-with-git/caching-your-github-credentials-in-git#github-cli "Caching your GitHub credentials in Git - GitHub Docs") using the GitHub CLI. This command is already installed on the server. # Storing Your Data **We will run out of disk space if we have too much data stored in your home directories.** The server has two physical storage devices connected, which you can see with [`df -h -T -x tmpfs -x squashfs -x devtmpfs`](https://explainshell.com/explain?cmd=df+-h+-T+-x+tmpfs+-x+squashfs+-x+devtmpfs). They are accessed at `/` (the root drive) and `/raid` (an extra, data physical drive). All home directories are in the `/` directory structure (e.g., `/home/CAMPUS/ajcd2020`). You should store all data on `/raid` by doing the following. ~~~bash # 1. Create a directory on /raid mkdir /raid/cs152/ # 2. Move your files to this directory mv ~/ /raid/cs152/ # 3. Set file permissions as needed. For example, # to make all files readable by your group members: find /raid/cs152/ -type d -exec chmod 755 {} \; find /raid/cs152/ -type f -exec chmod 644 {} \; ~~~ # Cleaning Your Home Directory When you logon to the server you start out in your home directory. Run the following command to see how much space you are using: ~~~bash du -h --max-depth=1 ~~~ This command will show you the size of each directory located in your home directory. Please delete or move files that don't need to be there. I check usage with this ([du](https://explainshell.com/explain?cmd=du+-h+--max-depth%3D1+%2Fhome%2FCAMPUS%2F+2%3E+%2Fdev%2Fnull)): ~~~bash du -h --max-depth=1 /home/CAMPUS/ 2> /dev/null ~~~ # CUDA Memory Issues Here some trouble shooting methods to use if you run into CUDA memory errors on the server. These instructions assume you are using a Jupyter notebook and that you are using PyTorch (instructions will be a bit different if you are using Tensorflow). First, if you can, then you should just restart your notebook. If you do not want to restart your notebook, then you should open a command prompt (you can do this inside the Jupyter interface if you prefer) and check GPU ussage on the server with the following command: ~~~bash nvtop ~~~ Look at the memory values and find the device index for a GPU that is less utilized. Next, add the following function calls into a new cell in your notebook: ~~~python # Clear your GPU cache (free up any GPU memory you are using) torch.cuda.empty_cache() # Choose a different GPU torch.cuda.set_device(GPU_INDEX) ~~~ # Extra resources - [How to Use Jupyter Notebook in 2020: A Beginner’s Tutorial](https://www.dataquest.io/blog/jupyter-notebook-tutorial/) start at "Creating Your First Notebook" - [The Linux command line for beginners](https://ubuntu.com/tutorials/command-line-for-beginners#1-overview) - [Working with Jupyter Notebooks in Visual Studio Code](https://code.visualstudio.com/docs/python/jupyter-support) Some other commands to try: ~~~bash who top free nvidia-smi ~~~ # Setting up your own computer If you want to setup Mambaforge on your own system, you should do the following: 1. Install [Mambaforge](https://github.com/conda-forge/miniforge#Mambaforge) following their instructions. 2. Follow these instructions for creating a similar environment: ~~~bash mamba create -n cs152 conda activate cs152 mamba install -c pytorch pytorch torchvision torchaudio cudatoolkit=10.2 mamba install transformers timm wandb isort black jupyter jupyter_contrib_nbextensions jupyterthemes jupytext streamlit seaborn stable-baselines3 gh python -m pip install nvidia-htop pipdeptree torch-summary ~~~