This guide walks through creating a quick instance of CUDA-enabled Jupyter notebook server using AWS EC2. There are better automated ways of doing this (like SageMaker, or Terraform, CDK, docker, etc), but as a quick first approach, this works.
In my case, I needed a CUDA-supported GPU attached to the instance, so I chose a g5.xlarge
, which is the cheapest I could find to scale later on when the PoC is done.
You might want it in cases where, like me, you need this as a configurable scratch-server that you can use to modify this to your needs. For instance, as a setup with other databases or software to be contained in a single instance.
1. Create instance
Create a Ubuntu-based EC2 instance, start it. Write down the IP address, and the root pass you created for it.
You can probably adapt these instructions to Amazon-Linux instances, but they won’t work with apt-get
.
Make sure to enable TCP:8888
and SSH
ports in the security groups for inbound connections. Ideally, from your IP only.
Make sure to have at least 20-30 GBs of storage mounted in root for installing dependencies. I used 128 GBs.
2. First time setup
Once it’s created, connect to it and do a first time setup.
In your local computer:
After successfully connecting, run on the remote instance:
3. Setup a user-specific virtual environment
Add the following line:
Then keep executing as the jupyter
user:
4. Setup Jupyter Notebook as a service
Back to the ubuntu
user:
nano jupyter-notebook.service
Add the following contents to the jupyter.service
file:
Finally, enable the service:
At this point, the jupyter notebook server should be running.
5. Setup CUDA
From your own local machine, visit <ip>:8888
. You should be able to see the notebook server at this point.
You should be able to see an NVIDIA device installed. Look for the device driver line that says recommended
.
Reconnect and continue:
This should show you stats about the NVIDIA device.
This should show you an NVIDIA CUDA compiler version information, meaning that everything works.
6. Extra steps if TensorRT is not found
If tensorflow complains that it cannot initialize TensorRT, very likely for missing libraries, this is a patch to fix it.