User Guide

Learn how to use Skyway - RCC Cloud Solution Offering

## Requirements of using Skyway * Have an active RCC user account * Experience using the Midway cluster * Experience using the SLURM resource scheduler (optional) ## Access to Skyway * Login to the RR Midway3 cluster ```bash ssh [your-cnetid]@midway3.rcc.uchicago.edu ``` * Load the Skyway module on the login node ```bash module load skyway ``` ## Using Skyway on the RCC Midway3 cluster You submit jobs to cloud in a similar manner to what do on your HPC cluster. The difference is that you should specify different partitions and accounts corresponding to the cloud services you have access to. Additionally, the instance configuration should be specified via --constraint. ### List all the node types available to an account ```bash skyway_nodetypes --account=your-cloud-account skyway_nodetypes --account=your-gcp-account ``` To submit jobs to cloud, you must specify a type of virtual machine (VM) by the option `--constraint=[VM Type]`. For instance, the AWS VM types currently available through Skyway can be found in the table below. Other VM types will be included per requests. | VM Type | AWS EC2 Instance | Configuration | Description | | ------- | ---------------- | ------------- | ----------- | | t1 | t2.micro | 1 core, 1GB RAM | for testing and building software | | c1 | c5.large | 1 core, 4B RAM | for serial jobs | | c8 | c5.4xlarge | 8 cores, 32GB RAM | for medium sized multicore jobs | | c36 | c5.18xlarge | 36 cores, 144GB RAM | for large memory jobs | | g1 | p3.2xlarge | 4 cores, 61 GB RAM, 1x V100 GPU | for GPU jobs | | g4 | p3.8xlarge | 16 cores, 244 GB RAM, 4x V100 GPU | for heavy GPU jobs | | g5 | p5.2xlarge | 8 cores, 32 GB RAM, 1x A10G GPU | for heavy GPU jobs | | m24 | c5.12xlarge | 24 cores, 384GB RAM | for large memory jobs | The following steps show a representative workflow with Skyway. ### Allocate/provision an instance ```bash skyway_alloc --account=your-cloud-account --constraint=t1 --time=01:00:00 ``` For a GPU instance, use ```bash skyway_alloc -A your-cloud-account --constraint=g5 --time=00:30:00 ``` ### List all the running VMs with an account ```bash skyway_list --account=your-cloud-account ``` ### Transfer data To copy a file from the login node to the instance named `your-run` ```bash skyway_transfer -A your-cloud-account -J your-run training.py ``` Transfer a file from an instance to the login node ```bash skyway_transfer -A your-cloud-account -J your-run --from-cloud --cloud-path=~/output.txt $HOME/output.txt ``` ### Connect to the VM named your-run ```bash skyway_connect --account=your-cloud-account your-run ``` Once on the VM, do ```bash nvidia-smi source activate pytorch python training.py > ~/output.txt scp output.txt [yourcnetid]@midway3.rcc.uchicago.edu:~/ exit ``` At this point, there would be a file named output.txt in your Midway3 home folder. ### Stop/restart a job To stop an instance, you use the `skyway_stop` command with your account and the instance ID: ```bash skyway_stop -A=your-cloud-account -i [instanceID] ``` The stopped instance does not get charged and the data on the VM is preserved. To restart a stopped instance, you use the `skyway_restart` command with your account and the instance ID: ```bash skyway_restart -A=your-cloud-account -i [instanceID] -t 02:00:00 ``` ### Cancel/terminate/cancel a job ```bash skyway_cancel --account=your-cloud-account [job_name] ``` Expected behavior: The jobs (VMs) got terminated. When run `skyway_list` (step 3 above) the VM will not be present. Note that when a VM is terminated, all the data on the VM is erased. The user should transfer the intermediate output to a cloud storage space or to local storage. The following steps are for launching interactive and batch jobs. ### Submit an interactive job ```bash skyway_interative --account=your-cloud-account --constraint=t1 --time=01:00:00 ``` For a GPU instance, use ```bash skyway_interative --account=your-cloud-account --constraint=g5 -t 00:30:00 ``` Expected behavior: the user lands on a compute node or a VM on a separate terminal. ### Submit a batch job A sample job script `job_script.sh` is given as bellow ```bash #!/bin/sh #SBATCH --job-name=your-run #SBATCH --account=your-cloud-account #SBATCH --constraint=g1 #SBATCH --time=06:00:00 skyway_transfer training.py source activate pytorch wget [url-of-the-dataset] python training.py ``` To submit the job, use the `skyway_batch` command ```bash skyway_batch job_script.sh ``` If the job is terminated due to time limit, the instance is stopped. You can find the instance ID with the `skyway_list` command and restart it to back up the data. Transfer output data from cloud ```bash skyway_transfer -A your-cloud-account -J your-run --from-cloud --cloud-path=~/model*.pkl . ``` You can cancel the job using the `skyway_cancel` command. ## Troubleshooting For further assistance, please contact our Help Desk at [help@rcc.uchicago.edu](mailto:help@rcc.uchicago.edu).

Loading ...