TORQUE resource manager

When to use TORQUE

You are expected to submit a TORQUE job if your task will take more than one hour OR use more than 8 cores. The maximum number of cores one can request is 40. The remaining 8 cores are reserved for short jobs and exploratory development. Please be mindful of your usage on these remaining 8 cores as they are shared without any management.

Purpose of using a resource manager

We are using TORQUE as a scheduler of long term jobs. Given some basic specifications (max runtime, number of CPUs required, etc.), TORQUE attempts to schedule the CPU time efficiently. The objective is to maximize the throughput of the system (for all users), given a set of jobs. This allows users to simply send their jobs to the queue and forget about them, rather than waiting around for the machine to become free at an undetermined time.

Overview of how TORQUE works

  1. A job is submitted using the command 'qsub' and a (short) script
  2. If there are no other jobs, then the scheduler will see if it can fit it
  3. If it cannot fit the job, it will attempt to schedule the job in the earliest window where the resource availability it met (according to the scheduler, which may not necessarily be the truth)

Looking at the current configuration

The current configuration can be checked by running:

qmgr -c 'print server'

On our system it looks something like this:

create queue batch
set queue batch queue_type = Execution
set queue batch resources_max.mem = 250gb
set queue batch resources_max.ncpus = 40
set queue batch resources_max.nodes = 1
set queue batch resources_max.walltime = 840:00:00
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = mcb
set server managers = pimentel@mcb.math
set server operators = pimentel@mcb.math
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 40

Note that the first line indicates 'create queue batch'. Since there are no other queues at the moment (this might change in the future), you should submit your jobs to the queue named 'batch'.

Writing a script

A TORQUE script is basically a BASH script, with some special comments at the top. These comments (prefixed by #PBS with no spaces) are read by the TORQUE parser which define the parameters of the job. Here are the minimum required arguments:

#PBS -l nodes=1:ppn=2,mem=4g,walltime=10:00:00
#PBS -q batch
#PBS -V

We will go through them line-by-line:

  1. -l are server specific requirements
    1. nodes defines the number of nodes. We currently only have one node configured (so leave it at one)
    2. ppn denotes the number of processors per node. This can be set from 1 to 40. NB: If your job is not able to process in parallel (i.e. multithreaded), then requested more than 1 CPU will only potentially hurt your scheduling time
    3. mem denotes the maximum amount of memory. In this case it is set to 4GB
    4. walltime denotes the maximum number of time the job will run.
The format is hours:minutes:seconds. This should be an upper bound. If it takes longer than the specified walltime, then the job will automatically be terminated. If the walltime is far greater than the realist maximum, then it will take a long time to be scheduled
  1. -q batch defines the queue you will be submitting to. At the time of this writing, you should simply leave as it
  2. -V is not required, but often helpful. This exports your current environment variables to the job

Other useful arguments are:
  • #PBS -N jobName changes the name of the job (useful in qstat output)
  • #PBS -o fileName denotes the location to pipe standard out
  • #PBS -e fileName denotes the location to pipe standard error
  • #PBS -d directory changes the default working directory (can also be done using cd in the script as in the example)

Example script

Here is an example of doing a Bowtie2 alignment:

#PBS -l nodes=1:ppn=25,mem=40g,walltime=96:00:00
#PBS -q batch
#PBS -V
 
# $DIR is the directory
# $FASTQ is the prefix
# $IDX is the prefix to the bowtie2 index
# $OUT is the prefix for the BAM file
 
cd /home/pimentel/er/human
 
allFastq=`ls $DIR/$FASTQ*.fastq`
allFastq=$(sed 's/ /,/g' <<< $allFastq)
 
bowtie2 -a -p 25 -x $IDX -U $allFastq | samtools view -Sb - > $DIR/$OUT.bam

This job will use 25 processors with 40 GB of memory (probably much too high, but ok), with a maximum walltime of 96 hours.

There are some comments about variables. Notice that they are used within the code, but never set. These variables are set during the qsub command execution.

The job will change directories, then take all the files matching that particular ls. It will then take all the files, replace spaces with commas (as Bowtie2 requires). It then runs a job as you normally would.

You could save this code in a file, say bowtie2.sh and then run your qsub command:

qsub -v DIR=/home/pimentel/rawReads/,FASTQ=exp1,IDX=/home/pimentel/bwt2_idx/ensembl-human,OUT=alignments bowtie2.sh

Since I used the -V command as well, I could have simply defined these variables using BASH and made a simpler qsub command:

$ export DIR=/home/pimentel/rawReads/
$ export FASTQ=exp1
$ export IDX=/home/pimentel/bwt2_idx/ensembl-human
$ export OUT=alignments
$ qsub bowtie2.sh

*Note: I'm not sure, but I think you can omit the 'export' part of the commands. Need to test.

If you are running several alignments, this could be useful as you will likely be using the same OUT and IDX names, and just need to change DIR and FASTQ.

This will find all exp1*.fastq files in /home/pimentel/rawReads, map them against the bowtie index ensembl-human* and name the file alignments.bam in the rawReads directory

It will also write the files bowtie2.sh.oX and bowtie2.sh.eX for the standard out and standard error, respectively, and where X is the job number. This can be changed by specifying:

#PBS -e /place/to/standard/error
#PBS -o /place/to/standard/out

Checking the status of a job

You can check the status of a job with the qstat command.

My favorite quick check is the command
qstat -a

Which has a bit more information than qstat

If you are unsure about the configuration of your jobs, you can use
qstat -f
which will give you a lot more information.

Before the job completes, the standard out and standard error files, respectively, are located here:
/var/spool/torque/spool/[job id].localhost.OU
/var/spool/torque/spool/[job id].localhost.ER
Job ID can be found using qstat as described above. After completion, these files will be moved to the directory from which the job was started (or where specified if #PBS -o and #PBS -e were used).

Changing the configuration of a job already running

The command to change an existing job configuration is qalter. You should first run qstat to find the job ID. Then run the appropriate qalter command. For example:

qalter 2899.localhost -l walltime=40:00:00,ppn=3

changes the walltime to 40 hours and the number of processors per node to 3.

TODO

  • Setup email notifications when a job terminates (completion or error). Any takers?
  • Setup different queues depending on our usage (i.e. a long job queue, shorter job queue, etc.)
  • Include links to other sources