Week 4 Part 2: Nextgen Sequence Quality Control

Molb 4485/5485 – Computers in Biology

Nicolas Blouin and Vikram Chhatre

Wyoming INBRE Bioinformatics Core
Dept. of Molecular Biology
University of Wyoming
nblouin@uwyo.edu
vchhatre@uwyo.edu
http://molb4485.uwyo.online

Part 2

2.2 Sequence Quality Control with Trimmomatic

There are a lot of trimming tools. The one we will use today is called Trimmomatic. You can see its user manual by clicking on this link.

We need to decide how we want to trim. Note Trimmomatic trims in the order the commands are given.

In the case of the ERR reads from the 1000 Genomes (Human) project, we will want to trim the first 15 bps from the 5’-end (beginning of the sequence) as well as low quality reads. Then we will remove any remaining short sequences.

Note: Trimmomatic will remove duplicates as well. Normally, for genome assembly this is something that we would do. However in the interest of time we will skip this step. Please review the Trimmomatic manual for all of the settings and suggestions.

Now we can build our job command. Below you will see a new symbol (\) that allows us to use line breaks. Whenever the computer sees this symbol \, it knows you are finishing your command on the following line. After each \ hit the enter key and keep typing. Let’s make a batch script (also called a shell or shell script) using your text editor (vim). Call this shell script Trim.sh

#!/bin/bash
#SBATCH -J Trim
#SBATCH -n 1
#SBATCH --cpus-per-task=4
#SBATCH -t 30:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<your email address>
#SBATCH --account=inbre-train

echo "Loading required modules"

module load swset gcc trimmomatic

echo "Following modules have been loaded:"
module list

echo "Initiating Trimmomatic Run at $(date)"

trimmomatic PE \
    -threads 4 \
    ERR013161_1.filt.fastq.gz ERR013161_2.filt.fastq.gz \
    ./fwd_pair.fq ./fwd_unpair.fq ./rev_pair.fq ./rev_unpair.fq \
    HEADCROP:10 \
    SLIDINGWINDOW:4:24 \
    MINLEN:80

echo "Completed Trimmomatic Run at $(date)"

2.3 Task: Compare Trimmed Reads to Raw, Un-Trimmed Reads

  1. Run FastQC on the trimmed paired-end reads (only the two “pair” files i.e. fwd_pair and rev_pair). You may ignore the unpair files.

  2. Download the html files (scp) and look at them. What are at least 3 things that have changed? We encourage you to chat with your neighbors.

  3. Look in your slurm.out file to see how many reads were removed in your trim. What are some different choices, if any, you might make if this were your data? You might need to look at the User Manual for Trimmomatic for ideas.