de novo Short Read Paired-End and Mate-Paired Assembly
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How to do a de novo short read paired-end genome assembly using mate-pairs?
Objectives
Assembly by using multiple libraries with different insert sizes.
Being able to explain the terms
scaffolding
and ‘scaffolds’.
De Novo assembly (Paired End and Mate-pair libaries)
In most assembly projects multiple libraries with different insert sizes are used.
Here we will add an 2.5 kb Mate pair library.
Due to the library preparation the read orientation of these libraries are different: PE: → ←
, MP: ← →
OR → ←
SPAdes PE and MP assembly
Exercise
Assemble the trimmed Paired End library together with a Mate-pair library, located at:
~/asm_workshop/data/mp/MP_2.5kb_25x_?.fastq.gz
with SPAdes. Use as output direcoli_pe_mp
.We have to specify in SPAdes the paired-end (PE) and the mate-pair (MP) library by applying the –pex-x and –mpx-x flags. And also provide the read orientation for the Mate-pair library.
hint: run
spades.py -h
for the command options.Solution
$ spades.py -h
with –pe1-1 for read1 and –pe1-2 for read2 we specify the paired-end library. with –mp1-1 for read1 and –mp1-2 for read2 we specify the mate-pair library.
With –mp1-rf we inform SPAdes the mate-pair read orientation
<- (reverse), -> (forward)
The full command for the two libraries:
$ spades.py \ --pe1-1 ~/asm_workshop/data/trimmed_fastq/PE_600bp_50x_1.trim.fastq.gz \ --pe1-2 ~/asm_workshop/data/trimmed_fastq/PE_600bp_50x_2.trim.fastq.gz \ --mp1-1 ~/asm_workshop/data/mp/MP_2.5kb_25x_1.fastq.gz \ --mp1-2 ~/asm_workshop/data/mp/MP_2.5kb_25x_2.fastq.gz \ --mp1-rf \ -o ~/asm_workshop/results/ecoli_pe_mp
QUAST: compare PE and PE-MP assemblies
Exercise
Compare the
PE-MP
assembly with the assembly were we used only thepaired-end library
by usingQUAST
.What are the major differences between the two assemblies?
Use
quast_pe_mp
as output folder.Solution
$ quast.py \ ~/asm_workshop/results/ecoli_pe/contigs.fasta \ ~/asm_workshop/results/ecoli_pe_mp/contigs.fasta \ -o ~/asm_workshop/results/quast_pe_mp
In a new tab (local computer) in your terminal do:
$ scp YOUR-NETID@student-linux.tudelft.nl:~/asm_workshop/results/quast_pe_mp/report.html ~/Desktop/quast/report_pe_mp.html
Visualise the assembly graphs with Bandage (Optional)
Exercise
Download the
assembly graph
of thePE-MP
and compare it with theassembly graph
of thePE
assembly.Solution
In a new tab (local computer) in your terminal do:
scp YOUR-NETID@student-linux.tudelft.nl:~/asm_workshop/results/ecoli_pe_mp/assembly_graph.fastg \ ~/Desktop/bandage/assembly_graph_pe_mp.fastg
start Bandage and load the file assembly_graph_pe_mp.fastg.
Click on “Draw graph” and save as image (current view)
compare with the
PE
assembly graph.
Filter PE-MP assembly
Exercise
Filter out the smaller fragments from the
PE-MP
assembly by applying filterFasta_500bp.py like we did with thePE
assembly. But now we have to use the filescaffolds.fasta
since the mate-pair library was used for scaffolding. Usescaffolds_500bp.fasta
as output filename.After filtering apply assemblyStats.py on the filtered scaffold file.
Have we assembled the complete genome of E. coli K12 substr. MG1655? And how many scaffolds do we have?
Solution
$ filterFasta_500bp.py -i ~/asm_workshop/results/ecoli_pe_mp/scaffolds.fasta \ -o ~/asm_workshop/results/ecoli_pe_mp/scaffolds_500bp.fasta
Inspect the assembly statistics:
$ assemblyStats.py ~/asm_workshop/results/ecoli_pe_mp/scaffolds_500bp.fasta
Scaffold alignment
Exercise
Align the filtered scaffolds to the reference: (
~/asm_workshop/reference/Ecoli_K12_reference.fasta
). Use as a prefix:ecoli_pe_mp
.Make sure you are working in the mummer folder:
~/asm_workshop/results/mummer
Inspect the resulting
ecoli_pe_mp.png
plot. Has the mate-pair library improved the assembly?Solution
Use as working directory:
~/asm_workshop/results/mummer
$ cd ~/asm_workshop/results/mummer
Run nucmer
$ nucmer --prefix ecoli_pe_mp \ ~/asm_workshop/reference/Ecoli_K12_reference.fasta \ ~/asm_workshop/results/ecoli_pe_mp/scaffolds_500bp.fasta
nucmer has aligned all scaffolds to the reference.
Use mummerplot to plot the alignments:
$ mummerplot --png --layout --filter --prefix ecoli_pe_mp \ ~/asm_workshop/results/mummer/ecoli_pe_mp.delta \ -R ~/asm_workshop/reference/Ecoli_K12_reference.fasta \ -Q ~/asm_workshop/results/ecoli_pe_mp/scaffolds_500bp.fasta
A plot file ‘ecoli_pe_mp.png’ has been created. Download the file to your local computer and inspect the file.
In a new tab (local computer) in your terminal do:
$ scp YOUR-NETID@student-linux.tudelft.nl:~/asm_workshop/results/mummer/ecoli_pe_mp.png ~/Desktop/mummer/
Key Points