de novo Short Read Paired-End and Mate-Paired Assembly
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How to do a de novo short read paired-end genome assembly using mate-pairs?
Objectives
Assembly by using multiple libraries with different insert sizes.
Being able to explain the terms
scaffoldingand ‘scaffolds’.
De Novo assembly (Paired End and Mate-pair libaries)
In most assembly projects multiple libraries with different insert sizes are used.
Here we will add an 2.5 kb Mate pair library.
Due to the library preparation the read orientation of these libraries are different: PE: → ← , MP: ← → OR → ←
SPAdes PE and MP assembly
Exercise
Assemble the trimmed Paired End library together with a Mate-pair library, located at:
~/asm_workshop/data/untrimmed_fastq/MP_2.5kb_?.fastq.gzwith SPAdes. Use as output dirspades_pe_mp.We have to specify in SPAdes the paired-end (PE) and the mate-pair (MP) library by applying the –pex-x and –mpx-x flags. And also provide the read orientation for the Mate-pair library.
hint: run
spades.py -hfor the command options.Solution
$ spades.py -hwith –pe1-1 for read1 and –pe1-2 for read2 we specify the paired-end library.
with –mp1-1 for read1 and –mp1-2 for read2 we specify the mate-pair library.
With –mp1-rf we inform SPAdes the mate-pair read orientation
<- (reverse), -> (forward)The full command for the two libraries:
$ spades.py \ --pe1-1 ~/asm_workshop/data/trimmed_fastq/PE_600bp_1.trim.fastq.gz \ --pe1-2 ~/asm_workshop/data/trimmed_fastq/PE_600bp_2.trim.fastq.gz \ --mp1-1 ~/asm_workshop/data/untrimmed_fastq/MP_2.5kb_1.fastq.gz \ --mp1-2 ~/asm_workshop/data/untrimmed_fastq/MP_2.5kb_2.fastq.gz \ --mp1-rf \ -o ~/asm_workshop/results/spades_pe_mp
QUAST: compare PE and PE-MP assemblies
Exercise
Compare the
PE-MPassembly with the assembly were we used only thepaired-end libraryby usingQUAST.What are the major differences between the two assemblies?
Use
quast_pe_mpas output folder.Solution
$ quast.py \ ~/asm_workshop/results/spades_pe/contigs.fasta \ ~/asm_workshop/results/spades_pe/scaffolds.fasta \ ~/asm_workshop/results/spades_pe_mp/contigs.fasta \ ~/asm_workshop/results/spades_pe_mp/scaffolds.fasta \ -R ~/asm_workshop/reference/Ecoli_K12_reference.fasta \ -o ~/asm_workshop/results/quast_pe_mpOpen the generated report.txt file
$ less ~/asm_workshop/results/quast_pe_mp/report.txt
Scaffold alignment
Exercise
Align the filtered scaffolds to the reference: (
~/asm_workshop/reference/Ecoli_K12_reference.fasta). Use as a prefix:spades_pe_mp.Make sure you are working in the mummer folder:
~/asm_workshop/results/mummerInspect the resulting
spades_pe_mp.pngplot. Has the mate-pair library improved the assembly?Solution
Use as working directory:
~/asm_workshop/results/mummer$ cd ~/asm_workshop/results/mummerRun nucmer
$ nucmer --prefix spades_pe_mp \ ~/asm_workshop/reference/Ecoli_K12_reference.fasta \ ~/asm_workshop/results/spades_pe_mp/scaffolds.fastanucmer has aligned all scaffolds to the reference.
Use mummerplot to plot the alignments:
$ mummerplot --png --layout --filter --prefix spades_pe_mp \ ~/asm_workshop/results/mummer/spades_pe_mp.delta \ -R ~/asm_workshop/reference/Ecoli_K12_reference.fasta \ -Q ~/asm_workshop/results/spades_pe_mp/scaffolds.fastaA plot file ‘ecoli_pe_mp.png’ has been created. Download the file to your local computer and inspect the file.
In a new tab (local computer) in your terminal do:
$ scp YOUR-NETID@vm0X-bt-edu.tnw.tudelft.nl:~/asm_workshop/results/mummer/spades_pe_mp.png ~/Desktop/mummer/
Visualise the assembly graphs with Bandage (Optional)
Exercise
Download the
assembly graphof thePE-MPand compare it with theassembly graphof thePEassembly.Solution
In a new tab (local computer) in your terminal do:
scp YOUR-NETID@vm0X-bt-edu.tnw.tudelft.nl:~/asm_workshop/results/spades_pe_mp/assembly_graph.fastg \ ~/Desktop/bandage/spades_pe_mp_graph.fastgstart Bandage and load the file assembly_graph_pe_mp.fastg.
Click on “Draw graph” and save as image (current view)
compare with the
PEassembly graph.
Key Points