Polishing

Welcome back to this Metagenomics with High Performance Computing course!

In previous lessons we had an introduction to the command line and completed the first part of our worflow: we assessed the quality of long and short reads and used the long reads to generate a draft assembly.

Long read data are great for tasks like this because they produce a less fragmented assembly and are more likely to span areas with repeats. However, they are also more likely to contain sequencing errors than short read data.

We must therefore use further tools to improve the quality of our draft assembly. We can “polish” our assembly using both long and short read data. After that, we can perform quality control (QC) checks to see what impact the polishing has had.

By the end of this lesson you will be able to:

Schedule

00:00 1. Polishing an assembly Why do assemblies need to be polished?
What are the different purposes of polishing with short and long reads?
What software can we use to do long and short read polishing?
00:40 2. QC polished assembly Why would we quality control (QC) an assembly?
How can we perform QC on an assembly?
What metrics can we compare between assemblies to understand the quality of an assembly?
01:50 Finish

The actual schedule may vary slightly depending on topics and exercises chosen by the instructor.