QC & Assembly

A metagenome is a collection of genomic sequences from various (micro) organisms coexisting in a sample. They are snapshots that tell us about the taxonomic, metabolic or functional composition of the communities that we study.

In this lesson we will discuss how to define metagenomics and consider the challenges that this type of analysis can present. We will also discuss a workflow for metagenomics analysis.

We will then log into our cloud instance for the first time and take a look at some data. Then, we’ll go through the first two steps in our workflow: quality control and metagenome assembly.

By the end of this lesson you will be able to:

explain what metagenomics is, and the challenges it presents
interpret a FastQC plot summarizing per-base quality across all reads.
interpret the NanoPlot output summarizing a Nanopore sequencing run
filter Nanopore reads based on quality using the command line tool SeqKit
run a metagenomic assembly workflow
assess the quality of an assembly using SeqKit

IMPORTANT

You should be aware that some of the analyses in this lesson can take several hours to run - these will be completed outside of the taught lesson. You will receive guidance about this from your course instructors.

Schedule

00:00	1. Introduction to Metagenomics	What is metagenomics? When should we use metagenomics? What does a metagenomics project look like?
00:35	2. Logging onto the Cloud	How do I connect to an AWS instance?
01:20	3. Assessing Read Quality, Trimming and Filtering	How can I describe the quality of my data? How can we get rid of sequence data that doesn’t meet our quality standards? How do these methods differ when looking at Nanopore data?
02:50	4. Metagenome Assembly	Why do raw reads need to be assembled? How does metagenomic assembly differ from genomic assembly? How can we assemble a metagenome?
04:10	Finish

The actual schedule may vary slightly depending on topics and exercises chosen by the instructor.