Extracting a Community Profile

In the previous episode we filtered our samples to separate out rRNA and mRNA sequences, however for our taxonomic analysis we shall be using our unfiltered data.

We shall be using a tool called Metaphlan, which aligns sequences with reference genomes in order to estimate which taxon those sequences originate from. Because rRNA is often used as a taxonomic marker, we will want to use our unfiltered data. The filtered data will be used instead for our functional analysis next lesson.

Taxonomic analysis methods

Often a metatranscriptomic analysis will also include data on the community structure of the microbiome being analysed. This data can be obtained in a few different ways:

(For more on different methodologies see the introduction to lesson 3).

What is Metaphlan?

Metaphlan (Metagenomic Phylogenetic Analysis) is one tool from the Biobakery suite of software, which all have bakery related names. Whilst this is a fun naming convention it can be somewhat confusing as to what each tool does.

Code

metaphlan data/trimmedreads/SRR6820491_merged.fastq  \
  --input_type fastq \
  --bowtie2db data/chocophlandb/ \
  --bowtie2out analysis/metaphlan/metaphlan_bowtie2.bz2 \
  --nproc 8 \
  --add_viruses \
  --tax_lev 's' \
  --min_cu_len 2000 \
  --read_min_len 70 \
  --stat_q 0.1 \
  --output_file analysis/metaphlan/metaphlan_output.txt \
  --min_ab 0.1 \
  &> analysis/metaphlan/log.txt &

##note if you want to re-run metaphlan you'll have to delete the original outputs