Cloud-SPAN logo. Cloud-SPAN logo.
  • Home
  • Precourse Instructions
  • About
  1. QC & RNA pre-processing
  2. Introduction to Metatranscriptomics
  • Files and Directories
    • Understanding your file system
    • Logging onto the Cloud
    • Introducing the Shell
  • Using the Command Line
    • Navigating Files and Directories
    • Working with Files and Directories
    • Redirection
  • QC & RNA pre-processing
    • Introduction to Metatranscriptomics
    • Quality of Raw Reads
    • Ribosomal RNA Filtering
  • Taxonomic Annotation
    • Extracting a Community Profile
    • Visualising Community Structure
  • Functional Annotation
    • Extracting Functional Information
  • Extras
    • Data
    • Glossary
    • Workflow Reference

On this page

  • What is the difference between Genomics, Metagenomics, and Metatranscriptomics?
  • Metatranscriptomics
  • Metatranscriptome sequencing approaches
    • Technical challenges
  • Our data
  • Bioinformatic workflows
  • Next steps
  • Edit this page
  • Report an issue
  1. QC & RNA pre-processing
  2. Introduction to Metatranscriptomics

Introduction to Metatranscriptomics

What is the difference between Genomics, Metagenomics, and Metatranscriptomics?

Microbiomes play key roles in host health, disease, and environmental processes. Modern sequencing technologies now allow us to study these communities at multiple molecular levels: DNA (metagenomics), RNA (metatranscriptomics), proteins (metaproteomics), and metabolites (metabolomics).

In genomics, we sequence and analyse the genome of a single species. We often have a known reference genome to which we can align all our reads.

In metagenomics, we sequence DNA from samples composed of many genomes. These might be environmental samples from soil or anaerobic digestors, or samples from the skin or digestive tracts of animals. The goal is to understand which organisms are present and what functional potential they encode.

In metatranscriptomics, we sequence RNA from mixed communities. Instead of asking “What genes are present?”, we ask:

  • Which genes are actively being expressed?

  • How are organisms responding to environmental conditions?

Metatranscriptomics provides a snapshot of the active functional state of a microbial community at a specific point in time.

Metatranscriptomics

A metatranscriptome is the complete collection of RNA transcripts (usually mRNA, but also rRNA, tRNA and other RNAs) from all organisms in a given community at a given time.

While metagenomics tells us about functional capacity, metatranscriptomics tells us about functional activity.

Because RNA is unstable and expression levels change rapidly, metatranscriptomes are highly dynamic and sensitive to environmental conditions.

Analysing transcriptomes presents several key challenges:

  • RNA Stability: RNA degrades quickly, so careful sampling, storage and extraction are critical.
  • rRNA Dominance: Ribosomal RNA (rRNA) often makes up more than 80–90% of total RNA. It usually needs to be removed before sequencing to focus on mRNA.
  • Expression Variability: Gene expression levels vary across species, environmental conditions, time, and even individual cells.
  • Taxonomic Assignment: Assigning transcripts to the correct organism can be difficult, especially when reference genomes are unavailable.
  • Quantification: We must distinguish between high abundance because the organism is abundant and high abundance because the gene is highly expressed.

A typical metatranscriptomic workflow is designed to answer two questions:

  1. Which organisms are metabolically active in the sample?
  2. Which genes and pathways are being actively expressed?

Metatranscriptomics provides insight into the functional activity of a microbial community, rather than just its potential, allowing us to see which genes are actively being expressed. It captures dynamic responses to environmental changes or treatments and, when combined with taxonomic profiling, can link specific taxa to the metabolic pathways they are actively using. This approach is also particularly useful for identifying rare organisms that, while low in abundance, are transcriptionally active and would be missed in DNA-based analyses.

Metatranscriptome sequencing approaches

In metatranscriptomics, RNA is extracted from the microbial community and converted into complementary DNA (cDNA) for sequencing. This approach captures:

  • mRNA: coding transcripts, revealing active metabolic pathways.

  • rRNA and tRNA: structural and regulatory RNAs; rRNA is often removed or depleted because it dominates total RNA.

Metatranscriptomic sequencing begins with careful sample collection, as RNA is far more labile than DNA and degrades rapidly if not preserved correctly.

Samples are therefore typically treated immediately with RNA stabilisation reagents or flash frozen to maintain transcript integrity.

The next step is RNA extraction, which must be performed under conditions that minimise degradation and contamination; in some cases, RNA and DNA are co-extracted to allow comparison between gene expression and genomic potential.

Once total RNA has been isolated, ribosomal RNA (rRNA), which usually constitutes 80–90% of the total RNA pool, is depleted to enrich for messenger RNA (mRNA), the fraction that reflects protein-coding gene expression.

The enriched RNA is then reverse transcribed into complementary DNA (cDNA), as sequencing platforms require DNA rather than RNA as input.

Finally, the cDNA undergoes library preparation before sequencing, which can be performed using either short-read or long-read technologies depending on the research question and desired resolution.

Technical challenges

It is important to note that metatranscriptomics comes with a number of technical and analytical challenges.

As mentioned above, RNA is inherently unstable, so careful sample handling and storage are essential, as degradation can quickly affect data quality. Bias can also be introduced during extraction, since some microbes are more difficult to lyse than others, which may skew the representation of transcripts recovered. Although rRNA depletion is routinely performed, it is rarely completely efficient, and residual rRNA can still dominate the sequencing output. Because mRNA makes up only a small proportion of total RNA, deeper sequencing is often required to capture sufficient coding transcripts.

The downstream analysis is also computationally demanding, requiring specialised workflows for transcript quantification and normalisation. As with other sequencing approaches, functional interpretation relies on reference databases, which are incomplete for many microbial taxa, meaning that poorly characterised organisms may remain difficult to interpret.

Our data

This course uses data from the Galaxy Training Network tutorial on metatranscriptomics, which utilizes a dataset hosted on Zenodo (Kunath et al., 2018, ISME J). The dataset comes from a time-series analysis of a microbial community inside a bioreactor. For the purposes of this course, we focus on a single time point (the first) and one biological replicate (replicate A). The data consist of paired-end RNA-Seq sequences in FastQ format, representing the expressed functional potential of the microbial community.

The samples originate from a **cellulose-degrading microbial consortium (SEM1b)** enriched from a thermophilic biogas reactor in Norway. The consortium is co-dominated by *Clostridium thermocellum*, a cellulolytic bacterium, and multiple strains of *Coprothermobacter proteolyticus*, which together degrade plant biomass under anaerobic, high-temperature conditions. This enrichment was performed to study the active expression of carbohydrate-degrading genes within the community.

Bioinformatic workflows

When working with high-throughput RNA-sequencing (RNA-seq) data, the raw reads obtained from the sequencer need to pass through several tools to generate meaningful biological insights.

The use of a defined set of tools in a specific order is commonly referred to as a workflow or pipeline.

Here is an example of the workflow we will follow for metatranscriptomics analysis, with a brief description of each step:

  • Sequence reads – obtaining raw RNA-seq reads from a sample via sequencing.

  • Quality control – assessing read quality, trimming adapters, and filtering low-quality or unwanted sequences (e.g., rRNA).

  • Taxonomic profiling – estimating the microbial composition of the community from expressed transcripts.

  • Functional profiling – identifying expressed genes, gene families, and metabolic pathways to understand the active functions of the microbiome.

  • Visualization and interpretation – generating plots, tables, and interactive charts to explore taxonomic and functional patterns.

Workflows in bioinformatics often adopt a modular approach, allowing the output of one tool to serve as the input for the next. Standard data formats such as FASTA and FASTQ make this possible, and many tools assume data are provided in these formats.

A more detailed version of the workflow, including all steps and program names used throughout this tutorial, can be found in Extras → Workflow Reference.

Next steps

Hopefully you now feel ready to start following our workflow to analyse our data. We’ll be guiding you through the steps and giving more context for each one as we go along. Let’s go!

Back to top
QC & RNA pre-processing
Quality of Raw Reads

Licensed under CC-BY 4.0 2021–24 by Cloud-SPAN

  • Edit this page
  • Report an issue
Cookie Preferences