Data

The Data

This course uses data from the Galaxy Training Network tutorial on metatranscriptomics, which utilizes a dataset hosted on Zenodo (Kunath et al., 2018, ISME J). The dataset comes from a time-series analysis of a microbial community inside a bioreactor. For the purposes of this course, we focus on a single time point (the first) and one biological replicate (replicate A). The data consist of paired-end RNA-Seq sequences in FastQ format, representing the expressed functional potential of the microbial community.

The samples originate from a cellulose-degrading microbial consortium (SEM1b) enriched from a thermophilic biogas reactor in Norway. The consortium is co-dominated by Clostridium thermocellum, a cellulolytic bacterium, and multiple strains of Coprothermobacter proteolyticus, which together degrade plant biomass under anaerobic, high-temperature conditions. This enrichment was performed to study the active expression of carbohydrate-degrading genes within the community.