Cloud-SPAN Genomics Course

Cloud-SPAN is a collaboration between the Department of Biology at the University of York and The Software Sustainability Institute funded by the UKRI innovation scholars award. It aims to train researchers to effectively generate and analyse a range of ‘omics data using Cloud computing resources.

We have found that people taking the Genomics module can vary the amount of experience they have had in navigating file systems and using the command line. We have designed another module, Prenomics, to prepare those with less experience for Genomics. We have a Self-assessment Quiz to help you decide if you would benefit from Prenomics before the Genomics module. The Prenomics module assumes no prior experience and is designed for absolute beginners.

The Prenomics and Genomics modules are based on the Data Carpentry’s Genomics Workshop.

Genomics teaches data management and analysis for genomics research including: (1) best practices for organization of bioinformatics projects and data, (2) use of command-line utilities to connect to and use cloud computing and storage resources, (3) use of command-line tools for data preparation, (4) use of command-line tools to analyze sequence quality and perform and automate variant calling.

The module is designed for a four half-day, tutor-led workshop, or for self study.

Getting Started

Carry out our short Self-assessment Quiz to help you decide if you would benefit from attending Prenomics before the Genomics module.

This lesson uses an Amazon Machine Instance (AMI). If you are attending a tutor-led workshop your AMI will be created for you and will be sent the log in information you will need to use it. If you are self-studying Genomics, you will need to set up your own instance using our Create Your Own AWS Instance module before starting.

Genomics presumes some familiarity with the biological concepts, including genomic variation, and some experience with using command line previously.

To get started, follow the directions in the Setup tab to get access to the required software and data for this workshop.


This course uses data from a long term evolution experiment published in 2016: Tempo and mode of genome evolution in a 50,000-generation experiment by Tenaillon O, Barrick JE, Ribeck N, Deatherage DE, Blanchard JL, Dasgupta A, Wu GC, Wielgoss S, Cruveiller S, Médigue C, Schneider D, and Lenski RE. (doi: 10.1038/nature18959)

All of the data used in this workshop is included in the AMI. However, you can also download it from Figshare. You can read about the data on the Data page.


Lesson Overview
Project management for cloud genomics Learn how to structure your data and metadata, plan for an NGS project, log onto a cloud instance and start navigating using the command line.
Data preparation and organisation Learn how to automate commonly used workflows, organise your file system for a new project, and use command-line tools to perform quality control.
Assessing read quality then trimming and filtering reads Learn how to identify the quality of data, then filter out poor quality data.
Finding sequence variants Learn how to align reads to a reference genome, identify and visualize between-sample variation.