Data preparation and organisation

In a previous lesson, you learned how to organise a sequencing project. You also learned about metadata best practises, and what information you will need, and how this is best stored. Finally, you learned how to connect to a cloud instance via a command line interface.

As you progress through this lesson, keep in mind that, even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatic tools. What you learn here will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.

Getting Started

This lesson assumes no prior experience with the tools covered in the course. However, learners are expected to have some familiarity with biological concepts, including the concept of genomic variation within a population, as well as some basic experience using a command line interface to navigate file systems.

For a beginner-level overview of the command line, see the Cloud-SPAN Prenomics pages. If you are unsure whether your skills/experience are sufficient, why not try our self-assessment quiz to test your knowledge?

This lesson is part of a course that uses data hosted on an Amazon Machine Instance (AMI). Course participants will be given information on how to log-in to the AMI during the course. Information on preparing for the course is provided on the Cloud-SPAN Genomics setup page.

Schedule

00:00 1. Writing Scripts and Working with Data How can we automate a commonly used set of commands?
00:40 2. Project organisation How can I organise my file system for a new bioinformatics project?
How can I document my work?
01:10 3. Background and Metadata What data are we using?
Why is this experiment important?
01:25 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.