Case Studies in Experimental Design: Glossary

Key Points

RNAseq example
  • Biological replication is much more important than technical replication in giving the experimental design power. Technical replication is still better than no replication.

  • We need to have appropriate controls to test hypotheses and we need to be aware of confounding variables in our designs.

AD metabolomics example
  • Due to how LCMS data (proteomics or metabolomics) is generated, we need to consider how our sample running order can be altered to reduce biases in the collected data

  • Randomised block designs are better for LCMS data because it reduces drift between calibrations. It also reduces the chance of accidental severe imbalances in run order

  • Multiple QC injections and blanks are used to condition the column prior to the first samplesbeing injected to reduce carryover from previous experiments

E. coli long-term evolution experiment example
  • Most experiments are not long term compared to this LTEE experiment, however it does allow us to perform additional analyses we are not able to do with shorter term experiments.

  • For experiments such as this, where measurements are taken over a long time course, making sure all record keeping is up to date stops the data from being redundant.

Glossary

accession
a unique identifier assigned to each sequence or set of sequences
categorical variable
Variables can be classified as categorical (aka, qualitative) or quantitative (aka, numerical). Categorical variables take on a fixed number of values that are names or labels.
cleaned data
data that has been manipulated post-collection to remove errors or inaccuracies, introduce desired formatting changes, or otherwise prepare the data for analysis
conditional formatting
formatting that is applied to a specific cell or range of cells depending on a set of criteria
CSV (comma separated values) format
a plain text file format in which values are separated by commas
factor
a variable that takes on a limited number of possible values (i.e. categorical data)
Gb
gigabyte of file storage or file size
Gbase
a gigabase represents one billion nucleic acid bases (Gbp may indicate one billion base pairs of nucleic acid)
headers
names at tops of columns that are descriptive about the column contents (sometimes optional)
metadata
data which describes other data
NGS
common acronym for “Next Generation Sequencing” currently being replaced by “High Throughput Sequencing”
null value
a value used to record observations missing from a dataset
observation
a single measurement or record of the object being recorded (e.g. the weight of a particular mouse)
plain text
unformatted text
quality assurance
any process which checks data for validity during entry
quality control
any process which removes problematic data from a dataset
raw data
data that has not been manipulated and represents actual recorded values
rich text
formatted text (e.g. text that appears bolded, colored or italicized)
string
a collection of characters (e.g. “thisisastring”)
TSV (tab separated values) format
a plain text file format in which values are separated by tabs
variable
a category of data being collected on the object being recorded (e.g. a mouse’s weight)