Setting Up Your Cloud and Terminal Environments: Glossary

Key Points

Create Your AWS Account
  • Your new AWS account is automatically entitled to the one-year AWS Free Tier.

  • The Free Tier has some limits that you must observe in order not to incur unwanted costs.

  • The link to login to your account in the AWS Management Console is https://aws.amazon.com/console

Configure Your AWS Account
  • The Root account should only be used for high-level admin tasks.

  • IAM accounts should be created and configured for daily use.

  • Your IAM user account has the permissions of the (account) AdministratorAccess policy, and is hence sufficient for most practical purposes.

Configure Your Terminal Environment
  • Your terminal environments can be a Git Bash terminal running on a Windows computer; a Linux terminal running the Bash shell; a Mac terminal running either the Bash shell or the Zsh shell, but you must install or update the Bash shell to version 5.0 or higher.

  • You can also use a Linux or Mach terminal in a remote server through ssh.

  • The configuration of the AWS CLI sets your AWS account as the target account where the Scripts will create and manage AWS resources.

Configure Your AWS CloudShell Environment
  • The AWS CloudShell is a Bash shell terminal that runs in the browser and has the AWS CLI already installed and configured with programmatic access to your AWS account. Hence, only the Scripts need to be installed.

  • You must be logged in to the AWS Console to launch the AWS CloudShell.

  • The AWS CloudShell terminal runs on a Linux AWS instance with 1 GB storage.

  • Data and software you install in your home directory will be deleted after 120 days of not using the AWS CloudShell. You will receive warning emails before they are deleted.

  • For some applications, the key bindings in the AWS CloudShell may be slightly different to the key bindings for same applications in a desktop/laptop machine.

Glossary

accession
a unique identifier assigned to each sequence or set of sequences
categorical variable
Variables can be classified as categorical (aka, qualitative) or quantitative (aka, numerical). Categorical variables take on a fixed number of values that are names or labels.
cleaned data
data that has been manipulated post-collection to remove errors or inaccuracies, introduce desired formatting changes, or otherwise prepare the data for analysis
conditional formatting
formatting that is applied to a specific cell or range of cells depending on a set of criteria
CSV (comma separated values) format
a plain text file format in which values are separated by commas
factor
a variable that takes on a limited number of possible values (i.e. categorical data)
Gb
gigabyte of file storage or file size
Gbase
a gigabase represents one billion nucleic acid bases (Gbp may indicate one billion base pairs of nucleic acid)
headers
names at tops of columns that are descriptive about the column contents (sometimes optional)
metadata
data which describes other data
NGS
common acronym for “Next Generation Sequencing” currently being replaced by “High Throughput Sequencing”
null value
a value used to record observations missing from a dataset
observation
a single measurement or record of the object being recorded (e.g. the weight of a particular mouse)
plain text
unformatted text
quality assurance
any process which checks data for validity during entry
quality control
any process which removes problematic data from a dataset
raw data
data that has not been manipulated and represents actual recorded values
rich text
formatted text (e.g. text that appears bolded, colored or italicized)
string
a collection of characters (e.g. “thisisastring”)
TSV (tab separated values) format
a plain text file format in which values are separated by tabs
variable
a category of data being collected on the object being recorded (e.g. a mouse’s weight)