Managing AWS instances: Glossary

Key Points

Configure Instances Internet Access
  • If you are using a personal account, the base domain name you created will incur some minor cost.

  • If you are using a institutional account, the base domain name may need to be created by your IT department, and you may need to use a subnet managed by your IT department.

Instances Management Tasks Using the Scripts
  • The Scripts create and manage one or multiple AWS instances based on the configuration files “instancesNamesFile.txt”, resourcesIDs.txt and tags.txt.

  • You can choose a different name for “instancesNamesFile.txt”, and you can use multiple “instancesNamesFile…” files to handle unforseen instances management requests.

  • Those files must be placed inside a directory called inputs, and inputs inside a “course/workshop” directory whose name you can choose.

  • You run each of the Scripts by specifying only the path of your “instancesNamesFile.txt” as a parameter, for example csinstances_create.sh course-name/inputs/instancesNamesFile.txt.

  • You should not run the Scripts inside a “course” directory without specifying the name of the “course” directory — use the Tab key to help you complete the commands that run the Scripts.

  • It is convenient to be logged in to the AWS Console, in the EC2 - Instances page, to check the runs of the Scripts affect the target instances state as expected, and to quickly reboot an instance to run the configuration step if needed.

AMIs Management
  • Creating an AMI involves three main steps (1) creating an instance, (2) configuring the instance as required regarding software and data, and (3) registering the configured instance as an AWS AMI.

  • Step (1) — we create and manage instances to-become-AMIs in the same way we manage instances for courses but use a different directory called amis at the same level as courses. We also manage a tests directory at the same level for managing instances for tests.

  • Step (3) — we register/create AMIs, make them public or private, and delete them using the AWS Console.

  • Step (2) — configuring instances to-become-AMIs may be simple or rather complex depending on what needs to be updated, and may involve configuring that enables use of specific AWS (virtualised) “hardware platform” components (cf. ENA or Elastic Network Adapters). We use scripts whenever possible to help configuring such instances. It helps reducing mistakes, automating complex tasks, and documenting our work.

The Scripts Design
  • The Scripts comprise a modular organisation wherein each script is a module with a specific task that can be applied to one or multiple instances.

  • The Scripts communicate through shared files in the inputs and outputs directories of a “course”.

  • The AWS CLI Command Reference was our starting point to develop the Scripts based on the code examples therein.

Glossary

accession
a unique identifier assigned to each sequence or set of sequences
categorical variable
Variables can be classified as categorical (aka, qualitative) or quantitative (aka, numerical). Categorical variables take on a fixed number of values that are names or labels.
cleaned data
data that has been manipulated post-collection to remove errors or inaccuracies, introduce desired formatting changes, or otherwise prepare the data for analysis
conditional formatting
formatting that is applied to a specific cell or range of cells depending on a set of criteria
CSV (comma separated values) format
a plain text file format in which values are separated by commas
factor
a variable that takes on a limited number of possible values (i.e. categorical data)
Gb
gigabyte of file storage or file size
Gbase
a gigabase represents one billion nucleic acid bases (Gbp may indicate one billion base pairs of nucleic acid)
headers
names at tops of columns that are descriptive about the column contents (sometimes optional)
metadata
data which describes other data
NGS
common acronym for “Next Generation Sequencing” currently being replaced by “High Throughput Sequencing”
null value
a value used to record observations missing from a dataset
observation
a single measurement or record of the object being recorded (e.g. the weight of a particular mouse)
plain text
unformatted text
quality assurance
any process which checks data for validity during entry
quality control
any process which removes problematic data from a dataset
raw data
data that has not been manipulated and represents actual recorded values
rich text
formatted text (e.g. text that appears bolded, colored or italicized)
string
a collection of characters (e.g. “thisisastring”)
TSV (tab separated values) format
a plain text file format in which values are separated by tabs
variable
a category of data being collected on the object being recorded (e.g. a mouse’s weight)