class: center, middle, inverse, title-slide .title[ # Introduction to R for complete beginners. ] .subtitle[ ## Cloud-SPAN. ] .author[ ### Evelyn Greeves and Pasky Miranda (slides by Emma Rand) ] .institute[ ### University of York, UK ] --- <style> div.blue { background-color:#b0cdef; border-radius: 5px; padding: 20px;} div.grey { background-color:#d3d3d3; border-radius: 0px; padding: 0px;} </style> # Your opinions count! We will be sending you on an online evaluation form after the course. Your feedback really helps us to plan future courses and justify our funding from UKRI. **When you submit your form you will be automatically entered into a draw for a £25 Amazon voucher to say thank you 💸** --- # Overview * Finding your way round RStudio * Typing in data, doing some calculations on it, plotting it * Understanding the manual * Working with a dataset * Importing data: working directories and paths * Summarising and visualising with the [`tidyverse`](https://www.tidyverse.org/) <img src="images/tidyverse_logo.png" width="160px" style="display: block; margin: auto 0 auto auto;" /> --- class: inverse # Finding your way round RStudio --- # RStudio: live demonstration Overview [Larger](http://www-users.york.ac.uk/~er13/RStudio%20Anatomy.svg). **Will be followed be a recap** <img src="http://www-users.york.ac.uk/~er13/RStudio%20Anatomy.svg" width="600px" /> There is an [RStudio cheatsheet](http://www-users.york.ac.uk/~er13/rstudio-ide.pdf) which covers more advanced RStudio features. --- # RStudio: Recap * the panels * making yourself comfortable * typing in the console sending commands * using R as a calculator * assigning values * where to see objects * using a script - make sure to execute * comments \# * data types and structures * functions `c()`, `class()` and `str()` * types of R files: .R, .RData .RHistory --- # RStudio: Recap .pull-left[ Top left Panel * Script - write and edit code and comments to keep --- Bottom left Panel * Console - where commands get executed and can be typed ] .pull-right[ Top right Panel * Environment - see your objects * History - of commands --- Bottom right Panel * Files - a file explorer * Packages - those installed and a method of installing * Help - the manual * Plots ] --- # RStudio: Recap Type of file * .R a script file: code and comments * .RData: a environment file also known as a workspace. Objects but no code * .RHistory: everything you typed, mostly wrong! Using a script * any R code can be executed from a script * code can be (should be!) commented * comments start with a `#` --- # RStudio: Recap Data types and structures These are the most commonly needed but there are others .pull-left[ Types * numeric * integer * logical * character ] .pull-right[ Structures * vectors * factors * dataframes ] --- class: inverse # Typing in data and plotting it --- # Typing in data and plotting it ## The goal We will work with some data on the coat colour of 62 cats. You are going to type data in R and plot it The data are as a frequency table: --- .pull-left[ The frequency table <table style="width:30%; margin-left: auto; margin-right: auto;" class="table"> <caption>Frequency of coat colours in 62 cats</caption> <thead> <tr> <th style="text-align:left;"> Coat colour </th> <th style="text-align:right;"> No. cats </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> black </td> <td style="text-align:right;"> 23 </td> </tr> <tr> <td style="text-align:left;"> white </td> <td style="text-align:right;"> 15 </td> </tr> <tr> <td style="text-align:left;"> tabby </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:left;"> ginger </td> <td style="text-align:right;"> 10 </td> </tr> <tr> <td style="text-align:left;"> tortoiseshell </td> <td style="text-align:right;"> 5 </td> </tr> <tr> <td style="text-align:left;"> calico </td> <td style="text-align:right;"> 1 </td> </tr> </tbody> </table> ] .pull-right[ The aim <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-4-1.png" width="288" /> ] --- # Typing in data and plotting it ## Getting set up In RStudio do File | New project | New directory Be purposeful about where you create it and name it. I suggest `cloud-span` -- Make a new script ![new script](images/newscript.png) and save it as `analysis.R` to carry out the rest of the work. --- # Typing in data and plotting it ## Data structures Start by making a vector called `coat` that holds coat colours ```r # coat colours coat <- c("black", "white", "tabby", "ginger", "tortoiseshell", "calico") ``` - Write your command in the analysis.R -- - Notice I have used a comment -- - Cursor on the line you want to execute -- - Execute with ![Run](images/runbutton.png) or Ctrl+Enter --- # Typing in data and plotting it ## Data structures Create a vector called `freq` containing the numbers of cats with each coat colour -- ```r # numbers of cats with each coat colour freq <- c(23, 15, 8, 10, 5, 1) ``` --- # Typing in data and plotting it ## Total number of cats We can use `sum(freq)` to check the total number of cats is 62. ```r # the total number of cats sum(freq) ``` ``` ## [1] 62 ``` --- class: inverse # Plotting the data with ggplot() --- background-image: url(images/ggplot2.png) background-position: 90% 75% background-size: 200px # Typing in data and plotting it Commands like `c()`, `sum()`, and `str()` are part the 'base' R system. Base packages (collections of commands) always come with R. -- Other packages, such as `ggplot2` (Wickham, 2016) need to be added. `ggplot2` is one of the `tidyverse` packages. --- background-image: url(images/tidyverse.png) background-position: 90% 75% background-size: 200px # Typing in data and plotting it You should have already installed `tidyverse` but we need to load it (add it to our library) before we can use it in a session. ```r library(tidyverse) ``` You will likely be warned of some function name conflicts but these will not be a problem for you. -- We will also later use `dplyr` and `tidyr` functions also from `tidyverse`. -- `ggplot2` is the name of the package `ggplot()` is its most important command --- # Plotting using ggplot2 ## Data structure for `ggplot()` `ggplot()` takes a dataframe for an argument We can make a dataframe of the two vectors, `coat` and `freq` using the `data.frame()` function. ```r coat_data <- data.frame(coat, freq) ``` -- `coat_data` is the name we have given the dataframe Click on `coat_data` in the Environment. --- # Plotting using ggplot2 ## A barplot Create a simple barplot using `ggplot` like this: ```r ggplot(data = coat_data, aes(x = coat, y = freq)) + geom_col() ``` <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-10-1.png" width="288" /> --- # Plotting using ggplot2 ## A barplot `ggplot()` alone creates a blank plot. -- `ggplot(data = coat_data)` looks the same. -- `aes()` gives the 'Aesthetic mappings'. How variables (columns) are mapped to visual properties (aesthetics) e.g., axes, colour, shapes. Thus... --- # Plotting using ggplot2 ## A barplot `ggplot(data = coat_data, aes(x = coat, y = freq))` produces a plot with axes -- `geom_col` A 'Geom' (Geometric object) gives the visual representations of the data: points, lines, bars, boxplots etc. --- class: inverse # Using the help manual --- # Using the help manual 'Arguments' can be added to the `geom_col()` command inside the brackets. Commands do something and their arguments (in brackets) and can specify: - what object to do it to - how exactly to do it -- Many arguments have defaults so you don't always need to supply them. --- # Using the help manual Open the manual page using: ```r ?geom_col() ``` ## Demonstration --- # Using the help manual: Recap * **Description** an overview of what the command does * **Usage** lists argument * form: argument name = default value * some arguments MUST be supplied others have defaults * ... means etc * **Arguments** gives the detail about the arguments * **Details** describes how the command works in more detail * **Value** gives the output of the command * Don't be too perturbed by not fully understanding the information --- # Using manual: Alter a ggplot Change the fill of the bars using `fill`: ```r ggplot(data = coat_data, aes(x = coat, y = freq)) + geom_col(fill = "lightblue") ``` .pull-left[ <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-12-1.png" width="288" /> ] -- .pull-right[ Colours can be given by their name, "lightblue" or code, "#ADD8E6". Look up by [name](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf) or [code](https://htmlcolorcodes.com) ] --- # Using manual: Alter a ggplot `fill` is one of the arguments covered by `...`. `fill` is an 'aesthetic'. If you look for `...` in the list of arguments you will see it says: > Other arguments passed on to layer(). These are often aesthetics, used to set an aesthetic to a fixed value, like colour = "red" or size = 3. They may also be parameters to the paired geom/stat. We just set the `fill` aesthetic to a fixed value. --- # Using manual: Alter a ggplot Further down the manual, there is a section on **Aesthetics** which lists those understood by `geom_col()` We can set (map) the `fill` aesthetic to a fixed colour inside `geom_col()` *or* map it to a variable from the dataframe inside the `aes()` instead. This means the colour will be different for different values in that variable. --- # Using manual: Alter a ggplot ```r ggplot(data = coat_data, aes(x = coat, y = freq, fill = coat)) + geom_col() ``` .pull-left[ <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-13-1.png" width="288" /> ] -- .pull-right[ Mapping fill to variable means the colour varies for each value of n. Note that we have taken `fill = "lightblue"` out of the `geom_col()` and instead put `fill = coat` in the `aes()`. ] --- # Using manual: Alter a ggplot Can you use the manual to put the bars next to each other? Look for the argument that will mean there is no space between the bars. .footnote[ <br> <span style=" font-weight: bold; color: rgba(246, 250, 253, 255) !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: rgba(37, 73, 107, 255) !important;" >Extra exercise:</span> Change the colour of the lines around each bar to black.] --- ```r ggplot(data = coat_data, aes(x = coat, y = freq)) + geom_col(fill = "lightblue", width = 1) ``` <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-14-1.png" width="288" /> --- # Using manual: Alter a ggplot <span style=" font-weight: bold; color: rgba(246, 250, 253, 255) !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: rgba(37, 73, 107, 255) !important;" >Extra exercise:</span> Change the colour of the lines around each bar to black. .pull-left[ ```r ggplot( data = coat_data, aes(x = coat, y = freq)) + geom_col(fill = "lightblue", width = 1, colour = "black") ``` ] .pull-right[ <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-15-1.png" width="288" /> ] --- # Top Tip <div class = "blue"> .font100[ Make your code easier to read by using white space and new lines * put spaces around `=` , `->` and after `,` * use a newline after every comma in a command with lots of arguments ] </div> --- # Alter a ggplot: axes We can make changes to the axes using: - Changes to a discrete x axis: `scale_x_discrete()` - Changes to a continuous y axis: `scale_y_continuous()` `ggplot` automatically extends the axes slightly. You can turn this behaviour off with the expand argument. -- Each 'layer' is added to the ggplot() command with a `+` --- # Alter a ggplot: axes ```r ggplot(data = coat_data, aes(x = coat, y = freq)) + geom_col(fill = "lightblue", width = 1, colour = "black") + * scale_x_discrete(expand = c(0, 0)) + * scale_y_continuous(expand = c(0, 0)) ``` .pull-right[ .footnote[ <span style=" font-weight: bold; color: rgba(253, 249, 246, 255) !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: rgba(37, 73, 107, 255) !important;" >Extra exercise:</span> Look up `scale_x_discrete` in the manual and work out how to change "coat" to "Coat colour"] ] --- # Alter a ggplot: axes <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-16-1.png" width="288" /> --- # Alter a ggplot: axes ```r ggplot(data = coat_data, aes(x = coat, y = freq)) + geom_col(fill = "lightblue", width = 1, colour = "black") + scale_x_discrete(expand = c(0, 0), * name = "Coat colour") + scale_y_continuous(expand = c(0, 0), * name = "Number of cats") ``` --- # Alter a ggplot: axes <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-17-1.png" width="288" /> --- # Alter a ggplot: axes I would prefer to see the *y*-axis extend a little beyond the data and we can change the axis "limits" in the `scale_y_continuous()` --- # Alter a ggplot: axes ```r ggplot(data = coat_data, aes(x = coat, y = freq)) + geom_col(fill = "lightblue", width = 1, colour = "black") + scale_x_discrete(expand = c(0, 0), name = "Coat colour") + scale_y_continuous(expand = c(0, 0), name = "Number of cats", * limits = c(0, 25)) ``` --- # Alter a ggplot: axes <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-18-1.png" width="288" /> --- # Alter a ggplot: removing the backgd ```r ggplot(data = coat_data, aes(x = coat, y = freq)) + geom_col(width = 1, colour = "black", fill = "lightblue") + scale_x_discrete(expand = c(0, 0), name = "Coat colour") + scale_y_continuous(expand = c(0, 0), name = "Number of cats", limits = c(0, 25)) + * theme_classic() ``` --- # Alter a ggplot: axes <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-19-1.png" width="288" /> --- # Alter a ggplot: bar ordering - The default ordering of a categorical variable like `coat` is alphabetical. Often we want to change the order. - We can order the categories by the values in another variable by using `reorder()` --- # Alter a ggplot: bar ordering ```r ggplot(data = coat_data, * aes(x = reorder(coat, freq, decreasing = TRUE), y = freq)) + geom_col( width = 1, colour = "black", fill = "lightblue") + scale_x_discrete(expand = c(0, 0), name = "Coat colour") + scale_y_continuous(expand = c(0, 0), name = "Number of cats", limits = c(0, 25)) + theme_classic() ``` --- # Alter a ggplot: axes <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-20-1.png" width="288" /> --- class: inverse # Working with imported data --- # The goal .pull-left[ Summarise <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="2"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; "> Interorbital distance</div></th> </tr> <tr> <th style="text-align:left;"> Population </th> <th style="text-align:right;"> N </th> <th style="text-align:right;"> Mean </th> <th style="text-align:right;"> SE </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> A </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 11.24 </td> <td style="text-align:right;"> 0.12 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 40 </td> <td style="text-align:right;"> 11.70 </td> <td style="text-align:right;"> 0.09 </td> </tr> </tbody> </table> ] .pull-right[ Plot <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-22-1.png" width="288" /> ] --- background-image: url(images/interorbital.png) background-position: 100% 0% background-size: 180px # Working with Data ## Importing This section will teach you about three concepts: -- 1. 'working directories', 'paths' and 'relative paths' -- 2. Tidy data -- 3. dealing with data in more than group -- We will work with the interorbital distances of domestic pigeons in two different populations: A and B --- # Working with Data ## Organising Restart R: Session | Restart R (Control+Shift+F10) Make a folder in your Project directory (this is also your working directory) called `data` The easiest way to do this is in RStudio - see the bottom right Files panel -- Save a copy of [pigeon.txt](data/pigeon.txt) to the 'data' folder --- # Working with Data ## Start coding Make a new script called 'pigeon_analysis.R' -- Add this code: ```r # load packages library(tidyverse) ``` We need to load the `tidyverse` packages for several of commands we will use. --- # Working with Data ## Importing To read the data in to R you need to use the 'relative path' to the file in the `read_table()` command: ```r pigeon <- read_table("data/pigeon.txt") ``` -- The `data/` part is the 'relative path' to the file. -- It says where the file is **relative to your working directory** pigeon.txt is inside a folder (directory) called 'data' which is in your working directory. --- # Working with Data ## Understanding the dataframe A dataframe is made of columns and rows The columns are the variables; the rows are the observations --- # Working with Data ## Tidy format Instead of having a population in each column, we often have, **and want**, all measurements in one column with a second column giving the group. -- This format is described as 'tidy' . -- Has one variable in each column and only one observation (case) per row. -- Captures the structure of data and allows you to specify the role of variables in analyses and visualisations. --- # Data Organisation ## What is tidy data? One response per row. Tidy data adhere to a consistent structure which makes it easier to manipulate, model and visualize them. The structure is defined by: 1. Each variable has its own column. 2. Each observation has its own row. 3. Each value has its own cell. --- # Data Organisation ## What is tidy data? The term 'tidy data' was popularised by Wickham (2014). Closely allied to the relational algebra of relational databases (Codd, 1990). Underlies the enforced rectangular formatting in SPSS, STATA and R's dataframe. -- There may be more than one potential tidy structure. --- # Working with Data ## Tidy format Suppose we had just 3 individuals in each of two populations: .pull-left[ NOT TIDY! <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> A </th> <th style="text-align:right;"> B </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 12.4 </td> <td style="text-align:right;"> 12.6 </td> </tr> <tr> <td style="text-align:right;"> 11.2 </td> <td style="text-align:right;"> 11.3 </td> </tr> <tr> <td style="text-align:right;"> 11.6 </td> <td style="text-align:right;"> 12.1 </td> </tr> </tbody> </table> ] .pull-right[ TIDY! <table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> population </th> <th style="text-align:right;"> distance </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> A </td> <td style="text-align:right;"> 12.4 </td> </tr> <tr> <td style="text-align:left;"> A </td> <td style="text-align:right;"> 11.2 </td> </tr> <tr> <td style="text-align:left;"> A </td> <td style="text-align:right;"> 11.6 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 12.6 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 11.3 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 12.1 </td> </tr> </tbody> </table> ] --- # Working with Data We can make the data tidy with `pivot_longer()`<sup>1</sup>. .footnote[ [1] `pivot_longer()` is a function from a package called `tidyr` which is one of the `tidyverse` packages. ] -- `pivot_longer()` collects the values from specified columns (`cols`) into a single column (`values_to`) and creates a column to indicate the group (`names_to`). --- # Working with Data .scroll-output-width[ ```r pigeon2 <- pivot_longer(data = pigeon, cols = everything(), names_to = "population", values_to = "distance") str(pigeon2) ``` ``` ## tibble [80 × 2] (S3: tbl_df/tbl/data.frame) ## $ population: chr [1:80] "A" "B" "A" "B" ... ## $ distance : num [1:80] 12.4 12.6 11.2 11.3 11.6 12.1 12.3 12.2 11.8 11.8 ... ``` ] A 'tibble' `\(\approx\)` dataframe --- # Working with Data Now we have a dataframe in tidy format which *will* make it easier to summarise, analyse and visualise. -- To summarise data in this format we use the `group_by()` and `summarise()` functions. -- We will also use the pipe operator: ` |> ` --- # Working with Data To summarise multiple group data in tidy form: ```r pigeon2 |> group_by(population) |> summarise(mean = mean(distance)) ``` -- This can be read as: - take pigeon2 *and then* - group it by population *and then* - summarise it by calculating the mean i.e., the mean is done for each population. -- The `mean` before the `=` is just a name. --- # Working with Data The result: ``` ## # A tibble: 2 × 2 ## population mean ## <chr> <dbl> ## 1 A 11.2 ## 2 B 11.7 ``` --- # Working with Data We can add the number of pigeons in each group to the summary using the `length()` function. ```r pigeon2 |> group_by(population) |> summarise(mean = mean(distance), * n = length(distance)) ``` .footnote[ <span style=" font-weight: bold; color: rgba(253, 249, 246, 255) !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: rgba(37, 73, 107, 255) !important;" >Extra exercise:</span> Add a column for the standard deviation <span style=" font-weight: bold; color: rgba(253, 249, 246, 255) !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: rgba(37, 73, 107, 255) !important;" >Extra exercise:</span> Add a column for the standard error given by `\(\frac{s.d.}{\sqrt{n}}\)` ] --- # Working with Data The result: ``` ## # A tibble: 2 × 3 ## population mean n ## <chr> <dbl> <int> ## 1 A 11.2 40 ## 2 B 11.7 40 ``` --- # Working with Data ```r pigeon2 |> group_by(population) |> summarise(mean = mean(distance), n = length(distance), * sd = sd(distance), * se = sd / sqrt(n)) ``` ``` ## # A tibble: 2 × 5 ## population mean n sd se ## <chr> <dbl> <int> <dbl> <dbl> ## 1 A 11.2 40 0.740 0.117 ## 2 B 11.7 40 0.573 0.0906 ``` --- # Working with Data To plot this data as a histogram: ```r ggplot(data = pigeon2, aes(x = distance)) + * geom_histogram(bins = 10, col = "black") + scale_x_continuous(name = "Interorbital distance (mm)") ``` .pull-left[ <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-35-1.png" width="288" /> ] -- .pull-right[ `geom_histogram()` is the `geom` and `bins` gives the number of bars. This is whole data set, not separated by population! ] --- # Working with Data To plot multiple group data in tidy form we map the population variable to the `fill` aesthetic ```r *ggplot(data = pigeon2, aes(x = distance, fill = population)) + geom_histogram(bins = 10, col = "black") + scale_x_continuous(name = "Interorbital distance (mm)") ``` <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-36-1.png" width="288" /> .pull-right[ .footnote[ <span style=" font-weight: bold; color: rgba(253, 249, 246, 255) !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: rgba(37, 73, 107, 255) !important;" >Extra exercise:</span> Make the axes cross at (0,0)] ] --- # Working with Data <span style=" font-weight: bold; color: rgba(253, 249, 246, 255) !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: rgba(37, 73, 107, 255) !important;" >Extra exercise:</span> Make the axes cross at (0,0) ```r ggplot(data = pigeon2, aes(x = distance, fill = population)) + geom_histogram(bins = 10, col = "black") + scale_x_continuous(name = "Interorbital distance (mm)", * expand = c(0, 0)) + scale_y_continuous(name = "Frequency", * expand = c(0, 0)) ``` result on next slide. --- # Working with Data The result: <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-37-1.png" width="288" /> --- # Working with Data `geom_density()` can also be used when `distance` is mapped to `x` and `y` gives a measure of occurrence. ```r ggplot(data = pigeon2, aes(x = distance, fill = population)) + * geom_density(col = "black") + scale_x_continuous(name = "Interorbital distance (mm)") ``` <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-38-1.png" width="288" /> --- # Working with Data Alter the transparency using `alpha`: ```r ggplot(data = pigeon2, aes(x = distance, fill = population)) + * geom_density(col = "black", alpha = 0.3) + scale_x_continuous(name = "Interorbital distance (mm)") ``` <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-39-1.png" width="288" /> --- # Working with Data Formatting figures for inclusion in reports? All [elements can be customised individually](https://ggplot2.tidyverse.org/reference/theme.html) but `theme_classic()` takes care of many options you are likely to desire. --- # Working with Data ```r ggplot(data = pigeon2, aes(x = distance, fill = population)) + geom_density(col = "black", alpha = 0.3) + scale_x_continuous(name = "Interorbital distance (mm)") + * theme_classic() ``` <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-40-1.png" width="288" /> --- # Working with Data A different kind of plot: .pull-left[ <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-41-1.png" width="288" /> ] .pull-right[ Note: We need to change the `aes()` as well as the `geom` because this figure has population on the x axis. ] --- # Working with Data ```r ggplot(data = pigeon2, aes(x = population, y = distance)) + geom_boxplot() + scale_x_discrete(name = "Population") + scale_y_continuous(name = "Interorbital distance (mm)", expand = c(0, 0), limits = c(0, 15)) + theme_classic() ``` .footnote[ <span style=" font-weight: bold; color: rgba(253, 249, 246, 255) !important;border-radius: 4px; padding-right: 4px; padding-left: 4px; background-color: rgba(37, 73, 107, 255) !important;" >Extra exercise:</span> Can you (gratuitously) colour the boxes by population too? ] --- # Working with Data <img src="02_intro_to_r_and_working_with_data_files/figure-html/unnamed-chunk-42-1.png" width="288" /> --- # Summary * Use a script and comment it * organise analyses and use relative paths * shortcuts: `<-` is Alt-minus ` |>` is Ctl-Shift-M * objects are seen in the Environment window * data is read in to R from files into dataframes * the dataframe is a common data structure * you'll eventually get used to the manual! * a `ggplot` has a `data` argument and an `aesthetic` argument; layers are added with a `+`; `geoms`determine how the data are plotted --- # Your opinions count! We will be sending you on an online evaluation form after the course. Your feedback really helps us to plan future courses and justify our funding from UKRI. **When you submit your form you will be automatically entered into a draw for a £25 Amazon voucher to say thank you 💸** --- class: inverse # 🥳 Congratulations! Keep practising! 🎈 --- # References .footnote[ .font60[ Slides made with with xaringan (Xie, 2019) and xaringanExtra (Aden-Buie, 2020) ] ] .font60[ Aden-Buie, G. (2020). _xaringanExtra: Extras And Extensions for Xaringan Slides_. R package version 0.2.3.9000. URL: [https://github.com/gadenbuie/xaringanExtra](https://github.com/gadenbuie/xaringanExtra). Codd, E. F. (1990). _The Relational Model for Database Management: Version 2_. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc. Wickham, H. (2014). "Tidy Data". In: _Journal of Statistical Software, Articles_ 59.10, pp. 1-23. Wickham, H. (2016). _ggplot2: Elegant Graphics for Data Analysis_. Springer-Verlag New York. ISBN: 978-3-319-24277-4. URL: [https://ggplot2.tidyverse.org](https://ggplot2.tidyverse.org). Xie, Y. (2019). _xaringan: Presentation Ninja_. R package version 0.12. URL: [https://CRAN.R-project.org/package=xaringan](https://CRAN.R-project.org/package=xaringan). ] --- # introduction to R for complete beginners Emma Rand [emma.rand@york.ac.uk](mailto:emma.rand@york.ac.uk) Twitter: [@er13_r](https://twitter.com/er13_r) GitHub: [3mmaRand](https://github.com/3mmaRand) blog: https://buzzrbeeline.blog/ <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.