Skip to Main Content

Introduction to R Workshop

This workshop is designed for people with very little to no previous knowledge of R. It assumes that each student will have a computer and will code at the same time as the instructor does. The design of the workshop is inspired by Software Carpentry.

Getting started

You will need to first download and install both R and RStudio; both are free to use.

  1. Open your browser and go to https://cran.r-project.org/
  2. In the top box, click the link which best describes your operating system. For example, on a PC, click "Download R for Windows."
  3. Follow instructions for downloading R for your specific system.
  4. Then go to https://posit.co/download/rstudio-desktop/ to download RStudio.
  5. Once installed, open RStudio using the icon or clicking on the program in your computer's menu.

Orientation to RStudio

In RStudio, the main windows for typing commands will be on the left side - the console and scripts boxes. 

To begin a new script file, click on the top left button of a white box with a green plus sign - in the drop-down menu, press R Script. 

The > symbol in the console indicates to the user that RStudio is ready for commands; R code goes after this symbol and the enter/return button will run that line. 

For example, try running 2 + 2 in the console after the >

R can handle much more complex queries than simple calculations!

 

To work with these more complex queries, R allows the user to call functions to run a pre-defined set of steps using a shorter line of code with input from the user through arguments

The structure of working with functions in R is:

functionname(argument1 = value, argument2 = othervalue, ...)

For example:

read.csv(file = "dungeness_crab1.csv", header = TRUE)

You can also include comments in your R Script file by using the # sign:

# this is a comment and will not be run as code

If you are typing code in your script file, you'll need to manually choose any lines you want to run (as compared to pressing enter/return when coding in the console); use the button titled "run" after highlighting which specific lines of code you want R to evaluate. 

Variables , much like x and y in algebra, are containers for data we define and name. In R, variables can hold many different types of data.

Some common variable object classes:

numeric A number with a decimal. Examples: 25.32, 30.0, 222.8
integer A number without a decimal. Examples: 1, 5, 8503
logical Evaluated with logical operators. Examples: TRUE, FALSE
character Not evaluated as a number. Examples: "Welcome", "fifty-five", "55"
dataframe A matrix (tabular data)

Tips for naming variables:

  • Don't start variable names with a number or symbol
  • Variable names are case-sensitive
  • Don't use spaces in the name (try an underscore or period)
  • Be descriptive! (something like data isn't very useful when you have more than a few lines of code)

The <- operator is how we assign values to variables. R will evaluate whatever is on the right side of the arrow first, and assign that to the object defined on the left side.

For example:

# I am assigning the value 55 to a variable named temperature_C

temperature_C <- 55

Exercise: create a variable named canopy_height and assign the value 76.8 to it.
Exercise: create a variable named my_friend and assign a name to it. Check what kind of variable it is by running class(my_friend).

For numeric and integer variables, we can perform mathematical and statistical operations with the variable.

For example: 

2.2 * canopy_height

Basic and commonly-used functions

To read in tabular data, we can use the function read.csv(). This expression tells R to read a .csv file defined in the arguments, and accepts more optional arguments such as whether the file has a header (header = TRUE). 

For any pre-set function in R, we can look up which arguments are required vs. optional, more information about the use of the function, and some examples of use by running ?functionname to get help. 

 

The read.csv() function simply tells R to read the .csv file included in the arguments - when we run just that function, R will display the contents of the file. If we want to use the data and include it in our R project, we need to assign the data to a variable. 

For example:

forestA_2020 <- read.csv(file = "filename.csv", header. = TRUE)

Now we can look at the first few lines of our dataframe variable (and any headers) by calling head(). In this example: head(forestA_2020) .

Summary statistics functions

mean(data) Returns the mean value of data
sd(data) Returns the standard deviation of data
median(data) Returns the median value of data
length(data) Returns the number of elements (length) of data
summary(data) Returns the minimum, median, mean, maximum, and interquartile range of data

We have been working with "base R" functions - what is already included in RStudio. There are many different packages that we can download as expansion packs for more complex and specific tools, including statistical analysis, publication graphics, and data manipulation. 

To load packages, we use two functions: install.packages() and library(). install.packages() only needs to be run once on your machine when you first download the package. As long as R is still downloaded on your computer, you don't need to run this again. In contrast, every time you open up RStudio again, you'll need to load the package into your current project/script using library()

For example:

install.packages("somepackagename")

library(somepackagename)

Vectors

Another useful function in base R is the c() function; this function allows us to combine a series of values to make a vector. Vectors are a structure of variable data, not a variable class itself. 

Example:

vector1 <- c(2, 3, 1, 6, 4, 2, 3, 7)

There are often multiple elements within vectors. We can extract specific elements using square brackets [ ] and defining the position we are looking for.

For example:

vector1[3] returns the value 1 , which is the third element in our vector. In R, indexing for vectors begins at element number 1. 

We can get multiple elements by using the c() funciton:

vector1[c(1, 5, 6)]

Working with dataframes

We can think of dataframes in R as stacks of vectors, so we can use the same operators to subsection our data. 

Now, instead of one number in our square brackets, we'll need to use two - one for the row position and one for the column position. 

For example:

dataframe[row, column]

To select all rows and a specific column: dataframe[ , 1]

All columns and a specific row: dataframe[1, ]

For dataframes with header names, we can also call on specific columns using the header. 

For example:

forest$treenumber

From here, we can use summarizing statistics:

max(forest$treenumber)

mean(forest$treenumber)

Resources

Most of the exercises and instructions used in this guide have been obtained from these resources:

An Introduction to R, Alex Douglas, Deon Roos, Francesca Mancini, Ana Couto & David Lusseau.

Programming with R, lesson from Software Carpentry.

An Introduction to R, W. N. Venables, D. M. Smith and the R Core Team.

Where to get help

Stackoverflow

You can ask me!