Getting started with R

Installing R

  1. Download and install R for your system: https://cran.rstudio.com/

  2. Download and install RStudio for your system: https://www.rstudio.com/products/rstudio/download/

Packages

We will use a number of packages that extend the functionality of basic R, and make some operations easier/more intuitive. You can start by installing the tidyverse package using the code below.

install.packages('tidyverse')

Introduction to R

Writing analyses in R is writing code. If you are new to this notion, you might benefit from this excellent article on what code is, from this discussion of the two cultures of computer users, and from this harsh, but accurate description of what it takes to really learn to code

Getting started

There’s a large set of introductory tutorials to R online, easily accessible via google.

I recommend working through some interactive tutorials to start yourself off:

  • Try R from codeschool
  • swirl offers interactive R lessons run within R
  • datacamp offers interactive tutorials, but I’m a bit confused what is free, and what requires subscription
  • Here is a handy tutorial R script
  • OpenIntro stats also R labs
  • Rstudio offers a list of such lessons

As you become familiar with the basics, you may want some quick reference sources.

You will need to find help.

  • Google “R [what you want to do]”
  • CrossValidated is a great resource: I often find solutions to my problems there.

You may also want to consult more advanced lessons to supplement labs/notes:

To write code well, you will need to know something about how a computer works.

Once you can actually write some code, it is worth learning to make it good.

  • Good code is readable by humans and self documenting
  • This can be achieved by adopting a consistent and sensible style of code. A few suggestions: Google R style guide, and Wickham’s style guide.
  • Avoid magic numbers. They make your code hard to read and brittle to change.
  • Use unique and meaningful names for scripts, functions, variables, data.frames, columns, etc.
  • Learn to type well, and pay attention to text details. In most programming languages, letter order, letter case, visually similar symbols, etc. have to be correct for a computer to understand what you are saying. Human readers are forgiving with typos, computers are not.
  • Learn to use your IDE (in our case, Rstudio). Tab completion is amazing. Keyboard shortcuts are very handy.

Better data analysis code.

The overarching flow of data analysis is something like:
data -> pre-processing code -> clean data -> analysis code -> results -> presentation code -> figures, tables, numbers

It is helpful to factor your code this way, as it allows you to muck around with various parts without disrupting the others.

A few suggestions for how to write good code for data analysis.

  • Make sure analysis code is state independent (it should re-run correctly after rm(list=ls())), and self-sufficient (it should not require any human intervention, mouse-clicks, etc). All of this ensures that re-running your analysis is not a pain, and is reproducible.
  • Don’t arbitrarily create data subsets stored in assorted variables – that’s a great way to make a mess of your code and confuse yourself. Subset data as needed, while keeping the data frame complete.
  • Build complicated commands piece by piece in the console, then assemble the final compact command in your script. Especially when using dplyr pipes (%>%), or nesting functions.
  • When in doubt about whether the code is intuitive, pass named, rather than positional, arguments to functions.
  • Take explicit control of your data types and structures – don’t just assume that when you read in a csv file, all variables, factors, etc. have the correct data type, names, etc.

Using R-markdown

We have been using R-markdown throughout this class–the website, lecture notes and lab notes were all written in R-markdown (as are these instructions). R-markdown allows you to embed code and figures in easy to read text documents. This makes it much easier to present data to collaborators and yourself (and us who will be grading your assignments!).

You should have R-markdown and knitr installed, but in case you dont:

To install R-markdown, execute:

install.packages("rmarkdown")

You will also want to install the knitr package. This will allow you to turn Rmd files into pretty html files.

install.packages("knitr")

In broad strokes, you will want to:

  1. Create an Rmd file by going to “File,” selecting “New File” and the “R Markdown.” Alternatively, modify this file.
  2. Write text as would in a Word or Text document. If you want to add formatting, check the cheat sheet or the tutorial/basics
  3. Insert chunks of code by selecting “Command + Shift + I” on a Mac or “Control + Alt + I” on a PC . This will create a code block demarcated by three backticks, starting with ```{r} and ending with ```, rendered in grey in Rstudio. When you run the Rmd script, R-markdown will execute any code you’ve written in this space. See here for more details.
  4. Click “Knit HTML” on the top-bar to turn your Rmd file into a pretty html file (it will be created in the same folder).
  5. NOTE: please use echo=TRUE in your global options so that the code prints with the output when you render your html.

Here are some useful links
* A brief tutorial
* R markdown cheat sheet
* More throrough reference