1 Understanding R
This course aims to build up your R knowledge step by step and with many applications. We focus particularly on applications for economists and other quantitative social sciences. In this chapter we will understand the essentials of the programming language R, and what we can use it for.
1.1 Using R
R is a statistical programming language based on S. It is most easily used via RStudio. Basically everything you do is to transform one object into another using functions. A lot can be done with basic functions, but the real strengths comes from a large user written library. You can add these user written functions in form of packages.
1.2 What can you do with R?
R can be used for many tasks:
- Load and manipulate data from almost any source
- Make descriptive statistics and advanced graphs
- Fit all sorts of statistical and econometric models
- Easily write simulations for statistical or other types of models
- Write your own functions/ programs and share them
- Interact with other programming languages such as C, html, etc.
- Perform other tasks such as text recognition, web scraping, GIS,…
- This book is written in RStudio using RMarkdown + R + KnitR!
1.3 Installing R
Let’s start by downloading the program R itself and RStudio which is an IDE (integrated development environment), optimized for R. Using an IDE helps you to better handle a program:
You can download R from http://www.r-project.org, and RStudio from http://www.rstudio.com.
Let’s have a first look at RStudio:Note the four panes which are four panels dividing the screen: the script pane (top left), the console pane (bottom left), the files pane (bottom right), and the environment pane (top right).
1.4 A first Look at using R
Activity
Open RStudio and just type simple math commands at the console prompt (bottom left).
1+1
a <- 1
b <- 2
a + b
sum(a,b)
Usually we write code into script files (top left) and run our code with CTRL+Enter. To open a R script in RStudio, klick on the plus button or CTRL + SHIFT + N.
Two other important keyboard shortcuts are:
- CTRL + SHIFT + ENTER to run all lines, and
- CTRL + ALT + B to run until current line.
You can find out more about shortcuts in RStudio here: http://www.rstudio.com/ide/docs/using/keyboard_shortcuts.
1.5 A Closer Look at Objects and Functions in R
In R, we create objects by assigning them to some name. With a <- 1
, we created an object of value 1
and called it “a
”. However, objects can contain more than one element. E.g. a <- c(1,2,3)
assigns the elements 1
, 2
, and 3
to a vector called a
.
Functions take and usually return objects. In fact, c()
is a function. To see and understand a function’s syntax, we can open the help-file: For example, ?list
will open the help file of the list()
function. If you want to keep the function’s result in the memory, store it in a (new) object, e.g.sum_of_a <- sum(a)
.
Besides predefinded functions like sum
, you can even write your own functions with my_function <- function(x,y) {...}
. Moreover, you can “wrap” functions: sum(unique(as.numeric(x)))
, but always remember that R will execute the functions from the inside to the outside. Most R functions are polymorphic generic functions. This means, that they change depending on what objects they are being called on. For example, summary()
gives very different output depending on what you ask it to summarize.
1.6 Some Conventions in R
The assing sign <-
We don’t use the equal sign (=
) to assign content to a new object. Instead, R uses an arrow, <-
, to assign something on the right to something on the left. We do this to reserve the equal sign for input to functions and logical comparisons (e.g. double equal). Click here to learn why.
1.7 Finding Help
When we start learning R, the learning curve is very steep (much like when we learn an actual language). However, there are may ways to find help: To find help within R just type: help(function)
or ?function
. In addition, R can show you a number of examples for most functions. To call examples, type example(function)
.
If you find the help files provided within R hard to understand, which may be the case in the beginning, it makes sense to look for help online! Many people are learning R and there are countless FAQs and tutorials out there. An important website for slightly more complicated questions is: http://stackoverflow.com/questions/tagged/r. Always remember: There are nearly always multiple solutions to the same problem!
1.8 Packages
A great advantage of R is that it is open source. This means, that everyone can write and add new packages. All users can load packages from the R-library.
Before we can use a package, we have to:
- Install a package (only once)
- Load the package (every time we start a new session)
The R community recommends using library()
(klick here to find out why). Remember to update packages regularly.
Exercise 1
Question 1: Simple Objects and Functions
- Create 4 new objects, name them from
a
tod
- Save the string
"Just"
in the first object - Save the Number
"4"
in the second object - Save the string
"Fun"
in the third object - Apply the
paste()
function to all the three, save them ind
- Now call the
print()
function on your objectd
Answer
### 1. Create 4 new objects, name them from a to d ###
# Save the string "Just" in the first object
a <- "Just"
# Save the Number "4" in the second object
b <- 4
#Save the string "Fun" in the third object
c <- "Fun"
# Apply the "paste()" function to all the three, save them in d.
# Include also the command sep = " " when you call the function
d <- paste(a,b,c,sep=" ")
# Now call the print() function on your object d:
print(d)
Question 2: Use some basic mathematical functions
- Create two vectors, call them
a
andb
- One should contain all even numbers between 1 to 10 (use the
c()
function) - The other should contain all integers from 1 to 100 (use
1:100
to do so) - Calculate the mean and the sum of both vectors. Store them in new elements.
- Finally,
- multiply the sum and the mean of
a
, - subtract
b
’s mean from its sum, - divide
b
’s sum bya
’s sum.
- multiply the sum and the mean of
You don’t need to save the results this time
Answer
### 2. Use some basic mathematical functions ###
# Create two vectors, choose your own names:
# One should contain all even numbers between 1 to 10 [use the c() function]
# The other should contain all integers from 1 to 100 [use "1:100" to do so]
a <- c(2,4,6,8,10)
b <- 1:100
#Alternative way - The seq() function:
a <- seq(2,10,by=2)
b <- seq(1,100,by=1)
# Calculate the mean and the sum of both vectors. Store them in new elements
sum.a <- sum(a)
mean.a <- mean(a)
sum.b <- sum(b)
mean.b <- mean(b)
# Multiply the sum and the mean of a, subtract b's mean from its sum,
# and then divide b's sum by a's sum. You don't need to save the results this time
sum.a * mean.a
sum.b - mean.b
sum.b / sum.a