– 11 February 2016

– Author: Jin Hyun Ju (jj328@cornell.edu)

```
getwd() # check current working directory
setwd("./") # set working directory
dir() # check what is in your directory
dir.create("./test_dir") # In case you forgot to create a directory
```

```
QG16.lab.2 <- read.table("../Lab1/QG16-lab1-data.csv", sep = ",", header = TRUE) # Reading data from a csv file
QG16.lab.2.only.a <- subset(QG16.lab.2, factor1 == "a") # subsetting a data frame
```

```
write.table(QG16.lab.2.only.a, file = "./QG16_subset_only_a.csv", sep = ",", quote= FALSE, row.names=FALSE)
# the quote options remove the "" of the entries. try it with quote = TRUE and see how it is different.
# row.names = FALSE eliminates the numbers in front of each row
```

“Everything that exists is an object. Eveything that happens is a function call.” - John Chambers

We learned that a function is something that takes in an input and gives you an output.

The input for an R function is written between “( )” of a function, and the output(s) are the result(s) returned by the function.

R has many built in functions for commonly used methods in statistics.

```
# Examples of built in functions
example.vector1 <- c(5,2,3,7,1,1,2,9,9)
# a function that calculates the mean
mean(example.vector1)
```

`[1] 4.333333`

```
# a function that summarizes the counts
table(QG16.lab.2$factor1)
```

```
a b
50 50
```

- We can also build custom functions.

```
# the syntax for declaring functions, note the {} after function()
log10_add <- function(input1,input2){ # all the inputs are specified within the ( )
cat("This is a custom function \n")
cat("The inputs are = ",input1,input2,"\n") # showing you the inputs
output <- log10(input1) + log10(input2) # creating an output within the function
cat("The output is = ",output,"\n") # print the output
return(output) # return specifies the output
}
# Now we can call our custom functions like this
log10_add(100,1000)
```

```
This is a custom function
The inputs are = 100 1000
The output is = 5
```

`[1] 5`

```
# Note that the variable output is not created in our workspace
ls()
```

```
[1] "example.vector1" "log10_add" "QG16.lab.2"
[4] "QG16.lab.2.only.a"
```

```
# in order to save the result of a function to a variable we have to assign it to a variable
test.output <- log10_add(100,1000)
```

```
This is a custom function
The inputs are = 100 1000
The output is = 5
```

`test.output`

`[1] 5`

**Question 1**

- Can you guess what is going to happen?

```
x <- 11
test_function <- function(y){
output <- x + y
return(output)
}
test_function(2)
```

```
test_function2 <- function(x,y,z){
output1 <- x + y + z
output2 <- x * y * z
output3 <- x ^ y ^ z
}
x <- test_function2(1,2,3)
print(x)
```

```
x <- 10
test_function3 <- function(){
x <- 20
}
test_function3()
print(x)
```

We can also use functions by installing published packages if somebody else did the hard work for us.

Let’s see what happens if we try to run a function called mutate().

`mutate()`

- The mutate function is part of the “dplyr” package, so we need to install it first to be able to use the function.

```
# We can install packages that are published on CRAN by using this function
install.packages("dplyr") # in this case we are installing a package called "dplyr"
# you will need an internet connection to install a package using this function
```

- So, does it work now since we have it installed?

`mutate(QG16.lab.2,mean = (data1+data2+data3)/3)`

- Having the package installed does not mean you can use the function. You have to load the package into your current R session in order to use it.

```
# Once the installation is finished you can load the functions into the workspace
library(dplyr)
```

```
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
```

```
# or by
require(dplyr)
# Now we can use the functions from dplyr
QG16.lab.2 <- mutate(QG16.lab.2,mean = (data1+data2+data3)/3)
head(QG16.lab.2)
```

```
genename data1 data2 data3 factor1 factor2 mean
1 gene1 1.42866369 -0.157846249 1.3136225 a info1 0.86147998
2 gene2 -0.58165021 0.593995373 -0.2319272 b info2 -0.07319403
3 gene3 -1.03955628 1.083860155 0.7050685 a info3 0.24979079
4 gene4 0.58381526 -0.125866753 1.2627935 b info4 0.57358068
5 gene5 0.04377365 -0.002239948 0.1445219 a info5 0.06201853
6 gene6 0.26732986 -1.756292916 0.5459083 b info6 -0.31435160
```

```
# Let's try another function from dplyr
filter(QG16.lab.2,factor1 == "b" & factor2 =="info8") # Note the & operator which is an AND operator
```

```
genename data1 data2 data3 factor1 factor2 mean
1 gene8 0.5110170 1.3689777 0.9548534 b info8 0.94494937
2 gene18 1.0239683 0.1330547 0.9028215 b info8 0.68661481
3 gene28 0.8516973 -1.0032102 -1.2986571 b info8 -0.48338999
4 gene38 0.2830858 -0.6795029 1.8192317 b info8 0.47427152
5 gene48 -1.1049222 0.5409523 0.3202559 b info8 -0.08123801
6 gene58 0.7904040 0.5096778 -1.6116118 b info8 -0.10384333
7 gene68 -1.4591909 1.5071098 0.1016278 b info8 0.04984887
8 gene78 1.1132743 0.7142994 -0.2152775 b info8 0.53743207
9 gene88 -0.9999763 1.2684340 -0.9384624 b info8 -0.22333490
10 gene98 1.4704641 0.8852491 1.2723985 b info8 1.20937057
```

At this point you might be wondering why we need to install all those packages and load them? Shouldn’t it be less complicated if we just have all these functions by default, or automatically loaded onto the workspace after installing?

The reason we use the library() or require() function to load specific packages is because it will slow down R if we loaded everything we have installed at the beginning as we install more packages.

You can think of your R session as your workbench, and the packages as specific tools in your toolbox. You wouldn’t empty out the toolbox onto your workbench for a simple task like sanding a surface. The only thing you need is some sandpaper, and any additional tool just occupies additional space and contributes only to the aesthetics of your workspace (or acts against it).

For loops are mainly used in cases where you want to apply a function iteratively over a vector, matrix or data frame (or just simply do a task multiple times)

Structure of the code = for (counter in vector) { do stuff }

Syntax of a simple for loop (Printing out the process helps a lot)

```
N <- 3
for ( i in 1:N ){
cat("Processing loop number = ",i,"\n")
}
```

```
Processing loop number = 1
Processing loop number = 2
Processing loop number = 3
```

- You can use any legal variable name for the counter

```
for ( anything_goes in c(1,3,5) ){
cat("Processing loop number = ",anything_goes,"\n")
}
```

```
Processing loop number = 1
Processing loop number = 3
Processing loop number = 5
```

You can also create a loop within a loop

For this example, let’s say that you want to create multiple samples with different sample sizes

```
for ( outer in 1:3 ){
cat("Processing Outer Loop #", outer, "\n")
for ( inner in 1:2 ){
cat("Processing |_ Inner Loop #", inner, "\n")
}
}
```

```
Processing Outer Loop # 1
Processing |_ Inner Loop # 1
Processing |_ Inner Loop # 2
Processing Outer Loop # 2
Processing |_ Inner Loop # 1
Processing |_ Inner Loop # 2
Processing Outer Loop # 3
Processing |_ Inner Loop # 1
Processing |_ Inner Loop # 2
```

**Question2**

- What is the final value of N ?

```
N <- 3
for( i in 1:N){
cat("Processing loop = ", i, "\n")
N <- N + 1
}
print(N)
```

- If you want to modify each element of a vector by a scalar value you can use the math operations that we have learned last week.

`example.vector1`

`[1] 5 2 3 7 1 1 2 9 9`

`2 * example.vector1`

`[1] 10 4 6 14 2 2 4 18 18`

`1 + example.vector1`

`[1] 6 3 4 8 2 2 3 10 10`

`example.vector1 ^2`

`[1] 25 4 9 49 1 1 4 81 81`

- If you are interested in the dot product of two vectors you have to use a special operator

`example.vector1 %*% example.vector1`

```
[,1]
[1,] 255
```

- The same applies for matrices

```
example.matrix1 <- matrix(c(1,1,1,2,2,2), nrow = 2, ncol = 3, byrow= TRUE)
2 * example.matrix1
```

```
[,1] [,2] [,3]
[1,] 2 2 2
[2,] 4 4 4
```

`example.matrix1 ^ 2`

```
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 4 4 4
```

`example.matrix1 - 1`

```
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 1 1 1
```

- Here is how you can do matrix calculations

```
# t() is transposing the matrix
example.matrix1 %*% t(example.matrix1)
```

```
[,1] [,2]
[1,] 3 6
[2,] 6 12
```

`# Note the dimensions 2 x 3 %*% 3 x 2 = 2 x 2 `

- Here are some useful functions that can be used in matrix calculations

```
# creating a diagonal matrix with the first input as values on the diagonal
diag(2,nrow = 3)
```

```
[,1] [,2] [,3]
[1,] 2 0 0
[2,] 0 2 0
[3,] 0 0 2
```

`diag(example.vector1)`

```
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 5 0 0 0 0 0 0 0 0
[2,] 0 2 0 0 0 0 0 0 0
[3,] 0 0 3 0 0 0 0 0 0
[4,] 0 0 0 7 0 0 0 0 0
[5,] 0 0 0 0 1 0 0 0 0
[6,] 0 0 0 0 0 1 0 0 0
[7,] 0 0 0 0 0 0 2 0 0
[8,] 0 0 0 0 0 0 0 9 0
[9,] 0 0 0 0 0 0 0 0 9
```

```
# calculating the inverse of a matrix
A <- matrix(c(2,-3,1,0.5),nrow = 2)
solve(A)
```

```
[,1] [,2]
[1,] 0.125 -0.25
[2,] 0.750 0.50
```

```
# we can check this by
A %*% solve(A) # which results in an identity matrix
```

```
[,1] [,2]
[1,] 1 0
[2,] 0 1
```

Can you generate a function that takes a vector with student names with “Weill Cornell” grades as input, and creates a list as an output that has the following entries?

“LetterGrade” should have the converted letter grades of each student in it.

“Overview” should have the number of students for each Letter grades

There should be an entry for each letter grade with name of students that received the corresponding letter grade.

```
scores <- sample(c("NP","P","LP","HP","H"), 20, replace = TRUE)
names(scores) <- paste0("student", 1:length(scores))
standard_letter_grades <- c("H" = "A", "HP" = "B", "LP" = "C", "P" = "D", "NP" = "F")
```

Your code should also work when the conversion table changes (no hard coding of letter grades are allowed!)

Your function should also be verbose, meaning that it should tell the user how many students are in the input in total, and the breakdown for each grade as well.

**Hints**

- Getting names of vectors

`names(example.vector1)`

- Checking if value is equal to something

```
value == 1
value == "A"
```

- Example output

```
Total number of students = 20
2 students received A
4 students received B
6 students received C
3 students received D
5 students received F
```

```
$Overview
student_letter_grade
A B C D F
2 4 6 3 5
$LetterGrade
student1 student2 student3 student4 student5 student6 student7
"A" "C" "F" "A" "F" "F" "C"
student8 student9 student10 student11 student12 student13 student14
"B" "B" "F" "F" "D" "C" "B"
student15 student16 student17 student18 student19 student20
"B" "C" "C" "C" "D" "D"
$A
[1] "student1" "student4"
$B
[1] "student8" "student9" "student14" "student15"
$C
[1] "student2" "student7" "student13" "student16" "student17" "student18"
$D
[1] "student12" "student19" "student20"
$F
[1] "student3" "student5" "student6" "student10" "student11"
```