Quantitative Genomics and Genetics

Computer Lab 3

– 11 September 2014

– Author: Jin Hyun Ju (jj328@cornell.edu)

1. Review

  • Instructions for submitting the R code for homework 2

  • What are functions?

  • How do we define functions?

  • Installing and loading packages

  • As always, set your working directory first!

setwd("~/Dropbox/Quantitative_Genomics_2014/Computer_lab_3/")

2. Vector and Matrix calculations

  • If you want to modify each element of a vector by a scalar value you can use the math operations that we have learned last week.
example.vector1 <- c(1,2,5,-3,15,20)
2 * example.vector1
[1]  2  4 10 -6 30 40
1+ example.vector1
[1]  2  3  6 -2 16 21
example.vector1 ^2
[1]   1   4  25   9 225 400
  • If you are interested in the dot product of two vectors you have to use a special operator
example.vector1 %*% example.vector1
     [,1]
[1,]  664
  • The same applies for matrices
example.matrix1 <- matrix(c(1,8,-2,5,-1,12), nrow = 2, byrow = T)
2 * example.matrix1
     [,1] [,2] [,3]
[1,]    2   16   -4
[2,]   10   -2   24
example.matrix1 ^2
     [,1] [,2] [,3]
[1,]    1   64    4
[2,]   25    1  144
example.matrix1 - 1
     [,1] [,2] [,3]
[1,]    0    7   -3
[2,]    4   -2   11
  • Here is how you can do matrix calculations
# t() is transposing the matrix
example.matrix1 %*% t(example.matrix1)
     [,1] [,2]
[1,]   69  -27
[2,]  -27  170
# Note the dimensions 2 x 3 %*% 3 x 2  = 2 x 2 
  • Here are some useful functions that can be used in matrix calculations
# creating a diagonal matrix with the first input as values on the diagonal
diag(2,nrow = 3)
     [,1] [,2] [,3]
[1,]    2    0    0
[2,]    0    2    0
[3,]    0    0    2
diag(example.vector1)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    0    0    0    0    0
[2,]    0    2    0    0    0    0
[3,]    0    0    5    0    0    0
[4,]    0    0    0   -3    0    0
[5,]    0    0    0    0   15    0
[6,]    0    0    0    0    0   20
# calculating the inverse of a matrix
A <- matrix(c(2,-3,1,0.5),nrow = 2)
solve(A)
      [,1]  [,2]
[1,] 0.125 -0.25
[2,] 0.750  0.50
# we can check this by 
A %*% solve(A) # which results in an identity matrix 
     [,1] [,2]
[1,]    1    0
[2,]    0    1

4. Visualization 101

  • Since we are dealing with many data points now it would be a good idea to plot them!

  • Generating a histogram is a good way to get an initial look on the data.

# most basic form
hist(normal.vector)

plot of chunk unnamed-chunk-8

# adding options
# col can be a color name defined in R or a color generated by the function rgb() or a 
# code for a color. (google R colors for more help!)
# breaks specifies the binning of the data
hist(normal.vector, xlab = "X label",main = "Histogram with options",col = "skyblue", breaks = 10)

plot of chunk unnamed-chunk-8

hist(normal.vector, xlab = "X label",main = "Histogram with density line",col = "skyblue", probability = TRUE)
lines(density(normal.vector)) # add a density line to the histogram

plot of chunk unnamed-chunk-8

more.normals <- rnorm(10000, 0,1)
hist(more.normals, xlab = "X label",main = "Bigger sample size",col = "skyblue", probability = TRUE)
lines(density(more.normals)) # add a density line to the histogram

plot of chunk unnamed-chunk-8

  • You can also save the plots directly into a file. (in this case a png file)
# the png function creates a png file at a custom location. 
# you can also specify the resolution and the size of the image, look up ?png for details
png(file = "./Normal_distribution_histogram.png")

more.normals <- rnorm(10000, 0,1)
hist(more.normals, xlab = "X label",main = "Bigger sample size",col = "skyblue", probability = TRUE)
lines(density(more.normals)) 

dev.off() # after plotting on the png file, the png file has to be closed in order to save the plot correctly. Otherwise, subsequent plots will be written to the same png file and that means trouble.
  • Scatter plots are also useful, but we will talk more about them in a future lab.

  • Here are the basics

par(mfrow = c(1,3))
plot(normal.vector)

plot(normal.vector, xlab = "X label", ylab = "Y label", main = "Plot title", pch = 19, col = "skyblue")

plot(normal.vector, xlab = "X label", ylab = "Y label", main = "Plot title",sub ="sub title", type ="l", col = "darkblue")

plot of chunk unnamed-chunk-10

5. for loops

  • For loops are mainly used in cases where you want to apply a function iteratively over a vector, matrix or data frame (or just simply do a task multiple times)

  • Structure of the code = for (counter in vector) { do stuff }

  • Printing out the process helps a lot sometimes.

# Let's say we are interested in generating 20 random samples from a normal distribution

# We are also interested in looking at the samples later, so we want to save the results of the sampling in a matrix. 

# In order to do that we need to create an empty matrix with the appropriate dimensions

# N is going to be the sample size for each run
N <- 100
normal.samples <- matrix(0, nrow = 20, ncol = N)

for (count in 1:20){
  normal.samples[count,] <- rnorm(N, mean = 0,sd = 1)
}


for (count in 1:20){
  
  cat("Creating Sample (N =",N,", mean = 0, sd = 1) #",count,"\n")
  normal.samples[count,] <- rnorm(N, mean = 0,sd = 1)
  
}
Creating Sample (N = 100 , mean = 0, sd = 1) # 1 
Creating Sample (N = 100 , mean = 0, sd = 1) # 2 
Creating Sample (N = 100 , mean = 0, sd = 1) # 3 
Creating Sample (N = 100 , mean = 0, sd = 1) # 4 
Creating Sample (N = 100 , mean = 0, sd = 1) # 5 
Creating Sample (N = 100 , mean = 0, sd = 1) # 6 
Creating Sample (N = 100 , mean = 0, sd = 1) # 7 
Creating Sample (N = 100 , mean = 0, sd = 1) # 8 
Creating Sample (N = 100 , mean = 0, sd = 1) # 9 
Creating Sample (N = 100 , mean = 0, sd = 1) # 10 
Creating Sample (N = 100 , mean = 0, sd = 1) # 11 
Creating Sample (N = 100 , mean = 0, sd = 1) # 12 
Creating Sample (N = 100 , mean = 0, sd = 1) # 13 
Creating Sample (N = 100 , mean = 0, sd = 1) # 14 
Creating Sample (N = 100 , mean = 0, sd = 1) # 15 
Creating Sample (N = 100 , mean = 0, sd = 1) # 16 
Creating Sample (N = 100 , mean = 0, sd = 1) # 17 
Creating Sample (N = 100 , mean = 0, sd = 1) # 18 
Creating Sample (N = 100 , mean = 0, sd = 1) # 19 
Creating Sample (N = 100 , mean = 0, sd = 1) # 20 
  • You can also create a loop within a loop

  • For this example, let’s say that you want to create multiple samples with different sample sizes

sample.size <- c(100,1000,1000)

# Lists come in very handy when you want to save results like these (multiple matrices)

sample.mx.list <- list()

# List index is used to save the results in a specific position (in this case 1,2,3) within the list 
list.index <- 1
for (N in sample.size){
  
  # The matrix has to be redefined for every sample size since the size is different.
  normal.samples <- matrix(0, nrow = 10, ncol = N)
  
  cat("Creating 10 samples with size = ",N,"\n")
  for (count in 1:10){
    cat("Creating Sample (N =",N,", mean = 0, sd = 1) #",count,"\n")
    normal.samples[count,] <- rnorm(N, mean = 0,sd = 1)
  }
  
  # The double "square" brackets are telling the list to save the matrix at position = list.index
  sample.mx.list[[list.index]] <- normal.samples
  
  # For the next run we need to move to the next index
  list.index <- list.index + 1
}
Creating 10 samples with size =  100 
Creating Sample (N = 100 , mean = 0, sd = 1) # 1 
Creating Sample (N = 100 , mean = 0, sd = 1) # 2 
Creating Sample (N = 100 , mean = 0, sd = 1) # 3 
Creating Sample (N = 100 , mean = 0, sd = 1) # 4 
Creating Sample (N = 100 , mean = 0, sd = 1) # 5 
Creating Sample (N = 100 , mean = 0, sd = 1) # 6 
Creating Sample (N = 100 , mean = 0, sd = 1) # 7 
Creating Sample (N = 100 , mean = 0, sd = 1) # 8 
Creating Sample (N = 100 , mean = 0, sd = 1) # 9 
Creating Sample (N = 100 , mean = 0, sd = 1) # 10 
Creating 10 samples with size =  1000 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 1 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 2 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 3 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 4 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 5 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 6 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 7 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 8 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 9 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 10 
Creating 10 samples with size =  1000 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 1 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 2 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 3 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 4 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 5 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 6 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 7 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 8 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 9 
Creating Sample (N = 1000 , mean = 0, sd = 1) # 10 
# Checking the dimensions will help us to find out if our code was doing the right thing or not

# To access the first element of the list we use double square brackets [[]]
dim(sample.mx.list[[1]])
[1]  10 100
dim(sample.mx.list[[2]])
[1]   10 1000
dim(sample.mx.list[[3]])
[1]   10 1000