Quantitative Genomics and Genetics

Computer Lab 4

– 18 September 2014

– Author: Jin Hyun Ju (jj328@cornell.edu)

1. Logical expressions and some useful functions

  • In case you are interested in finding out if certain elements of a vector are greater than or smaller than a certain value, use > < >= <=
example.vector <- seq(1,25,by= 2)
example.vector
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25
example.vector > 10
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[12]  TRUE  TRUE
example.vector >= 10
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[12]  TRUE  TRUE
example.vector <= 5
 [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[12] FALSE FALSE
  • The same applies to matrices
example.mx <- matrix(c(2,5,7,-2,-5,-10), ncol = 3, byrow=T)
example.mx > 5
      [,1]  [,2]  [,3]
[1,] FALSE FALSE  TRUE
[2,] FALSE FALSE FALSE
  • You can also look for a specific value
example.vector == 3
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[12] FALSE FALSE
  • ! tells R to look for the opposite. != means not equal to
example.vector != 3
 [1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[12]  TRUE  TRUE
  • You can also combine them with AND or OR operators
# Example of an AND operator
example.vector >5 & example.vector < 10
 [1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
[12] FALSE FALSE
# if you want to see the actual elements 
example.vector[example.vector >10 & example.vector < 20]
[1] 11 13 15 17 19
# Example of an OR operator
example.vector > 10 | example.vector < 20
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
example.vector < 10 | example.vector > 20
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE
[12]  TRUE  TRUE
  • If you want to check whether a certain element is present or absent in a vector use the %in% operator
fruits <- c("banana","apple","strawberry","peach","mango")

"mango" %in% fruits
[1] TRUE
"durian" %in% fruits
[1] FALSE
  • We can see what the ! operator is doing by wrapping the previous expression with a !()
!("durian" %in% fruits)
[1] TRUE
  • You can find out the index of a certain entry in a vector by using the which() function
which(fruits == "apple")
[1] 2
  • If you want to compare two vectors,
fruits2 <- c("orange","banana","durian","cherry","mango","apple")

fruits2 %in% fruits
[1] FALSE  TRUE FALSE FALSE  TRUE  TRUE
# show me the position
which(fruits2 %in% fruits)
[1] 2 5 6
#show me the elements
fruits2[fruits2 %in% fruits]
[1] "banana" "mango"  "apple" 
# There is also a function for this
intersect(fruits2, fruits)
[1] "banana" "mango"  "apple" 

2. If / else statements

  • By using if and else statements you can insert condition specific executions in your script

  • The structure looks like this

if (condition) {
  do stuff
} else {
  do stuff
}

# OR you can add more levels by using else if

if(condition){
  do stuff
} else if (condition 2){
  do stuf
} else {
  do stuff
}
  • For example, let’s say that we want to generate samples drawn from a normal distribution and we only want to keep those which have a mean above 0.
kept.samples <- vector()

for(index in 1:10){
  samples <- rnorm(1000,0,1)
  sample.mean <- mean(samples)
  if(sample.mean > 0){
    cat("Keeping sample #",index,"with mean = ",sample.mean,"\n")
    kept.samples <- rbind(kept.samples, samples)
  } else {
    cat("Discarding sample #",index, "with mean = ",sample.mean,"\n")
  }
  
}
Keeping sample # 1 with mean =  0.02061 
Discarding sample # 2 with mean =  -0.01356 
Discarding sample # 3 with mean =  -0.03616 
Keeping sample # 4 with mean =  0.01217 
Keeping sample # 5 with mean =  0.03097 
Discarding sample # 6 with mean =  -0.01935 
Keeping sample # 7 with mean =  0.01314 
Discarding sample # 8 with mean =  -0.02584 
Discarding sample # 9 with mean =  -0.007784 
Keeping sample # 10 with mean =  0.01754 
# check if we have 2 samples saved
dim(kept.samples)
[1]    5 1000

Exercise

  • Create a function that can generate X number of random samples with sample size N from either a normal or uniform distribution. (You don’t have to specify additional parameters, just use the default values)

  • For example, if I tell it to generate 10 random samples (N=1000) from a uniform distribution, the output should be a matrix containing all 10 samples drawn from a uniform distribution.