Quantitative Genomics and Genetics 2016

Computer Lab 3

– 18 February 2016

– Author: Jin Hyun Ju (jj328@cornell.edu)

Announcements

Homework submission guidelines for NYC students

  1. File name format = QG16_HW#_Firstname_Lastname.pdf

  2. Submitting R codes

  • Use Rmarkdown and all working directories should be set to “./” if there are input datasets

  • Submit your .Rmd script (HTML optional). There will be penalties for scripts that fails to compile. (except errors related to missing dependencies - ex) packages not installed on my computer, just make sure to mention which packages have been called).

[Review]

  • Instructions for submitting the R code for homework 3

  • How do we define functions?

  • Installing and loading packages

  • As always, set your working directory first!

1. Visualization 101

plot()

  • plot() generates a X-Y plot

  • The basic syntax is plot(x-axis, y-axis)

  • You can add more information to the plot with axis labels and titles.

x <- seq(-5, 5, by = 0.5) 

y <- x - 3

plot(x, y, 
     xlab = "X axis label here", 
     ylab = "Y axis label here", 
     main = "Plot Title here")

# These two lines draw the red dotted lines marking x = 0, y = 0
# the abline function adds custom lines to the plot generated above
abline(h = 0, lty = 2, col = 'red')
abline(v = 0, lty = 2, col = 'red')

  • You can also change the points into lines with the type option.

  • Ranges for x and y can be specified with xlim and ylim options

plot(x, y,                       # input data
     type = 'l',                 # specifying type, 'p' for points 'l' for lines
     xlab = "X axis label here", # x axis label
     ylab = "Y axis label here", # y axis label
     main = "Plot Title here", # main title
     xlim = c(-10,10), # setting range for x
     ylim = c(-10,10), # setting range for y
     col = 'blue')     # set the color for data points 

# col can be a color name defined in R or a color generated by the function rgb()  
# or a code for a color. (google R colors for more help!)

# the abline function adds custom lines to the plot generated above
abline(h = 0, lty = 2, col = 'red')
abline(v = 0, lty = 2, col = 'red')

  • You can also plot functions directly with plot().

  • Here is an example that plots the p.d.f for the normal distribution.

normal_dist <- function(x, mu = 0, sigma =1) {
  1/ (sigma * sqrt(2 * pi)) * exp( - ((x - mu)^2) / (2 * sigma ^ 2) )
}

plot(normal_dist, xlim = c(-5,5))

  • To change the mean and standard deviation you would have to called it like this:
plot(function(x) normal_dist(x, mu = 2, sigma = 1), xlim = c(-5,5))

Histograms

  • The hist() function generates histograms

  • Histograms are good to get an initial look on the data.

normal.vector <- rnorm(1000) 

# rnorm(() generates random points from a normal distribution
# more information about this function will be covered in the next section                     
     
# most basic form
hist(normal.vector)

# adding options

par(mfrow = c(1,2))

hist(normal.vector,                   # input data
     xlab = "X label",                # x axis label
     main = "Histogram with 5 breaks ", # main title 
     col = "skyblue", # setting the fill color for bars
     breaks = 5) # breaks specifies the number of bins

hist(normal.vector,                   # input data
     xlab = "X label",                # x axis label
     main = "Histogram with 20 breaks ", # main title 
     col = "skyblue", # setting the fill color for bars
     breaks = 20) # breaks specifies the number of bins

  • Setting the option probability to TRUE will change the y axis from counts to probability densities.

  • If you would like a density curve added, you can use lines() with density()

hist(normal.vector,                   # input data
     xlab = "X label",                # x axis label
     main = "Histogram with Density", # main title 
     col = "skyblue", # setting the fill color for bars
     probability = TRUE) # Density instead of counts 

lines(density(normal.vector)) # add a density line to the histogram

  • You can also save the plots directly into a file. (in this case a png file)
# the png function creates a png file at a custom location. 
# you can also specify the resolution and the size of the image, look up ?png for details
png(file = "./Normal_distribution_histogram.png")

hist(normal.vector, 
     xlab = "X label", 
     main = "Plot saved on disk", 
     col = "skyblue", 
     probability = TRUE)

lines(density(more.normals)) 

dev.off() # after plotting on the png file, the png file has to be closed in order to save the plot correctly. Otherwise, subsequent plots will be written to the same png file and that means trouble.