## Quantitative Genomics and Genetics

### Computer Lab 7

– 9 October 2014

– Author: Jin Hyun Ju (jj328@cornell.edu)

## Mini expression Quantitative Trait Loci (eQTL) Analysis with Real Data

Today we are going to run a mini-eQTL analysis with some real data downloaded from the HapMap Project. (http://hapmap.ncbi.nlm.nih.gov/index.html.en)

Since we would probably need something more powerful than a standard laptop to analyze the complete dataset, I have downloaded the genotype and phenotype data for a single population (YRI) and scaled down the data to 400 genotypes and 10 phenotypes for 107 individuals.

You will find 4 files posted with this lab note:

• HapMap_phenotypes.tsv = 10 phenotypes (gene expression levels) for 107 individuals
• HapMap_genoytpes.tsv = 400 genotypes (coded as -1,0,1) for 107 individuals
• HapMap_gene_info.tsv = Partial gene information (entrez gene id, gene symbol, position)
• HapMap_snp_info.tsv = Partial SNP information (chromosome, position)

All the information in the files are tab separated (as you can guess from the extension .tsv).

### Exercise

1. Read in the data from the files. Check the dimensions and look at the first few lines to make sure the data is in the format that you desire.

2. Use the function that you have created from last weeks lab to calculate the p-values for $$\beta_{\alpha}$$ (the additive effect) for every phenotype and every genotype. So the total number of p-values that you get is going to be 10 * 400 = 4000.

3. Find the minimum p-value and identify the phenotype and genotype pair that resulted in the minimum p-value

4. Plot a manhattan plot for the phenotype that you identified in step 3 (p-values for every genotype). You will see that the plot looks more like the ones we have seen in the lecture compared to the plots that you generated from a simulation.

5. From the information files (SNP and gene info), find the positions of the most significant pair and print out the position of the gene and SNP. Are the gene and the genotype positioned close to each other?

Your output should look something like this :

Gene = Y(You have to figure out which one it is), position = Chr # Start 123000 End 124000
Genotype = X(You have to figure out which one it is), position = Chr # Start 123000 End 124000