XenBioinfoKR2017

From TaejoonLab
Jump to: navigation, search

Xenopus Bioinformatics Workshop in Korea, 2017

This page describes the information about the Xenopus Bioinformatics Workshop

Basic Info

  • When: 2017. 07. 11. (Tues) ~ 2017. 07. 14. (Fri)
  • Where: UNIST EB4(110) 710, Ulsan, Republic of Korea
  • Emergency contact
    • 052-217-2635 or 052-217-5359 (Lab. EB4(110) 704/705)
    • 052-217-2583 (Taejoon's Office)

List of participants

  • POSTECH: 윤석민, 구영무
  • 경북대: 김초원, 정영은
  • 연세대: 정제인

Prerequisite

Day 1. "Hello, Linux"

  • List of commands: ls, man, chmod, which, echo, pwd, cd, mkdir, wget, rm, mv
 $ export PATH=$PATH:~/miniconda3/bin/ 
  • Ubuntu maintenance
    • sudo apt-get update
    • sudo apt-get dist-upgrade
    • sudo apt-get install (package name)
    • man apt-get
  • Working with text file
    • head, tail
    • nano (vim)
    • grep, pipe("|")
    • more, less
  • BLAST
    • blastn (DNA), blastp (protein)
    • makeblastdb -dbtype (nucl|prot) -in (input name) -out (output name)
    • blastn -query (input name) -db (db name) -out (output filename)
    • blastn -query (input name) -db (db name) -out (output filename) -outfmt 7

Day 2. NGS mapping

  • Tips to run BWA on background without termination
 $ nohup (program command) 2>(log file) & 

Day 3. Statistical analysis

$ cat translatome.tbl
Filename        SampleName      GroupName
SRR3157354.XENLA_JGIv18pV3_cdna_finalHS.bwa_mem.sam.gz.rpkm+cpm ctrl_d7 ctrl
SRR3157355.XENLA_JGIv18pV3_cdna_finalHS.bwa_mem.sam.gz.rpkm+cpm crush_d7        crush
SRR3157356.XENLA_JGIv18pV3_cdna_finalHS.bwa_mem.sam.gz.rpkm+cpm ctrl_d11        ctrl
SRR3157357.XENLA_JGIv18pV3_cdna_finalHS.bwa_mem.sam.gz.rpkm+cpm crush_d11       crush
$ ./cpm+rpkm-to-read_count_tbl.py translatome.tbl
$ ./cpm+rpkm-to-s_rpkm_tbl.py translatome.tbl
$ ls translatome.*txt
translatome.best_indiv_rpkmInt.txt  translatome.best_mean_rpkm.txt    translatome.mean_read_count.txt
translatome.best_indiv_rpkm.txt     translatome.indiv_read_count.txt
translatome.best_low_rpkm.txt       translatome.low_read_count.txt 
  • Boxplot & Clustering
tbl <- read.table("ctrl_kdm3aMO.joined.txt",header=T,row.names="SeqID");

par(mar=c(6,4,4,2));
boxplot(log10(tbl), las=2);

tmp_cor <- dist( t(as.matrix(tbl)) )
tmp_clust <- hclust( tmp_cor, method="average")
plot(tmp_clust)

tmp_cor <- dist( 1 - cor(as.matrix(tbl),method='spearman') )
tmp_clust <- hclust( tmp_cor, method="average")
plot(tmp_clust)
  • Run edge-R
source("https://bioconductor.org/biocLite.R")
biocLite("edgeR")
> library(edgeR)
> groups <- gsub('_(7|11)d$','',colnames(tbl_tn))
> groups
[1] "crush" "crush" "ctrl"  "ctrl" 
> keep <- rowSums(cpm(tbl_tn)>1) >= 2;
> tbl_tn <- tbl_tn[keep,]
> 
> y_tn <- DGEList(counts=tbl_tn, group=groups)
> y_tn <- calcNormFactors(y_tn)
> y_tn <- estimateCommonDisp(y_tn)
> y_tn <- estimateTagwiseDisp(y_tn)
>
> write.table(cpm(tbl_tn), file='tn.cpm.txt', row.names=TRUE, col.names=TRUE, sep='\t')
> write.table( topTags( exactTest(y_tn, pair=c('crush','ctrl')), n=Inf), "tn.edgeR_out", sep="\t")
  • DE Gene Plotting
> tn_edgeR_tbl <- topTags( exactTest(y_tn, pair=c('crush','ctrl')), n=Inf)$table
> tbl_edgeR_DE <- subset(tn_edgeR_tbl, tn_edgeR_tbl$FDR<0.01 | abs(tn_edgeR_tbl$logFC)>2)
> smoothScatter(tn_edgeR_tbl$logCPM, tn_edgeR_tbl$logFC)
> points(tbl_edgeR_DE$logCPM, tbl_edgeR_DE$logFC, col='red', pch=18) 

SmoothScatter Points Example.png

  • Heatmap
> par(mar=c(6,4,4,2))
> tn_cpm <- cpm(tbl_tn)
> tn_cpm_DE <- tn_cpm[rownames(tbl_edgeR_DE),]
> heatmap.2(as.matrix(tn_cpm_DE), trace='none', scale='row', margins=c(10,10))

Heatmap2 example.png

Day 4. Discussion, Presentation & Wrap-up