GATK HowTo

From TaejoonLab
Jump to: navigation, search

Prepare reference FASTA sequence

http://gatkforums.broadinstitute.org/gatk/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference

$ samtool faidx <my_ref.fa>
$ java -jar /work/project/src/Picard/2.6.0/picard.jar CreateSequenceDictionary R=<my_ref.fa> O=<my_ref.dict>

Mapping reads to the reference

See BWA HowTo

Mark Duplicates & Add Read Groups

#!/bin/bash
PICARD="/work/project/src/Picard/2.6.0/picard.jar"

for IN_SAM in $(ls ../sam/*.sam)
do
  OUT_BASE=$(basename $IN_SAM)
  OUT_BASE=${OUT_BASE/.sam/}

  SORT_BAM=$OUT_BASE".sorted.bam"
  #if [ ! -e $SORT_BAM ]; then
  #  java -Xmx32G -jar $PICARD SortSam I=$IN_SAM O=$SORT_BAM SORT_ORDER=coordinate
  #fi

  RG_BAM=$OUT_BASE".RG.bam"
  if [ ! -e $RG_BAM ]; then
    java -Xmx32G -jar $PICARD AddOrReplaceReadGroups I=$SORT_BAM O=$RG_BAM RGID=1 RGLB=<library name> RGPL=illumina RGPU=<platform unit> RGSM=<sample name> SORT_ORDER=coordinate
  fi

  MD_BAM=$OUT_BASE".MarkDup.bam"
  MD_METRICES=$OUT_BASE".MarkDup.metrics.txt"
  echo "$MD_BAM"
  if [ ! -e $MD_BAM ]; then
    java -Xmx32G -jar $PICARD MarkDuplicates I=$RG_BAM O=$MD_BAM M=$MD_METRICES
    java -Xmx32G -jar $PICARD BuildBamIndex I=$MD_BAM
  fi
done


Run HaplotypeCaller

GATK="/work/project/src/GATK/GenomeAnalysisTK-3.6.jar"

REF_FA=<my_ref.fa>

for IN_BAM in $(ls ../bam/*.RG.bam)
do
  OUT_BASE=$(basename $IN_BAM)
  OUT_BASE=${OUT_BASE/.RG.bam/}

  OUT_VCF=$OUT_BASE".gatk.vcf"
  OUT_BAM=$OUT_BASE".gatk_out.bam"

  echo "$IN_BAM -> $OUT_VCF"
  if [ ! -e $OUT_VCF ]; then
    java -Xmx32G -jar $GATK -T HaplotypeCaller -R $REF_FA -I $IN_BAM -o $OUT_VCF -bamout $OUT_BAM -forceActive -disableOptimizations
  fi
done