RNA seq reads - assessing distance from start codon - (Dec/02/2013 )
I have RNAseq data and I'm trying to relate the reads to distance from the start codon. Anyone have experience with this.
Do you need to extract reads that are e.g. less than 1kb from start codon? You should first map your reads (with Tophat for example) and get a sam/bam file. Then should try bed tools (see doc here http://bedtools.readthedocs.org/en/latest/content/bedtools-suite.html) to intersect your sam/bam with bed file that contains regions near the start codon. To get this bed file you need to do some manual work: get RefSeq gene track from UCSC Gb, find the starts of CDS (dont forget about gene strand), extend them to 1kb regions (upstream and downstream) and convert to bed format.
On the other hand you could run through all mapped reads in sam/bam using Picard Java API and process them as desired.