Cycle number and number of reads with NGS technologies. What do they mean!? - (Mar/17/2014 )
So sorry if this is a stupid question but Im having trouble getting my head around a couple of things with regards to NGS tech. So, number of reads, as in the MiSeq V2 does 15 million reads, does that mean it can read 15 million bases? And cycle number, so that's the number of times it goes through the sequencing process right? so if its 500 cycles, that means it can only sequence 2 x 250bp fragments and does it read 15 million bases at each cycle?
I think Im on the right lines... I hope, just need to get the explanation clear!
15 million reads, means that 7.5 million short DNA fragments have been sequenced. Each DNA fragment is read twice.
NGS are able to do many sequencing reaction in parallel.
500 cycles mean each read is a maximum of 250bp long. That is how far the machine will sequence each DNA fragment.
Each cycle is an incooperation of a nucleotide linked to a florophore to the growing DNA chain
As for your last question, it would depend on how long your DNA fragments are. If they were long enough, 15 million bp per cycle would in theory be read.
Great, thanks. So, its the number of DNA fragments the machine can read. In this case 7.5 million on the flow cell and each one is read twice because of the dual surface imaging (Illumina).
How does that equate to depth of sequencing? As I understand it, depth of sequencing means the number of times a sequence is read but how do you know how many times your sequences will be read? I read that 100,000 reads per sample is enough to capture alpha diversity but how do you make sure you get enough reads? I was assuming that that meant if the machine did 15 million reads, I could put 150 samples on (in theory) Im getting all confused!
Depth of coverage (sequencing)... you need to do the experiment itself and count. Some sequences will amplify easier than others and some will be completely missed out.
There are several different ways express coverage.
- the theoretical "fold-coverage" of a experiment: number of reads * read length / target size
- the theoretical or empirical "breadth-of-coverage" of an assembly: assembly size / target size
- the empirical average "depth-of-coverage" of an assembly: number of reads * read length / assembly size
Is 100,000 reads enough to capture alpha diversity? I am afraid I am unable to answer this one, as we have left my general knowledge on NGS and into the statistic and hands on experience required to estimate the complexity of a microbe habitat. I think the number will have to do with how dominated is a habitat by a particular species, and how easily we can extract DNA from different species of cells. Some species will be more resilient to the protocol we use to extract DNA and thus contribute less DNA, independent of their bio mass in the sample.
I have seen papers that say 100,000 reads is better than 20,000 (Obvious isn't it..and didn't help much.). Perhaps an experiment is required, to see how many reads is needed when a species is 0.1%, 0.01%, 0.001% etc of the total microbial population. 100,000 reads seems small to me but sampling microbial populations by NGS isn't my field.
There are 15 million reads, but only 7.5 million DNA fragments being read (each is being read twice). So at 100,000 reads (I guess DNA fragments), you can run 75 samples.
Many thanks for your detailed reply, that's very helpful!