Protocol Online logo
Top : New Forum Archives (2009-): : Bioinformatics and Biostatistics

non redundant DNA - (May/04/2013 )

Dear all,

can someone explain to me in simple english wat non redudant database means?
(in relationship to the ncbi non redudant nucleotide database).

Does it mean that each sequence is just 1 time present in the database? or?

And if so: how does this work when you get your results? I mean: if you get a hit with a known sequence from the database its a hit with just 1 sequence and not the "redudant" ones, but does this mean you will not get the other sequences as hits?
Eg: I blast the sequence AATTGGCCC with the non redudant database and I get a hit with the same sequence in B. cereus , but for example E. coli has the same sequence, will I still know this since the sequences are seen as redudant?
I suppose it will show up ?

thanks in advance.


So you are talking about RefSeq database, right. Your understanding is mostly correct. Since you mentioned bacteria as examples, redundancy depends on how the database defines uniqueness. For mammals, for instance, human and mouse are two species and have their non-redundant refseqs. As for bacteria, situation could be different according to RefSeq as found here

Microbial strains: Microbial genome sequence data derived from
different strains may be represented as
additional RefSeq records. This introduces
redundancy but may also add representation for
some proteins that are unique to a strain.
RefSeq records for a specific strain can be
identified by the unique taxonomic ID for that

That means you may get two hits for the query sequence with one from B. cereus and one from E. coli.


Ok thanks.

Could you explain the part about the mouse and humans?
Do you mean that a sequence that is the same in mouse and humans will be "merged" as one ? or?


Basically it means that each species will get its own specific sequence, but not more than one copy of that sequence. So mouse and human for gene XXX would both have a copy, but you wouldn't find human XXX1 and XXX2.


I see.

However, I am not really familiar with this.
So a copy is the same gene right? For example gene 1 that is present in cromosome 1 and the same gene 1 that is also present in chromosom 7 ?

Or are we talking about genes that have a mutation in their sequence?
Or sequence results that are +- the same but not 100% due to sequencing problems or?