For the question related to is ORF=gene, I will give you a riddle (actually is the fact that made me also think about this a few years back): C. elegans has ~20000 different proteins but ~36000 genes. Where does the difference come from?
Hint: promoter/terminator are needed for transcription; rbs, start/stop codons are needed for translation. Think about the product of each process.
Love riddles and I'll take a stab at this (scoffers, hold your peace ):
- This could have everything to do with the nature of eukaryotic genes where the RNA transcripts undergo a splicing step before translation. So presence of non-coding gene/DNA would be sort of "filtered" in the splicing step before those encoding a protein would finally be translated.
- Using the hint, anything that comes under the control (within the confines) of the promoter/terminator pair will be transcribed. But the rbs might not be present at all. Taking this a step further, the absence of the rbs and start and stop codons could be due to a 'frame-shifted translation' (ok, I have no idea what that's called. And we arrived at the two-promoter situation).
Getting a little help, the difference in the number of genes and proteins are caused by presence of "transposons, pseudogenes, and other artifacts".
This article in EMBO reports states that
"Non-protein-coding RNA transcription in the eukaryotes falls into two classes: introns and other non-protein-coding RNAs. In humans, introns account for ~95% of the pre-mRNA transcripts of protein coding genes, and are generally of high sequence complexity"
and does seem to help my explanation in (2) that everything gets transcribed but not everything gets translated. I suspect, whilst these are facts, you're looking for something simpler with regards to promoter, terminator, rbs, and start/stop codons with the processes involved.