TCGA-the cancer genome atlas - (Nov/15/2017 )
I would appreciate if someone can help me understand the TCGA. I recently read a paper in PNAS (2015) where the authors used the TCGA database to report that a low expression of a particular gene in breast cancer was associated to poor survival.
Does anyone know:
1. how do such database gather data on genetic differences in cancer vs normal tissues. Do they conduct genetic experiments on thousands of cancer tissues vs normal tissues and make the data pubically available. Who funds these projects? The TCGA has data describing tumor tissue and matched normal tissues from more than 11,000 patients.
2. how to use the TCGA. Do we need computational biology knowledge to dig out data from the database. Example, how can I dig out expression and survival data of my gene of interest.
I can answer 1) - The research is largely funded by public sources such as the NIH in the USA or the BBSRC in the UK, but private bodies (e.g. drug companies) sometimes fund this sort of work too. The data for most, if not all, publically funded research in the US and many other countries is required to be made publically available (often as a condition of publication as well, so that others can verify the correct analysis). They have indeed analyzed many thousands of cancers and normal tissues - it's easier than you might think as normal tissue is always collected as part of the tumour excision, so that the surgeons can be sure they got all the tumour and also so the pathologists can check that this tissue is clear of cancer metastasizes too. The data is curated by the bodies that collect it (the NIH largely, e.g. genbank), so that there are some quality assurances in place when you use the data - many of the large genome methods (e.g. Illumina sequencing) these days come with inbuilt measures of data quailty and read-depth, which are also reported as part of the data.