The advent of large amount of data coupled with escalation of computer capability and data analytics lead to a data revolution. In simple layman terms, big data has been cleverly described by Yahoo chief Marissa Mayer as “the planet developing a nervous system”(1) . Big data has recently become such a high priority in the national strategic plan that President Obama approved a $200 million Big Data Research and Development Initiative to glean discoveries from digital data.
Current biomedical big data, amassed by electronic medical records and digital image archiving (about 20 megabytes or MB per image), is reaching a staggering 100-250 exabytes with an annual growth rate of 1.2 to 2.4 exabytes(2) but remains extremely fragmented and disorganized. Our traditional “top-down” data approach up to now entails either healthcare databases or registries (that involve manual entry of data with its inherent limitations of accuracy and completeness, followed by data analysis with relatively basic statistical tools) or conventional hypothesis-driven research and randomized-controlled trials that have become prohibitively expensive, limited in scope, and often without definitive answers.
Recently, this new big data paradigm has been successfully applied to biomedical science mainly in the form of genomic medicine and its escalating genetic transcript big data(3). The vast magnitude and rapid acquisition of this genetic big data is absolutely vertiginous, as exemplified by Michael Snyder, a Stanford genetics PhD who has generated 30 terabytes of data of just his own biological data. Despite the daunting challenge, a few have met this challenge and successfully made strides into positive impact on patient care (4,5). The capstone of this entire data transformation effort in genomic medicine is the ENCyclopedia Of DNA Elements (ENCODE project), an international collaboration of research groups funded by the National Human Genome Research Institute with the aim of delineating the entirety of functional elements encoded in the human genome(6).
2 Hughes G MD. How Big is Big Data in Healthcare? From A Shot in the Arm blog, October 21, 2011.
3 Butte A (Chief, System Medicine at Stanford School of Medicine). Personal communication (February, 2013).
4 Ashley EA et al. Clinical Evaluation Incorporating a Personal Genome. Lancet 2010; 375(9725): 1525-1535.
5 Butte A et al. Computational Translating Molecular Discoveries into Tools for Medicine: Translational Bioinformatics Articles now Featured in JAMIA. J Am Med Inform Assoc 2011; 18(4): 352-353.
6 The ENCODE (ENCyclopedia Of DNA Elements) Project. The ENCODE Project Consortium. Science 2004; 306(5696): 636-640.