It is not sufficient, he argues, to simply collect and store massive amounts of data; they must be intelligently curated, and that requires a global framework. “We have all the pieces of the puzzle — now how do we actually assemble them so we can see the big picture?”--The Mathematical Shape of Big Science Data--Jennifer Ouellette
Extracting insights from the shape of complex data using topology describes a method to analyze large complex data by using geometric representation. The advantage is the ability to identify smaller groups often over-looked in analyses allowing deeper stratification compared to standard methodology.
Standard methods are reliant on hypothesis validation and the underlying assignment of sound models or hypothesis. Topological Data Analysis (TDA) discussed in more detail at the link--describes a method where a defined hypothesis is not required to explore the shape of the data in breast cancer databases.
"After accounting for non-independence between risk factors, around a third of Alzheimer's diseases cases worldwide might be attributable to potentially modifiable risk factors. Alzheimer's disease incidence might be reduced through improved access to education and use of effective methods targeted at reducing the prevalence of vascular risk factors (eg, physical inactivity, smoking, midlife hypertension, midlife obesity, and diabetes) and depression."--The Lancet Neurology volume 13, issue 8, August 2014
“Now we have this new multimodal data [gleaned] from biological systems and human social systems, and the data is gathered before we even have a hypothesis.” The data is there in all its messy, multi-dimensional glory, waiting to be queried, but how does one know which questions to ask when the scientific method has been turned on its head?--The Mathematical Shape of Big Science Data