What is big data?

What does “Big Data” involve?

The term “Big Data” originated from the discipline of computer science, it was initially used to describe large, rapidly growing and complex databases with a high volume of data. In a healthcare setting, Big Data is used to describe the extensive healthcare databases (like electronic health record systems) or networks of interconnected healthcare databases (called ‘linked’ databases) coming from multiple healthcare organisations.

DSL-health_data-2-diagram1-stg1.png

We need to study the data on large patient numbers to conduct specific types of research. For example, we would use big data to identify specific or unusual patterns of a health condition, to investigate the impact of different treatments used to treat a condition or to discover rare side-effects or long-term health outcomes.

This growing research need requires greater investment by countries and companies to establish networks that allow large-scale data analysis. We can expect to see a growing number of new insights and advances from the analysis of big data. Hopefully, this will involve the rapid development of new medicines and medical devices, as well as smart applications to support healthcare professionals and patients

Communities of patients with rare diseases, alongside the health professionals and scientists involved, are trailblazers in how to work together to advance our understanding of disease and the best treatments.

We strive to enable data science to transform public health. A few of the societal benefits that may arise from analysing big data include:

  • Increasing the effectiveness and quality of treatments.

  • Identifying risk factors and thus preventing diseases or conditions.

  • Improving patient safety by delivering patient information directly to healthcare professionals.

  • Predicting outcomes and identifying pathways in disease transmission, making them preventable.

  • Disseminating knowledge.

  • Reducing inefficiencies by identifying healthcare systems that do not work well.

What does this look like in reality? So far, published findings derived from big data include:

  • Validating >200 novel biomarkers predicting cardiovascular risk.

  • Investigating variation of 174,000 observed national prescribing patterns to national guidelines for COPD.

  • Comparing ~8,000 treatment outcomes for leukaemia by age: uncovering a major unmet treatment need.

  • Mining >700 million records to develop new cancer risk stratification algorithms.

You can find these and other examples of research using big data in our case study collection