Imperial College London and IBM Join Forces to Accelerate Personalized Medicine Research within the OpenPOWER Ecosystem


By Dr. Jane Yu, Solution Architect for Healthcare & Life Sciences, IBM

When the Human Genome Project was completed in 2003, it was a watershed moment for the healthcare and life science industry. It marked the beginning of a new era of personalized medicine where the treatment of disease could be tailored to the unique genetic code of individual patients.

We’re closer than ever to fully tailored treatment. To accelerate advances in personalized medicine research, IBM Systems is partnering with the Data Science Institute of Imperial College London (DSI) and its leading team of bioinformatics and data analysis experts. At the heart of this collaboration is tranSMART, an open-source data warehouse and knowledge management system that has already been adopted by commercial and academic research organizations worldwide as a preferred platform for integrating, accessing, analyzing, and sharing clinical and genomic data on very large patient populations. DSI and IBM Systems will be partnering to enhance the performance of the tranSMART environment within the OpenPOWER ecosystem by taking advantage of the speed and scalability of IBM POWER8 server technology, IBM Spectrum Scale storage, and IBM Platform workload management software.

At ISC 2015 in Frankfurt, representatives from Imperial College DSI and IBM Systems will be demonstrating an early prototype of a personalized medicine research environment in which tranSMART is directly linked to IBM text analytics for mining curated scientific literature on POWER8. For a demonstration, please visit us at IBM booth #928 at ISC.

How did we get here? In recent years, the advent of Next Generation Sequencing (NGS) technologies has significantly reduced the cost and time required to sequence whole human genomes: It took roughly $3B USD to sequence the first human genome across 13 years of laboratory bench work; today, a single human genome can be sequenced for roughly $1,000 USD in less than a day.

The task of discovering new medicines and related diagnostics based on genomic information requires a clear understanding of the impact that individual sequence variations have on clinical outcomes. Such associations must be analyzed in the context of prior medical histories and other environmental factors. But this is a computationally daunting task: deriving such insights require scientists to access, process, and analyze genomic sequences, longitudinal patient medical records, biomedical images, and other complex, information-rich data sources securely within a single compute and storage environment. Scientists may also want to leverage the corpus of peer-reviewed scientific literature that may already exist about the genes and molecular pathways influencing the disease under study. Computational workloads must be performed across thousands of very large files containing heterogeneous data, where just a single file containing genomic sequence data alone can be on the order of hundreds of megabytes. Moreover, biological and clinical information critical to the study must be mined from natural language, medical images, and other non-traditional unstructured data types at very large scale.

As drug development efforts continue to shift to increasingly complex and/or exceedingly rare disease targets, the cost of bringing a drug to market is projected to top $2.5B USD in 2015, up from about $1B USD in 2001. The ability of government, commercial, and academic research organizations to innovate in personalized medicine requires that the compute-intensive workloads essential to these efforts run reliably and efficiently. IBM Systems has the tools to deliver.

The high-performance compute and storage architecture must have the flexibility to address the application needs of individual researchers, the speed and scale to process rapidly expanding stores of multimodal data within competitive time windows, and the smarts to extract facts from even the most complex unstructured information sources. The financial viability of these initiatives depends on it. The tranSMART environment addresses each of these critical areas.

Code which demonstrates marked improvements in the performance and scalability of tranSMART on POWER systems will be donated back to the tranSMART open-source community. Early performance gains have already been seen on POWER8. In addition, IBM Systems will be working with DSI, IBM Watson, and other IBM divisions to enable large-scale text analytics, natural language processing, machine learning, and data federation capabilities within the tranSMART – POWER analytical environment.

We look forward to seeing you at ISC to show you how OpenPOWER’s HPC capabilities are helping to improve personalized medicine and healthcare.

About Dr. Jane Yu

Jane Yu, MD, PhD is a Worldwide Solution Architect for Healthcare & Life Science within IBM Systems. Dr. Yu has more than 20 years of experience spanning clinical healthcare, biomedical research, and advanced analytics. Since joining IBM in 2011, Dr. Yu has been building on-premise and cloud-based data management and analytics systems that enable leading edge clinical and basic science research. She holds an MD and a PhD in Biomedical Engineering from Johns Hopkins University School of Medicine, and a Bachelor of Science in Aeronautics & Astronautics from the Massachusetts Institute of Technology.


Joseph A. DiMasi, slides: “Innovation in the Pharmaceutical Industry: New Estimates of R&D Costs,” Tufts Center for the Study of Drug Development, November 18, 2014.