Tobias Kloepper PhD will be chairing the Research & Development round table at Precision Data SIG. 

Unlike a traditional conference, the bulk of the day consists of ‘Core Sessions’ – round table discussion groups which are themed around different topic areas. This allows participants to benefit from the shared experiences of senior-level drivers and implementers.  Discussion points are captured from each table, each group and the overall meeting.

Held in Boston, Massachusetts on the 12th September, the subject discussion will centre around four key themes and our commentary highlighting their importance:

Table 1: Applying machine learning to drug discovery and repurposing

Over the past decade, deep learning has achieved remarkable success in various artificial intelligence research areas.

The first wave of applications of deep learning in pharmaceutical research has emerged in recent years. Its utility has gone beyond bioactivity predictions, showing promise in addressing diverse problems in drug discovery.

Digital data, in all shapes and sizes, is growing exponentially. According to the USA National Security Agency, the Internet is processing 1826 petabytes of data per day.  In 2011 digital information grew nine times in volume in just five years, by 2020 its amount in the world is expected to reach 35 trillion gigabytes.

The high demand of exploring and analysing big data has encouraged the use of data-hungry machine learning algorithms like deep learning.

Table 2: Large-scale multi-omics data integration and analysis for translational research

A new era of personalised medicine has arrived, which promises an individualised health care model, with tailored medical targeted treatment and management for each patient. Under this regime, not only the clinical profiles of patients but also their molecular profiles can be individually managed to drive for personalised treatment.

Cancer studies that focus on one-dimensional ‘omics’ data have only been able to provide limited information about tumour progression. To overcome this, tremendous efforts have been made to obtain data from multiple ‘omics’ platforms and integrate them together to make better supported conclusions.

Multi-omics data integration is one of the major challenges in the era of precision medicine. Considerable work has been done with the advent of high-throughput studies, which have enabled the generation of large datasets throughout the drug discovery and clinical pipelines.

However, there remain significant challenges in de-siloing data from the various stages of the pipeline so that clinical outcomes are more accessible for downstream analysis.  The biggest barriers companies face in extracting value from data and analytics are still largely organisational; many struggle to incorporate data-driven insights into day-to-day business processes.

Table 3: Making best use of statistical modelling methods of tumours for oncology research

We are currently in an era where high-throughput data is the norm, and the wealth of publicly available data offers an exciting opportunity to study cancer as a complex network.  However, our ability to interpret this data for knowledge and discovery has not kept pace with the data collection efforts.

Owing to large-scale improvements in ‘omics’ technologies we can now track changes in biological processes in unprecedented detail, developing an understanding of how these data types relate to one another. Key considerations with statistical modelling are data/sample size, data quality (e.g. read depth, reproducibility) and metadata quality (e.g. coherent histopathology, unified clinical terminology/ontologies).

Table 4: The role of machine learning in identifying novel targets and pathways

The AI pioneers of the 1950s discussed building machines that could sense, reason and think like people — a concept known as ‘general AI’ that is likely to remain in the realms of science fiction for some time. However, with the continued rapid growth in computer-processing power over the past two decades, the availability of large data sets and the development of advanced algorithms have driven major improvements in machine learning. This has helped to bring about ‘narrow AI’, which focuses on specific tasks.

This progress has also triggered a wave of start-ups that employ AI for drug discovery, with many of them using it to identify patterns and pathways hidden in large volumes of data. However, more time is needed to gauge accurately whether the adoption of such technologies has provided significant productivity gains in the industry.

A key issue which remains pertinent is that of metadata management. Each data provider, CRO and clinical/medical records all using overlapping, un-interoperable and sometimes incoherent terminology. This generally requires much effort in the way of manual curation to unify meta-data catalogues. And can hence be an impediment for machine learning algorithms to generate high-quality models.