Deep Learning to Fight Cancer: from Molecular Research to Large-Scale Population Modeling

Projects that apply deep learning (DL) to cancer treatment can not only advance cancer research and treatment, but also improve the capabilities and infrastructure of deep learning, and ultimately accelerate the research of exascale computers (capable of a billion billion calculations per second). The advancement of biomedicine and the rise of the next generation of leading computers (exascale Development) will definitely promote the development of cancer treatment. Among them, the rapid development of deep learning and data-driven science should be credited.

The Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program was created in 2016 to accelerate cancer research using emerging exascale computing capabilities. Three JDACS4C pilot projects, including many studies from the molecular level to population size, are to support the CANcer Distributed Learning Environment project. These works aim to gain insights into scalable machine learning tools, reduce treatment time through deep learning, simulation and analysis technology, and provide information for future calculation programs. It is also hoped to establish predictive models based on use of the increasing data and cancer-related data, so as to better understand the disease and ultimately to provide guidance on individual treatment.

The following is a brief introduction to the three pilot projects:

RAS Molecular Project: The project “Molecular Level Pilot for RAS Structure and Dynamics in Cellular Membranes” aims to develop new calculation methods, support the research that has been completed under the RAS project, and ultimately improve our understanding of RAS gene family and the role of its related signaling pathways, and at the same time identify unique new therapeutic targets in the RAS protein membrane signaling complex.

Pre-clinical screening: This project “Cellular Level Pilot for Predictive Modeling for Pre-clinical Screening” will develop machine learning, large-scale data and predictive models based on experimental biological data derived from human tumor tissue xenotransplantation. These predictive models may indicate new targets for cancer treatment and help determine new treatments.

Population Model: This project “Population Level Pilot for Population Information Integration, Analysis and Modeling” aims to establish an extensible framework that can efficiently extract, extend, integrate and construct case information of cancer patients. Such an “engine” will be very powerful in many aspects of healthcare (transfer, cost control, research, etc.).

To complete the three projects, it is obvious that such a complicated work requires the cooperation of many organizations. For example, the departments of the National Cancer Institute include the Center for Biomedical Information and Information Technology (CBIIT), the Department of Cancer Therapy and Diagnosis (DCTD) and four U.S. Department of Energy National Laboratories are all involved with these projects.

Although all the pilot projects need machine learning, but they will use it in different ways. JDASC4C has reached cooperative relationships with companies such as Intel, Cray, NVIDIA, and IBM. Also, companies like Google, Microsoft, and Facebook all have their own deep learning frameworks. So the frameworks that is most suitable for solving problems should be evaluated first and hardware needs to be optimized when necessary.

Because RAS is a project on the molecular scale, it has the smallest size among all projects. RAS is a well-known cancer gene whose code generates a signaling protein embedded in the cell membrane. These proteins control signaling pathways that can extend into cells and drive many different cellular processes. RAS is currently involved in about 30% of cancers, including pancreatic cancer. The pilot project will combine simulation and wet laboratory screening data to elaborate on the details of the RAS-related signal cascade, and hope to find key points for the manufacture of new drugs that can interfere with this disease.

The machine learning models built can predict a drug response or tumor type/result very accurately, but they cannot tell us the reason very effectively. They are not explanatory, not mechanistic. What researchers have to do is to bring some mechanistic models or mechanistic data in some way and mix them with machine learning models to get two things-models with high-precision predictive capabilities and predictive interpretation capabilities.

The third project is dedicated to the development of a predictable population size model. Although the data is somewhat scattered, the National Cancer Institute (NCI), the National Institutes of Health (NIH), the Food and Drug Administration (FDA), and the volume of patient data held by companies and payer organizations (pathology reports, treatments, results, lifestyle, demographics, etc.) is huge. Unfortunately, it is largely unstructured like many biomedical data. Researchers can’t really use it for calculations in the way they want, so they are using machine learning to translate unstructured data into structured data in a way that can be used for calculations.

AI & Medicine is dedicated to collaborating with medical institutions and pharmaceutical enterprises around the world to meet their specific drug R&D requirements. Its AI-powered drug discovery platform is capable of providing a broad and integrated portfolio of medical and scientific solutions in areas like drug R&D, medical translation, medical imaging, medical therapy and research system, and more.

A follower of the latest advances in AI, science and biology.