In March, IBM announced alongside the White House Office of Science and Technology Policy (OSTP) that it would help coordinate an effort to provide hundreds of petaflops of compute to scientists researching the coronavirus. As part of the newly launched COVID-19 High Performance Computing (HPC) Consortium, IBM pledged to assist in evaluating proposals and to provide access to resources for projects that “make the most immediate impact.”
Much work remains, but some of the Consortium’s most prominent members — among them Microsoft, Intel, and Nvidia — claim that progress is being made.
Petaflops of compute
Powerful computers allow researchers to undertake high volumes of calculations in epidemiology, bioinformatics, and molecular modeling, many of which would take months on traditional computing platforms (or years if worked by hand). Moreover, because the computers are available in the cloud, they enable teams to collaborate from anywhere in the world.
The insights generated by the experiments can help advance humanity’s understanding of COVID-19 in key ways, such as viral-human interaction, viral structure and function, small molecule design, drug repurposing, and patient trajectory and outcomes. “Technology is a critical part of the COVID-19 research going on right now all over the world,” Thierry Pellegrino, VP of Dell Technologies, a member of the Consortium, told VentureBeat. “It’s crucial to the population of our planet that researchers have the tools to understand, treat, and fight this virus. Researchers around the world are true heroes doing important work under extreme and unfamiliar circumstances, and we couldn’t be prouder to support their efforts.”
VB Transform 2020 Online - July 15-17. Join leading AI executives: Register for the free livestream.
Companies and institutions matched 62 projects in the U.S., Germany, India, South Africa, Saudi Arabia, Croatia, Spain, the U.K., and other countries with supercomputers from Google Cloud, Amazon Web Services (AWS), Microsoft Azure, IBM, and dozens of academic and nonprofit research institutions for free. They’re running on over 136,000 nodes containing 5 million processor cores and more than 50,000 graphics cards, which together deliver over 483 petaflops (430 trillion floating-point operations per second) of compute across hardware maintained by the Consortium’s 40 partners.
In addition to supercomputing infrastructure built atop its Azure cloud computing platform, Microsoft is providing researchers with networking and storage resources integrated with workload orchestration through Azure HPC. Concurrent with this is the company’s AI for Health program, which in April allocated $20 million to developments in five key areas — data and insights, treatment and diagnostics, allocation of resources, dissemination of accurate information, and scientific research — toward bolstering COVID-19-related work.
As a part of its work with the Consortium, Microsoft says it’s providing teams access to its scientists spanning AI, HPC, quantum computing, and other areas of computer science at Microsoft Research and elsewhere. Much of their work to date has entailed basic scientific discovery about COVID-19 itself and how it interacts with the human host, including the design of therapeutics, through:
Molecular dynamics modeling.
3D mapping of virus protein structures.
Compound screening to see if existing drug molecules are able to inhibit virus cellular entry.
Microsoft says that each grantee with which it collaborates receives a full Azure HPC environment, including Azure CycleCloud with the Slurm workload manager, best-fit Azure Virtual Machines, and storage. They’re configured to scale on-demand and meet compute as necessary, and they’re tailored to the specific research needs of the grantee.
Nepali modeling and ventilator splitting
Through the Consortium, Microsoft’s AI for Health is supporting the nonprofit research institute Nepal Applied Mathematics and Informatics Institute for Research (NAAMII), which is employing simulation to model what how COVID-19 would spread among the Nepali population, given different scenarios. These models, Microsoft says, can show patterns that might potentially save Nepali lives and livelihoods.
Duke University, another grantee, is leveraging Azure to investigate ventilator splitting, a technique that enables multiple patients to use the same ventilator. The Matlab division of MathWorks teamed up with Microsoft to optimize the researchers’ analysis for distributed computing environments.
Google continues to provide compute, storage, and workload management services to Consortium grantees through Google Cloud Platform, and it recently set aside $20 million in computing credits to academic institutions and researchers studying COVID-19 treatments, therapies, and vaccines. As a part of its work with the Consortium, the company is collaborating on epidemiological modeling with Northeastern researchers and applying AI to medical imaging with the Complutense University of Madrid.
Google also partnered with the Harvard Global Health Institute to fund companies, government agencies, nonprofit organizations, and institutions working on COVID-19-related research. Beyond this, the tech giant — along with Microsoft — kicked off a program with Microsoft-backed cloud company Rescale to offer HPC resources at no cost to teams working to develop COVID-19 testing and vaccines. Rescale provides the platform on which researchers launch experiments and record results, while Google and Microsoft supply the backend computing resources.
Amazon, like Google, is supplying compute and tools to researchers matched through the Consortium. Over 11 teams are currently using its infrastructure and dedicated Amazon Web Services solution architects conference with the scientists every week.
As a part of its AWS Diagnostic Development Initiative, Amazon is also providing $20 million in computing credits to over 35 institutions and private companies leveraging AWS to further the development of COVID-19 point-of-care diagnostics — i.e., testing that can be done at home or at a clinic with same-day results. “This is a global health emergency that will only be resolved by governments, businesses, academia, and individuals working together to better understand this virus and ultimately find a cure,” Teresa Carlson, VP of worldwide public sector at AWS, said in a statement.
Developing protein decoys
At the MIT Media Lab, inspired by a researcher at Johns Hopkins University, a team is identifying “decoy” proteins of ACE2 receptors (the receptors to which coronaviruses bind inside the human body) that might render COVID-19 inert. Using a machine learning model trained on data about the ACE2 receptor and running on AWS, the researchers are attempting to predict which variants of the decoy won’t interact with other proteins in the body and cause harmful side effects. If all goes well, tests in mice will commence soon, with clinical trials beginning toward the end of summer.
In separate efforts, AWS is empowering researchers at the Children’s National Hospital to combine hundreds of data sets to identify genes that might be targeted to treat COVID-19. A team at Iowa State University is tapping evolutionary models with public genomic data sets to suss out the relationships of strains of COVID-19, to understand how they mutate and spread. And scientists at Emory University are developing a web-based tool — tmCOVID — to extract and summarize key concepts in COVID-19 scientific studies.
Nvidia says that 14 of the Consortium’s projects have consumed over 3 million GPU hours on the Nvidia-powered Summit supercomputer at Oak Ridge National Laboratory. Summit is the world’s fastest supercomputer, as ranked by the Top 500 list of supercomputers. And it’s also offered its own 20,000-GPU infrastructure — SaturnV — which the company’s researchers are using primarily to optimize COVID-19 research applications
Nvidia has been using excess cycles on SaturnV to run Folding@home, a distributed computing project that simulates protein dynamics in an effort to help develop therapeutics for diseases, including COVID-19. And it’s assisted in matching researchers to supercomputers based on each researchers’ requirements
Quantum chemistry and virtual screening
In partnership with Microsoft, Nvidia is working with the University of California, Riverside on quantum chemistry solutions that benefit from GPU optimization. The number of possible COVID-19 inhibitors are immense, and carrying out experimental studies on all the candidates is both infeasible and cost-inefficient. The hope is that the project’s predictive, GPU-enabled simulations — which use up to 800,000 GPU hours on Azure — will provide guidance for efforts that narrow in on the most promising candidates.
According to Nvidia, in less than a week its experts helped project lead Bryan Wong package research code using HPC Container Maker, the company’s open source tool that ships with 30 containerized HPC applications. And they tapped Nvidia’s Nsight debugging tool to develop a fix for a bug that cut the amount of work that was scheduled to take 800,000 GPU-hours to 300,000 GPU-hours, translating to a savings of $500,000.
At Carnegie Mellon University, a team led by Olexandr Isayev worked with Nvidia to apply AI approaches to the task of high-throughput virtual screening, which uses algorithms to identify bioactive molecules. Unlike traditional scientific simulations, which take a brute force approach to problems by attempting to simulate every possible combination of molecular interaction, AI makes educated guesses that reduce the number of combinations to be simulated. This leads to theoretically faster candidate drug discovery (and quicker field trials); Isayev estimates that it might be as much as a million times faster than usual mechanical calculations.
The first step in the process is using AI to analyze a library of molecules that can be purchased from chemical companies, preparing them for screening in simulation. The best candidates from the screening will then be simulated using AI-enhanced molecular dynamics, and top hits from the final screening will be tested in partner laboratories.
At the conclusion fo their work, Isayev and colleagues plan to deposit their data sets to the open source COVID-19 data lake, a centralized repository of curated data sets maintained by Amazon’s AWS division. There, they hope that other researchers will benefit from them.
IBM VP of technical computing Dave Turek says that COVID-19 research continues with partners across the spectrum — both on machines powered by its hardware and within laboratories and institutions with which it has relationships. “Without any large contracts or anything of the kind, [the Consortium] came together in a way to both share resources and manage a process of expediting the scientific proposals that came into the consortia and match it to the best resources,” he said in a statement. “The teams are making rapid progress, and these supercomputing-powered projects are using novel approaches to understanding the virus.”
For example, overseas, IBM researchers in Daresbury at the Hartree Centre partnered with University of Oxford scientists to combine molecular simulations with AI in discovering compounds that could be repurposed as anti-COVID-19 drugs. Using Summit and the Texas Advanced Computing Center’s (TACC) Frontera, the fifth-fastest system per the Top 500, they say they’re accomplishing months of research in a matter of hours.
Generating molecular compounds
With the help of IBM, researchers at the University of Utah tapped the National Center for Supercomputing’s Blue Waters and TACC’s Longhorn and Frontera to generate more than 2,000 molecular models of compounds relevant for COVID-19. They ranked the models based on the molecules’ force field energy estimates, which they theorized could help design better peptide inhibitors of an enzyme to stop COVID-19.
The team investigated the structure of the COVID-19 main protease, an enzyme that breaks down proteins and peptides, in complex with a peptide inhibitor called N3. They then applied an approach developed to identify Ebola-stopping molecules involving molecular dynamics simulations and the optimization of specific structures. This enabled the COVID-19 protease to break down a series of similar, easy-to-detect probes that had already been designed, serving as the basis for assessments that test the inhibitors’ effectiveness.
The work built on a body of knowledge about how the potential energy generated by atoms can give a molecule a positively or negatively charged “force field” that attracts or repels other molecules. Using AMBER, a molecular dynamics code, the researchers observed experimental results within one hundred-millionth of a centimeter, a measure imperceptible to all but the strongest microscopes.
The University of Utah’s Schmidt lab will later transform the peptide leads into biopharmaceutical scaffolds called circular modified peptides. “Our hope is that we find a new peptide inhibitor that can be experimentally verified in the next couple of weeks. And then, we will engage in further design to make the peptide cyclic to make it more stable as a potential drug,” University of Utah professor and research lead Thomas Cheatham said in a statement.
Mapping how COVID-19 spreads
It’s well understood that COVID-19 spreads via virus-laden droplets, which transport around environments by air conditioning units, wind, and other forms of turbulence. But there remains a great deal of contention around airborne transmission rates, and according to some experts, it could take years and cost lives to gather evidence of airborne transmission.
In safer pursuit of clarity, scientists at Utah State University, Lawrence Livermore National Lab, and the University of Illinois intend to use the Consortium’s supercomputing resources to study person-to-person transmission of airborne respiratory infectious diseases like COVID-19. They worked from the hypothesis that aerosolized droplets from human airways contaminate rooms more quickly than initially assumed. They’ll leverage high-fidelity multiphase large-eddy simulations (LES) — mathematical models for turbulence used in computational fluid dynamics — running on IBM hardware to determine cloud paths in typical hospital settings.
The short-term aim will be to understand how long a cloud persists and where the particles settle, which could inform non-pharmacological techniques to reduce the spread. “The [goal] of this study is to fundamentally improve our understanding of person-to-person transmission of airborne respiratory infectious diseases,” wrote the researchers in a statement. “Our findings will [make] it safer for health care professionals.”
Studying genetic susceptibility
Beyond isolating COVID-19-killing compounds and mapping viral spread, researchers are attempting to define risk groups by performing genome analysis and IBM supercomputer-enhanced DNA sequencing.
A team of scientists affiliated with NASA observe that COVID-19 appears to cause pneumonia, triggering an inflammatory response in the lungs called acute respiratory distress (ARDS). To test this, they plan to use the supercomputer at NASA’s Ames Research Center, which will sequence the genome on patients who develop ARDS and those who don’t.
If all goes well, the team believes their study will result in practical tools for predicting which COVID-19 patients are likely to develop ARDS, and therefore which patients are likely to need intensive support prior to the emergence of severe symptoms. Such tools could help to guide intensive care resource usage for the sickest patients, helping to manage ongoing treatment.
Intel is actively involved in the design, development, and deployment of several Consortium-affiliated supercomputers as well as the upcoming Aurora at Argonne National Laboratory in Chicago. The company says it has a staff of engineers working on code optimizations for HPC applications including LAMMPS (a molecular dynamics code), Gromacs (a package for protein, lipids, and nucleic acids simulation), NAMD (another molecular dynamics code), AMBER, and others. Intel’s also providing tools, architecture knowledge, and software with partners to enhance COVID-19 applications and scale their performance on Intel-based hardware.
One specific area of focus for Intel is a collaboration with NAMD to release a version of the code that provides faster simulations on Xeon processors that support AVX-512. The company says the significant performance boost will allow researchers to achieve longer timescales in the simulation of relevant molecules associated with COVID-19, by extension enabling them to better understand aspects of viral infection with “atom-level” detail. The update is expected to be made public for early use in June.
Hewlett Packard Enterprise
Some of Hewlett Packard Enterprise’s (HPE) work is through the Consortium, while the rest is directly with a number of customers and partners. As a result of its acquisition of Cray in September 2019 for approximately $1.3 billion, the company claims it now has the most supercomputers and HPC systems in use by leading research centers.
“High-performance computing is more powerful today than it’s ever been, and its massive computing power — along with other advanced capabilities — has significantly transformed drug discovery,” said Peter Ungaro, former Cray CEO and head of HPE’s HPC and mission-critical systems group, in a statement. “Supercomputing and HPC systems unlock greater potential for AI and machine learning applications, and when applied to 3D modeling and simulations, dramatically accelerates time-to-insight and increases scientific outcomes. Our work within the consortium provides the researchers with HPC capabilities they wouldn’t normally have access to independently, to help fast track the discovery of a cure for the pandemic.”
Drug design research
In partnership with Microsoft. HPE is working with a team at the University of Alabama in Hunstville (UAH) to supply its Sentinel supercomputer through the Azure cloud. With the supercomputer, along with a team of dedicated HPE experts, it’s supporting various stages of the drug design process at UAH.
The researchers are employing a molecular docking approach, which is a kind of bioinformatic modeling that involves the interaction of two or more molecules to yield a stable combination. Drawing on a large, open set of natural products found in plants, fungi, the sea, and animals, Sentinel is performing calculations to determine how natural compounds interact with COVID-19’s protein. Previously, 20,000 molecular dockings could be improved against a protein target in seven or eight minutes, versus the full 24 hours it used to take. Now, the research team can perform as many as 1.2 million molecular dockings per day.
Elsewhere, HPE is supporting the work of the Lawrence Livermore National Laboratory with the Theta supercomputer, which is housed at the Argonne Leadership Computing Facility. The researchers’ goal is to apply AI to accelerate the process of simulating billions of molecules from a database of drug candidates. They’ve narrowed down the number of potential candidates from 1040 to a set of 20, and they’ve tapped Catalyst — an HPE-powered HPC cluster that generates predictions like experimental and structural biology data — to improve outcomes and speed up discovery.
HPE is also collaborating with France’s National Center for Scientific Research and GENCI to arm scientists at the Sorbonne University in Paris with GENCI’s Jean Zay supercomputer, which HPE designed. The team is using Jean Zay to optimize the Tinker-HP software, an approach to parallel computing enabled by multiple graphics cards and designed to simulate at the level of atoms for large biological molecules. Tinker-HP simultaneously performs a range of data-intensive calculations to create 3D simulations of molecular interactions faster and at high resolutions.
Contributions from the private sector
The nature of the Consortium’s work isn’t strictly academic. Startups hope to use the group’s vast computational resources to develop treatments, molecular designs, and drugs targeting COVID-19.
Kolkata-based Novel Techsciences is identifying phytochemicals from the more than 3,000 medicinal plans and anti-viral plant extracts in India that might act as natural drugs against COVID-19. It also plans to isolate plant-derived compounds that could help tackle multi-drug resistance that arises as the coronavirus mutates, with the goal of developing a comprehensive prophylactic treatment regime.
Meanwhile, in London, Y Combinator-backed PostEr is overseeing on the Moonshot Project, whose aim is to produce inhibitors based on over 60 fragment hits (i.e., molecules validated to bind to a target protein, making them a chemical starting point for drug discovery) isolated in experiments to determine the molecular structure of COVID-19. By running machine learning algorithms in the background to triage suggestions and generate synthesis plans, PostEra has identified around 21 highly effective volunteer-submitted molecular designs, which will be synthesized by chemical company Enamine. Within months, it will be tested in animals.
If successful, PostEra’s would be one of the first drugs developed in an open-source fashion. “[Machine learning] can reduce the time to determine optimal ways to make these compounds from weeks to days,” the company said in a statement. “[We believe] the worldwide scientific community [can suggest] drug candidates that might bind to, and neutralize, [COVID-19].”
Another private sector project is led by London-based AI startup Kuano. This team’s intention is to gain insights from diseases that are similar to COVID-19 — mainly other coronaviruses — to design a drug that could defeat COVID-19, with a genetic algorithm that searches the chemical space surrounding existing antiviral drugs and a deep learning-based classification model built on existing binding data. The company’s combining those tools with docking and molecular dynamics simulations to enhance the results and yield machine learning models that can be used to score molecular designs for synthesis as antiviral compounds.
As for AI and drug development startup Innoplexus, it’s also working with the Consortium’s supercomputers to accelerate the discovery of molecules that could lead to a drug to combat COVID-19. It expects to run permutations on five promising candidates — specifically, candidates that are non-toxic, can be manufactured, and have a high potency.
Despite the fact that much of the work remains in the early stages, momentum around the Consortium appears to be accelerating.
Last month, IBM announced that the UK Research and Innovation (UKRI) and Swiss National Supercomputer Center (CSCS) will join the Consortium, making available machines including the University of Edinburgh’s ARCHER; the Science and Technology Facilities Council’s DIRAC; the Biotechnology and Biological Sciences Research Council’s Earlham Institute; and Piz Daint, the sixth-ranked supercomputer in the world according to the Top 500. The new additions brought the total available petaflops up to 483; as of mid-March is was 300, and in May it hit 437.
“The COVID-19 HPC Consortium … is the largest public-private computing partnership ever created. What started as a series of phone calls […] five days later, more than two dozen partners came on board, many who are typically rivals,” said IBM’s Turek. “Without any large contracts or anything of the kind, this group [is coming] together in a way to both share resources and manage a process of expediting the scientific proposals that came into the consortia and match it to the best resources.”