A novel machine learning method that enables scientists to derive insights from systems of previously intractable complexity in record time

Berkeley Lab and UC Berkeley researchers say “iterative Random Forests” will deliver powerful scientific insights

While it may be the era of supercomputers and “big data,” without smart methods to mine all that data, it’s only so much digital detritus. Now researchers at the Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab) and UC Berkeley have come up with a novel machine learning method that enables scientists to derive insights from systems of previously intractable complexity in record time.

In a paper published recently in the Proceedings of the National Academy of Sciences (PNAS), the researchers describe a technique called “iterative Random Forests,” which they say could have a transformative effect on any area of science or engineering with complex systems, including biology, precision medicine, materials science, environmental science, and manufacturing, to name a few.

“Take a human cell, for example. There are 10¹⁷⁰ possible molecular interactions in a single cell. That creates considerable computing challenges in searching for relationships,” said Ben Brown of Berkeley Lab’s Environmental Genomics and Systems Biology Division. “Our method enables the identification of interactions of high order at the same computational cost as main effects – even when those interactions are local with weak marginal effects.”

Brown and Bin Yu of UC Berkeley are lead senior authors of “Iterative Random Forests to Discover Predictive and Stable High-Order Interactions.” The co-first authors are Sumanta Basu (formerly a joint postdoc of Brown and Yu and now an assistant professor at Cornell University) and Karl Kumbier (a Ph.D. student of Yu in the UC Berkeley Statistics Department). The paper is the culmination of three years of work that the authors believe will transform the way science is done. “With our method we can gain radically richer information than we’ve ever been able to gain from a learning machine,” Brown said.

The needs of machine learning in science are different from that of industry, where machine learning has been used for things like playing chess, making self-driving cars, and predicting the stock market.

James (Ben) Brown of Berkeley Lab

“The machine learning developed by industry is great if you want to do high-frequency trading on the stock market,” Brown said. “You don’t care why you’re able to predict the stock will go up or down. You just want to know that you can make the predictions.”

But in science, questions surrounding why a process behaves in certain ways are critical. Understanding “why” allows scientists to model or even engineer processes to improve or attain a desired outcome. As a result, machine learning for science needs to peer inside the black box and understand why and how computers reached the conclusions they reached. A long-term goal is to use this kind of information to model or engineer systems to obtain desired outcomes.

In highly complex systems – whether it’s a single cell, the human body, or even an entire ecosystem – there are a large number of variables interacting in nonlinear ways. That makes it difficult if not impossible to build a model that can determine cause and effect. “Unfortunately, in biology, you come across interactions of order 30, 40, 60 all the time,” Brown said. “It’s completely intractable with traditional approaches to statistical learning.”

The method developed by the team led by Brown and Yu, iterative Random Forests (iRF), builds on an algorithm called random forests, a popular and effective predictive modeling tool, translating the internal states of the black box learner into a human-interpretable form. Their approach allows researchers to search for complex interactions by decoupling the order, or size, of interactions from the computational cost of identification.

“There is no difference in the computational cost of detecting an interaction of order 30 versus an interaction of order two,” Brown said. “And that’s a sea change.”

In the PNAS paper, the scientists demonstrated their method on two genomics problems, the role of gene enhancers in the fruit fly embryo and alternative splicing in a human-derived cell line. In both cases, using iRF confirmed previous findings while also uncovering previously unidentified higher-order interactions for follow-up study.

Brown said they’re now using their method for designing phased array laser systems and optimizing sustainable agriculture systems.

“We believe this is a different paradigm for doing science,” said Yu, a professor in the departments of Statistics and Electrical Engineering & Computer Science at UC Berkeley. “We do prediction, but we introduce stability on top of prediction in iRF to more reliably learn the underlying structure in the predictors.”

Simulating quantum computer properties in a classical computer to help build quantum computers

“This enables us to learn how to engineer systems for goal-oriented optimization and more accurately targeted simulations and follow-up experiments,” Brown added.

Learn more: Teaching Computers to Guide Science: New Machine Learning Method Sees the Forests and the Trees

The Latest on: Machine learning

[google_news title=”” keyword=”Machine learning” num_posts=”10″ blurb_length=”0″ show_thumb=”left”]

via Google News

The Latest on: Machine learning

Vertical Axis Wind Turbines Redefined by Machine Learning
on May 17, 2024 at 12:00 pm
it comes down to an engineering problem – air flow control – that he believes can be solved with a combination of sensor technology and machine learning. In the paper recently published in Nature ...
Researchers use machine-learning modeling tools to improve zinc-finger nuclease editing technology
on May 17, 2024 at 7:23 am
Genome editing is making inroads into biomedical research and medicine. By employing biomolecule modeling tools, a Japanese research team is accelerating the pace and cutting the cost of zinc finger ...
Machine learning method for predicting glioma mutations shows promise for personalized treatment
on May 16, 2024 at 1:12 pm
Machine learning (ML) methods can quickly and accurately diagnose mutations in gliomas—primary brain tumors. This is shown by a recent study by Karl Landsteiner University of Health Sciences (KL Krems ...
Study uses machine learning to predict opioid use disorder treatment interruptions
on May 16, 2024 at 1:07 pm
University of Florida researchers have developed a system designed to identify patients at high risk of discontinuing buprenorphine treatment for opioid use disorder.
How the State Department used AI and machine learning to revolutionize records management
on May 16, 2024 at 12:34 pm
A pilot approach helped the State Department streamline the document declassification process and improve the customer experience for FOIA requestors.
Simulating diffusion using 'kinosons' and machine learning
on May 15, 2024 at 7:04 am
Researchers from the University of Illinois Urbana-Champaign have recast diffusion in multicomponent alloys as a sum of individual contributions, called "kinosons." Using machine learning to compute ...
Artificial intelligence and machine learning in agriculture
on May 15, 2024 at 5:27 am
HCR Law experts walk us through what we need to know about artificial intelligence and machine learning in agriculture ...
Optimizing Machine Learning Controllers with Digital Twins
on May 14, 2024 at 4:35 pm
How can machine learning be improved to provide better efficiency in the future? This is what a recent study published in Nature Communications hopes to ad | Technology ...
Machine learning and AI aid in predicting molecular selectivity of chemical reactions
on May 13, 2024 at 2:32 pm
There are few problems now that AI and machine learning cannot help overcome. Researchers from the Yokohama National University are using this modern advantage to resolve what conventional methods ...
Machine Learning Could Make Geothermal Energy More Affordable
on May 12, 2024 at 10:00 am
Machine learning technology has the potential to unlock the vast potential of geothermal energy, making it more accessible and affordable.