This article clarifies the difference between predictive models and search algorithms—two concepts that are often confused in Materials R&D. It introduces four core models commonly used in practice, along with practical criteria for selecting the right approach based on your objectives and data characteristics. Designed as a practical guide, this article helps teams adopt a more data-driven approach to accelerate research and development.
This article defines the difference between “predictive models” and “search algorithms,” which are often confused in the field of Materials R&D DX, and explains the characteristics of the four main models that are essential in practice.
It presents criteria for selecting models based on purpose, as used by professional data scientists, and supports optimal algorithm selection according to the characteristics of the data.
This is a practical guide to a data-driven approach for accelerating research and development.
When promoting AI-driven materials development (Materials R&D DX), the use of algorithms is unavoidable. However, even though they are all referred to as algorithms, did you know that those used in Materials Informatics (MI) can be broadly divided into two types?
One is machine learning algorithms (predictive models), which learn patterns from experimental data.
The other is search algorithms (optimization methods), which use those models to find optimal experimental conditions.
Although both are called algorithms, their roles are clearly different. If these are confused, it can lead to questions such as, “Which method should we use?” or “What is the difference between Random Forest and Bayesian Optimization?”
In this article, we focus on machine learning algorithms (predictive models), which form the foundation, and explain how data scientists in practice understand these models and on what basis they compare and select them.
First, let us clarify the terminology again. As mentioned above, in data-driven materials development, two types of algorithms are mainly used.
The focus of this article is 1. Predictive Models.
When using search tools such as Bayesian Optimization, this part may not be very visible, but in fact, this predictive model is always operating behind the scenes.
A predictive model is, so to speak, a virtual experimental system built inside a computer.
No matter how excellent the search algorithm (optimization method) is, if the accuracy of this system ****(predictive model), which serves as the basis of calculation, is low, it will never reach optimal conditions. Therefore, understanding the characteristics of predictive models is essential for successful exploration.
Before looking at specific algorithms, the first thing to decide is what you want to predict.
In this article, we focus on Regression (numerical prediction), which is in highest demand in materials development.
It should be noted that many algorithms, such as Random Forest and Support Vector Machines, can be applied to both regression and classification.
In this article, we describe their characteristics when used for regression (numerical prediction). The selection of algorithms for classification problems will be explained in a future article.
Many people may associate AI with deep learning (neural networks). However, in materials development, where the number of data points is typically on the order of tens to thousands, the following four groups are mainly used because they can achieve good accuracy even with relatively small datasets.
Methods that attempt to capture trends in data using a straight line (or plane).
Methods that make predictions by combining numerous conditional branches, such as “if the temperature is above a certain value, go right; otherwise, go left.”
Using kernel methods, data is mapped into a higher-dimensional space, and predictions are made based on similarity (distance) between data points.
Methods that integrate predictions from multiple different models (e.g., Lasso and XGBoost) using a “consensus” approach.
(Note: In a broad sense, Random Forest is also an ensemble of decision trees, but here we refer to methods that combine different types of models.)
Unfortunately, there is no universal model. Professional data scientists identify strong initial candidates based on the purpose.
Although we have introduced various methods and selection criteria, implementing and comparing these individually in practice requires significant effort and expertise.
Even if the characteristics of algorithms are understood, manually conducting comprehensive validation each time can be a heavy burden in practice.
Even professional data scientists rarely decide on a single model from the beginning.
Instead, they test multiple models under consistent conditions and select the most suitable one based on objective numerical evaluation.
Our Materials R&D DX platform automates this comprehensive validation process.
Using properly structured data, it trains and compares major algorithms, allowing the system to handle the extensive trial-and-error process required for model selection.
Tasks that computers excel at—such as model selection and tuning—can be left to AI.
Researchers can instead focus their time on interpreting insights and making creative decisions about the next experiments.
In the next article, before moving on to search algorithms, we will explain evaluation metrics (R², RMSE, etc.) used to determine whether predictive models are sufficiently accurate for practical use.
Even if you use a search algorithm, an inaccurate model will only lead to incorrect guidance.Evaluating model reliability in advance is essential for successful exploration.