As MI tools become more accessible, it’s time to move beyond “just running analyses.” We explore CRISP-DM—the global standard for data analytics—for MI projects. What are the six essential steps to drive consistent results?
As the adoption of Materials Informatics (MI) continues to grow, have you encountered challenges like these?
“We built a model using our data, but don’t know how to connect it to actual product development.”
“We obtained results, but they don’t align with practical experience, and the project stalled.”
In recent years, MI tools have become more accessible, significantly lowering the barrier to entry.However, ease of access does not necessarily mean ease of effective use.
Now that tools are widely available, it is no longer enough to simply build models.What matters is what to do after building them, and even more importantly, why they are built in the first place.
This requires designing the overall framework of the project—what we call Materials R&D DX (Digital Transformation).In this article, we introduce a structured approach based on CRISP-DM, the global standard process for data analytics, adapted specifically for MI projects.
These steps will help organizations move from “just trying things out” to consistently delivering results at scale.
CRISP-DM is a data analytics framework consisting of six iterative steps. It is not a linear process, but a cycle—moving back and forth between steps to improve outcomes.Running this cycle itself is what enables organizations to embed a culture of data-driven R&D—in other words, to practice Materials R&D DX.Let’s walk through the 6 steps.
Clarify “What kind of materials do we want to develop?”, “What problems are we trying to solve?”.
Key point:The objective does not need to be perfect from the beginning. Even a simple goal like “Let’s see if we can predict this property” is sufficient.Why it matters:Clear objectives make it much easier to evaluate model performance later.
This step answers the fundamental question: “What is the purpose of DX?”
Take inventory of your available data - Experimental notebooks, Excel files on personal PCs, Historical reports in shared drives..
Assess data potential - Can this data be used for machine learning? Is the volume sufficient?Understanding both data quality and quantity at a high level is critical.
This is the most labor-intensive, most critical, and generally the most time-consuming step.First, collect and consolidate data scattered across individuals and departments into one place, and organize it into a common format (template).Then, correct inconsistencies in notation, handle missing values, and organize the data into a structured format for machine learning.
Build machine learning models.Start simple: There is no need to begin with complex algorithms. Start by creating a baseline model. By actually running models, you gain insights such as “We need more data here”, “This might be more predictable than expected”.
Evaluate the model not only by accuracy, but also by usability and interpretability.
(1)Forward Prediction
(2)Inverse Design
(3)Cross-check with domain knowledge (Interpretability):
(4)Perspective for improving accuracy - If accuracy or confidence is insufficient, how can it be improved?
Integrate the model into actual R&D workflows. This is the true goal of Materials R&D DX.
Operation in Practice
Iterative Cycle
These six steps may seem complex. However, the reality is the opposite.Start by quickly running through steps 3–5 using your existing data. This will naturally reveal “What data is missing”, “How the problem should be redefined”.
Our Materials R&D DX platform is designed to accelerate the CRISP-DM cycle and enable it to be repeated continuously.
Data Preparation:Standardization and centralized management (assetization) of data through the use of templatesModeling:Automated modeling capabilities that can be used even without specialized expertiseDeployment:Implementation of forward prediction and inverse design using the developed models
Whether you want to “just try it out” or “build a full-scale process” - this platform enables you to move forward smoothly without losing sight of where you are in your project.Why not start with a free trial and experience your “first cycle” using your own data?
Why not try your first cycle using your own data? Start with a trial and experience the value firsthand.