Chemicals and Materials: Strategies for Data-Driven R&D

Dotmatics’ unified platform is making a big impact on how R&D organizations in chemicals and materials companies are tackling digitalization challenges. Recently, Umicore Corporate R&DCroda, and Arkema have issued press releases talking about how Dotmatics allows their R&D to become more data-driven. 

One of the unique features of the Dotmatics platform is the fact that it covers the entire DMT (Design, Make, Test) innovation cycle. This implements workflows, simplifies how researchers are getting work done, and supports data access for decision support and data modeling (AI/ML).

This webinar covers:

  • The application of the above concepts to chemicals and materials R&D

  •  Hands-on demonstrations of the Dotmatics platform with examples of formulations, sample management, lab data automation, and data analytics

Welcome to the webinar. We will be talking about chemicals materials today, and specifically about strategies for data-driven R&D. I'm Max Peterson, and I'm the AVP of Chemicals & Materials Marketing. My role within Dotmatics is twofold. I work with customers and prospects to understand their requirements, and I also work with our R&D team to define the strategy for the chemicals and materials market and specifically am responsible for responsible for the formulations capabilities within our platform.

So what we want to accomplish today is to give a very brief introduction into Dotmatics for those who don't know us yet. Then I will talk about what data-driven R&D is and why it's important. And then I'm going to talk about how a platform approach is really needed to implement data with R&D. I'm going to talk about our approach to implementing this platform, and then finish up with a couple of customer case studies.

So Dotmatics has as a goal to combine a scientific platform with best of breed applications to enable collaboration, automation, and insights. So I already mentioned how the platform is really essential for data-driven R&D. I’m not going to talk so much about applications, but ultimately, what they do is to basically tie everything together. Dotmatics is now a company that serves over 2 million scientists, has over 10 thousand customers, and is represented in a lot of large pharma companies. One in five scientists use our solution. And here while there's an impressive list of pharma companies, and today we are talking about chemicals materials companies.

So with that, I'm going to talk about the general philosophy of our platform approach. So the idea is that in a lab or in an R&D organization, fundamentally we will have this decide, make, test cycle that basically requires different capabilities to be supported when we are covering this entire innovation cycle. Now, within the decide phase, we need to be able to predict new properties. We need to be able to visualize, reside and model data. So this is very much at the core of what data within R&D aims at, but in order to support this, we really need to be able to capture information in that where systems like ELN systems come into play. And then we need to carry out the test where we are needing to analyze data and acquire information from instruments. And then in order to apply this to different disciplines, we need to lay on top of that capabilities like biology, chemistry formulations, and data management to basically be able to have the right context for the scientists.

Now carrying onto the topic of data-driven R&D, when you're thinking about what your digitalization objectives are around this is basically things that come into mind. And there is of course, at the lowest level individual productivity. And if you think back at the disruptions that the pandemic brought us, there were questions like, “How do I support a remote or hybrid workforce?” “How can I get my work done?” “How can I remotely operate a lab infrastructure?” “How can I use this very expensive infrastructure when I'm away from my desk?” So the actualization certainly plays a very strong role in supporting that. And the amount of data that is basically being produced and entered is of course, is the largest. And the ‘test’ area is where most of the activity actually happens.

So it's really important to be able to provide a streamlined capability around that: to basically be able to access this data, capture it in a high quality manner, but then as the next level, we have basically things like operational excellence. How can I make sure that this remote or high workforce can effectively collaborate? One thing we saw is that there were huge disruptions and can we now provide a framework for being able to operate in that environment? And we had a very interesting presentation at our last user group meeting from Valenti Woodcraft from DuPont, who really talked about how their decision to work with thematics was really driven by this need for having a framework where they can carry on innovating throughout the situation where they could not work the way they were used to.

And then there is this question about data automation. How can I get decision-critical data faster to scientists and project teams? And then on the top of that, you have a desire to use the data in a way that accelerates innovation. And that's really where data-driven R&D comes into play. And if you think about our sustainability crisis, right? In California, we were just in a heat wave. We have droughts. And I cannot think of anything more pressing than finding solutions that will address the crisis that we are steering towards. And yeah, innovate faster, make more with what you already have, disability driving forces behind data-driven R&D. The issue is that the desire is to innovate at the device level.

If I'm thinking about the changes that come, or the changes for tires, for example, in the context of mobility are quite drastic. But of course, nobody can operate at that level just innovating on tires and testing them on the road. There's a whole sequence of abstractions of the problem that can go all the way down to the lever. But typically we find many more intermediates. Yeah. We have feedstocks or representative units that can represent the certain characteristics of the materials that we are innovating on, and then we can carry our tests against them. And really when it comes down to it, and this is kind of a quote here from Michael Swartz, our SVP of of strategy is that really 80% of decision support is based on connecting structural properties to test data.

And this is really at the core of what an informatics system has to provide, but it's of course changing because we have so many different data types. And as a consequence 80% of this R&D data is actually stuck for us to see a life sciences centric view on it. But you could draw the same picture for chemicals, materials. We have many different data sort repositories and this is illustrated here for, again, a tire typical tire R&D team. Yeah. That has advanced materials groups, they have compounding groups, they have raw materials groups, they have planned engineering, and they all interact with different types of labs that provide different kinds of information that need to be matched up to the properties of the materials that they're trying to innovate on. Yeah. And really the first thing that needs to be solved is solving this data interoperability challenge.

Now, what really compounds the problem is that the way how we are doing research has fundamentally changed in the recent years with the advent of data modeling capabilities, right? And the old days, so to say, there was actually a pretty simple data life cycle that we needed to support. There was data generation that this was, means that there was some kind of an instrument or a measurement that would create output data. And there would typically be transient, right? An instrument would ride a fire and then a technician or a scientist would come along, make sure that the instrument ran properly, and then process the data and extract some kind of a feature or a signal. Yeah. And at that point maybe you would start thinking about saving some of this data, Yeah. That you say, Where I extracted these signals from my tests, let me put that into an ELN or into lab notebook.

But certainly I'm not gonna store all these instrument files, but cause what, how am I gonna actually use them five years from now, right? So data sharing may be happened at the point where I'm creating the data and then sending them to a scientist. But data retention actually happened much later. And there may be some data automation for doing this verification and processing exercise, right? And then at the end, I would take these signs or results, I would interpret them, I would compare them and draw some conclusions out of them, right? And I would again retain that data, and then I would write a report and then design a new set of experiments and then do this all over again. Now, what happened today is that basically I'm having a lot of these loops and I actually have data management happen at the point of data generation, because fundamentally what data modeling requires is that I don't want to make a, a priority determination of what data is important.

Maybe there's another signal that is relevant. I want to revisit the data. So basically there's now a requirement of doing data retention much early in the data life cycle, right? I may have to do data reductions and data analysis in a much more circular and much more intertwined system, right? And that, with that, we have a lot of new challenges that coming onto it Now anything from i a t environments to bias risk, lack of standardization context tracking, metadata, enrichment retention policies, fragmentation of data silos. So the data landscape has actually become increasingly complex and requires a much more holistic way of looking at the data. And that's kind of where the importance of the platform approach comes into play. If I'm looking at a traditional landscape of systems that store scientific data, I can divide them, first of all, into two big categories, once that are concerned with the data generation and ones that are concerned with data consumption.

Within data generation. We have equipment samples and experiments, Yeah. And equipment where we, for example, interfacing with an SDMs system or for chromatography system data we have, of course, cps. And then in the next step, I need to associate the equipment information with samples that is typically done with the limb system. We have SAP management, assay management, maybe more upstream in life sciences, but by in general, there are a separate system now coming into play to connect them. And then when it comes to experiments, that's kind of the domain of electronic lab notebooks. But of course, in more structured environments like manufacturing, I may just go and use the same limb system, but then have request management systems, lab execution systems and manufacturing execution systems to basically help me carry out the experimental workload. Now, all of this information has to bubble up into the data consumption systems like statistics, fundamental sciences, system level modeling or dashboards or systems that are really about the decision making process, data visualization project or portfolio management.

So really the challenges that we have two problems here. Now, the four, the first problem, small problem is how do I connect all the lab data to all the experiment data? And then how do I elevate all of that information into the data consumption layer there? Now, the reality in many chemicals and materials applications is that not everything fits at neat as neatly into these categories. And if we are taking here an example of a flavor and frequencies application, we would have typical data silos, which may or may not be kind of a mainstream application like formulations or analytical tests and ingredients, but there would be very specialized data silos as well, like flavor and taste profiles, food safety data, and fear testing and panel results. Now when we are mapping them onto workflows, we would see that in order to basically solve the workflow challenge, for example, in formula development, men may need access to many of these data silos to basically make a design of experiment or to decide what to do next the same.

And for testing, I may need food safety data. I may need information on the ingredients. So again, I need to strategize many of these data silos, and the same is true for QA, QC. And then on the data analytics side we have problems around scent modeling or taste modeling where may need obviously information of what's in there and connected to flavor profiles. But I may also want to consume another test to do my modeling. So, again, a pretty complex data interoperability challenge. And when you're thinking that through, you logically arrive at a point where you need to implement a platform approach.

And now I want to show you that how we doing that in our software. And I want to discuss our approach to implementing a platform that is useful for data driven research. And instead of continuing with the PowerPoint presentation, I'm actually going to switch directly to the software.

We're now in the Dotmatics ELN, and you can see that there are a series of formulations experiments. Want to open one up. You see that there is metadata associated with it. And this is really important if you wanted to, for example, link this to a customer request or any other external project management data. So what we really want to focus here is on what are some of the integration points and the platform capabilities that really exchange information between different systems and allow you to centralize information. So here is a formulation. I'm actually showing a Tableting example, and you can see that there's a list of ingredients. I could choose other ingredients from a database, more about that later. Once these are chosen, they can assume different roads, they can be filler material, there can be coats, they can be non-adherence, and so on and so forth.

I can actually specify this year. And this is basically controlled by a dictionary that is stored somewhere. And basically this is not where story ends. Of course, I have more ingredient properties that are really relevant when I'm defining these experiments. So I can open this tab here where I can take a deeper look at the ingredient properties that define my formulation. I can see here that there are actually a list of the ingredients for the formulations that we are specifying. And then there's more information associated with them. And these all come from different data repositories that are basically now exposed to the platform and can be consumed here. I can also see where these ingredients actually left within the inventory. So I have dark code information, I have container information, and so on and so forth. I can see whether the ingredients are expired or not.

So of course now I'm operating against the Dotmatics inventory system, but it is not a necessary or necessarily a typical situation. There may be other inventory locations, most famously there could reside in your E R P. But let's just take a look at the information that sits in the dogmatic inventory. I have here the different bins. And if you may have noticed actually all our exceptions are coming from this bin, number two, then I can see all the informations listed. What's available? Is it checked in or is it checked out, and so on and so forth. So this allows me to basically tie these two systems together. And since there are connected via the platform, the x information exchange is seamless. So I get this view not only on the inventory locations, but also on more detailed properties of these ingredients.

And they're actually stored in a registration system that allows us to define these entities and gift and properties. And this is actually done in our biological registration system. And we are using this here really in the spirit of a generalized entity registration system where I have defined equipments, exceptions, and analytical solutions. So if you take here a list at the exceptions, I can open those up, and then I can see all the properties that are actually reflected in the ELN defined here. And this is something that the user can do in a safe surface fashion. There's no dark magic here involved. This is basically at a, of course, there need to be some control for data quality purposes. But basically this is no programming necessary. It's actually done via a CSV type interface where it can expose everything.

So now the same is true for sample tracking. Each of these formulations here can be associated with samples. And this is a one to many relationship. I can create arbitrarily many samples associated with each of these formulations, and I can then show them in this list of samples, and I can also see where they're located. I can add them to the inventory, and then of course, I can and this is again, operating against the inventory system. And then I can connect the samples up to our request management system. And this basically now allows me to farm or work out to my colleagues. And basically now create a collaborative framework where I can shop work back and forth. So for example, if I wanted to take the sample and uncheck the other ones, and if I want to place a request here, I could say that I would want to do a H P C analysis, and then I can transfer this into my a queue to the take care team. And then when I submit this request, now I have basically sent out a request to, to the other team. And then if I refresh this page here, let's just go back and to the experiment. And then the test requests, I can now see that I have a new request in the queue.

So the point here is that I'm actually straddling various systems. Of course, I am in the year end, which will keep track of my experiments to metadata. I may connect those into other systems that may track project information. But even in this very simple system, I have tied into the inventory system, I've tied into the request management system and into the registration system that defines the properties of my exhibitions. And in real life in many of our customer installations, these are actually third party systems. But since there are, the data is ingested into the platform, the operational framework is exactly the same.

So this is all I wanted to say about the capabilities of our approach. Happy to discuss more. I hope that gave you a quick flavor. And now let's talk about some customer store. So how do our customers benefit from the Dotmatics platform? Where first let's take a look at two press releases from last year that really talk about how data driven research is a major objective for these organizations. So first, let's look at chemo. They state in their press release that their objective is to ensure digital continuity of all experimental data, and that that will allow them for faster innovation of chemicals, materials driven by statistics calculation and modeling and artificial intelligence. So a very clear goal that they found in mathematics to write partner to basically find a framework how they can basically capture experimental data and then make them use for, for data driven research to accelerate innovation. And then Kda has a very similar objective, namely that they also want to accelerate innovation delivery and ends customer collaborations and generate growth by helping kda to become data driven that will help them to innovate and move the company. What's data mining and providing a foundation for artificial intelligence. So again, a recognition that if you are serious about implementing data modeling approaches, you need to get your data under control fast.

Now let's talk a moment about CLA. And CLA is an interesting case study because they basically came to dogmatic and said, We have apart from a few limb system, really no digital infrastructure for our R&D teams. And they said that they were basically cover all their major business areas, which is care chemicals, catalysis, and natural resources, and to implement what they call the media break free environment, which basically means that a researcher in CLA will work within this environment and basically never do any data recording or administrative work outside of the platform. So this is a pretty large scale project encompasses over 900 users. And we should mention that this is not a simple role out of a, a productivity tool, but this is really something that fundamentally changes how business is conducted.

And you can see this from a rather lengthy project time and also the number of in employees that we are engaged within this project. But as a result CLA and, and Dotmatics basically now work in have created a very strong partnership. And we are covering of course platform strategy integrating of key productivity systems and facilitating collaboration and knowledge management. We covered the lab digitalization needs that include sample management, request management, and general data management. Some of the stuff that we have already seen in this very brief demo, DSSD formulation development. And that's where we have worked very closely together to build out these capabilities. And then, of course, in other areas, we are also leveraging the chemistry and biology capabilities within the platform.

Family is another very interesting one. During one of our users symposia Marco Pai actually gave a presentation that really was talking about how Dotmatics helped fer to become more data driven. And as you may know, fer is the world's largest privately owned flavors taste company. And really what the Dotmatics project was about to not start from scratch like Clarion, but to address very specific legacy IT shortcomings in terms of data accessibility, the ability to configure roles and workflows was really important to them. And also the fact that they could move everything into the cloud and find a partner with collaborative spirit. So in this presentation, he talked how using Dotmatics they could increase data utilization and reduce workflow complexity.

Now another very interesting use case was with BASF a sciences in a project that was called Data to Value. And there is a very detailed use case document that you can access on our website. But the, the gist of it is that this project actually ran in three phases. And this is actually pretty typical about how these projects evolve, that they start out with really looking at very specific gaps in capabilities that existed. And that was, in this case scientific data virtualization, right? They had a lot of data sitting in a bespoke data warehouse, and they were just using Excel to visualize the data. So we came in with a data visualization capability that could deal with their chemistry and also with their scale of data that they were looking at. So needed a tool to deal with huge amounts of data. Next, we basically started building a more flexible query framework that set on top of that bespoke data warehouse. And that then allowed the scientist to take more information out of these warehouses and really getting full access to the data. And then in phase three, we started with a replacement of a legacy ELN which basically now took them full circle that they basically now can input the data using thematic into a system that then later on provides them with the data access.

So what were some of the key learnings for him? We, this was about really covering the design server that we mentioned earlier on. And the recognition is that projects rarely start from scratch. And that this specific data to value project actually ran in reverse. We basically started with the end point and ended with the starting point, which was actually the data capture. And the other point is that you need to involve end users now that you need to create ownership approval is change management, and you have to take care of the end users, make sure that something in for them. And as mentioned in this case study, maybe the three qualities of a good R&D informatics platform are flexibility. That is basically a system that will be under continuous improvement, and the system should be adaptable so that if research becomes more refined, the system can keep a pace with it performance. It needs to be able to serve up the information as fast as possible to the scientists, and it should be have good usability. The system needs to be intuitive and should support research workflows and make the life easier for the scientists. So this is almost a summary of the presentation that I wanted to give, but of course, here are some take home messages.

I just want to reiterate that data driven r r d really aims at accelerating innovation, which requires to solve the data interoperability challenge. And this can only be done with an open platform approach. And it's really key for chemicals materials because of the incredible diversity and variability of the scientific workflows and data that we are dealing with. And that I've hopefully shown you a glimpse of how our customers are successfully implementing their data driven R&D strategies using our platform. So thank you very much again. We are happy to answer questions in a written form. Please submit those via the platform and we will get back to you and hopefully continue the conversation at a later stage. So thank you again for your time. It was a pleasure.