One of the perks of my job is that I experience first-hand how organizations implement their digitalization strategies. Within chemicals and materials, there is an accelerated drive towards data-driven R&D, which in itself is a fascinating topic, but there is an argument to be made that this surge is because R&D digitalization has, or is perceived to have, reached a point where “big data” technologies become applicable. Big data technologies should be taken with a grain of salt, because if compared with “conventional” big data problems found in e.g., banking fraud or customer sentiment analysis, R&D data volumes are orders of magnitude smaller, and the main objective is fundamentally different: while conventional big data problems center around finding known behaviors in vast volumes of data, data-centric research aims to make assertions about unknown behaviors in sparse data sets. In simple terms, we are trying to extrapolate to desired properties outside of our training sets.
The need for high quality contextualized data
To do so, high quality data that maximizes the information content of the known parameter space is crucial. And while this sounds obvious, it is the crux of data-driven R&D. The core of the challenge is that it is not enough to capture the data in isolation, in addition we need to have the right level of contextualization or metadata associated with them. Put simply, we need to understand how data relates to a successful (or unsuccessful) experimental outcome. Or, in machine learning terms, we need to be able to supervise our learning algorithms: experimental complexity is just too high to make automated signal detection in these datasets viable. This is where domain knowledge also comes into play, meaning we (humans) often apply a transfer of knowledge to the algorithms, e.g., by connecting known, fundamental scientific principles to hard-to-predict problems.
The impact of a unified data-centric platform
How do Dotmatics customers tackle this challenge? First, our data-centric unified platform allows them to contextualize data, independently of whether they reside in 3rd party applications or are native to Dotmatics applications. Second, they can flexibly define workflows and roles that naturally contextualize data. We are lucky to have 3 recent examples illustrating how companies are using these capabilities to become more data-driven in R&D: The agrosciences division of BASF with their “Data 2 Value” initiative, and two digital transformations at Arkema and Croda.
BASF exemplifies how organizations can leverage existing data infrastructure to become more data-driven. They started by using Dotmatics data-visualization and analytics tools to deepen the insights they could obtain from their existing data extraction tools. Next, they implemented our data query infrastructure to ask questions they couldn’t before. And lastly, they replaced their legacy ELN capability to modernize how data was captured in the lab. Here is a link to the full case study that discusses this digital transformation in detail.
Intensification of R&D digitalization and data-driven R&D
Arkema and Croda are prime examples of organizations that accelerated their R&D digitalization journeys because they understood the potential of combining data-driven approaches with physical lab experimentation. While R&D digitalization had already been a focus for both organizations, an overarching platform strategy became key to supporting a unified view on all R&D data. Since working with Dotmatics, Arkema and Croda have issued press releases on these topics and have also spoken publicly about their efforts.