Enabling the Connected Digital Lab
Welcome, folks. My name is Adam Crown. I lead industry marketing for health care and life sciences here at Databricks, and really appreciate you being on today's webinar, smarter r and d with AI powered insights. Today's topic is around smarter research and development, around AI powered insights. You're gonna hear a lot about data foundation. You'll hear a lot about conversational analytics. We've got some incredible thought leaders. So a few things that you'll be able to take away as you hear from today's incredible presenters, especially around data in the life sciences industry. So first and foremost, we know that data silos are presenting a challenge in accelerating research and development insights from fragmented systems to teams that are disconnected and not able to actually operate on the same data, that's preventing data analytics and AI from working correctly. So you'll hear about how you can potentially fix the data foundation, and getting it right first is gonna help enable everything from business intelligence to artificial intelligence all the way through to AgenTek AI workloads. We know in our industry, there's a tremendous patent cliff that's happening, and that creates a lot of urgency for every single organization to use more potential of their data, and we'll help you figure out how to unlock that. And lastly, a big piece of today's conversation will be around conversational analytics. So, traditionally, data has been made available into the hands of technical folks, But more and more, conversations are happening with your data, and Ryan Bernhardt, who's the vice president and general manager from an organization, at Dotmatics, is going to ultimately take us through how they're using Databricks today to build on top of the life sciences industry and help accelerate workflows for our customers. Thank you very much, Michael, and I wanna thank, Mark and Adam as well. And a and a big thank you to the Databricks team for the invitation to, participate today and get to share a little bit about our story at Dotmatics. I'm Ryan Bernhardt. I am the general manager of one of our application companies called Versidian where we do automated chromatogram analysis to provide actionable intelligence for scientific workflows, and I'm also one of the the leaders of our Luma platform, which today I'll be discussing. Luma is actually built on Databricks, and I'm gonna be talking about enabling the connected digital lab together with Databricks and how we can can really orchestrate all aspects of the lab. So, as we get started here, I I wanna first, kinda set the stage for the challenge or or the the the the problem statement that organizations all over the world in life sciences are are trying to solve. And and you've already heard a little bit about this from from Mark and and and and Michael, in regards to the journey from molecule to market and getting scientific innovation to to consumers takes a really long time and costs a lot of money. So if we if we take medicine, for example, on average, it takes over ten years and two point two billion dollars to get a medicine to to the market. And so as as as part of being able to to solve this challenge, we we've really focused on the the the fact that going from the design of a molecule all the way through manufacturing is is hindered by the fact that we have a lot of fragmented systems, siloed data, disparate tools, data tools that are being used. There's certainly challenges in in being able to transfer technology from a very small scale or batch scale up to a, a a scale that would be able to be manufactured, in the quantities and locations that that the patients would would need that. And so, for those that that don't know, Dotmatics was actually acquired in the summer of twenty twenty five by Siemens, to really, build out this the the the entirety of of, this portfolio in terms of presence, not only in the development and manufacturing space, but but dotmatics provides capabilities in in the research and discovery laboratory space to really complement that that very large portfolio of of Siemens. And what we've seen over time is that there's been a lot of focus in the manufacturing space to to really take things digital and and move from historical and and more antiquated systems to a much more digitally connected environment. However, r and d has has has largely lagged behind in that, and so there's been challenges still experienced from the the fragmented systems, disparate data, siloed expertise. And so our the the Siemens mission has really been about enabling organizations around the world to to to be able to accelerate the path from molecule to market. And the way that we we see this happening is through creating this end to end digitalization going from the design of a molecule all the way through the manufacturing that at scale. And and to be able to do this, we we must be able to create a digital thread from the beginning to the end that allows for, efficiency gains for seamless technology transfer and and really be able to, streamline or shorten the time it takes to get a a a innovation or a medicine to market. And so if we focus the remainder of of the time today really on the the I r and d space or what the discovery space, many of you joining today may may resonate with the the image on the screen here to the right where scientific discovery is is scaling faster than its digital foundation. And so over on the right, what you see is a lot of the tools and instruments and and capabilities that are found in life science organizations in the laboratory environment, such as electronic notebooks, laboratory information systems, SOPs, experimental protocols. There's instrumentation, dashboarding tools. But but most of these have, historically been, siloed, isolated, fragmented. You know, a liquid handler that's running samples doesn't know anything about the data that's being produced on the LCMS that's sitting next to it. And so what it's led to is is really a an overall inefficient process of of scientific innovation. And so if we're serious about streamlining scientific innovation, we really need a way to manage every aspect of the way or better yet, orchestrate it. And so when we refer to orchestration, really what we're talking about is being able to bring together the physical, the digital, and the logical aspects of the laboratory environment. And so over on the left hand side of the screen, you're seeing, you're seeing some of these aspects in terms of the resources that are that are really used as starting points, such as the the people, scientists, the tools, instruments, and and and laboratory, instrumentation. And we have SOPs. We have business rules. We have experimental protocols. We have incoming data that that may be, about the samples or or, rate reagents and materials. We certainly have the different chemicals and and, cells and labware and and materials that'll be being used. All of those really come together as part of that experimental process. And then out of that experimental process is the generation of results. And these results may be in the form of a blockbuster molecule, but they could also be results that allow us to to scale innovation or reduce cost because we were able to miniaturize a a experiment in in the amount of materials that we're we're using. It may it could allow us to embed the, experimental protocol in in automation where we can now deploy this, seamlessly at other locations around the world in a reproducible way, or it could be that we're able to capture and harmonize all of this data together in a way where it's ready to apply, artificial intelligence and machine learning. And that's not only the successful experimental experimental data, but also the the the failed experimental data, which becomes so beneficial, as we iterate upon this this scientific process. And and although we we think of an experimental protocol as being linear, it's, science is anything but linear. It's actually an iterative process where we're looking to, to test the hypothesis and then gather data and and iterate upon that such that we can we can optimize or change course for the next cycle. And this brings me really to the the Dotmatics story or the portfolio. Over the last decade or so, Dotmatics has acquired, or built a number of industry leading scientific capabilities from, the likes of, GraphPad Prism, which is the most ubiquitous used analytical, package in the laboratory environment or or or Dotmatics electronic notebook or, technology like Versidian, which is for automated chromatogram analysis or our omic platform for for flow cytometry applications. And and labs all over the world have have relied on these, very, deep and rich scientific, capabilities for for their own innovation, and and research and development. But over time, there was, there was this this need to, to bring these capabilities in a meaningful way in such a way where where, you know, one plus one equals three. And out of that was born the our Luma platform. And our Luma platform is a scientific intelligence platform built on Databricks that that allows us to connect, all of these different scientific capabilities through through, APIs, in a way where we can we can really leverage, the benefits of connectivity. And and Luma is is not only a connectivity hub, but it's really the the heart of of the orchestrated solution where we're not only connecting our own applications in the dotmatics portfolio, but we can now connect applications that are in the larger Siemens portfolio such as GPROMs shown on the the outside of the wheel here, which is a a process modeling digital twin, application that really allows us to enable this wet lab, dry lab paradigm. In addition to being a connectivity hub, Luma, being built on Databricks, incorporates the the lake house, capabilities that that allows us to, capture, harmonize, and and really store, all of this data together, in a in a in a usable meaningful way. But maybe more importantly than than showcasing the the ability to connect our own applications, it's really this this plus on the bottom, of of the wheel here, which is all about be being able to offer an open and extensible ecosystem, that allows us to connect the the tools and the data systems that your lab is relying on on a daily basis to really drive scientific innovation. And so our our Luma platform built on Databricks is, built in in a in a way that allows us, to connect a variety of different data tools and systems in as Luma applications. And and these applications follow an SDLC like process for drafting, committing, and publishing configurations for use so that data is highly structured with auditable changes, ensuring traceability and and scalability in those governance models. And so you you can essentially create Luma applications for for any data systems and tools that you're using, and that's a core function of the Luma platform is that it it it must be open and extensible. And you can think of Luma as as a bit of an operating system for science. And and like operating systems out there, Luma has many core processing capabilities as part of that. So one of the things that Luma does is it it enables material registration. It allows you to embed, data management rules and business rules that are that are really gonna guide the the science that's happening in your laboratory environment. Luma has the ability to create adaptive scientific workflows or digitalize scientific workflows. Luma is built with an AI powered foundation. So we have AI already in place from a a large language model, and I'll talk a little bit later more about that in terms of the AgenTek AI capabilities that we offer as part of the Luma platform as well. And then we're utilizing the capabilities to be able to create statistical analysis and and charts and and graphs as to provide that business intelligence later layer, but we also have the ability to create graphical user experiences and and visualizations that a scientist would be interacting with. And in addition to being able to connect to a variety of of these data tools and systems out there that that your your lab is relying on, We also have the ability to make connections down to, stand alone instruments, integrated work cells that may be, controlled by scheduling softwares. We can integrate, automated transportation devices such as mobile robots and track systems, conveyors. And we have the ability to also be able to connect to, archival, file types or or files where maybe historical data has been unaccessible, it's been locked away for years, we have the ability to extract that, transform it, and harmonize it, that that previously, an accessible data in a manner with, the data that is being generated today in more of a modern format. And so if we take a look at, you know, one use case that we're where we're implementing this with a large pharma customer, In this case, that we have connected over three thousand stand alone instruments across their their research campus. And on a daily basis, we're capturing over five terabytes of data, which equates to over five hundred thousand files per day. And and being built on Databricks, we can handle billions of scientific data points, on a daily basis that we're processing, through this. I also mentioned we have the ability to to integrate with integrated work cells, and so we can seamlessly integrate one or many different, scheduling softwares and data systems together as part of Luma to really enable that end to end scientific workflow, through our our rest APIs. And and so we have really four common data ingress patterns. I've I've touched a little bit on in terms of our ability to to bring data in through, file based. We also have, a REST API where we can make, direct connections. We can use a date database connections or SQL or Oracle or other database formats. And for those organizations that, are also using Databricks, we can, enable the Databricks delta sharing, mode of of transfer. And so if we die dive a little bit into just looking more at, you know, building the foundation of of orchestration in your laboratory environment, I'll go into a little more detail in terms of, you know, what what it looks like from from really this file, this ingress, framework. So you may have a series of instrument, laboratory instrumentation that's that's capturing data, generating data. You have your data tools such as an ELN, a LIMS, maybe statistical analysis or or data analysis, data reduction tools. You may have some sort of third database. And then and then maybe you're you're also interfacing with CROs or or other third party organizations for your research. We have the ability then to to capture that data through files where that makes sense. We can make those connections through APIs. We can do direct database connections, or we can do the Delta sharing to other organizations utilizing Databricks. And then all of that data comes together as part of the the Databricks tech stack, and where it stores, governs, and and and activates this data. And then really at the at at the at the layer closest to to the the user is the is that Luma experience. And and Luma has been, it provides a a highly scientific, and very rich, rich domain knowledge of, of science and and the and the r and d environment, on top of that Databricks infrastructure so that we can really gear things, for the experience that scientists are needing and counting on. And so this allows us to bring together the data in a meaningful way where it's now ready to apply AI. We can do visualizations, decision making, and we can also then, pass that information on for, say, automated workflows or or even take it take it out to an external environment. And so what does this actually look like then within Luma? So this is an example of a synthesis and purification scientific workflow where what we've have is is scientific data coming off of LCMSs. So we're we've generated chromatograms of the reaction mixture. You can see those peaks are correlating with a particular, compound. So we have our compound of interest that's there, but we also have other peaks present where we may it may be, impurities. It could be reactants or that haven't gone all the way to completion. And so we have the ability, to to see those and then, also provide information as to what, what compound that is, what impurity that is. We can quantify that. We can we can make decisions on what is the best mode of of purification or or or running a further analysis. And then that this data is now, coming together with with several other systems in order to to be able to realize this. Scientific data also has very specific, or unique needs in that r and d environment. And so the the data within Luma and the experience in Luma has been geared specifically for that laboratory environment and for those scientists. And that's very, very important in in that context. But Luma also being built on Databricks has the ability for us to to then make that available that data available to other corporate data lakehouse applications such that we can now pair scientific data with finance data or business intelligence or or other, you know, asset management types of data as well that would be needed more at the corporate level. I wanna spend a couple minutes talking a little more about the artificial intelligence that I mentioned was was really at the foundation of Luma. We have we have built Luma as a platform with both AI and automation at the forefront of of where where we're headed in the future, but also where scientific organizations are headed in the future. And so our AI in Luma is really built for scientific scrutiny. And so we use deep learning models to provide traceable, explainable, and verifiable outcomes such as and what would be expected and counted on within the laboratory environment into that very analytical scientific setting. In in addition to being able to get the most out of AI, AI does a really good job with unstructured data. We probably all see that in our personal life and and leveraging chat GPT and and other things. But but AI is even more impactful when we when it can utilize structured data. And so, we we believe that that that the AI, the power of AI really is all about the data. And so Luma, being built on on, the Databricks infrastructure allows us now a place to harmonize all of that data together in a structured fair format where now it's ready to to apply AI with that very deep scientific context. And so we can use AI in in an assistant manner, a domain specific. We can leverage composite AI applications. And in addition to having AI at the foundation, we also have a number of, of Luma AI agents that are available, for acting as that that coscientist, to scientist in the laboratory. And so Luma is a platform that's tailored to, specific scientific needs where where we start at being able to to digitalize a workflow. We can apply, specific experimental conditions and business rules. We have complete access, to, the lineage, of the materials and samples that are that are that are part of this, including being able to to do the registration of those and track those all the way through. We can bring together historical data as that's happened as a precursor to this as well as then being able to capture the chem informatics, the bioinformatics, and finally, being able to, leverage AI, on any and all parts of this, this this scientific, workflow. And so I wanna spend the the remainder of the time just showing a couple, Luma use cases that are leveraging data that's leveraging the Databricks, tech stack to for and one example is will be based upon a chemistry application, and and the other one is really focused more on bi biology. But we believe that that the the path forward is to streamline scientific innovation is really all about creating and leveraging kind of the lab and the loop model and the design, make, test, analyze cycle as part of that. And so Luma allows us to bring the design, make, test, analyze cycle to life, but it also allows us to bring AI into that entire cycle. So as part of this and I mentioned that that the scientific process is iterative, where we're really looking to rapidly go from hypothesis to result to being able to decide what to do next. And we we have the ability to then bring AI into that where it can happen autonomously while tracking how it came to that conclusion. But you also can can use a human in the loop to be able to, to assess that and make the decision, or you can put or you can go as far as keeping human in the loop, to to facilitate that as well. And so I'm gonna, show a chemistry application that allows us to do design, make, test, analyze. And this is our Cynthia Luma app application, and this is for the design and and the synthesis of compounds of interest. So these this is a small molecule compound that a chemist has drawn out. And then we're gonna submit that to the system where and now Luma is gonna go out and look at all of the different possible pathways that could be used, synthetic pathways that could be used to make that compound of interest. It'll it's also going to, determine the best pathway by the number of step the the the number of reaction steps that would be required. But it's also looking at things like, can I get the building blocks and reagents that would be needed to to run those reactions? And so it's going it's looking out at commercially available chem chemical suppliers, such as Sigma Aldrich and others. It's also looking at what does this particular scientific organization have available, within the walls of their own, research campus. It's looking at the cost, of getting those, the lead time. And then, also, it's mapping out the the synthetic pathways or workflows that would be required. And then we have the ability to drill down into those workflows to look specifically at each step in the reaction, what intermediate chem chemicals are being created. We can look at, you know, what the predictive amounts would be and the and the purity of that, and then we can simply create a standard SOP in a natural language format that chemist could could follow to to, realize that or execute that experiment. But now we also have that available to seamlessly pass into the ELN or to create a digital workflow that can be passed to an automated environment. And once we've decided which pathway we'd like to use, we can simply hit start experiment, and that really spawns into going to order that the the the necessary building blocks and reagents, sending the workflow into the ELN, and so on. Another application, which is really focused on more of the biology side, is is our Bioglyph LUMA, which is allows us to model and create proteins and antibodies. It allows us to to be able to construct these based upon a series of of different building blocks and and and construct formats. And we have the ability to register these these different constructs as well as we have we also have the ability to sequence those. We can do in silico reactions, so we can run predictive reactions on on those different antibody designs. But then we and then once we've screened those out in in an in silico format, we can then push the the the ones of interest down into an in situ environment where they're being made, in the actual laboratory, in in kind of that wet lab environment. And then we have the ability to then connect that data back, to various systems as well, both as part of the the larger dotmatics portfolio, but also other capabilities out there. And so what we have in Luma that is, again, built on this data bricks infrastructure is one orchestrated system for science that enables going from experiment all the way to insight. Luma supports a diverse app diverse number of applications. It's a multimodal platforms, and it provides foundational tools that orchestrate, discovery. Luma is also an enterprise software that allows, labs of all sizes, to experience the the benefits of this. So this could be used by thousands of scientists in multiple locations around the globe, or it could be used by three scientists in a single location, somewhere in a laboratory, in such a way where, large data can be brought together, for leveraging AI. And then Luma also allows us to to harmonize, harmonize all of that data in a structured fairway, which which provides, the scientific context and really empowers, the use of of AI, for that scientific innovation. And as as we, look look back at where we started in terms of, you know, how are how is Siemens and Dotmatics really enabling organizations around the world to accelerate, getting a molecule to market, It's really all about creating the digital thread, via the Luma platform where we can go from the design design of a a molecule experiment all the way through, scale up and and and into, manufacturing in such a way where we can we can deliver molecules to the market quicker and more cost effective than ever before with the goal of improving and enhancing lives of of people all over the world. So with that, I wanna thank thank you for your attention. And and also, again, a a huge thank you to the Databricks team for for providing a a technology stack that makes this possible, but also for the opportunity to join this group today to share some real life use cases of of what can be built on top of it. Awesome. Ryan, thanks so much for presenting and giving us an overview of Luma and the use cases. We'll talk soon, and have a fantastic rest of your day, everybody. Take care. Bye now.
In this on-demand webinar, Dotmatics and Databricks walk through how the Luma platform, built on the Databricks data lakehouse, creates a connected, orchestrated digital lab from molecule design through experiment execution. Watch Ryan Bernhardt, VP and GM at Dotmatics, demonstrate how Luma harmonizes fragmented scientific data into one governed, AI-ready foundation for R&D.
Our Latest on Science & Industry
Simplify your path to discovery.
See Luma in action by requesting a demo today.



