As artificial intelligence (AI) gains traction in drug discovery, many companies will feel compelled to increase their use of AI. According to Deloitte, more than 60% of biopharma and medtech companies surveyed spent over $20 million on AI programs in 2019; that amount is only expected to grow over time. But AI investment can be complicated. It often takes longer than expected to see returns because of the time it takes to train models. And many companies struggle to successfully implement AI within their organizations, often because of data challenges.
AI Preparation Precedes Success
Simply put, you can’t benefit from AI if you’re not ready for AI.
With artificial intelligence, data are key. In fact, Gartner reports a notable shift from a model-centric approach toward a more data-centric approach, which looks to improve outcomes through better data management, labeling and annotation, rather than through tweaking models. Therefore, an essential aspect of being AI-ready is having the infrastructure in place to efficiently to collect and use data.
Google as a Pioneer
Let’s consider Google as an example. They spent years investing in, and training, their AI before applying it within the tools so many of us use daily, such as maps, search, and YouTube. Google’s AI approach centers around the reciprocal nature of data and models—the notion that plentiful, good data are needed to create models, and in turn, those models are needed to derive the most possible value from that data. This notion of reciprocity even extends into the company’s utilization of the wider public to help build their AI; for example, they’ve used Google CrowdSource to publicly collect training data, while in turn providing back to the community open-source data sets (e.g., Google DataSet Search) and AI/ML software (e.g., TensorFlow).
At this year’s annual I/O Developers Conference, Google CEO Sundar Pichai detailed recent ways the company has applied AI, such as to map rural areas in Google Maps, summarize documents in Google Workspace, and improve natural language processing and speech recognition in Google Chat and Google Pixel. Notably, Pichai took time to acknowledge the importance of setting up for success, commenting, “The advances we’ve shared today are possible only because of our continued innovation in our [technical] infrastructure.”
Preparing for AI in Drug Discovery
As life science and small molecule drug discovery innovators seek to better leverage AI, they can learn a lot from Google and their commitment to investing in an infrastructure that supports large-scale data collection and model refinement.
Innovation with AI typically demands that companies manage their data and workflows differently than they have in the past. In a recent BioITWorld article, Dotmatics’ Science and Technology Specialist, Will Bowers, reviews best practices companies can adopt to automate data cleaning and AI pipelines for increased enterprise-wide adoption.
However, as detailed by Towards Data Science, legacy data and technology infrastructures typically cannot accommodate the level of integration and data fluidity needed for AI; instead, scalable, flexible data platforms are best suited to support AI. All too often, in our work with customers and prospects, we see incredibly innovative companies struggle with data management because of their technology infrastructure and workflow processes. The problem is quite pervasive. In fact, nearly 30% of the biopharma and medtech companies that Deloitte surveyed said data struggles negatively impact their AI initiatives. Specific pain points identified include poor-quality data and siloed data systems—two obstacles Dotmatics can help companies overcome.
Dotmatics Can Help You Get AI-Ready
Dotmatics can help life science and small molecule drug discovery companies get AI-ready by providing a unified scientific research-data management platform that:
enables the capture of clean and trustworthy AI-ready data, such as through automated instrument-data collection, database and application integration, and error-proof data entry via electronic laboratory notebooks (ELNs),
removes data silos by seamlessly integrating all the different data types that make up the experimental fabric, such as chemistry, biology, formulation, and physical characterization data, and
provisions the model quality data needed for machine learning by breaking away from proprietary data formats, automating QC and QA, and eliminating time-consuming and error-prone data wrangling.
Next Steps
Learn more about developments in small molecule discovery with the on-demand webinar, "Reduce Risk, Cost, and Time in the New Era of Small Molecule Drug Discovery."