What is a Scientifically Aware Program?

By Tom Oldfield | Tuesday, August 7, 2018 - 00:31 UTC
Sequence

Vortex and other Dotmatics software is often described as scientifically aware – but what does this really mean in practice to the user.  Let’s consider the Life-science domain where we have biological data such as sequences, activities, expression sites, DNA vectors and so on.

Drawing the data

At its most simple, a scientifically aware program must be able to draw common life-science data in ways that a scientist recognises.  In the case of the biological sequence – this is a movable coloured representation of the polymer structure; with chemical/biological structures, this must be a rotatable 3D rendered image.  

Clearly, the drawing of scientific data requires that we can show all the different variations that users require to understand what they are looking at. This is about bringing the software to the user and not forcing the user into something they don’t recognise.

The meaning of “like”

A critical component to handling scientific data is the ability to sort the data in order, or search for similarity within the data; or cluster the data into groups.  The key feature of this is the definition of “like”.  In life-science biological sequence data we need to know when one sequence is like another and, most important, a numerical value that indicates the similarity.  Once we have this metric we can define the sort order from most like to least like – or cluster in groups of similar sequence.  Of course, in life-science the like problem is a NP-hard computation problem so in Dotmatics we provide 5 different algorithms that are optimised to solve different problems regarding likeness.   The meaning of like opens a vast range of possibility of data analysis for the user, it is most critical we take advantage of this using science aware functionality built on “like”.

Analytics

The final critical component of scientifically aware software, is the ability to immediately use life-science data within key analysis methods without effort on the user’s behalf.  This means the user can run tasks that immediately transform the scientific data for the analysis with a single action.  This is taken to extreme when this data can be directly used within predictive methods such as Bayes classification/probability methods found in Vortex.  In this way, it allows the user to back predict activity data based on sequence mutations, or cluster activity based on mutation types and position.

Comments