Computational software design has the goal of designing algorithms that approach 1st order, that is where the time of calculation is directly proportional to the size/complexity of data. Interestingly, within the graphical domain we can do better than this; how is this possible?
A human can only understand a limited amount of data at any one time, so any graphical representation must be a window of a large amount of data (a scrolling table for example), or a reduced representation of the data (a pictogram of DNA expressed regions for example). We immediately see that the limiting point in graphics is drawing one screen of information in a way that a human can interpret. Additional information in the graphical view provides no further interpretive power for the user and does not allow greater knowledge discovery. The goal of a graphical problem can therefore be zero order – since we need to provide sufficient information to the user to see this information. Any data outside this view does not need to be drawn, and any detail not pertinent to knowledge discovery also does not need to be drawn. Of course, we can include scrolling and progressive disclosure for the user – but it does not change the fundamental implementation of the solution.
The typical measure of graphical performance is “frame-rate”, or fps. Any frame rate above 25fps is considered usable, and any frame rate above 50fps appears smooth for all users and is the preferred target. Stuttering can additionally occur where memory management within the computer does not keep up with the data throughput and will occur where the data size exceeds the physical memory of the computer.
Vortex has been designed from the ground up to be big-data ready. If we take a life-science example of a small table (11 rows) of short amino-acid sequences (150 residues) with 4 other columns of textual meta data as a bench mark, then this table is rendered at around 70 fps . This is on a business Dell Laptop without a graphics card.
If we now take the entire human Genome download from a public resource (Genbank) divided into chromosomes and various important fragments of data, we have a table of just 557 rows. The sequence data cells contain nucleic-acid sequences of up to 250,000,000 bases with a total data size of over 3.5Gbytes. In this case the frame rate is still 70fps – we see no performance loss. If we take a table of 970,000 rows of nucleic sequences of length around 400 nucleotide bases with columns of meta data, we again see a frame rate of 70fps when we scroll and manipulate the sequence data in any way.
The point of this analysis is that the frame rate measure indicates that Vortex is close to zero order based on both data rows and data complexity (sequence length). This remains the case until we get beyond the physical memory available; at this point the frame rate is maintained but very bad stutters arise due to memory paging to disk. This is a separate problem – where data is too big before reduction to be used with a computer of a certain size; the handling of data efficiently within a computer program (ie Vortex) will be discussed in a forthcoming blog article. We can therefore see that the high-performance program Vortex has essentially reached the desired goal for big-data computing – a zero order graphical program.