Building computed fields in a biological database

phsource · on Oct 18, 2017

Interesting write-up on the additional work of almost implementing a spreadsheet in the back-end! I'm curious as to what kinds of functions exactly are these.

In the article, you mention that "We chose to compute asynchronously in case computation takes too long or runs into a fatal error" -- coming from Excel, how do you handle errors? And how long exactly do these formulas potentially take? (As someone without a lifesci background, it's hard to wrap my head around the concrete parts)

joshma · on Oct 19, 2017

(Posting on behalf of Somak)

Hi there, thanks for the feedback! The 2 examples in the write-up are: 1) Molecular Weight: weighted sum across amino acid sequences, using amino acid weights defined in [1] 2) List Concatenation: aggregate lists of added resistances (like ['Ampicillin', 'Kanamycin']) across itself and all ancestors

These are simple, and can be expressed as formulas in Excel or functions. They take < 0.5 s.

A more complex biochemical property for antibodies that can't be as easily expressed in Excel is isoelectric point ([2], see example BioJava implementation at [3]). It requires a binary search, but the search space is constant so usually these calculations finish < 20 s.

Since our implementation is in Python, we can wrap the function in try/catch and, in the catch block, log the error and set the failed computation status.

[1] https://www.promega.com/-/media/files/resources/technical-re... [2] https://en.wikipedia.org/wiki/Isoelectric_point [3] http://biojava.org/docs/api1.9.1/src-html/org/biojava/bio/pr...