
Building computed fields in a biological database - joshma
https://benchling.engineering/building-computed-fields-in-a-biological-database-5cb8774c4a2a?hn
======
phsource
Interesting write-up on the additional work of almost implementing a
spreadsheet in the back-end! I'm curious as to what kinds of functions exactly
are these.

In the article, you mention that "We chose to compute asynchronously in case
computation takes too long or runs into a fatal error" \-- coming from Excel,
how do you handle errors? And how long exactly do these formulas potentially
take? (As someone without a lifesci background, it's hard to wrap my head
around the concrete parts)

~~~
joshma
(Posting on behalf of Somak)

Hi there, thanks for the feedback! The 2 examples in the write-up are: 1)
Molecular Weight: weighted sum across amino acid sequences, using amino acid
weights defined in [1] 2) List Concatenation: aggregate lists of added
resistances (like ['Ampicillin', 'Kanamycin']) across itself and all ancestors

These are simple, and can be expressed as formulas in Excel or functions. They
take < 0.5 s.

A more complex biochemical property for antibodies that can't be as easily
expressed in Excel is isoelectric point ([2], see example BioJava
implementation at [3]). It requires a binary search, but the search space is
constant so usually these calculations finish < 20 s.

Since our implementation is in Python, we can wrap the function in try/catch
and, in the catch block, log the error and set the failed computation status.

[1] [https://www.promega.com/-/media/files/resources/technical-
re...](https://www.promega.com/-/media/files/resources/technical-
references/amino-acid-abbreviations-and-molecular-weights.pdf) [2]
[https://en.wikipedia.org/wiki/Isoelectric_point](https://en.wikipedia.org/wiki/Isoelectric_point)
[3] [http://biojava.org/docs/api1.9.1/src-
html/org/biojava/bio/pr...](http://biojava.org/docs/api1.9.1/src-
html/org/biojava/bio/proteomics/IsoelectricPointCalc.html)

