I.e., i write some code, hit F5, and yeast comes out the other side of the compiler-machine. How confident can I be while debugging that the changes in behavior between the compiled output (yeast) and my code are bugs versus as-yet-unknown behaviors of complex molecules bumping into each other in a very crowded bag of other complex molecules sitting in a dish somewhere? As of right now, and I posit, as a biological researcher, for a very long time to come, the answer is "A lot less than 100%". Simple stuff like gene-toggle switches  have been known and pretty much work since the late 90s, but scaling is deeply nonlinear.
This doesn't mean that synbio is useless, but it means that the error bars are a lot higher than they are when you're running code on printed circuits and can have a high degree of confidence that your computational substrate actually does what you think it does.
The fluorescent molecule at the center is not actually a nucleic acid, but is formed several hours after the protein has folded after a secondary reaction between some of the atoms at the center of the protein find a lower energy state. This seems to me (not a protein engineer) to be virtually impossible to design a-priori.
 Meta-Meta-Article: https://www.bakerlab.org/index.php/2019/01/02/nature-article...
 Actual paper: https://www.nature.com/articles/s41586-018-0509-0
From the paper:
"Synthetic genes encoding the 56 designs were obtained and the
proteins were expressed in E. coli. Thirty-eight of the proteins were wellexpressed and soluble; SEC and far-ultraviolet CD spectroscopy showed
that 20 were monomeric β-sheet proteins (Supplementary Table 3).
Four of the oligomer-forming designs became monomeric upon incorporation of a disulfide bond between the N-terminal 3–10 helix and
the barrel β-strands. The crystal structure of one of the monomeric
designs (b10) was solved to 2.1 Å, and was found to be very close to
the design model (0.57 Å backbone r.m.s.d., Fig. 3c)."
"Two of the 20 monomeric designs—b11 and b32—were found to
activate DFHBI fluorescence by 12- and 8-fold with binding dissociation
constants (KD) values of 12.8 and 49.8 μM, respectively (Extended
Data Fig. 6f)"
De novo Design of Potent and Selective Mimics of IL-2 and IL-15: https://www.nature.com/articles/s41586-018-0830-7
Lab's accompanying media post: https://www.bakerlab.org/index.php/2019/01/09/potent-anti-ca...
" Potent anti-cancer proteins with fewer side effects. Today we report in Nature the first de novo designed proteins with anti-cancer activity. These compact molecules were designed to stimulate the same receptors as IL-2, a powerful immunotherapeutic drug, while avoiding unwanted off-target receptor interactions. ..."
A remarkable and powerful achievement.
As a side note, this is why I appreciate HN, never would have seen this paper otherwise.
Do you think that despite the high error resulting from the sheer complexity of those interactions, design tools would be useful to researchers? Or is it mostly just long shot proof of concepts with no practical purpose right now?
The domain is complex, the customers require a high level of correctness, and features are often implemented to try things out in the lab and then abandoned once it's a dead-end in the real world. Oh, and most academics don't have money to pay for software, nor do they respect the software engineering process.
Commercial gui-based software is usually sold along with proprietary equipment like a microscope or something, and universally loathed even by people who managed to wrap their heads around it. It's much easier to work with different software from different people or groups if it's all the same syntax, rather than relearning clunky proprietary gui after clunky proprietary gui.
I also encourage you to look at the Qian Lab. Lulu has created neural networks out of DNA -- https://www.nature.com/articles/nature10262. Her research is entirely about programming molecular systems -- http://www.qianlab.caltech.edu/research.html.
I was fortunate enough to collaborate with Lulu and with Erik Winfree's DNA computation lab at Caltech a while back.
My work was about building switching circuits using DNA. We designed DNA sequences that would bind together probabilistically. Using these, we created "probabilistic switches" -- analogous to Shannon's original on/off switches. Then, we used some earlier work of mine to design (and build!) circuits of "pswitches" that realize certain probability distributions.
We published an open-access paper last year here -- https://www.pnas.org/content/115/5/903. I suggest looking at Figure 1 to get a good idea how it works.
At what temperatures?
In what pH range?
In what species?
With what condon bias?
When should it be expressed?
At what rate should it mutate?
Do you want any other nearby 'functions' from mutation?
Is this a membrane protein?
Do you want restriction sites?
Is this for insertion into a plasmid or for insertion directly into a genome?
What bacterial cell line will you use to maintain the plasmid?
If you are expressing the protein for purification what cell line or bacterial strain will you be amplifying?
Do you want a single sequence that will work for all of these or are you willing to compile a different version for each combination? etc. etc. etc.
The list goes on and on.
The sequence you compile to will depend on those environmental parameters and the semantics of that 'environmental ISA' will likely be highly specific for many high level descriptions. I imagine you could produce sequences that were more robust, but the simulation time required to generate and validate them would grow accordingly.
All of this not even mentioning the fact that you also absolutely must specify _all_ the things it should not do, such as cleave a bunch of other proteins, or bind non-specifically and form aggregates in the cytoplasm, etc. A language level list of defaults here would certainly be a requirement, and that means the space that you will be optimizing in is absolutely massive. I don't even want to imagine how slow it would be. Probably faster to synthesize a bunch of variants and test them all in real cells.
The simpler case of taking an NCBIGene identifier and smashing it into an Addgene plasmid identifier and setting codon biases and optimizing for expression is a much more manageable task, and would probably be a building block for the more complex version.
Bio-informatics deals with the stuff you mentioned in-silico. There’re a whole bunch of libraries for many different languages, but I’ve not come across any specific language for bio-informatics.
You might find https://youtu.be/8X69_42Mj-g interesting, they developed a DSL in common lisp to generate C++ code using LLVM.
"It turns out that there are lots of similarities between modelling concurrent systems and biological systems. Just like a computer, biological systems perform information processing, which determines how they grow, reproduce and survive in a hostile environment. Understanding this biological information processing is key to our understanding of life itself.
It’s probably easier to understand some of the output of this work – specifically the Stochastic Pi Machine, or SPiM as it’s often referred to. SPiM is a programming language for designing and simulating models of biological processes. The language features a simple, graphical notation for modelling a range of biological systems – meaning a biologist does not have to write code to create a model, they just draw pictures.
You can think of SPiM as a visual programming language for biology. In addition, SPiM can be used to model large systems incrementally, by directly composing simpler models of subsystems. Historically, the field of biology has struggled with systems so complex they become unwieldy to analyse. The modular approach that is often used in computer programming is directly applicable to this challenge."
They knew what those instructions will do and they had 100% certainty about it.
This is not the case in biology, however. People still struggle to understand how proteins interact on a molecular level.
Few years ago I took part in a project that aimed to engineer a new, synthetic metabolic pathway in yeast.
A key enzyme in this pathway had to be introduced from an exotic bacteria. But no matter how much we tried, the enzyme worked very poorly (2 order of magnitude slower) when introduced to cerevisiae.
The problems we have in biology today aren't technical, they are still fundamental... Creating a programming language dedicated for wiring genetic circuits is nice, but won't be a game changer.
There are some problems with using nucleic acids, chief among which is that the central dogma of molecular biology isn't nearly as true (or as central) as we used to think. Don't know as much about RNA -- there certainly are RNA "machines" where the nucleic acid itself has a function and maybe those could be coded for specific functionality. RNA isn't as stable though (because it's single stranded and RNAses are everywhere) so it sounds hard to deal with experimentally, but I'm no expert.
Beyond that, were not quite at the place where we can compute desired phenotype -> genetic code, except for when single enzymatic reactions are all you are doing. Going bigger requires understanding the nonlinear complexities of the system in much deeper detail.
Transcription Factor + GFP -> green transcription factor -> codon-optimized plasmid
Membrane tag + light-activated conformational change + cytoskeleton -> optogenetic mechanotransduction
Small-molecule-activated degradation domain + transcription factor -> chemically-induced genetic switch.
And though there's still a bit of trial and error to debug which/how those components go together, it's a common enough thing to be reasonably accessible.
The curious part, from our perspective is that biology has massive surface area - and the surface area is 3D. It not only scales between species/functions, but it also scales up and down, from atoms to organs. And the expertise/abstraction layers that work at one scale become complicated if you try too hard to account for all variation at a different scale. HIV's genome is a backwards, upside-down, mirrored, fugue of an engineering design that uses exotic molecules, exotic regulation, exotic proteins, and exotic physics. We're starting at a different place, just trying to write very simple scales.
In our case, we've chosen a single size scale to work with - proteins, but are wide enough to look across every species & discipline to understand those proteins as common tools. We compile all of our designs for a particular function that we're interested in, down to DNA, literally. And finding the niche where we do not have to deal with all of the DNA-regulation, or cellular regulation, or tissue synthesis, etc. allows us to expand and build in complexity at the protein level - while keeping other parts of biology constant. And that also allows us to interact and work with others who are working at different scales.
And there are others that build in complexity at other biological levels (gene regulation, pathway flux, etc.). Companies like Asimov  are involved in similar work at some of those abstraction layers. The open-source design language, SBOL is an attempt to standardize the DNA layer . And this contributes to the challenge in that a lot of people/companies/labs have projects to build an abstraction layer that compiles down to DNA - but they might be talking past each other and be doing separate projects.
We've built an entire API of 'high-level' commands at an abstraction layer above DNA, where the output compiles down to literally, a JSON file specifying the DNA sequence to be manufactured by a 3rd party, as well as human-level citations to enable turning the new designs into intellectual property.
There is still a LOT of data missing, and there's a lot of empirical work to do - and you need to keep your compiling system constant enough that when you make changes at your abstraction layer you know that when you hit a roadblock you know it's because of a change you made, and not just a bug in the system.
I’d heard about Asimov from their original work on Cello at MIT, but SBOL is news to me. I’ll check it out as well.
It sounds like this space is becoming pretty competitive, which is interesting.
Always looking for good work. The field is growing rapidly right now. You've got a good combination of talents to help.
In terms of protein design at that atomic level, the computation traditionally has relied on knowing or guessing at the structure (atomic arrangement) of the protein. And without that, there's not much to do (that's where our work picks up). A lot of that kind of protein design computation work is being done with software like Rosetta .
DNA seems to be able to detect lesions and mis-matches based on conductivity of electrons down the double-strands themselves .
For one example.
not one of the end products nucleic acid by itself...this is why creating a dinosaur from blood, etc is such a fictional dream
Sure you could device something to take written instructions and produce peptides according to it, but it won't be a "programming language for Biology".
"Language" and "code" are shitty metaphors for what DNA does in living organisms, and it misses almost everything about what's needed to go from the molecule complex to the phenotype.
Divine beings (and possibly Real Programmers, who mastered the use and internals of C-x M-c M-butterfly [https://xkcd.com/378/]) may get away with doing their work despite thinking about it that way. Everyone else need not apply.
Incidentally a good deal of the "junk" DNA turns out not to be.