
Powering CRISPR with AWS Lambda - sajithw
http://benchling.engineering/crispr-aws-lambda/?hn
======
dankohn1
I know it's old hat on HN, but I just wanted to point out how close to science
fiction this article is. This technique to edit genomes is only a decade old.
This startup is able to run sub-second searches without requiring any of their
own infrastructure.

It costs them less than a _$100_ a month.

It was written by an _intern_.

~~~
akulesa
>This technique to edit genomes is only a decade old.

Actually, this technique is about 3 years old.

[http://www.sciencemag.org/content/339/6121/819.short](http://www.sciencemag.org/content/339/6121/819.short)

------
phkahler
While reading, I just kept wondering why this search needs to be in the cloud
at all. Finding 20 byte strings in 3GB can be done on a laptop very quickly.

~~~
dankohn1
That seems partly a reference to this classic article
<[https://www.chrisstucchio.com/blog/2013/hadoop_hatred.html>](https://www.chrisstucchio.com/blog/2013/hadoop_hatred.html>)
"Your data isn't that big".

But in this case, it does seem incredibly advantageous to be able to scale up
to any number of parallel searches, and to be able to search arbitrary new
genomes.

------
aleem
AWS Lambda is great for inconsistent atomic workloads. However, I had a fairly
disappointing experience with Lamda when I tested it just last week.

For example, you cannot send dynamic response headers using the AWS API
Gateway (the complementary service to expose HTTP endpoints). In my case I
wanted to change the mime-type depending on JSON vs JSONP response.

It's also not possible to connect Lambda directly to ElastiCache and mostly
you are expected to work with S3 or DynamoDB (Amazon's proprietary JSON store
and what was mostly responsible for the data outage recently in US East).
ElastiCache would allow easy persistence which is why it's surprising it can't
be connected to given that it's an AWS service (you can connect to it by
creating an EC2 proxy but that would defeat the purpose of a serverless
architecture).

Some other oddities were sniffing the response body to set HTTP headers as
opposed to just allowing your Lambda function to set the HTTP header directly
or parsing the JSON response as opposed to doing a regex match.

~~~
impostervt
I've been playing around with API Gateway & Lambda a bit lately, and it
definitely feels like these services are sometimes built by teams that don't
talk to each other.

API Gateway tries really hard to HIDE things from you. For instance, you can't
see what the requested URL was without using a fair bit of VTL to put it back
together from some other variables. Any only lately can get you a full list of
query parameters, without having to specify them at the time of API creation.
In fact, it seems like most of the work on API Gateway, since it's release,
has been to let end-users have more access to data they hid in the first
place.

------
ac360
Hi Vineet,

I'm a huge fan of CRISPR. I've been following it closely since I heard
Radiolab's podcast about it.

I'm also the founder of the JAWS framework, which is an open-source
application framework built entirely on AWS Lambda and AWS API Gateway:
[https://github.com/jaws-framework/JAWS](https://github.com/jaws-
framework/JAWS)

I would LOVE to grab a coffee with you or anyone on your team some time, and
chat about lambda or CRISPR, or anything really :) I live in Oakland and my
email address is austen[at]servant.co

Also, will you be at Re:invent? I'm doing a breakout session on JAWS and I'll
be there all week.

Good luck to you!

Austen

------
taternuts
I kind of forgot that while lambda only supports node, you can use it as a
glorified wrapper to call your c++ code

~~~
netcraft
they also support java, and both node and java can launch other things on
amazon linux including bash, python and ruby according to their docs

~~~
rottencupcakes
I'd be nervous about java - the startup time of the JVM plus the slow
execution at the beginning until everything gets JITed makes me think overhead
could easily trump execution.

~~~
glibgil
Q: Will AWS Lambda reuse function instances?

To improve performance, AWS Lambda may choose to retain an instance of your
function and reuse it to serve a subsequent request, rather than creating a
new copy. Your code should not assume that this will always happen.

[https://aws.amazon.com/lambda/faqs/](https://aws.amazon.com/lambda/faqs/)

------
motoboi
"Our old server infrastructure cost thousands of dollars each month just for
server costs.

Using the new Lambda infrastructure, we pay for the number of Lambda
invocations, the total duration of the requests, and the number of S3
requests. This comes out to $60/monthfor hundreds of thousands of CRISPR
searches!"

Well, how much of that money you spent on EBS storage for your copies of
genome data?

EC2 instances could read from S3 directly as lambda does, maybe that could
alleviate the cost a lot.

Using AMI S3 backed instances could save a lot too.

But great work, nonetheless!

~~~
vineetg
EBS costs are actually fairly small ($9 a month per instance for 90GB). More
than 95% of our costs were just paying for the EC2 servers.

~~~
ac360
Right.

My friend is refactoring an app at his company right now, using only Lambda
via JAWS and we ran some numbers on the cost savings. He's retiring 2 EC2
c3.large instances which were costing $2.97/day. On Lambda the app will cost
$0.05/day.

We don't hear about it nearly enough yet, but the cost savings of building
apps on Lambda are huge. Then you add in the time saved on devops... and you
realize how seriously disruptive this tech is.

~~~
hyperpallium
Yes, this is the key point of microservices: it's not the modularity, nor
cross-language nor "webscale" etc.

It's cheaper, because more efficient use of resources, because finer-grained.
Each component of an app only gets what it needs; and the vendor can sell that
unused capacity to someone else.

Geometrically speaking, finer grains pack tighter, wasting less space.

They also utilize multi-core effectively.

------
Gatsky
Of note, the latest thing in reference genomes is representing them as a graph
data structure, which importantly allows variation to be incorporated. Some of
the newest methods for mapping short DNA fragments (that come out of the most
common type of sequencers) take this approach. They use a genome index though,
which takes a lot of computational effort to build before hand.

Anyway, benchling wants to avoid genome indexes from the sounds of it, in case
users upload their own genomes. Having said that, if someone is doing multiple
searches, it would quickly become more efficient to just index the genome. I
would have thought most people seriously concerned about off target CRISPR
hits would be using high quality reference genomes though.

~~~
psycr
Are you referring to string-overlap graphs for de novo assembly? In that case,
isn't CRISPR addressing another problem?

~~~
akulesa
I think Gatsky is referring to the method described here:

[http://www.technologyreview.com/news/537916/rebooting-the-
hu...](http://www.technologyreview.com/news/537916/rebooting-the-human-
genome/)

------
netcraft
I recently have started looking harder at lambda after realizing that you can
use 1M requests / month for free indefinitely. I just worry about vendor lock-
in with services like this - if for whatever reason you want to move away its
a rewrite at best. If amazon was to open source the lambda implementation
allowing me to run my services somewhere else with a config change id probably
buy into it completely and never move away...

~~~
crandycodes
If Lambda open sourced the engine, but didn't make it super easy (as in a
1-button deploy, nothing intentional) to stand up your own stand alone Lambda
service, would that be better?

On my projects that have been open sourced, we mostly open sourced to make
debugging easier and make extension authoring easier. I've gotten comments
from people that that makes them feel easier about vendor lock in, but
honestly, I haven't seen many people try and stand up their own service. Would
you say that matches your own expectations?

~~~
netcraft
yeah pretty much. I don't want to move, I want to have the ability to if
something goes bad.

------
JulianMorrison
I wonder if it would be possible to go the other way. How close is CRISPR to a
primitive of Turing complete computation?

Take it from s/xxx/yyy/ into being /bin/sed. And then run the search in
wetware.

~~~
toufka
Crispr is a homing system. It allows you to address a specific part of
(genetic) memory. It is a single component in a much larger system. A
required, and otherwise missing component. But it is just a component.

------
deegles
They might be able to save a bit on costs by caching locally. Lambda instances
can be reused if TPS is high enough. I think the limit is 500MB in the /tmp
directory.

------
jewbear48
How are you getting such quick responses from S3? In our own testing using
Java. It was taking over 500ms just to initiate the connection with S3 from
Lambda.

~~~
vineetg
We're getting around 100ms connection times. We're using the Node aws-sdk to
get things from S3: s3.getObject({Bucket, Key})

------
Sujan
Did I miss it or doesn't the article mention what server does the slitting and
combining? Is this also done in Lambda?

~~~
vineetg
I briefly mentioned it in the "New Infrastructure" section, but we're doing
the splitting and combining of results on our web servers.

~~~
Sujan
Thanks!

