
Fake S3 – Save time, money, and develop offline - jubos
http://blog.getspool.com/2012/04/18/fake-s3-save-time-money-and-develop-offline/
======
Fluxx
In my opinion, having to replicate S3 in development and test isn't the best
idea. There are a few problems I see: You have tied yourself to S3's API, you
must maintain this "other" S3 by making sure it behaves like the real S3 and
your test and development code never actually hits the real API you're
using...until staging or production.

There are a few better strategies I can see here:

1\. For test, use something like VCR[1] to record real HTTP interactions with
the real S3 API during first test runs, serialize them to disk, and then
replay them later.

2\. Go the more OO route and create an internal business object with a defined
interface that handles persistance of your objects. You could have a
S3Persister for production and staging, but then you can create a
LocalDiskPersister or even MemoryPersister for tests. Hell, you can even keep
your own S3 and create OurS3Persister as well. The main point here is that
your application code is coded to one API/interface - the "persister" - and
you can easily swap in different persisters for different reasons. All the
individual persisters can then have their own tests that guarantee they adhere
to to Persister interface and do their own individual things correctly.

3\. Mock out the calls to your S3 library. It's the job of the library to
provide an API interface for you as the application developer to S3, so you
can mock out those API calls and trust the library works and is doing the
right thing. Since you're mocking things out, you should still have
integration tests with the real S3 to verify everything is working, but for
quick unit tests mocking works great.

The blog post mentioned they had GB of data, so YMMV on these ideas, but these
are strategies I and others have used in the past when dealing with APIs like
S3 and they work great.

[1] <https://github.com/myronmarston/vcr>

~~~
jubos
Excellent points.

We work on the idea of different stages in the test and development pipeline.
At different stages mock objects make sense, and at other stages having
something like Fake S3 makes more sense.

For testing, the first stage would be unit testing. At that stage it is best
to mock out your S3 interactions (with something like VCR or WebMock) and use
an OO approach to wrap your persistence, so you could swap out S3 with another
persistence engine without breaking APIs.

The second stage for us is integration testing where you might have multiple
machines testing across the network. In this situation, I think it is great to
have real network requests happening rather than mock requests. Also you can
deal with real files (especially important with media files like images and
video).

The last stage is taking out Fake S3 and using a true S3 connection to ensure
that everything does work on a production environment (cuz Fake S3 could be
faking you out, especially on things like authentication and versioning). We
do that by launching a stage cluster and running a set of integration tests on
that before doing a production release. Ideally, the first and second stages
catch any errors before you start doing tests against the real AWS services.

As for the development pipeline, being able to work with real assets while you
are making mobile or web interfaces is really useful, as well as simulating
latency to see how interfaces respond when under a slow network connection is
something that would be difficult to truly mock.

~~~
Fluxx
Awesome, thanks for the extra info. I think your setup sounds really good :)

------
ben1040
I had to do some work on an S3-backed project while out at sea on a cruise
ship a few months ago (let's save the discussion about working on vacation for
the 501 developer thread).

Thanks to git I was able to spool up my commits and then push when I pulled
into port and had cellular access, but I wasn't really able to do everything I
wanted with the paperclip-backed models without reliable/cheap network access.

An offline emulation mode for S3 sounds pretty nice, thanks for this!

~~~
mr_luc
On my last project, we used Dragonfly. Holy cow -- trivial to switch between
file-backed and s3-backed storage in the various environment config files.

It was a lifesaver, because the wifi at the place I was couch-surfing was a
little spotty.

~~~
andrewflnr
This dragonfly? <https://github.com/markevans/dragonfly>

It's kind of hard to google for "dragonfly".

~~~
knewter
Yeah, that one. That dragonfly is effing amazing and I <3 it. I have
proselytised for it for quite some time now and I'm still depressed more
people don't use it. I actually found it when I started contributing to
refinerycms.

------
justinsb
I'd recommend installing OpenStack's Swift component (S3 equivalent) and
evaluating that as well. You can run it on one node for development purposes,
you can scale it up if you want private object storage on your network, and
many public clouds are offering it: Rackspace Cloud Servers, HP Cloud, AT&T,
Korea Telecom, Internap etc

Wikipedia use OpenStack Swift to store their images, and have some good
presentations on this.

~~~
jubos
Swift is very powerful piece of technology, but it is also more involved to
setup. Curious to try RiakCS as well and see how it compares to Swift for
running production level S3 object storage.

~~~
justinsb
Looks like I need to post my blog post about how to set up Swift really
easily!

~~~
jubos
My twitter handle is @jubos. I would love to see how you approach it.

------
DenisM
How about failure simulations? Also, S3 has eventual consistency, so a read
can mIss a recent write. Ferequently injecting errors and consistency issues
would make this very helpful.

~~~
jubos
Great idea. I like the idea of a command line flag (like the rate limit flag)
to run it with a percentage failure rate or something along those lines.

------
fennecfoxen
I'm mildly surprised you have in-application bandwidth limits instead of
setting up clever firewall rules on your local box. (Latency in particular is
a fun thing to add.)

~~~
jubos
I wanted a cross platform way to test slow connections with a single command
line parameter. Whether it be Linux, FreeBSD, or OSX (maybe Windows (haven't
tested :-P)), it is easy to setup.

iptables or putting nginx with rate limits in front of Fake S3 would be a more
powerful approach, but also harder to get going.

------
EricR23
Why not just change the storage strategy to saving files locally while in your
test environment? Fog lets you do this easily with its configuration options.

~~~
dennyabraham
I can only imagine this is specific to scenarios where you have to manipulate
s3 objects directly and the fog::storage abstraction used for s3 isn't
adequate (though I could not example such a scenario specifically)

------
hrabago
I did this on a smaller scale within our SOA environment. We're told our DEV
must connect to everyone else's DEV. The problem is everybody's DEV is
unstable, because by nature, everything deployed there is a work in progress.
If someone's service goes down, it can prevent me from testing and block my
progress.

So early on when I developed a mock web service which could serve mock data
based on the service I was calling. As a result, I always knew what data was
available, had coherent data (foreign keys across systems were always valid),
and whenever I needed to, I can bring a system down and test my own system's
rigidity and error messages. It was great. And then we reengineered all the
systems and everything changed.

------
RandallBrown
This has little to do with the contents of the article, but I found it
interesting.

"For development, each engineer runs _her_ own instance of Fake S3 where _she_
can put gigabytes of images and video to develop and test against, and _her_
setup will work offline because it is all local."

Is spool a team of all women engineers? (I'm just curious as to whether or not
that's true because it's so rare. I don't want to turn this into a weird
opposite day version of the sexism in computer science debate.)

~~~
anadiazhernandz
Female pronouns are used in many contexts to counteract the too-common use of
male pronouns. I assume jubos was doing this because he knows the tech-world
is flooded with male pronouns.

~~~
dlgeek
I have no problem with this in general... but for the love of god, please
don't do what a coworker of mine did recently. In a threat model document, he
had A sending HIS messages to B, using HER public key. I automatically
translated the names to Alice and Bob (like anyone else would in a threat
model), hit the pronouns, and my brain segfaulted.

------
bdonlan
No license file? It's difficult to use software like this in many
organizations if the licensing situation isn't clear...

~~~
jubos
Good point. I will put a MIT License file in there shortly.

------
deepakprakash
Talk about the timing!

We currently have a setup that needs S3 access to reliably develop/test the
app I'm currently working on and I had just sat down planning to remove this
dependency, since I will be on the road the next week or so.

This will save me a bunch of time immediately and probably some money later
on. Thanks!

------
j45
While the triviality of maintaining a sync between the spec and functionality
of a Fake and real S3 will show in time, I think this is a neat idea.

One thought that comes to my mind is if I could get away with building entire
apps using this and spin it off to S3 where/if it's needed.

------
japherwocky
there was a python implementation of something like this in tornado (s3server
and s3client), though now I don't see it. Anyone follow that project and know
what happened to it?

~~~
dtwwtd
I think what you're talking about is one of Tornado's demos - not sure how
current it is though.

[https://github.com/facebook/tornado/tree/master/demos/s3serv...](https://github.com/facebook/tornado/tree/master/demos/s3server)

~~~
zackattack
if anyone wants to port the Ruby script to python, put it on kickstarter and
ill start u off with $50. i want a link to my webpage, CompassionPit, on the
page for the final tool though. and one on the kickstarter page if ur feeling
generous =)

p.s. anyone think we need a developer-tools kickstarter? paul, lmk before i
give it to my friends at tech stars

~~~
mryan
A Kickstarter for dev tools would be a great idea. As well as new projects, it
could also be beneficial for large changes to existing projects. Some
combination of Kickstarter and bounties for GitHub issues would be
interesting.

My only concern would be for the potential drama caused by members of the
community who find the idea of paying for open-source development abhorrent.

~~~
zackattack
Concern? That's free marketing. zackster@gmail

------
kellysutton
Simple, elegant, awesome.

Have you tested it against paperclip?

~~~
jubos
Thanks! I haven't but let me know if works.

------
mikebabineau
A similar tool is available for SDB:

<https://github.com/stephenh/fakesdb>

------
deutronium
Could you use Eucalyptus for this?

~~~
LauriL
You could, but setting up Eucalyptus is a lot of work, compared to Fake S3, a
Ruby tool.

------
sparknlaunch12
Great concept. Bandwidth cost savings are a big plus.

------
hashfold
great concept. will use it this weekend.

