

Ask HN: How to build a resilient, scalable REST api that is open to the web? - volokoumphetico

I'm thinking about the recent url-based api created by someone on here called mebe.co<p>How would you go about building somthing like this? Where would you host them?<p>I was thinking appfog for node.js to handle incoming http request and rabbitmq to queue asynchronous requests, reddis to support data.
======
kellros
You should probably consider the process design more than the technology (for
now).

Goals:

1\. Whenever receiving a request for processing, you should return a task
identifier that can be used to check for progress (ex. you could simply store
this in redis as 'TaskId', 'Url', 'FileId', 'Type', 'Status', 'OutputId' -
with Type being the type of input: URL or FileId)

2\. Once the client received the TaskId, it can do a request to get the
status, which would could be 'Pending' or 'Complete' with an URL to the
output.

Process:

===================

Request Processing:

\-------------------

1\. Receive the request with parameters (can users upload files or simply pass
through an URL?)

2\. Create a token based on the hash of the input file and the parameters
supplied (ex. MD5 hash of the video, from time, end time of the video to
capture).

3\. Check if a task identifier has already been assigned to the token, if it
has, return it. This means the client will then do a request to check for the
status and would receive { 'Status': 'Complete', 'URL': '...' } and can
immediately access the output via the URL.

4\. If the video is uploaded, check whether it has been saved via hashing the
file (ex. MD5), if it has then use the existing FileId, otherwise, save the
file and generate a new FileId.

5\. Generate a new task identifier and create a task entry to store in redis
that contains metadata about the task.

6\. Create a command containing the task identifier and publish it to the
message queue for processing (which is asynchronous).

7\. Return the task identifier to the client.

Task Processing:

\-------------------

To process the video, you would simply grab the metadata from redis and update
the progress as you go along. If the task identifier doesn't exist (no
metadata entry on redis), simply abort the task processing.

Although this is pretty simple, you should focus on optimizing file storage
(ea. don't store the same file twice and reference them by MD5 hash or the
like and perhaps have a cleanup routine to remove rarely/once-off used files)
and processing time (if the task has already been processed, simply return the
output). You should also decide on limitations to prevent abuse (DOS/DDOS
attacks, large file uploads, what the service can be used for etc.) and limit
your liability.

Is this the answer you were looking for or are you asking about the specifics
of implementation (technology wise)?

~~~
volokoumphetico
It's 2 AM here, but I'm mustering up the strength to write this because the
answer provided here is so detailed, so specific to my question. Thank you so
much for taking the time to write all that. I am still reading it over and
over to fully comprehend it all, there's a lot to take in.

I see now the thinking flow I have to change, I have been window shopping so
to speak with frameworks, stacks.

iron.io is what I think will be used for converting long asynchronous task.

I wouldn't mindd specific technology stacks. it's either node.js or python. I
built something in Flask (god bless it's heart) but I don't know if there are
others who have made it scalable...

I keep hearing this idea from something like this
<http://www.slideshare.net/norbu09/rabbitmq-couchdb-awesome>

Thanks again, but I need to hit the sack.

------
volokoumphetico
As soon as as a user types that into the browser, a few things will happen of
course at that exact moment an http request is received by ________.

and then it will call iron.io ironmq or something like that, put in a request
to the queue, fire off some ffmpeg scripts in the ironworker asynchronously,
receive message that it's successful, send back to the browser animated gif
that has been rendered.

During this time, the user will just have to wait, until the task completes.
I'm not sure if I would want them to time out because what if (edge case),
someone decides to convert one of those "Nyan cat loop for 10 hours" videos,
the server is stuck in a long task.

Then comes the question of whether the same long process will continue if some
other user wants to test out this exploit. That's where I think persistent
storage of the processed file would come in handy. the request url should
first be scanned if it's existing in one of our previous url requests.

Redis could be used replacing rabbitmq or ironmq. but I'm think for spike
cases, not in a sense that the app will become hugely popular overnight
(although making it on the first page of news ycombinator would awesome), but
certain users either being passionate users making lot of requests and
malicious individuals abusing it.

I really don't want to block the api behind the http auth dialog or a web
based login.

I envision it to just work like you would use google, except you can convert
youtube videos to animated gifs, just by entering in the url.

Of course in the backend, I'd need extensive monitoring and realtime analytics
would be cool.

