Hacker News new | past | comments | ask | show | jobs | submit login
Simple API with Nginx and PostgreSQL (rny.io)
197 points by ricny046 on July 26, 2013 | hide | past | web | favorite | 73 comments



This article gives a good example of why you SHOULD consider using an existing framework to create a REST API.

I see no concern about authentication, authorisation, scalability, protection from SQL injection attacks, nor making the output easily parseable by third-party applications.

None of these are issues you can simply say, "I'll deal with that later when it becomes a problem." They are reasons why an existing framework is helpful.

Any example of a web application using a database should NOT be using string concatenation for adding field values to an SQL statement. Ever. Really.


Frameworks don't magically solve these concerns. I can create my API with Django, and there's no guarantee that I will restrict my endpoints to authenticated parties or I guard article modifications to their authors, for example.

Django doesn't magically scale; you still need to a) learn how their ORM works and where it can be greedy or b) just use a "saner" ORM (eg. SQLAlchemy) or otherwise write your own without the need of a thousand features you don't need.

SQL... set variables with `postgres_escape` and not `set`, always.

Output... he returns appropriate responses, but between all the "RESTful" frameworks I've seen, they ALL have different opinions on what should be returned, what HTTP codes to use, etc. I'm not sure a framework helps, other than to inform you or get you stuck with their ridged paradigms.


Again the middle brow dismissal.

Please make a little effort and expand on why this article is interesting instead.

For instance, people are using micro frameworks for a reason. They want less clutter and a more direct grab on what the machine is doing. With the proposed theoretical solution we get to remove one big complexity element in the setup, it is very interesting.


"protection from SQL injection attacks"

In the module he's using, there is a postgres_escape function.. so there's no reason he couldn't have used it.

http://labs.frickle.com/nginx_ngx_postgres/README


Unfortunately when defaults are inherently insecure, they lead to people building insecure systems. If every single parameterized SQL command needs to include (possibly multiple) escapes then it'll be missed in some places.

It's unnecessary anyway. It would be way better would be if the parameters were bound as named parameters. Ex:

    location ~ /articles/(?<id>\d+) {
      postgres_pass database;
      rds_json  on;

      postgres_query    HEAD GET  "SELECT * FROM articles WHERE id = $id";
      postgres_rewrite  HEAD GET  no_rows 410;
   }
See how $id has no quotes around it? The DB driver should parse the parameter and bind it as a string in that position. If you need to use it as a different data type (ex: integer) then you can do an explicit type conversion. That's how you prevent SQL injection.


"The DB driver should parse the parameter and bind it as a string in that position. If you need to use it as a different data type (ex: integer) then you can do an explicit type conversion."

Not sure exactly what you mean here. The protocol and libpq support sending literal values entirely outside of the query itself. There is no reason for the DB driver to do any parsing.


Miscom on my part. By driver I meant the nginx module that's calling out to libpq (which would more accurately be referred to as the DB driver).

I meant that libpq supports bind variables and the nginx module should be using them rather than performing a string substitution.


Well, it would appear to me this is one of the few posts on OpenResty. I recently discovered it myself as one of the faster options on the the TechEmpower Benchmarks. I know that this will lead to a benchmark flamewar, but that is where I first heard of it and started researching it.

http://www.techempower.com/benchmarks/#section=data-r6&hw=i7...

However, you might also notice that this project uses Lua (specifically LuaJIT) to run embedded in the server. I would think this avoids the problems partially. There is even a new MVC project budding out of this for Lua/Moonscript called Lapis, if that really assuages your concerns. Seems pretty micro, so I am not sure it is meant to really handle your security concerns. It was used on a demo site for a web game shop site that made it here a few months ago. Check it out; it is pretty cool.

https://github.com/leafo/lapis

As pointed out before, few frameworks I have researched care about security. Ironically, I think web2py is the only I remember having dedicated page space on doing OWASP evaluation against their code base from a long time back: http://www.web2py.com/book/default/chapter/01#Security). Others rarely mention it, and are total crap security wise. As others pointed out, frameworks are little guarantee against security unless, like the language and dev experience itself, prepare to know what the hell you are doing.

Final point, a much cooler complement might be the "Web Application Firewall" projects built into the web server itself to block typical injection attacks. NAXSI, specifically for Nginx, is a cool project and I will look into soon. Maybe this will interest you to.

https://code.google.com/p/naxsi/


Edit: he should have escaped the $title and $body.


It was a bad move not include protection against SQL injection to begin with. I was thinking about keeping it as simple as possible and just show a simplified example. But I think this was a bad thing to do so I have updated the example so it now uses postgres_escape.

And yes, this approach should only be used for very simple APIs. If you're building something bigger, use a framework.


If you use lapis[1] things will be much easier.

[1]: http://leafo.net/lapis/


Doesn't couchdb accomplish this? (out of box REST API with little config)


CouchDB's support for "views" and using comet/websockets (have they implemented websockets yet?) to communicate with a JS app served from the DB itself...

Riak does too - any DB that implements a RESTful interface to the database instead of a binary protocol.

CouchDB is particularly well-suited for this use-case though (I would never use Riak in that fashion).


Exactly. Couchdb is preferable to this for so many reasons. The author is effectively using the Nginx configuration language for server-side scripting. The result ends up looking a lot like a bad PHP crud app. With couchdb, the REST API is already built in.


Except if you already have a postgres-based system and you want to build a simple REST api around some small piece of it.


Interesting. I'd like to see how you implement role based access control and sql injection safeguards. How would you define complex validation schemes? In stored procedures?


Simple, you'd use a framework.


I thought as much. If the goal is to not write code then writing stored procedures is akin to writing code, just in a different language.


I made it to the bottom and thought the article was cut off. Turns out, it really is that simple. (Sans the sql injection and auth stuff).

Fantastic.


I thought the same thing.. page could use some whitespace at the end or a footer.


I really like the simplicity of this approach, but are the Postgres calls asynchronous? If not, it won't scale...


I think the answer is yes, it performs the calls asynchronously. I've been searching through the source code for the postgres module (https://github.com/FRiCKLE/ngx_postgres), and it appears to use the libpq asynchronous API (http://www.postgresql.org/docs/9.2/static/libpq-async.html).

Of course, one could easily use the async API in a non-async manner, so I'm not 100% certain.


The MySQL driver is async, the Postgres one should be as well.


I want to chime in that I have been using a similar setup: nginx, ngx-postgres, postgresql 9.2.4 in production for a while. It has been very snappy and rock solid till this point.

I used android async-http to talk to the server from my app to have everything online. This is an intranet app so latency is not really an issue.


If you created stored procedures in postgresql that accepted a json/hstore argument, you'd have a pretty flexible/fast system here.

i.e. PUT to "/api/blah" with {"id": 1, "name": "Joe"} could call the update_blah stored proc passing in that json.


Pretty neat, what about SQL injections?


This example is not secured against SQL injections but you can do it easily. Look at the documentation here: https://github.com/FRiCKLE/ngx_postgres/


To clarify, it looks like "postgres_escape" is the way to do escaping.

Unfortunately, it seems a bit awkward to differentiate between empty and NULL strings. That's something to be careful of.

Also, I really think this should be included in the blog post, even if it's simple. Protecting against SQL injection is not optional, so leaving it out only muddies the comparison with more traditional frameworks. Also, there's always going to be someone that copies and pastes it without thinking much about it.

That being said, this is very cool and I hope to see more development and exploration in this area.


From the docs re postgres_escape: "Because nginx cannot tell the difference between empty and non-existing strings, all empty strings are by default escaped to NULL value."

This behavior actually is what anyone who has used Oracle is accustomed to (empty string is NULL). I don't recall the default behavior in Postgres but Postgres is "Oracle-ish" in a lot of ways so I would be surprised if this is not the default there as well.


The "empty string is NULL" behavior you describe is a standards violation. Postgres does not follow Oracle off of that particular cliff; postgres does differentiate between the two.


... And Oracle is going to change that any day now for varchar, which is why they've been hounding us to use "varchar2" from time immemorial. (They wouldn't have any other reason to encourage varchar2, would they?) :)


Good point. I will update the examples in the blog post as soon as possible.


Yeah but by the time we've taken care of injection issues (and handled all the other gotchas that haven't been articulated yet), we've written about as much boilerplate as we would for just about any other scripting language / framework, wouldn't we?

Don't get me wrong -- it's still a pretty neat idea. It's just there's that "other 90%" (i.e. the necessary work to create anything seriously production grade) that always comes to mind when evaluating a new interface.


I would implore blog writers to not do this. I realize it makes the code easier to read, but even if you loudly disclaim "THIS CODE IS INSECURE DO NOT USE IT AS-IS" there are plenty of copy/paste coders in the world who will do exactly that, perhaps even intending to come back later and add in the protections, but never actually get around to it.


looks like that door is wide open


Wouldn't you be throwing away your ability to add a caching layer. I guess their are ngx modules for Memcache/Redis but coding even a simple api in nginx.conf can get difficult?

I still like the idea of nginx being able to communicate with different datastores. For example, let's say you want to serve a file but it requires authentication, you can pass some signed request via query string and cross check it in session in redis/postgres wherever.


If you want a really good REST API for Postgres you're better off using HTSQL. http://htsql.org


Every time I see something like this I think it is an opportunity to finally do something with http://www.eclipse.org/Xtext/. How cool would it be to come up with a simple (external!) DSL to mount SQL queries over HTTP routes, and transform the whole thing to a nginx configuration file?


Is this a "look how cool/flexible nginx is!" article or a real proposal about how to do API's?


Hopefully the former. It would be difficult to build anything non-trivial with it. However, I suppose you could build a basic counter or key-value system over HTTP with it.


It is by definition difficult to build anything non-trivial regardless.

This looks like a very nice way to build small simple REST APIs especially if you're already using nginx and postgres anyway.


Can this be done with redis, or memcache?



There is http://wiki.nginx.org/HttpMemcachedModule, but it's not really the same thing.... Cool though.


very cool, add in some lua scripting and you can get some pretty solid single purpose endpoints: http://wiki.nginx.org/HttpLuaModule

maybe not perfect for a whole app but maybe depending on your goals.


Can you recommend a public repo which demonstrates use of the HttpLuaModule? [Edited to add:] Another comment mentions the Lapis framework which gives [me] a good starting point.

@subs: thanks!


Hope this will help you http://blog.cloudflare.com/pushing-nginx-to-its-limit-with-l...

This helped me setting up nginx/lua/redis setup on Ubuntu https://github.com/ehazlett/nginx-rt-log/blob/master/readme....


You can also check out a simple example[1] used for the TechEmpower Benchmarks[2].

[1]: https://github.com/TechEmpower/FrameworkBenchmarks/tree/mast...

[2]: http://www.techempower.com/benchmarks/


There is a presentation here that goes through the basics. http://www.londonlua.org/scripting_nginx_with_lua/index.html


An oldie but goodie on this sort of stuff on nginx ... http://agentzh.org/misc/slides/nginx-conf-scripting/nginx-co...


I wonder what the difference in performance would be compared to NodeJS and Express.



Why use a SQL database when all you really want is a key-value store?


Because PostgreSQL is very good at being a key-value store?


Who says he wants a key value store?

This setup is powerful by its simplicity. Only two layers. And stored procedures can help you do validation, so one could imagine a full Twitter like implemented this way (lest the ui, obviously)


The API he implements suggests a key-value scheme.


This is amazing, always wondered if I could move things from the app server to the web server level.

but are there any performance benchmarks ? I suppose nginx would have a better performance.


Very curious - why would you want to do that?


On an unrelated note, timestamptz is advisable over timestamp.


Timestamptz is one amazing reason to use Postgres over MySQL frankly. Proper time + timezone storage is quite useful.


Postgresql doesn't offer proper timezone storage, or in fact any timezone storage. Timestamptz is just a UTC timestamp that is converted to/from your connection-specified timezone. You completely lose the "what timezone was this originally specified in" data.



The point is that the timestamp type is meant more for calendaring, unless you're manually normalizing the dates you're passing into postgres to a certain timezone like UTC. Timestamptz is what you actually want 99% of the time.


Yes, timestamptz is the one you want 100% of the time. But there are a ton of people who don't realize it doesn't work the way they think it does. Lots of people think it stores the timezone, and it is really important to understand that is not the case when designing your app. Postgresql is actually completely lacking in the ability to store and validate timezones, as well as converting values to other timezones in useful ways. It is quite possibly the biggest weak spot postgresql has.


Postgres already offers HTTP interface:

http://wiki.postgresql.org/wiki/HTTP_API


I'm looking for information on how to return homepage or template files from such pure REST interface. If you have any idea, please share.


Great for a quick little hack, but I wouldn't want to maintain that conf file once the project grows.


Well, this is for a very simple REST API and the post acknowledges that much. I mean if this does become a "project", I'm sure it's time to move it to a proper framework, but as you say for a quick hack, it's a pretty nice way to do it (with proper sanitization, of course).

I'd also venture that this is an order of magnitude faster and more responsive than a framework so there are definitely some plusses and minuses to consider.


yep agreed. if you're comfortable with nginx and just want to test something quickly, this is many times better than downloading a framework, setting up, routing, etc.


How does one do user authentication and resource authorization?


Auth Basic can be done directly against postgres, plus a few more modules (headers-more and eval was how I did it, IIRC). Could probably do it with just openresty too.

After authentication, you have the credentials needed check authorization, it'd probably just involve a few changes to the SQL in the article.


Presumably with an 'api_key' table and an 'Authorization' header. Alternatively, Nginx can do cookies and have if-then logic to check them.


You can easily do AuthBasic in nginx with a user file




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: