This is one of those remarks that is obvious to someone who knows what it means but mysterious to someone who doesn't. So what does it really mean?
- Don't use a technology that, no matter how good, is just too niche for widespread adoption?
- Don't use a technology that we'll have trouble finding programmers to support?
- Look for stuff that will take better advantage of impending improvements in hardware and communications?
- Avoid frameworks for which support may wane?
- Avoid technology that we may have to support ourselves?
- Avoid anything proprietary whose owner may disappear?
- Make sure that whatever you choose, it will work well on mobile technologies, even as native apps?
- Choose technologies abstract enough to minimize our costs but not too abstract to be inefficient?
- Any combination of the above?
- What considerations have I missed?
A couple years ago I considered both MongoDB and CouchDB immature for that reason. The recent confusion of CouchDB/Couchbase etc. shows that was a reasonable view.
A lot of software development teams are releasing their own RPM and .DEB Linux binary packages for just that reason, to encourage people to use up to date packages instead of the stale OS distro packages.
In a way, it's rather like security updates. Who would refuse to install security updates because it's not part of the Ubuntu 10.4 LTS release? Almost nobody even thinks of doing that. So why would you use old obsolete releases of mission critical software?
"If it ain't broke, don't fix it"
Because it's mission critical, and you can't afford for it to break. Once you hit a certain complexity, upgrades almost always break something:
APIs change. Undefined behavior changes. New bugs are introduced. A feature critical to your app starts performing worse. The above changes break something else you depend on (libraries, proxies, etc.)
Upgrading to a significantly changed version of a mission-critical app/library/language is a lot of work, and is sometimes impossible: many projects couldn't be reasonably ported to Python 3 if they wanted to; a lot of important libraries don't work on Python 3.
This is exactly why bug and security fixes are often backported into old versions. Python 2.5 is still receiving security patches. Apache 1.3 was maintained for years after 2.0 was considered stable.
And come the only valid reason, from my POV: We have to denormalize and avoid using the powers of relational data sometime because we do not know how to store (read and write) conveniently huge amounts of relational data; yet. Notice the "yet"? Well, I think it is a technical problem, that may not be so far to find its solution.
One example of a solution showing its nose recently: "Google plus with your world". It do strike me that the fact that any query I make to the closest Google server respond /instantly/ to any random word with a join on a very possibly monstruous matrix of all the likes of all my circled users.
I don't know how they do store this, where and how they denormalize, but in any case it seems to me to be just "relational data as usual".
I may be wrong, however, and would love more insights on this.
Google talked a little about how personalized search works in a paper about BigTable, it's worth a review:
> Personalized Search stores each user's data in Bigtable. Each user has a unique userid and is assigned a row named by that userid. All user actions are stored in a table. A separate column family is reserved for each type of action (for example, there is a column family that stores all web queries). Each data element uses as its Bigtable timestamp the time at which the corresponding user action occurred. Personalized Search generates user profiles using a MapReduce over Bigtable. These user profiles are used to personalize live search results.
Regardless, even in your scenario with the perfect RDBMS, the future web stack wouldn't change much. You still have the same issues with blocking and different languages for client and server. As a developer myself, it doesn't matter at all to me if my call to a method is backed by a relational, document or key/value database. It's all an abstraction somewhere. It just needs to come back quickly and be easy to scale up.
The big change we're seeing is the client becoming primarily JS driven and the server more or less relegated to sending/receiving JSON. It's a much richer experience, but a pain when the toolsets on either end are completely different.
In other words, it's nice to see Joel greenlight something like this, I'd say it's kind of a sign of the times in terms of the industry's overall comfort level with what would be termed 'hip' technology, or 'new' or whatever moniker you want to attach to it.
"The Socket.io server currently has some problems with scaling up to more than 10K simultaneous client connections when using multiple processes and the Redis store, and the client has some issues that can cause it to open multiple connections to the same server, or not know that its connection has been severed."
I wonder if they ran into redis's hard-coded 10k connection upper-limit. As it turns out, their configuration for "unlimited" connections actually has a cap of 10k. I believe in master this is going away, but if you need more than 10k connections on redis <= 2.4, you need to manually patch the daemon, in case anyone else runs into this.
Also note stuff like https://news.ycombinator.com/item?id=3419693: node.js is not free of issues itself. (To be fair, lots of people got that wrong and it was patched quickly.)
It seems like at the end of the day you still need to validate and sanitize user input before doing anything with it.
You might, for instance, find that simply sanitizing user input isn't enough when you're using the same interpreted language at multiple stages. If an attacker could cause the right front end code to be executed on the app server, or backend code to be executed in the database you could potentially compromise a lot.
Our stack is Redis, MongoDB, Nginx, SCSS, HAML, Coffee, Rails and NodeJS. I'm extremely happy with these choices.
Recently me and a friend did a small weekend project: www.bubblefap.com (nsfw) The design and code is a homage to ugliness. We only used PHP-ActiveRecord and that's it. and I had so much fun!
I just hacked away! I was cowboy coding, hacking away, and I didn't need to think about frameworks and architecture and integration with fog and hacking Rack to support flash uploads. Oh, good times :-)
In all seriousness, though, it looks like they are using Redis for exactly the right reasons, and the larger architecture is pretty much the definition of a sane forward-looking design.
I've always been hesitant to get too far away from my LAPP(ython) stack, but I'll almost certainly be hacking something together with these components to see how I like writing everything in CoffeeScript.
Would love to hear some more about the "bleeding" experiences you've had.
MongoDB (on FreeBSD): https://jira.mongodb.org/browse/SERVER-3927
. . . are a few.
Isn't that one of the major drawbacks of using a bleeding edge Tech Stack? Albeit mitigated by choosing to use code that is easy to comprehend/maintain (like backbone!)
What do you think you will do? E.g. With backbone: Merge up, status quo -- maintain your branch, change libraries, or something else?
Is this related at all to backbone.iosync.js?* Or, if not, is it something that Fog Creek will be open to sharing in the future?
Its news to me.
If you want to look a couple really-small examples I have a couple apps on github I used for learning backbone.js/coffeescript. They both use the underscore.js template engine.
Backend is sinatra, takes a HN story url and polls the comments in reverse chronological order.
Backend is rails, polls images from dribbble and lets you save your favorites.
That lets you use existing libraries (Devise, etc.) for the authentication and once that's out of the way, you can go have "fun" with coffeescript and mustache. :)
Does anyone happen to know if this is an open source or in house library?
- Is there an acronym/name (ala LAMP) for the Node/MongoDB/Redis stack?
I am writing a sizable Node app myself. In the end, you just get used to the callback style.
EDIT: search for "De-sugared syntax" on this page for an example.
* Mostly monadic (creating a deferred is "return", the "then" method is "fmap" and "bind").
* Can be implemented as a library, without a separate compilation step or having to patch the runtime.
* Avoids most of the CPS inversion of control madness. You can return and store promises and you can also add callbacks after the fact so code is much more flwxible. (writing sequential async loops is still annoying though)
I liked this article that discusses three different solutions to callbacks with different compromises: http://blog.willconant.com/post/7523275566/continuations-in-...
Jeremy discusses #350 and tamejs here: http://news.ycombinator.com/item?id=2777196
Tamejs have a great writeup about the callback spaghetti problem here: http://tamejs.org/
- MoNsteR? =)
My experience has been that you could only do this in few cases. Not an issue that can be worked around elegantly in libraries, you need language support.
They could have used something like Pusher for about half of their implementation (the websockets, message pushing, polling, etc)
I bring this up because I built an OSS Pusher clone, for those people that want to deploy their own, built on the Play! framework.
The socket.io codebase has been shrunk dramatically, and it's as a result easier to scale/maintain.
Can you guys share what (if any) test suites were used? Was curious on this point.
Mikito Takada (Zendesk) has some really helpful information regarding socket.io + haproxy specific workarounds via http://blog.mixu.net/2011/08/13/nginx-websockets-ssl-and-soc...
Also - I took a look at the article - it does not mention using a total node approach of using something like node-http-proxy for load balancing. Any idea on such a set-up? I need HTTPS as well.
With load balancing, I would recommend going with Stud/Stunnel and HAProxy. Terminating SSL with a specialized piece of software is nicer (separate SSL overhead to another box), and using a separate load balancer allows for more flexible options, e.g. serving static assets from Nginx. There is nothing wrong with node-http-proxy, it's just that these two projects have been around for longer and are better understood from an ops perspective.
You cannot use round robin load balancing, you need to have at least IP-based stickiness with Socket.io for now. Well, you can round robin if you use the Redis store but you'll run into inefficiencies with https://github.com/LearnBoost/socket.io/issues/686 . I'll do a bit more coverage on other SIO deployment-related issues once I get the chapter on Socket.io finished for my (free) book in a few weeks.
If I were starting now, I'd just ignore the Flash sockets transport since it makes the whole stack more complex due to not looking like HTTP to load balancers, and start with Engine.io / WebSocket.io to take advantage of their simplicity.
This inspired me to write about how I think, they could've gone even further: http://wuher-random.blogspot.com/2012/01/single-page-web-app...
Alternatively, it could be for the same reason people usually say they're going "up" to a big city, regardless of the direction of travel. The server is the hub and all the clients are down from it. This possibly dates back to when towns/fortifications were usually located on hills.
I mention it because they mention other parts of the client stack (Backbone.js for ex), but not said libs.
For IE and others (Android, I'm looking at you!), I have yet to try it and so can't vouch for it, but there exists a polyfill: https://github.com/Yaffle/EventSource
The three alternatives I know about are SockJS:
Faye - which implements the Bayeux pubsub protocol:
And node-browserchannel (mine) which implements google's browserchannel protocol:
(BrowserChannel works all the way down to IE5.5!)