

MongoDB singleton connection in Node.Js - afshinmeh
http://afshinm.name/mongodb-singleton-connection-in-nodejs/

======
joshguthrie
Nice job. Here are some tips for areas you could improve on:

\- Query all collections at startup time and attach them to your singleton so
you can just call MongoDbSingleton[collection].find(). This way you only call
your collections once and gain lot in speed for subsequent requests.

\- Since you are querying all your collections, you might as well use
.ensureIndex() by then to make sure your application won't insert two users
with the same e-mail.

\- Now that you have sane collections, you can rid your application code of
one "useless" instruction: give each collection a method that wraps the call
to .find() in another function that will perform the .toArray() for you in the
background. (This is NOT removing the step, it is merely moving it elsewhere
for code readability).

\- Play a bit more with your new singleton: you know how to craft a BSON
ObjectId from an hexstring? Simple, call
"mongodb.BSONPure.ObjectID.createFromHexString()". Okay, too long, why not
have a method to wrap that in your singleton?

\- All these function (err [, data]) flying around... Why not have a single
function that will always be called to check for "err" existence and output
error and trace?

I've been using most (if not all) of those tips in my own MongoDb wrapper,
both in professionnal apps and side-projets and it's made my MongoDb use much
more simpler. Hit me up if you want some more tips ;)

~~~
philbo
Is your wrapper open-source and available in NPM? If not, it should be! :)

~~~
joshguthrie
Was thinking about it but never imagined people would actually be interested
in it ^^;

I'll clean it up a bit before. It should appear in
<https://github.com/joshleaves> and <https://npmjs.org/~joshleaves> by the end
of the day :)

------
lukesandberg
This isn't actually a singleton. If the singleton is requested while it is
being opened initially (by another request) then two connections will be
opened. There needs to be some additional state tracking whether or the
connection is currently being opened, so concurrent requests can wait for it.
Unless the author just doesn't care... but this isn't a singleton.

~~~
chimeracoder
Be careful of double-checked locking errors:
<http://en.wikipedia.org/wiki/Double-checked_locking>

~~~
lukesandberg
seeing as this is node you won't need a lock (and since it is single threaded
there are no cross thread visibility issues).

What you do need is a state variable that indicates if the connection is
uninitialized, loading, or ready. And if it is in the loading state you need
to have a list of callbacks so that concurrent requests can be added to the
list.

------
aninimus
I recently read a chapter of David Herman's Effective JavaScript[1] that
talked about preserving the async contract even if you cache results (ie. can
return without an async call from the caller). Try:

    
    
        if (connectionInstance) { 
          setTimeout(function(){callback(connectionInstance)}, 0); 
          return;
        }
    

That way your async functions will always behave the same way.

[1] <http://effectivejs.com/>

~~~
afshinmeh
Or maybe with process.nextTick.

------
impostervt
I like to initialize the connection to the database in my app.js, and create a
global variable to each of my app's collections. The collection variables are
then always usable throughout the rest of the app, so I don't have to connect
to the database or find the collection again.

Is there any reason to not do it my way? I never see it shown that way in
examples so I wonder if I'm missing something.

~~~
anton-107
do you keep them updated? or are they read-only collections?

also some collections could be just a waste of ram for your app

~~~
joshguthrie
EDIT: As bellwether pointed out, in MongoDb, loading a collection only means
something roughly similar to "pre-allocating the connection and meta-
information": when querying a collection, no "actual" data is transmitted from
server to client.

As strange as it may sound...there's no need to update them (or my apps have
been lucking it out for the last six months?).

When you use less than 20 collections, this is not the kind of thing that'll
become a "waste of ram". You could try a LRU to remove unused collections and
get them later...but you would still lose more RAM because of the LRU
manager...

------
latchkey
This highlights the fact that nobody has created a good appserver for node
yet. There is such a focus on web frameworks still that everyone keeps writing
this code over and over again for their own apps. In my case, I wrote more of
a 'service' layer with a set of lifecycle events because mongo generally isn't
the only service that needs a connection. We have redis, neo4j, apns, etc. I
then just 'require' my service wherever I need it in my application and have
access to its connection. Services can also have dependencies on other
services, such as a ram based caching layer that depends on pulling data from
mongo.

------
zaim
Am I correct to assume that something like this would already be implemented
in a mongodb wrapper/lib? A quick examination of mongojs
(<https://github.com/gett/mongojs/>), the wrapper I'm using now, would suggest
so.

Also, I thought the 'official' mongodb library already does connection
pooling, or is this something different? Just curious.

------
outside1234
If you are using mongoose, is there a similar advantage to be had or is this
handled internally?

~~~
jameswyse
What I tend to do is create a database file (db.js) which exports an object
with getters for mongoose, redis, etc. The getter connects to the db and
returns the connection the first time and returns a cached copy on subsequent
calls.

One big tip for mongoose is to use the .lean() function on your queries
whenever you don't need to use any special mongoose features (statics,
virtuals, etc), which returns a standard JS Object instead of a mongoose
collection instance. It can be quite a bit faster!

------
mudetroit
I would be really curious to know what the overhead of actually creating the
connection and loading the collections on each request would be. Is this
anything more than a micro-optimization?

~~~
joshguthrie
Coming from WAMP background, I had the same question when I first tried
MongoDb databases with Node.JS and the consensus I read online back then (so,
3 years ago) was "Just connect once and don't bother anymore".

Though the overhead can be rather minimal, it still exists:

\- Connect to the database

\- Query the Collection

\- Make a request

\- Close the connection

Also, these four separate request are asynchronous, which means you could end
up with X web requests having connected to the DB but not having a collection,
Y requests having a collection and N requests waiting to be closed.

This is not a real problem (as long as you don't have 4 servers connecting to
the same Db with 65 thousand connections each :-) ) but having multiple
CONCURRENT connections to the same server is pretty useless in a SINGLE-
THREADED context like a Node.js app.

Now, if you were to use PHP, that's a different matter...

~~~
andypants
Actually, it can help in a node.js app if a single db connection can only
complete one request at a time. The javascript in node.js executes in a single
thread but a database connection is I/O. This is exactly what node.js is good
at.

With that said, you shouldn't create a connection per request, you should use
a connection pool.

~~~
joshguthrie
Look at christkv comment for "why this is not a good idea in node.js".

You may be thinking you can fire up multiple request at the same time (ie:
fire the requests, let the I/O do the job, get all your requests back) but
since you are in a single-threaded environment, you do exactly the same as I/O
on a single connection (plus the overhead of opening new sockets and
establishing hands shaking with the server).

~~~
andypants
christkv's comment is about creating a new connection per request. As I said,
I agree, you should use a connection pool instead. When the web request comes
in, you already have multiple database connections waiting for you. The
connection overhead is not an issue then.

And yes, that is exactly what I am thinking. More database connections = more
concurrent database requests. I don't understand your reasoning about being in
a single threaded environment. You made a statement but you haven't explained
it.

From what I can see, this problem is simply serial vs parallel. Is there
something I'm missing? I don't see how the fact that node.js is single
threaded changes the situation since the database requests aren't working
within the node.js context.

Edit: sorry, my previous reply may have been confusing. I was responding to
your comment that having multiple connections is useless in a single-threaded
environment. I was not suggesting that it may be helpful for the OP to make a
new connection for each incoming http request.

~~~
joshguthrie
Why not try to benchmark the results and see how it really goes? :-)

In both cases, the same process of reading/writing to one or more connection
sockets happen, you send and receive the same content to MongoDb: you can not
send twice as much queries, you're gonna send a query, THEN you're gonna send
another, then you get an answer THEN you get another answer. Everything
happens in the same sequence.

Also take in account MongoDb's own concurrency properties.

~~~
andypants
Well then why would the nodejs mongodb driver support connection pooling?

~~~
christkv
for a very simple reason :) the current mongodb database has a thread pinned
to each socket connection. This thread will serially execute the incoming wire
protocol messages. So say you have 2 queries running at the same time. The
first query takes 230 ms and the second one takes 100 ms. If you use only a
single socket the first one will have to finish processing before the second
one can be processed. This means that the total time of execution is 330 ms.
If you fire each message on it's own socket they will be executed in parallel.

The balance is between how big your pool is and how many messages are in
flight. In the future this will change as MongoDB moves to some sort of
eventloop or multiplexing of threads to sockets. But for the moment you have
to strike a balance between operations in flight and how many sockets you open
on the server.

Another thing to keep in mind is that the more socket connections you open on
the mongodb server the more memory needs to allocated on the heap and the more
contexts switches between threads will happen.

------
nbevans
People are still writing Singleton pattern code in 2013? Wow.

~~~
joshguthrie
I wish I could downvote a comment that condescending.

The Singleton Pattern is, as its name implies, a pattern. It's not a "good
pattern", it's not an "anti pattern",... It is ONLY a pattern whose rightful
use relies upon the developper.

Is a singleton justified in this case where the developper wants ONE
connection object to be used across all his application? I don't see why not.

Would a non-singleton object that would be re-instancied upon each
file/request/w/ever be better suited? I beg to differ.

But if your comment can teach us one thing, it's that for every pattern and
every HN comment, the malice resides only between keyboard and chair.

~~~
nbevans
If my comment can teach you one thing it's that you need to learn IoC/DI,
lifetime management and stop using the antipatten that is the singleton. It's
a pattern that only makes sense on a hack day and nothing more. HN has a low
brow audience if my comment has seriously earnt a -4 vote.

