

MongoDB's New Query Matcher - francesca
http://blog.mongodb.org/post/51574091391/mongodbs-new-matcher

======
rogerbinns
I wish the 10gen folks would do some work on various issues that have been in
MongoDB for a long time. For me the most serious one is data size. Storage is
pretty terrible, especially compared to CPU these days. I would far prefer the
use of more CPU in order to reduce storage.

Records in a collection mostly use the same set of keys. Every record includes
the full key which makes the storage that much larger. Strings can interned
and replaced with a far smaller numerical token to reduce storage. (I'd be
happy for it to be attempted for string values as well.)

The other is compression which is a straight forward cpu versus storage
tradeoff.

<https://jira.mongodb.org/browse/SERVER-863>

<https://jira.mongodb.org/browse/SERVER-164>

~~~
mayank
Until they do, you could always write a thin wrapper over the client driver to
gzip/gunzip non-indexed fields. Something like Google's Snappy would be well
suited to that.

~~~
gerner
That sounds great if you've got large fields with lots of redundancy. In fact,
we do this.

But small fields won't compress so well on their own. Often there's a lot of
redundancy across records for the same fields which is great for compression.
This might be a great way to achieve the benefits of field name tokenization
too (which is similar to part of how most compressors work). I'd like to see
block compression, rather than field compression .

~~~
mayank
Hmm...interesting. I don't know if this will work, but you could try storing
your MongoDB database on a compressed ZFS partition. Since MongoDB uses mmap,
this would have the nice side-effect of your working set remaining
uncompressed, and only being compressed when written to or read from disk.

~~~
gerner
you're not the first person to suggest that to me :) although I haven't
thought about using ZFS for this. You're not the only one to suggest ZFS. Why
that and not compression in btrfs (or something else entirely)?

_hopefully_ the mmap interface would provide the best of all worlds: mongodb
continues to be simple with respect to how it handles getting data from disk
and the kernel/fs can do it's magic behind the scenes of mmap. Of course, it
could be that mmap + compressed filesystem leads to some unexpected (and bad)
perf results. But then again, I've never tried :) have you?

~~~
mayank
No reason for ZFS in particular -- I was just unsure about how stable btrfs
currently is. I haven't tried this out, but I think I might. Email's on the
site in my profile if you beat me to it. :)

------
nasalgoat
As a large commercial user of MongoDB (almost 300 instances of it running in
production), I've seen some big shifts in 10gen's focus lately. They've really
ramped up the customer service, and they are making more regular releases that
seem like improvements.

Unfortunately, the better customer service doesn't provide better answers or
solutions to problems, and the improvements aren't targeting long standing,
basic issues with the platform.

This feature announcement fits the pattern: an improvement, but not in a
critical area like indexing, or document-level locking, sharding stability,
etc. There are basic fundamentals that need addressing like overcoming the 20K
maximum connections per server, or mongos CPU usage. These are the things I
deal with in production and that are business critical, but those feature
requests sit untouched in JIRA for months or years.

This feature seems interesting, but it solves a problem I don't have. I'd
prefer them to solve real world problems.

~~~
outworlder
Why do you need > 20k connections to your database?

~~~
nasalgoat
Because I want to run more than 100 instances of a webserver per mongos
process, but I can't because it causes the connections per server to go over
20K.

Right now my application stack is woefully under utilized due to this
completely arbitrary decision on their part not to allow the end user to set
the connection limit. They've even admitted that they just picked that number
four years ago and haven't looked at it since.

Just like they limit replica sets to 12 members maximum. What if I have higher
read requirements than that limitation allows? Well, too bad.

I think this points at a fundamental issue with MongoDB at the design level -
they don't allow end users to make any decisions about how to configure the
product, even if those decisions might turn out bad.

Every Enterprise-level DB allows end user tuning of parameters, so clearly
MongoDB isn't Enterprise-level.

~~~
jasondc
The 20,000 connection limit was removed in 2.5:
<https://jira.mongodb.org/browse/SERVER-8943>

------
ShabbyDoo
My name might be "Mud" a a previous client because I recommended MongoDB and
implemented a first-rev persistence interface based on my presumption that
this issue would be fixed in the near-term:

<https://jira.mongodb.org/browse/SERVER-3253>

To be fair, I educated the client on the alternatives should 10Gen not improve
the aggregation framework so that it could do arbitrary transforms without
failure should the output size exceed 16MB. [Use other MongoDB mechanisms,
custom compilation with higher per-document size cap, etc.] What annoys me is
that 10Gen did not mention this incredibly important limitation when they were
touting the planned features of 2.2. My client would not have minded had it
simply been the case that queries with large result set sizes were somewhat
slow. What this client could not tolerate were failures to deliver any results
at all. In retrospect, I wish that I had pushed harder for a solution based on
the Hadoop stack. While it seems to have its own demons, at least there is an
ecosystem dedicated to fixing the most blatant of limitations.

------
ghc
Do I regret choosing MongoDB a couple years ago to augment Postgres for
several data analysis applications? Not really. But I do wish Mongo has been
more mature. I'm happy to see each step forward taken, but it does bother me
that often serious issues get overlooked for press-friendly feature additions.
Until Mongo adopts some form of compression I won't be using for new projects
aside from personal ones where development ease trumps everything else.

~~~
nasalgoat
How about compression on the replication links?

When I inquired about that, they suggested using an ssh tunnel. I can see why
they like mmapped files for storing data.

------
mgamache
I use MonogoDB. I hope they will be able to manage the technical debt and stay
competitive with newer alternatives. Seems like this is a step forward.

~~~
jsemrau
Just out of curiosity. A real live application or a personal project. ?

