Ditching mmap will not be that easy, cause most of the speed and simplicity of Mongo comes from using mmap.
Unfortunately, a central requirement of a write ahead log is that the data pages must not get written before the WAL. If that were to occur, and there were a crash, the system wouldn't know that it had to undo the changes to the data pages, leaving you with corrupt data. mmap generally doesn't provide the ability to pin your dirty pages in memory - they're subject to getting flushed any time the system is under memory pressure - so it makes logging a lot more complicated to get right.
For a tool like Redis, it's fairly easy for me to accept the limitation that my data size can't exceed available RAM.
But for an indexed document store with full-text query capabilities, it's a lot harder to for me to accept that limitation.
You do have to be able to keep your indexes in RAM, but that's much less limiting.
They hear you and are working on it.
What you linked to is a consequence of memory mapping. Mapping a single 2GB file in a 32 bit process will use up virtually all the address space and you couldn't map more than one at a time.
From 1 producer, it doesn't matter.
Though the point of mongo is to be webscale which implies to me many writers.
If there is a single high-volume data pump, for example machine generated data, will readers be affected by a continuous "fire hose" of incoming data?
That single producer caused wide-scale locking/hanging for all readers on the website and I had to manually stop the task during business hours because of that. Oy!
This is exciting even if I don't expect it to happen soon.
How do the two databases compare now. Is search improving in Mongo or is that something they are not really worrying about at the moment.
In terms of how they compare, I'm not familiar enough with Elasticsearch to comment, but for basic test searching needs, the implementation in MongoDB is pretty decent. More here:
Our pricing starts as low as $15 per month
The cheapest instance of amazon cloudsearch is around $79 monthly, You then have to deploy a transactional DBMS and then most likely S3 for storage as well.
the hosted version is upto 8gb ram, don't you think that's low ?
sqlserver + oracle are also insane
By the way Partitioning by Hash and Range as we have stated means "Sharding"
what i meant is that the correct price is 700 and not 350 since the 350 pricing doesn't include sharding/clustering
$700/year for enterprise-level solutions is generally regarded as suspiciously inexpensive.
"the upgrade from MongoDB 2.4 to 2.6 is a binary-compatible drop-in upgrade: shut down the mongod instances and replace them with mongod instances running 2.6."
And, you have to be on 2.4 first (I know you said you are, but best to make sure for others)
I'm leaning heavily towards couchdb though.
Not saying MongoDB is bad (I've used Mongodb and like it), but much of their API-design is quirky/bad, locking is an issue and mmap feels weird too.
Also it has a very nice HTTP interface so can talk to it straight from the Web via a proxy.
And, more node.js specific (but also offline capable, and available any time):
the latter is especially useful. I'll pass it along to people for extra reference.
same with some others document json oriented database (like elastic search), but mongodb is a good compromise in many area, the query language is easy to understand and powerful, the biggest issue being the diffculty to do complex computation and aggreagation: mapreduce help, aggreagation framework helps too, but in this area SQL is generally much faster for instance.
It could have been built with CouchDB however some features such as ad-hoc querying and partial document updates make MongoDB a more compelling choice (albeit prone to some scalability issues until mongodb version 2.8 hopefully lol!).
I use elasticsearch for adhoc querying : http://daemon.co.za/2012/05/elasticsearch-5-minutes/
And couchdb does have atomic in-place updates, http://wiki.apache.org/couchdb/Document_Update_Handlers
I used the latter recently to track the last view-time on images, as well as to build a schema migration routine in like 100 lines of code.
edit fixed link
I can't hold my breath much longer! :-)
Edit: docs don't make any mention of it but then again they probably haven't updated them yet (fingers crossed!) http://docs.mongodb.org/manual/faq/concurrency/#what-type-of...
Don't asphyxiate, tokumx has document-level locking right now. http://github.com/Tokutek/mongo
I wonder why 10gen hasn't made any official comment on the work the folks at Tokutek are doing to enhance Mongodb's features.
Why would they comment? What would they say? Toku is basically trying to steal MongoDB's customers, they even use the same basic pricing model.
We generally don't criticize indiscriminately. MongoDB has a lot of good sides and we embrace and extend those, and where it has faults we try to work around or replace them. Our core strength is fast, reliable, compressed storage and MVCC semantics, so obviously we talk about that a lot, but we also understand and acknowledge that a large amount of TokuMX's success, to the degree it has some, is due to the excellent parts of MongoDB.
As an example, I personally am really excited about what MongoDB has done with aggregations in 2.6 (and what seems to be coming down the pipe soon), and I can't wait to merge it in to TokuMX. We all get stronger together.
This means no automatic upgrades to 2.6, and sysadmin action to correct config file name etc.
I'm sure that it is a viable choice for some use cases, it's just that I didn't found a use case for it yet.
Being able to choose from PostgreSQL, Redis, Cassandra, heck, even ElasticSearch made me always choose one of those over MongoDB, at least for the problems which I had been trying to solve.
Or does your data fit in one csv file?
Essentially you say something akin to "Death metal is just pulsating air-waves, like jazz, so what's the big difference?".
The index portion of a file system are called files and directories.
Several file names can refer to the same data. Those are called hard links.
So with hard links, I can refer to a Foo by their related Bar.
If I am not mistaken, this accurately describes the limits of MongoDB in terms of mapping relations. I'm not a Mongo expert because no one could convince me otherwise to date, somebody correct me?
I'm not sure why you imply that a full search is incompatible with an index.
Perhaps you meant "full scan", that is reading everything while searching, instead of "full search" (searching everything). The first is not a prerequisite for the second.
In any case, grep is a very inefficient way of doing a full search. An index is so much faster it's not even funny.
>The index portion of a file system are called files and directories.
Those are just indexes for the names of the files and folders, and a few other select metadata. Nothing like a full-text search index, or even actual indexes on metadata.
(Some filesystems allow those too, e.g. in BeOS, but nowhere as comprehensive and flexible as using a dedicated tool for this, be it MongoDB or something else).
>Several file names can refer to the same data. Those are called hard links. So with hard links, I can refer to a Foo by their related Bar.
Sounds like a convoluted and inefficient way of building something somewhat like a "document database" with 1/10 the features (if that).
>I'm not a Mongo expert because no one could convince me otherwise to date, somebody correct me?
I'm far from a fan of Mongo, but you seem like you have already made up your mind, and nothing will change it.
Plus, if a filesystem is enough of a document database for you (with no cheating, e.g piling up tons of hacks and add-ons like external full-text scanning tools), then be all means, us one.
Why would you be calling grep from an online application anyway?
You are missing the big picture here of linear search versus indexed search. Optimizations in grep don't magically make it better than O(n).
You can also use GridFS to store files in the document database, which actually breaks files into chunks and stores them in collections, also just like a FAT table.