

MongoDB 2.4 Released: Text Search, Security, Hash-based Sharding  - francesca
http://blog.mongodb.org/post/45754637343/mongodb-2-4-released

======
dmytton
Using the new working set analyser is going to make figuring out how much
memory you need significantly easier. Giving MongoDB enough memory for your
working set is the easiest way to get performance but it was quite difficult
to figure it out: you had to know both data and index sizes across your most
common usage patterns.

~~~
diminoten
Do you know of any resources that give some info on this topic in general?
I've been looking but I haven't found anything that's sufficiently plainly-
spoken that I fully grasp the contents.

~~~
kstirman
from: <http://docs.mongodb.org/manual/reference/server-status/>

serverStatus.workingSet.pagesInMemory

pagesInMemory contains a count of the total number of pages accessed by mongod
over the period displayed in overSeconds. The default page size is 4
kilobytes: to convert this value to the amount of data in memory multiply this
value by 4 kilobytes.

If your total working set is less than the size of physical memory, over time
the value of pagesInMemory will reflect your data size. Conversely, if your
data set is greater than the size of the physical RAM, this number will
reflect the total size of physical RAM.

Use pagesInMemory in conjunction with overSeconds to help estimate the actual
size of the working set.

Also, there's a discussion of working set in this guide starting on page 9:
[http://info.10gen.com/rs/10gen/images/10gen-
MongoDB_Operatio...](http://info.10gen.com/rs/10gen/images/10gen-
MongoDB_Operations_Best_Practices.pdf)

------
gregjor
Great to see the Pick database reinvented bit by bit. Takes me back to 1980.
Forget the past, doomed to repeat, etc.

~~~
julien_c
Can you elaborate for those of use who weren't around in 1980?

~~~
shin_lao
Basically the guys from MongoDB are rebuilding databases as they existed in
the 1980s with the motto "we will do better than relational databases!" or
"Sybase, Oracle, prepare to die!".

It's not clear which problem MongoDB is trying to solve or if it is an
improvement over existing technology (this is my personal opinion).

~~~
base698
I think it's pretty clear to anyone who's used it. It allows for rapid, low
overhead changes to your data model in the very early stages of building
something new. Of course, the further along you get those changes are no
longer low overhead, but at the start of the new project it's very easy to get
up and running.

~~~
pkulak
At the start of a project it's also really easy to change an SQL schema.

~~~
base698
It's not easier than nothing. You don't have to do __ANYTHING __in mongo. In
any reasonable sized web app with a well designed ERD changing is going to
result in you stopping coding and doing some ALTER TABLE, DROP INDEX, etc
commands before you can start back with the coding.

Also when you want to create a report you can just write a script aggregates
the data and fills another collection on the fly without creating the table
first. You can even try out 6 different reports at the same time, without
issuing separate create table statements.

~~~
gregjor
QED.

------
davidkellis
I was hoping for collection-level locking to be a part of the 2.4 release. I
didn't see any mention of it in the release notes. Last I heard they were
going to implement collection-level locking and then begin work on document-
level locking. I'm still hoping document-level locking isn't too far off.

~~~
spf13
Collection level locking isn't in 2.4. Working on more granular locking for
2.6/2.8. May skip collection level entirely for something more granular. More
details can be found in the collection level locking ticket
<https://jira.mongodb.org/browse/SERVER-1240>

~~~
davidkellis
Skipping collection level locking and going for something more granular (like
document level) would be awesome.

~~~
tomsthumb
Don't you pretty much get document level locking by using multiple $<update>
modifiers in a single query?

~~~
apendleton
From an atomicity perspective, yes. From a performance perspective, no; other
concurrent operations affecting anything else on the whole database wait on
your write.

------
Lionga
It is great to have a stable version with fast count indexes see
<https://jira.mongodb.org/browse/SERVER-1752>

Best feature for me

~~~
friendly_chap
They are not fast, they are "normal speed" now. They were slow before. That
was one of the problems which made me question the competency of the Mongo
team.

Best feature for me too anyway.

------
c-oreills
"db.killOp() Can Now Kill Foreground Index Builds"

That's going to save a lot of accidental "shit-the-whole-db-is-locked" pain.

------
dkhenry
I am curious to play with the text searching. I wonder how it stacks up to
Lucerne or Solr in the text indexing space

~~~
mumrah
I am quite sure it will not come close to Lucene/Solr in terms of performance
or capabilities.

~~~
andersnolsen
Only one thing to do - test

~~~
Argorak
Or just read the blog post:

tl;dr: use it, if a simple reverse index fits your needs, use Lucene for the
grown-up stuff

> MongoDB text search is still in its infancy and we encourage you to try it
> out on your datasets. Many applications use both MongoDB and Solr/Lucene,
> but realize that there is still a feature gap. For some applications, the
> basic text search that we are introducing may be sufficient. As you get to
> know text search, you can determine when MongoDB has crossed the threshold
> for what you need.

------
leothekim
According to the upgrade instructions [1], the only supported upgrade path
from sharded 2.0 clusters is via 2.2.

[1] [http://docs.mongodb.org/manual/release-
notes/2.4-upgrade/#up...](http://docs.mongodb.org/manual/release-
notes/2.4-upgrade/#upgrade-a-sharded-cluster-from-mongodb-2-2-to-mongodb-2-4)

~~~
c-oreills
And infact you can't have a mix of 2.2 and 2.4 boxes in the same cluster, you
have to go via 2.2.1

------
derricki
I was hoping the security enhancements would include SSL certificate
validation. Anyone know why they don't do that, or how a user should approach
that limitation?

~~~
matthewlucid
We'll be moving away from MongoDB because it doesn't support certificate
validation. What is the point of SSL connections if you don't validate the
certificate? It seems that you get all the drawbacks of encryption (overhead,
throughput) with none of the benefits (security).

I'd love to see a solution.

~~~
kstirman
[http://docs.mongodb.org/manual/administration/ssl/#ssl-
confi...](http://docs.mongodb.org/manual/administration/ssl/#ssl-
configuration-for-clients)

------
jonesjim
Whoop! Multithreaded javascript with V8 JavaScript engine!

~~~
apendleton
I can't actually find any details about how this works in practice. Are
multiple maps and reduces in a map-reduce executed simultaneously within a
single mongod process? If so, how many? Is it based on the number of cores in
the server?

Edit: not asking you specifically, just generally curious.

~~~
tbrock
Each map reduce job can still only use one thread.

Before, under spider monkey, only one job could be executed per mongod
instance. Now you can have many jobs executing in parallel on a single server
instance but each one of them still only uses a single core.

~~~
apendleton
Do you know if this is likely to be permanent? In an application I work on, I
do map-reduces on Mongo data using a third-party framework at the moment
specifically to work around the inability to easily parallelize map-reduce
jobs on a single machine.

------
vailripper
Love seeing the text indexing!

------
ranman
V8 is another important change that isn't really touted much in this release.

~~~
tbrock
Yeah, this is a big deal. It enables very interesting opportunities to create
a map reduce engine that isn't awful or which requires Java.

People use Hadoop because they have to, not because it is great. Isn't it time
for a better alternative now that the toothpaste is out of the tube re:
map/reduce?

Java doesn't even have hash literal syntax! Why would you ever want to query
document oriented data with it as your language of expression?

~~~
lucian1900
It's not hard to write Hadoop map/reduce queries in other languages: mrjob for
python is particularly nice.

It's also not relevant that SpiderMonkey is being replaced with V8.

Mongo's map/reduce is also just a toy, not at all comparable with Hadoop's and
being deprecated in favour of the aggregation framework.

Also, there already are decent alternative map/reduce implementations. Disco
is a good example, with a similar design to Hadoop.

~~~
kstirman
MongoDB's MapReduce implementation is not being deprecated. The primary
beneficiary of the V8 implementation is MapReduce, so it should be seen as
further investment in this area.

You can also run Hadoop MapReduce over data in MongoDB directly (24 - 29):
<http://www.slideshare.net/spf13/mongodb-and-hadoop>

------
leif
I'm glad this is out. I'm really going to enjoy plugging fractal trees in
under FTS.

------
dschiptsov
any row-level locking?)

------
L0j1k
[Comment about supporting only one master]

