With DocumentDB, not having a local version severely limits what I'd consider this for. Losing that flexibility is a big deal. Maybe this is just a limited preview and they haven't build the management side for local installs.
I am a Program Manager for Azure Search and as curiousDog mentioned, yes along with DocumentDB, we also announced Azure Search which is a PaaS based full text search service. We actually leverage ElasticSearch at the core of this service and as chippy says about spatial search, we do have the ability to provide a pretty solid geo-spatial search capability thanks to Elastic Search and Lucene. To nemothekids's point, is it very unlikely that we will offer this as a local (non-hosted) version because we found that although ElasticSearch is awesome, one of the common complaints many admin's have is the complexity around managing systems such as ElasticSearch/SOLR/Lucene at high scale and how difficult it is to implement more advanced search capabilities such as tuning and relevancy. Those are areas we think we can add a lot of value being a fully managed service. Longer term, we think this will also allow us to bring even more value on top of search by adding in other Microsoft technologies. For example, we could tie in Bing Maps to allow you to easily tie in reverse geo-coding right into your search. Or perhaps allow you to leverage Bing's synonym list so that you could allow people to search yet find results that are synonyms to commonly searched words (i.e., user types in shoes, but in your content it is referred to as footwear). Multi-language support is actually one of the big things we want to tackle in the short term and we believe that the NLP from Office will really help jump start us with this.
I know everyone is all about running huge clusters behind the scenes, but most people simply don't need that, and being able to start small, and buy bigger would be a nice option.
For example, a simple node application using leveldb(levelup) for the storage interface could be very effective as a backend for development/testing. From there, you provide an API compatible interface. Open-source that version, with the disclosure/understanding that the Azure hosted version is much more robust.
I think you'd see a lot more buy in from the open-source community... even more so if you accepted PRs to make the open version more robust.
If you are at a point where you are considering the likes of ES/SOLR/Lucene etc, you are likely ready to make the jump to self-host in the cloud, or use a SaaS provider. Where people get a bit concerned is in the lock in. I know why Azure would want to present that, but I think it's a bad idea without an open implementation that allows for self-hosting for development and on the small scale.
Right now, the company I work for is hosting in Azure. I recently switched from a couple RabbitMQ queues to Azure Queues. It works fine, as was a really simple replacement for a flow of near real-time but temporary data keys. I would be much more open to using a hosted MongoDB from MS than DocumentDB, in much the same way I'm happy to see you guys embracing Redis for a cache system.
I hear you on this and to add another justification for the need for openness, let me give you another example. With ElasticSearch there are some amazing tools that are available. A few that come to mind are Kibana and LogStash. Given that we have our own API layer on top of ElasticSearch, we are not able to support these tools even though we are using ElasticSearch at our core. This is most unfortunate.
There are many reasons why we put an API layer on top of Azure Search. One key reason is that for our particular customers, we found that there were things we could do in our API that could simplify the interaction with Search. In fact, we have a system called Scoring Profiles that allow you to easily (I hope) set weights on important fields and attributes to quickly tune the results of your search based on what is important to you. No coding required. Another reason we don't just expose ElasticSearch is that ElasticSearch allows you to run random code. This is generally not a great idea in a PaaS service and can often lead to issues. There are a number of other reason that I'll skip for now.
We still need to do some thinking in this area. I hope we can get Azure Search to a point that we can enable tools such as Kibana and LogStash to work with our service without compromising the goals of what we are trying to build. Not only would this allow us to really open up the types of things people can do with the service, but I suspect it would really help reduce concern around vendor lock in. We'll see if we can get there...
Also the Search API is terrible.
> Want to edit or suggest changes to this content? You can edit and submit changes to this article using GitHub.
Pretty remarkable given Microsoft's approach to open source in the 1990s that they're now using a service built around Linus's bespoke open source version control system to allow people to suggest changes to their documentation.
Edit: the downvoters clearly weren't in the industry in the 80's and 90's and haven't dealt with them in the enterprise / volume licensing department recently. Comparing the two, they're even more ruthless, unfair and incompetent than ever and will screw you as hard as they can once you're locked in.
This stuff gets you through the door, as does BizSpark etc, then you're not a friend but a cow for the milking via VL, audits and licensing changes.
I speak from experience working for 4 paid up gold partners over the last decade and then dealing with them in a corporate capacity back to '95. Every game ends the same.
Edit 2: appears you can't tell the truth about Microsoft these days in the same fashion you couldn't tell the truth about Apple about 3-4 years ago...
...in the same fashion you couldn't tell the truth about Apple about 3-4 years ago...
Yeah, if only there had been lots of people highly critical of Apple in 2011. Whatever happened to that Android thing, anyway?
Only the marketing and front end has changed. The cogs that drive the machine and the revenue mill have the same components and structure.
The market position is pretty much the same. Bar some new consumer markets, they have almost total domination of the business and enterprise sector. They even made a big dent in the entertainment sector with XBox with the piles of cash and losses they incurred and came out on top.
Because most were trite-BS? Like the same advice market pundits used to give Apple that in hindsight was always wrong, like that "Zune will crash them", or they "need to make a netbook NOW" (in 2010), "stuff is overpriced" etc etc. Heck, people were even championing the Dell Ditty in forums...
Now, if you have something serious to say about Apple, e.g regarding their technology, or the consequences of having a walled garden approach (and say it without assuming that everybody in a discussion "ought" to be against a walled garden approach), then I don't think there would be a problem. We have had serious discussions critisizing Apple in HN for ages.
It makes me forget they killed my pappy!
Because, I'd say, if the software is good, and fits its users needs, then "releasing proprietary software behind a walled garden" is totally celebratory-worthy too.
Releasing something for the public good, open source, open for all to use is an entirely different, more applaudable action.
You don't get browny points for being baiting your traps. Even if it's fancy cheese.
A turd of a software, even if libre, is still a turd. I won't celebrate it just because someone offers it for free.
I'm not talking in general: there's excellent libre software, and excellent proprietary software too.
But there's specific libre software that's just plain crap for most use cases compared to its proprietary alternative (e.g consider a high-end DAW and the libre DAWs. Or a high-end NLE and the libre NLEs).
And some other libre software is mighty fine in itself, but lacks other characteristics that some proprietary software has (from 24/7 paid on the phone support, to quality documentation, to working with your preffered OS or your other infrastructure, etc). So some people can make good use of it, while for others it's not suitable.
It's weird that you think these two marks cannot both be hit in one stroke.
I was there in past decades. We're better off than ever, both with regard to accessibility and pricing of proprietary software and with regard to abundance of libre software.
- 1PB for assets
- 10TB for database
Even today, after disposing of numerous divisions, it's still a $100bn company, ie still larger than Microsoft. (So you pay IBM more than you pay MS every year, and you always have done.)
However, it is true that Microsoft is the IBM of PC software, just as Google is the IBM of the web, Intel is the IBM of processors, Facebook is the IBM of social networking, Cisco is the IBM of routers, Oracle is trying to be the IBM of server software, Apple is the IBM of hipster status symbols... Well, you get the idea.
IBM used to be the IBM of everything in IT, and started getting into other areas (telephone switches, copiers, cash machines etc) before the wheels started to come off.
Today, Google is a lot more powerful than Microsoft, much scarier, and much more ambitious in IBM-like ways (self-driving cars, robots, superbrains etc).
I lived through it.
My only intention was related how IBM focused away into other areas like consulting, to stay relevant in the PC world.
If they want to be the next IBM, they have a lot of work ahead of them.
Actually, there is one way that Microsoft did become the new IBM. Long ago, I went to a talk by a senior IBMer and he said "We used to be the Evil Empire. Somebody else has that job now."
The Microsoft of the 1990s wouldn't be using Github.
> Microsoft's approach to open source
has become... when they're only using GitHub as a free-hosting CMS provider for docs.
I can't see how releasing a proprietary "DocumentDB" on a Microsoft-only Azure cloud is a glowing endorsement or valued contribution to OSS. Despite what their marketing messaging says about how "Open and approachable" it is: http://blogs.msdn.com/b/documentdb/archive/2014/08/22/introd...
Pricing is based on "capacity units". Starts with $22.50 per month (this includes 50% preview period discount). One capacity unit (CU) gives 10GB of storage and can perform 2000 reads per second, 500 insert/replace/delete, 1000 simple queries returning one doc.
In order to see pricing details, change the region to "US West":
Very interesting addition to Microsoft offering. I was actually just yesterday wondering if they have any plans for this kind of service. Table Storage is quite primitive and Azure SQL on the other hand gets expensive when you have lots of data.
One potential "problem" with this is the bundling of storage capacity and processing power. If I understand this correctly, I would need to buy 10 CUs per month to store 100GB of data even if I'm not very actively using that data.
How does that work? Isn't that going to incur a major performance hit? If not, why don't other databases get rid of indexes?
Also, if anyone from MS is reading, http://azure.microsoft.com/en-us/documentation/articles/docu... links to http://azure.microsoft.com/en-us/documentation/articles/docu... which is a 404 error.
I don't think it would be feasible to have "indexes for everything", as the numbers involved would scale geometrically.
So if I have three fields - a, b and c - and by hypothesis I want to map all possible queries in a naive way, I will need to build, at the very least, the following indexes:
[a, b, c]
[a, c, b]
[b, a, c]
[b, c, a]
[c, a, b]
[c, b, a]
all of which represent a different and valid way of querying my data. Anything less than this would leave some valid query uncovered by the indexes. These are 6 indexes, which is 3! (factorial). Add a fourth field and we have 12 indexes, and so on. Hence, geometrical growth (to be fair, factorial growth is even greater than geometrical).
Which is why I'm thinking that either (i) indexes get created automatically based on the most frequent / heavy queries, or (ii) indexing works differently for DocumentDB and they are actually able to map the document space in a more efficient way (but I'd say that we lack the technical details to jump at this conclusion, at the moment).
I wouldn't tie a product to a single cloud vendor.
However there is an overwhelming trend towards hosting key parts of your infrastructure including data storage. Managing a database is surprisingly difficult in particular at scale.
Not sure if your interpretation of his point was wilfully incorrect for some reason, but it was quite obvious what he meant.
I'm totally willing to believe that DocumentDB beats the pants of MongoDB on just about every axis (in fact, that seems pretty likely) but it's going to take some actual numbers and a better description of the internals.
All I can really say is that the replication model provides a significant performance boost over MongoDB in the multiple replica (i.e., production) scenario.
We were using MongoDB at Microsoft for a while (I left MS almost a year ago). I was developing a real-time metrics system with it. It was very unstable at our target load (500k increments per minute, high percentage of tomorrow's documents preallocated the day before). We only managed maybe 10% of that with MongoDB, IIRC. Sometimes it would choke and not come back until I restarted the cluster (~30 machines total, I believe. 3 replicas * 10 shards).
We were so sure that MongoDB should be able to handle this scenario, since they talk about it in their documentation. After talking with the MongoDB devs, we came to the conclusion that even though we were issuing increment operations on preallocated documents, MongoDB was:
a) using a global lock on the "local" db used for replication, and
b) "replicating via disk" instead of via the network. In other words, replication requires writing to the journal journal before other members of the replica set have a chance to apply the change and ack back. This results in a loss of concurrency.
The lack of async query support in the C# driver didn't help either.
Eventually we used a replicated, write-back cache which sits atop the framework DocDB uses. Not a fair comparison, but the goal was achieved easily with 1/3rd the hardware. We just backed it onto Azure Table Storage. Our queries were all range queries, which table storage supports.
I can't talk about the framework, unfortunately.
I wanted to avoid having to deploy and maintain a database system, so using table storage was a solid choice.
I pray Microsoft is looking for Python developers: https://gist.github.com/whalesalad/2142f0075c6896f4547c
for i in range(len(path_parts) - 1, -1, -1):
if not path_parts[i] in resource_types and path_parts[i] in resource_tokens:
Have I missed something, or have MS delivered a novel and valuable feature? I'm not aware of support for transactions across documents in other NoSQL platforms. I'd be grateful if someone has any experience or better information in that regard, thanks.
Quite an amazing piece of technology, check how they test cluster failures, there is a blog post about it :)
What's the max. duration of database query, max size of query result.
What kind of performance can be expected, does it decrease as the size of database increases or it remains constant?
I'm going to wait a few days until hype settles.
Great find though. Thanks.
Airport, Bootcamp, Safari, Time Machine, FaceTime, Grand Central Dispatch, QuickTime...
and I won't even start on the open source side of things...
"Oh cool, which document database is it? MongoDB? CouchDB? Cassandra?"
DocumentDB seems much more similar to mongodb and appears to have a very flexible query ability. In my opinion, one of the best features with DynamoDB is you can tune the number of reads/writes each individual table requires. This lets you scale up and down your database and greatly helps keep costs down. This is a feature that only a hosted database service can offer. I haven't yet read any pricing info on DocumentDB, but hopefully they offer a similar feature as this is really where a hosted database service can shine.
Instead it's a key-value wide column store, with 1st and 2nd level index support.
I haven't delved into the MSFT pricing model, but DynamoDB is pay-for-through put. You provision your table with a certain amount of read/write performance that is guaranteed, and you pay for that.
Generating a list of all used tags:
r.table('posts')('tags').reduce((a, b) => a.setUnion(b))
r.table('posts').filter((doc) => doc('tags').contains('foo'))
return doc.city == "Melbourne" && doc.rating > 4.5;
Then the introduction needs some work.
"DocumentDB enables complex ad hoc queries using the SQL dialect"
"Azure DocumentDB offers the following key capabilities and benefits: Ad hoc queries with familiar SQL syntax"
The "Query DocumentDB" article also seems to be focused on that SQL dialect.
I would say it allows flexibility in thinking about your storage and access patterns, which in turn allows flexibility in use. Hopefully without the expense of performance, and integrity.
But there are cases where you need to build the query object programmatically (and dynamically), and I believe it's a bit awkward to do that to obtain an SQL statement. But possibly it's just my personal taste :)
I remember awhle back doing work on a system that (ab)used MS Exchange Server 5 as a database, mostly because of the Outlook integration.
But it will be very interesting to compare features, Raven really has a lot of advanced features.
Not only that, there are better products that are free.
My thinking is, scale-out deployments aren't all that likely to pay at SQL-Server-like rates per CPU anyway, and you're helping give the Windows world better parity with the horde of Linux options for folks that need scale-out (or, just as commonly, hope to need it someday).
On the flipside, by doing that you might forego some sales of SQL Server (though I suspect that's limited; most folks that need SQL Server really need SQL Server) or sales on Azure that could theoretically help recover the dev costs. But greatly improving the scale-out-on-Windows story seems like a big deal, the kind of thing that might justify going to lots of effort to make a DB product then giving it away.
Yes it is a simple question, and the answer is NO.
No thanks. I think this will do just fine.
And looking at their diagram I already see where they can make this strategy work. While their collection/document/attachment structure is quite straightforward to move to another document DB, their UDF, sprocs and triggers I bet will be microsoft-specific. And microsoft will make everything possible to "optimize data fetching by running processing code close to data" and lure developers into using these. And in the end locking software solution to microsoft platform.
I've been working with MongoDB for the last year and DocumentDB features look very interesting to me.
Everybody else uses open source on premises or their cloud of choice.