

Azure Search as a Service - shliachtx
http://azure.microsoft.com/en-us/services/search/

======
ChuckMcM
So using their numbers (max 36 'units', 15 qps per unit, 25GB) that is a max
search corpus of 540M documents or 900GB, 540qps, $4,500 (approx) per month
which is a discount of 50% so $9K/month when full up). Does anyone know if the
36 host limit arises out of the requirement that all units are in the same
rack?

I'm wondering what the target market for that is?

~~~
liamca
Hi ChuckMcM,

I'm a Program Manager on the Azure Search team. I am going to correct your
numbers a bit. Even though you can have a maximum number of 36 search units,
the number of partitions you can create (currently) is 12. Partitions, by the
way is what you increase to allow you to increase the number of documents.
With this limit of 12 partitions, the maximum size of an index is actually
180M documents or 300 GB (not 900 GB as you stated). So far, we have found
that the vast majority of customers we have been working with fit well below
these limits and in fact even more of the majority fit into the 1 partition
(15 M document / 25GB) range.

For a very few customers we have talked to, there is a need for more than this
and for this we can actually allocate a much larger system that has much
higher ranges. We have an azuresearch_contact email address on the pricing
page ([http://azure.microsoft.com/en-
us/pricing/details/search/](http://azure.microsoft.com/en-
us/pricing/details/search/)) with more details if you need this.

To your other question about racks and search units. You can think of a search
unit as a dedicated Azure VM for your usage. For each additional Search unit
you create is an additional VM for your use. Each VM has a certain amount of
capacity that it can handle. If your needs grow beyond what you can get with a
single search unit, you can move the dial up to increase it whether it is
increasing replica count to add more QPS / High Availability or increasing
partitions to add more documents / faster data ingestion. The way you
calculate the number of search units you have is replicas x partitions, where
each search unit (during public preview) is $125 US / month. By the way a
single replica can handle about 15 QPS which for most customers is more than
enough. But even with this, the ability to scale up and down is pretty
important to a lot of people. Imagine Black Friday in the US where a retailer
gets hammered with searches, yet only wants to allocate increased replicas for
that day to handle the increased query load. There is a bit more information
on this here: [http://azure.microsoft.com/en-
us/documentation/articles/sear...](http://azure.microsoft.com/en-
us/documentation/articles/search-manage/#sub-6)

Hope that helps, Liam

~~~
ChuckMcM
It does help Liam, thanks. I'm coming at this from a web search perspective.
Checking our crawler we have about 16M documents from Wikipedia indexed, which
would presumably fit inside your single partition. The 'hot' crawl (things
that change with a frequency <= 7 days) is a lot bigger than that though :-)

I'm guessing your target market is folks that want to corral their documents?
(sort of like the Google appliance but in the cloud?) What is your privacy
policy on that? (lawyers for example have a lot of documents but rarely put
them in the cloud for example) And when you say 15 qps what is the SLA? I that
at the 50th percentile? 95th? 99th? I've noticed it seems to be hard to pin
down in Elastic Search.

~~~
liamca
ChuckMcM, you are absolutely right. Nailing down QPS rates are an incredibly
tough thing. Not just for Azure Search but also for most Search engines that I
am aware of. Things like #'s of facets, complexity of queries all play a part
in what a search engine can serve up from a QPS rate. When we say ~15QPS we
try to point out that this is based on an average index of the ones that we
have seen from our typical customers. Certainly some customers may see way
more QPS on a single search unit and others will see less.

The main markets (or scenarios) that we target with Azure Search are eCommerce
Retail, User Generated Content sites (such as a recipe site or Hacker News)
and internal organizational apps. The interesting thing about internal
organizational apps is that we are seeing more an more users are finding that
search is a natural way to navigate and explore their data. Users are
typically far more knowledgeable of using search to explore their data thanks
to engines like Google and Bing then then are with say SQL.

We actually don't have an official SLA yet for this preview. That is one of
the goals of this public preview which is to really determine what we can
realistically promise for our v1 release.

Yes, privacy is a thing for sure. It is interesting that you say lawyers
because we have had a number of companies in the law field that have wanted to
use Azure Search. Things like indexing of case documents is quite popular from
what I have seen. In many of these examples (and also with Helathcare
especially), privacy or more specifically encryption at rest as well as
compliance (such as HIPPA) often become critical. As of today we don't have
either. We don't have encryption at rest and we do not have HIPPA compliance
for Azure Search. Of course, this will be a goal and I guess we need to start
somewhere. The encryption as it relates to search is actually going to be a
really hard thing to do properly so that will be an interesting thing for the
future.

By the way, WikiPedia is one of the datasets we often test with our service.
Feel free to ping me as we have a loader for the WikiPedia dataset that I
could look into sharing with you if would you like to play with it and Azure
Search. My email address is my YCombinator username + microsoft.com.

Liam

------
arafalov
"Currently Azure Search does not offer configurable analysis modes" ( source:
[http://msdn.microsoft.com/en-
us/library/azure/dn798920.aspx](http://msdn.microsoft.com/en-
us/library/azure/dn798920.aspx) )

The configurable analysis is the cornerstone of the search functionality and
is a huge portion of Apache Lucene (and therefore both Solr and
ElasticSearch).

So, in my eyes, this offering has not outgrown the pure web-search domain yet.

(edit) Which is strange, because in another comment they do say they use
ElasticSeach under the covers. I even thought that the API interface looked
somewhat similar to ES.

~~~
liamca
arafalov,

The purpose of starting with this "Simple query syntax" was to try to keep
things as simple and straightforward as possible for both the developer and
the users of search. We have only exposed the query syntax that we have found
that customers we have been working with so far have needed. I am sure there
will be more, and as you say, since the core is ElasticSearch, if the demand
for things such as configurable analysis is there, we can certainly look to
expose it in our API. For things like this it would be great if you could cast
a vote in our UserVoice page ([http://feedback.azure.com/forums/263029-azure-
search](http://feedback.azure.com/forums/263029-azure-search)). By the way,
this is a great place to go to see the feedback and suggestions from customers
we have been working with so far.

Liam

------
tmarman
My main problem with this is that a standard instance is $125/mo for anything
beyond the free limits (10k documents, 50mB). It would be great to see pricing
that followed, say, Azure Websites or SQL pricing... $20-40/mo for smaller
instances (smaller by either search volume or index size).

I mean, after all, if SQL Azure supported Full Text Indexes, this wouldn't be
critical either.

~~~
liamca
tmarman,

It is agreed that the jump from $0 to $125 (preview pricing) is a large jump
and I can say we are definitely considering something in the middle ground. I
am curious, what types of features would you be willing to give up for a lower
price point? For example, one option might be for us to consider using smaller
VM's, but that would greatly reduce the document count (perhaps 1M docs max),
QPM's (perhaps max 1 QPS) and/or possibly limit the ability to support for
high availability.

Do these sound like reasonable things to give up for this lower price point?
Do you have alternate ideas?

Liam

~~~
tmarman
From my perspective, I think limiting QPM is better than limiting the number
of documents that you can index. In my current case, I have millions of
"documents" (in a Lucene sense of documents) but relatively low usage.
Obviously, my goal long-term is to increase the usage. So being able to pay to
index a lot of documents but limit the resources (i.e., VM size, but not
storage size) to search would be the best way to scale.

In theory, the more users I have the more $ I have to scale the search.

The solution I'm currently using (mostly because SQL Azure doesn't support
Full-Text Search) is Lucene.NET (which is still on a very old version but
supposedly 4.8 is coming) and AzureDirectory (which leverages Blob storage).
It's clunky, but it works... at least for the scale I use it at currently. I
would love to be able to use Azure Search and scale it up again just like with
all my other services.

------
blutoot
This comes out at a time when I've finally decided to start digging into
ElasticSearch. Since it's a side project, I don't have to worry about scaling
and management issues. So just from the standpoint of functionalities, is
there any advantage of this Search as a Service over an ElasticSearch cluster
on Azure's Virtual Machines?

~~~
Maarten88
A colleague of mine also experimented with ElasticSearch on Azure. His server
got hacked and was shut down by Microsoft after they discovered it generated
large amounts of traffic and participated in DOS attacks. (there was a
vulnerabiliy, and many ElasticSearch servers were hacked a few months back)

So I'd argue that using Azure Search Service, being managed, will free you
from worries of having to manage and update yet another technology.

~~~
tatalegma
Was his server hacked due to the recent vulnerability found in ElasticSearch?

------
ryanburk
this looks fairly promising if the performance is there and you are already
using azure for your data since you push into an index versus crawl/pull.

looking at the tech specs [1] you can only search a single index at a time
which is pretty limiting. hopefully just a limit of the preview.

[1]
[http://msdn.microsoft.com/library/azure/dn798933.aspx](http://msdn.microsoft.com/library/azure/dn798933.aspx)

~~~
liamca
Hi ryanburk,

I am a Program Manager for the Azure Search team. I am glad to hear you think
our service looks promising. You are right about searching across indexes.
This is something we heard often from a number of customers we worked with
before today's annoucement. There are often ways of working around this, for
example by merging content into a single index, but this is obviously not
workable for everyone, so you are also correct, that this is really just a
matter of time.

------
jsmeaton
Is this based on Elastic Search in the background?

~~~
liamca
Yes, the core of Azure Search leverages ElasticSearch. It does however have an
API layer on top of ElasticSearch.

