
Amazon CloudSearch - Start Searching in One Hour for Less Than $100 / Month - jeffbarr
http://aws.typepad.com/aws/2012/04/amazon-cloudsearch-start-searching-in-one-hour.html
======
jtchang
You know this doesn't seem like a bad deal though $100/mo might be high for
someone just starting out. Right now my options for search are:

    
    
      Full text SQL search
      Apache Solr or something similar
      Google Search Appliance
      Custom search
      Google free search on your site
    

Yay for search as a service.

~~~
rb2k_
And if you want a hassle free scalable search service with automatic
sharding/scaling, a lucene underpinning and a nice REST API:

Elasticsearch

~~~
Mikushi
And if you want an actually good search product there is always SphinxSearch.

~~~
lobster_johnson
Last I checked, Sphinx had a huge design flaw in that it indexed directly from
an SQL database. In other words, your Sphinx configuration not only needs to
have read access to the database, it needs to contain the required SQL
queries.

This tightly couples Sphinx to your application and your schema, and creates
serious issues for your ops team since every app change potentially needs to
modify the Sphinx config. It gets particularly hairy when you want to host
multiple applications using a single Sphinx daemon.

We started out with Sphinx for our apps but quickly discarded it in favour of
ElasticSearch, a much more elegant and orthogonal piece of software.

~~~
paulsmith
That's not true, you can pipe in data from any source:

<http://sphinxsearch.com/docs/2.0.4/xmlpipe2.html>

------
staunch
Feeling for the guys who started IndexTank replacements and other Search-aaS
companies. Infrastructure is a poor place to be with AWS around. Just a matter
of time until they offer _every_ low level service.

~~~
diego
They can have one advantage over AWS: customer service. That's what made
IndexTank, we would have never made it as user-friendly without our close
relationship with our users.

~~~
cannuk
I can definitely attest to IndexTank's fantastic customer service making a
difference. We used the service for our startup. I remember dealing with
Ignacio. Not only did he take the time to help us when we were using the free
service, he took the time to look at our site (and give us a much needed ego
boost by calling it a good idea). He took the time to understand the idea and
figure out just what we needed. That is not going to happen with AWS...

------
fellowshipofone
I am super excited about this announcement, but for now CloudSearch seems to
be supporting only 3 types of indexes
([http://docs.amazonwebservices.com/cloudsearch/latest/develop...](http://docs.amazonwebservices.com/cloudsearch/latest/developerguide/configureindexfields.html)),
so no geo-search probably and other filters that might be welcome in upcoming
releases

------
cpg
Theoretical, fun, exercise for the reader: How much would it cost in AWS fees
to index the same amount as Google and make it available to search?

The answer is in two parts: 1) "fixed" cost to upload the data (say in one
shot) and 2) the hourly/daily/monthly) cost to make these search instances
running.

------
alexro
IndexTank had one appealing option: it allowed to change associated
statistics, like number of votes, without re-indexing the document. It also
allowed to dynamically use ranking functions.

Also, other solutions allow indexing in different languages, I can't find this
in the CloudSearch.

~~~
codexon
How can you change statistics like number of votes without reindexing?

I don't see how that is possible unless that field wasn't indexed.

~~~
mthreat
This would be stored in a numeric value. IndexTank (and Searchify which runs
IndexTank) stores numeric values in a normal array in RAM. So an update is
essentially equivalent to:

array[index] = updatedValue;

And then you can sort or filter by these numeric values, as well as the usual
text searching. More info in the docs here:
[http://www.searchify.com/documentation/python-
client#documen...](http://www.searchify.com/documentation/python-
client#document-variables)

Note that if you update a text field, it does a normal reindex (although it's
true real-time in IndexTank).

------
matan_a
It's nice to see Amazon entering this space as they do have the expertise to
keep this up and running as well as provide the scale needed as apps grow.

What i always tho is the ability to run search queries that also involve
dynamic grouping (like grouping by random combinations of facets) and
providing those aggregated results.

Only thing i've seen that can do this "on the fly" is SenseiDB.
CloudSearch/Solr/etc seem to need preprocessing to get this right.

------
cbg0
It would be interesting to see how this performs, compared to running Lucene,
Solr, Sphinx, or what-have-you on an EC2 instance with equal resources.

~~~
huoju
Lucene and Solr eat too many memory, they are not fit to vps users.
Cloudsearch is more cheaper than setup a delicate server for search.

~~~
Ecio78
it seems that the latest Solr 3.5.0. release use less memory: _Bug fixes and
improvements from Apache Lucene 3.5.0, including a very substantial (3-5X) RAM
reduction required to hold the terms index on opening an IndexReader_
<http://lucene.apache.org/solr/solrnews.html>

------
edouard1234567
They claim to support realtime indexing. I wonder what they really mean by
that and how it impacts performance. SOLR is a great piece of software, I
start using it 5 years ago and more recently implemented a near realtime
indexing(publishing) integration but I always had to make some kind trade off
between high performance and near "realtime indexing"

~~~
stdbrouw
It says "near-real time", which is really just a synonym for "fast".

------
edbloom
Very interesting. I have a project that I'm about to start which is going to
have an index of 4 million plus data records where I need high performance
faceted search. I was exploring using Solr but may now give this a look as I'm
planning on putting the app on EC2. Are there any technical details/tutorials
on how their facets are configured?

~~~
mthreat
Hope you'll excuse the shameless plug, but check out Searchify's hosted
search. It covers the requirements you mention - easy-to-use, fast faceting.
<http://www.searchify.com/documentation/tutorial-faceting>

------
intended
looks like some people would want help in converting document X into a - "JSON
or XML that conforms to our Search Document Format (SDF)"

This is still going to be a painfull task for legacy data - it has to be
massaged into shape. Should be interesting to see how this gets applied
though.

~~~
spullara
Why in the world would transforming your structured data into XML or JSON be a
painful task? Essentially every application built is massaging data from one
format to another, this being one of more straight-forward transformations. Is
building an RSS feed hard? Seems about the same level of difficulty.

------
sdfjkl
Having added my own "site search" for an existing project using django-
haystack + Xapian, I was surprised how easy the whole implementation was
(spent about a day on it, plus a bit of tweaking). Of course that was a
comparatively tiny index.

------
mfringel
This sounds like the way you get to creating a Web Commerce Server-as-a-
Service. Pricing and Inventory goes into the cloud, and you get all the
standard retail searchability "for free" i.e. dis-contiguous price searches,
etc.

------
uptown
Can anyone recommend the best book or books to learn about the Amazon cloud
stuff? I do better with books than online docs, and am coming from a VPS
background.

------
marathe
Quick write-up here: [http://webdev360.com/amazon-finally-moves-into-search-
with-n...](http://webdev360.com/amazon-finally-moves-into-search-with-new-aws-
tool-41966.html)

Interesting to note that Pando Daily reported this as a rumour almost three
months ago (though they got the announcement date drastically wrong:
[http://pandodaily.com/2012/01/17/good-news-for-
ec2-customers...](http://pandodaily.com/2012/01/17/good-news-for-
ec2-customers-amazon-may-launch-new-cloud-search-tomorrow/)

------
zargath
interesting, I will check out for sure. Any plans for EU release?

I am very interested in the facet-search functionality, anybody know if there
is sort options on the returned facet's ? Most search engines just sort facets
by number of hits.

~~~
edouard1234567
You can sort facets in your application, not sure why you'd want the search
engine to sort them, doesn't seem like a feature that belongs to the "back-
end" search engine.

~~~
jahewson
Say a query returns a million results, it makes much more sense to sort them
on the search server and return the top 10 than to transfer them all to the
application server and sort them there. Another use case is a custom ranking
function which boosts the score of newer documents.

------
joshu
anyone know what it is underneath the covers? elasticsearch?

~~~
mryan
It says it is based on the same search that powers Amazon.com, so presumably
it is A9: <http://a9.com/>

~~~
hboon
Just like they said AWS is the infrastructure that powers Amazon _when_ AWS
started out. I'd read this with a pinch of salt.

~~~
aphexairlines
Amazon never claimed this.

~~~
peyton
Here's the initial announcement of EC2: [http://aws.amazon.com/about-
aws/whats-new/2006/08/24/announc...](http://aws.amazon.com/about-aws/whats-
new/2006/08/24/announcing-amazon-elastic-compute-cloud-amazon-ec2---beta/)

> run on Amazon’s proven computing environment

Yes, Amazon never explicitly stated, "We run our site on a fleet of EC2
instances, and you should too," but they certainly weaseled a connection with
their main site that didn't exist.

~~~
StavrosK
This says that EC2 runs on Amazon, not the other way around.

~~~
peyton
What does running on Amazon mean (in 2006)? Remember, the company is
officially _Amazon.com, Inc._ , i.e. the ecommerce site turning tens of
billions in revenue.

Here's an excerpt from the Businessweek cover story[1] a few weeks after EC2's
launch:

> Amazon is starting to rent out just about everything it uses to run its own
> business, from rack space in its 10 million square feet of warehouses
> worldwide to spare computing capacity on its thousands of servers, data
> storage on its disk drives, and even some of the millions of lines of
> software code it has written to coordinate all that.

Weeks later Bezos discussed the $2 billion investment in _Amazon.com_ 's
infrastructure [2]. In effectively the same breath he mentioned 200,000
developers signed up for AWS. _This is deliberately deceptive._ Bezos linked
AWS with Amazon.com's infrastructure, when the two are totally separate.

Signs point towards EC2 coming about through the intransigence of a couple
people [3], not through a deliberate effort by top brass to rent out
Amazon.com's infrastructure. Implicating the main site was a tactic to inspire
confidence and provoke experimentation. It's unclear whether EC2 would have
found the same success had Amazon not papered over the inchoate AWS
architecture by invoking the Amazon brand.

I love AWS. Their blog is the only corporate blog I subscribe to, because
everything they post is so friggin' cool. Sentences like these...

> If you have ever searched Amazon.com, you've already used the technology
> that underlies CloudSearch.

...are weasely and unnecessary. Maybe CloudSearch is functionally identical to
Amazon.com search or maybe it isn't. Everyone understands there are tradeoffs
to be made. I just wish Amazon were more transparent about its architecture.

[1]:
[http://www.businessweek.com/magazine/content/06_46/b4009001....](http://www.businessweek.com/magazine/content/06_46/b4009001.htm)
[2]: <http://aws.typepad.com/aws/2006/09/we_build_muck_s.html> [3]:
[http://itknowledgeexchange.techtarget.com/cloud-
computing/am...](http://itknowledgeexchange.techtarget.com/cloud-
computing/amazons-early-efforts-at-cloud-computing-partly-accidental/)

~~~
nickm12
> If you have ever searched Amazon.com, you've already used the technology
> that underlies CloudSearch.

I don't see what is "weasely and unnecessary" about this. It's vague, sure,
because "technology" is a vague word. But it seems a bit of an overreaction to
believe this statement is actively trying to deceive. I read it as "we use
this technology to run the amazon.com website", implying that it is up to the
task of running your own web service. Time will tell if that actually proves
to be true.

------
brainless
Does this literally kill the Search as service provides like IndexTank, Unbxd
etc.? It becomes a really tough market with AWS doing the same (~very similar)
service.

~~~
tybris
Literally kill? I certainly hope not.

