Hacker News new | comments | ask | show | jobs | submit login

A few of these points are just ridiculous:

1) If you want "SQL and Joins", use SQL. This is like complaining that you can't play Halo on Linux.

1A) There isn't full text search. If you need full text search, use a system with full text search as a feature.

2) Some of the points are out of date (or will be out of date soon). The 30 second limit for cron jobs will be 10 minutes after the next release. As noted, the 1000 results per query limit is gone already.

3) Anything can fail. If you assume your own system won't fail, you're going to be in worse shape later.

4) What objects would you cache that are >1MB anyhow? In almost any case, you'd be better off caching it as multiple, smaller objects.




I've gotten full text search working using techniques similar to those described here:

http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-A...


This type of full-text search works on a small dataset or a dataset which doesn't have a wide variety of data. It uses the internal merge-join functionality of AppEngine, which as the name suggests takes two queries and joins them together. Problem is the the merge join has a very small time limit. I forget the exact timeout (it's undocumented), but it's something 500 milliseconds.

The problem we ran into with the merge join functionality was the following:

Let's say you're searching for "lcd monitor", your code could do a search for lcd and monitor then merge the result (select * from ngrams where ngram in ['lcd', 'monitor']). There are many lcd monitors so the merge join will find 1000 results very quickly.

Let's say you search for "dell monitor". Unlike the previous search, there aren't many dell monitors but there are lots of dell products and lots of monitors. Your merge join will timeout because there isn't enough time to perform a query for dell and another for monitor then merge the results because of the internal merge-join limitations.

Also, it was VERY expensive to index every document (our data is in a constant flux) so we decided to use a different solution.


Not sure if this will solve your problem, but have you looked at GAELucene? http://code.google.com/p/gaelucene/


I find it amusing that Google doesn't have a full-text search engine. I seem to remember them having code to do that...somewhere. Hmm.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: