1) If you want "SQL and Joins", use SQL. This is like complaining that you can't play Halo on Linux.
1A) There isn't full text search. If you need full text search, use a system with full text search as a feature.
2) Some of the points are out of date (or will be out of date soon). The 30 second limit for cron jobs will be 10 minutes after the next release. As noted, the 1000 results per query limit is gone already.
3) Anything can fail. If you assume your own system won't fail, you're going to be in worse shape later.
4) What objects would you cache that are >1MB anyhow? In almost any case, you'd be better off caching it as multiple, smaller objects.
The problem we ran into with the merge join functionality was the following:
Let's say you're searching for "lcd monitor", your code could do a search for lcd and monitor then merge the result (select * from ngrams where ngram in ['lcd', 'monitor']). There are many lcd monitors so the merge join will find 1000 results very quickly.
Let's say you search for "dell monitor". Unlike the previous search, there aren't many dell monitors but there are lots of dell products and lots of monitors. Your merge join will timeout because there isn't enough time to perform a query for dell and another for monitor then merge the results because of the internal merge-join limitations.
Also, it was VERY expensive to index every document (our data is in a constant flux) so we decided to use a different solution.