I don't know if it qualify as "big data", but to fetch, index, classify (meta-data analysis) and rerank daily (6 hours) a set of more than 5 millions medical publications, the stack is simple :

- Glassfish 3 (EJB, JPA, JSF)

- PostgreSQL 9

- Lucene (used as an index and as a NoSQL store)

- hardware: blazing fast SSD and 8GB+ RAM for PostgreSQL

I'm impressed by PostgreSQL, handling without a fuss tables containing more than 60 millions entries (well, I like tuning the DB conf and the ORM, it's usually the best effort/result ratio in devs :)

