
Building a Full-Text Search App Using Docker and Elasticsearch - quotable_cow
https://blog.patricktriest.com/text-search-docker-elasticsearch/
======
jcadam
The sync issues between an RDBMS and ElasticSearch are definitely the most
annoying aspect of getting something like this working.

In the case of my current project, I've got an 'indexer' service whose sole
job is to await messages from RabbitMQ to create/update/delete records to/from
the index.

But sometimes an update may get missed for any number of reasons. So, I've got
a systemd timer that fires off a little script to go through my audit tables
in Postgres and make sure all changes since the last run were actually, in
fact, synced to ElasticSearch.

And I'm still wondering if what I'm doing is 'good enough.' How else could
this thing fail?

~~~
amelius
What if your server crashes after you commit to the database, but before you
submit to elastic-search? Even if submitting is flawless, this still presents
a problem.

~~~
jcadam
Depends on which server :D The Database, ElasticSearch, RabbitMQ, and the
application services are on different VMs (in my case, Linodes).

Committing to the database will trigger an entry (via a trigger function) in
an audit table. So there is still a record of the transaction. After the
system is restored to a healthy state, the next time the sync script is run
(either automatically or manually if I'm feeling anxious), it _should_ catch
up.

Of course, in a worst-case scenario, I could always rebuild the index :O.

------
sailfast
This is a great, really comprehensive walk-through!

For folks that want to work with a bit more than books, there are a bunch of
examples on Elastic's Github profile - I've found this useful for testing
queries that pull back visualizations, etc:
[https://github.com/elastic/examples/](https://github.com/elastic/examples/)

If you want to test query results without building a whole front-end / API,
you can also use Sense / Kibana to query directly via the query language or,
enable CORS and use something like POSTman.

If you don't want to use node I'd also recommend taking a look at their python
client.

------
marknadal
This is epic! The COO at Docker is one of my advisors and some customers of
ours are doing really neat open source ElasticSearch integrations:

Even comes with a dashboard for dataviz and querying the FTS system:
[https://github.com/lmangani/gun-
elastic/blob/master/README.M...](https://github.com/lmangani/gun-
elastic/blob/master/README.MD#screenshots)

And one of the other comments asked about how to keep their RDBMS synced with
ElasticSearch, that is also what the above linked person is working on, I know
they already have it syncing with Cassandra. I'm sure Postgres is just a plug
and play away.

------
amelius
When I run elastic search out of the box, I've noticed that search is quite
slow, on the order of 5 seconds per search, over say 10000 PDF documents
(papers), on a single node. Is this normal? And is there an obvious place to
look or a (quick) fix which doesn't require me to delve into the ES details
too much? (as I'm not a Java person)

~~~
rpedela
Sounds slow, but it depends on the total size of the index. I could see 10K
PDFs being big enough to slow things down. The easiest thing to try is setting
the Java heap to half the RAM up to 32GB. The more RAM the better in general.
I have also found setting the number of shards to the number of CPU cores to
be optimal in many cases, however if a shard is too big (>50GB) then it can
start causing other problems. Again the more CPU cores the better in general.

Here are the relevant docs, and don't worry you don't need to know Java to get
good performance.

indexing:
[https://www.elastic.co/guide/en/elasticsearch/reference/curr...](https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-
for-indexing-speed.html)

searching:
[https://www.elastic.co/guide/en/elasticsearch/reference/curr...](https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-
for-search-speed.html)

disk usage:
[https://www.elastic.co/guide/en/elasticsearch/reference/curr...](https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-
for-disk-usage.html)

~~~
amelius
Thanks! I'll have a look at that!

------
hux_
This is a great overview. Some numbers from step 4 - size of data set, time to
load index and space used on disk by that index+ES would be interesting to
see. I have a feeling for small datasets postgres/mongo even sqlites FTS would
perform better.

------
dqoo
Great tutorial

------
jarl-ragnar
It's really nice to see a full-stack example like this that builds out a
useful application while demonstrating how to utilise a broad spectrum of
technologies. Good job.

