
Using TLA+ for fun and profit in the development of ElasticSearch [video] - ngaut
https://www.youtube.com/watch?v=qYDcbcOVurc
======
jaytaylor
If you're already familiar with Elasticsearch, the interesting TLA+
presentation begins ~8 minutes in.

[https://www.youtube.com/watch?v=qYDcbcOVurc&t=479](https://www.youtube.com/watch?v=qYDcbcOVurc&t=479)

------
ignoramous
I spent time frequently mucking with the elasticsearch codebase from 5.x
through till 6.4 strictly from durability and resiliency PoV [0]. The
improvements it is seeing in 7.x are tremendous. It still had the occasional
tendency to go hay-wire and show behaviour that simply didn't make sense and
couldn't be recovered from, but I'm confident 7.x will push it to making it
even more resilient than 5.x+ [1].

The sheer number of configurable options for the cluster bringup, index setup,
plugable components turns it into a very hard to tame beast. There is an
opportunity here somewhere for someone to make Elasticsearch on Rails
(convention over configuration). With 7.x, I feel the time is just abt right
to try something like that.

What excites me most abt 7.x is the overhaul of underlying _distributed
engine_ that's been going on for over 2yrs [2] that will enable a whole slew
of super nice features [3] that'd put it on par with state-of-the-art
distributed databases in terms of not only features but resiliency.

[0]
[https://www.elastic.co/guide/en/elasticsearch/resiliency/cur...](https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html)

[1] [https://www.microsoft.com/en-
us/research/publication/pacific...](https://www.microsoft.com/en-
us/research/publication/pacifica-replication-in-log-based-distributed-storage-
systems/)

[2]
[https://github.com/elastic/elasticsearch/issues/10708](https://github.com/elastic/elasticsearch/issues/10708)

[3] [https://www.elastic.co/blog/elasticsearch-sequence-
ids-6-0](https://www.elastic.co/blog/elasticsearch-sequence-ids-6-0)

------
hcnews
Can someone summarize TLA+ functionality in 2-3 sentences. It keeps coming up
on hackernews but I haven't heard it any other context (and I have worked at
faang, so should've heard of it if it was generally useful). Just trying to
figure out if its a niche or something I should know as a distributed systems
person.

~~~
gamegoblin
TLA+ allows you to encode your system as a state machine, define invariants
that must always be true, and the checker will explore your state machine and
verify that the invariants are always true.

An example would be encoding a distributed locking system, with an invariant
saying "no lock is owned by more than 1 node at a time". You would encode all
of the locking and unlocking behavior in your state machine, and then the
checker would verify it.

~~~
jsjolen
So to ensure that your system follows this state machine you need to write a
proof for that in some other theory (like an abstract interpretaion of your
program)?

~~~
dwohnitmok
In theory (no pun intended) yes.

In practice this is not done. Perhaps never done. I know of only one toy
example that you could maybe extend to do this. TLA+ is meant to double-check
design level questions and intuitions rather than code level or implementation
level ones. That is it attempts to give guidance on whether a design is flawed
rather than whether an implementation successfully follows a specified design.

This may sound limiting, but the fact that this is an artifact completely
separate from code makes TLA+ much more attractive from a business risk
perspective in my opinion.

------
pron
For other reports/examples, visit
[https://old.reddit.com/r/tlaplus/](https://old.reddit.com/r/tlaplus/)

