

CircleCI DB performance issue post-mortem - misframer
http://status.circleci.com/incidents/hr0mm9xmm3x6

======
numlocked
Thanks for the detailed write up guys. Doesn't sound particularly fun. We're a
customer and were somewhat inconvenienced by the downtime, but overall we've
been thrilled with your service. Keep up the great work.

I'd love to hear how you identified the code that changed while live-patching
the Clojure app in order to integrate it into your repo later on. Is there a
way to automatically identify the diffs between your codebase and what's
running in production? Or did you just review repl logs to see what had
changed? Or do you have in-house tooling that allows you to do that?

~~~
arohner
I have no knowledge about what Circle did during the downtime, but I can talk
as a Clojure expert.

> Is there a way to automatically identify the diffs between your codebase and
> what's running in production?

In general, no, because Clojure functions aren't serializable, so it's not
possible to get the source for a running function in production. There are
hacks [1] and approximations [2], but I'm not aware of anyone running them in
production.

> Or did you just review repl logs to see what had changed?

Logging the REPL is possible, but it is far easier to do client side than
server side, meaning every dev would have to have it installed, and it's easy
to "circumvent" by e.g. SSHing into the production box and running a repl in
the SSH terminal. (I don't mean circumvent negatively, just that 'install this
tool on every dev IDE and use it at all times' is a somewhat fragile process
during a firefight).

Also, the last time we investigated REPL-logging, it was too verbose to be
convenient. Most nREPL tooling evals a significant number of expressions to
e.g. grab docstrings and function arities, so it's hard to isolate the actual
changes made. The difference of a single keystroke is also the difference
between "eval this function" and "eval this entire file". If the developer
evaled the file, you'd then need to diff the string buffer that went over the
wire with the committed file, rather than just saying "oh, Alice eval'd this
one fn".

If you have live-patched code that you want to keep, the simplest solution is
just push it to production as part of your normal deploy process and reboot
all servers, so they're guaranteed to match the committed code.

[1] [https://github.com/technomancy/serializable-
fn](https://github.com/technomancy/serializable-fn) [2]
[https://github.com/gtrak/no.disassemble](https://github.com/gtrak/no.disassemble)

~~~
hga
There's no good nREPL tool that keeps a transcript of what you did??? Yuck.

While I grant that I'm an old school Lisper, e.g. Maclisp, when I did some
work in Clojure a while ago none of the nREPL tools looked compelling, and I
did this sort of thing in an EMACS shell buffer associated with a file,
talking to a plain REPL.

Although, yeah, evaling a file isn't well captured by this approach.

