If I recall rightly, their example was a customer searching their old orders. The customers with the slowest response times were their very best customers, and they wanted those people to have at least as good as an experience as the median customer.
That definitely change my attitude to performance tests: now whenever I'm picking metrics, I think hard the real-world implications of the levels I'm setting.
Here is a bit more info: http://codesith.blogspot.com/2012/06/tp99.html
If i'm a human waking up at 3AM and I look at the logs for this server, the first thing i'm going to think is "is the time on this server the same as where I am?" followed by "how long ago did these events occur in relation to me?" The easiest way for any human to do this is to compare timezones. If you condition yourself to know the UTC difference of every time zone, then this works automatically, but [i'd argue] most humans are better at estimating timezone difference from other timezones, such as how California and New York are 3 hours difference, and New York and London is 5 hours difference.
In terms of correlating log events across the globe, you really, really want a log parser and correlating tool. They make your life easier and reduce time and complication during events. Splunk, Loggly, Graylog, Logstash, ELSA, etc. Don't look at your logs by hand. You'll be sitting there all day with 6 split windows looking at mysql, nginx, app logs, kernel logs, mail logs, security logs, blah blah blah, just on one server. When you horizontally scale your app, whether it's 2 or 2,000 servers, you need something to parse and correlate your logs so you can say "show me all logs from 11:00PM to 2:00AM from Web Cluster B", you get to save 10 minutes on your outage, and you don't miss anything.
I'm not saying that a log aggregator is not needed, they are still an important of any system as your 3rd paragraph clearly explains, but your 2nd paragraph actually makes the case for keeping everything in UTC regardless.
If the company's staff is all in one time zone, I'm inclined to use that for servers, as otherwise people have to mentally juggle two time zones: local and UTC.
* it's not affected by daylight savings time, which in many candidate timezones causes at least two complex and error fraught time discontinuities every year, and which requires manual intervention at the whims of the US Congress in the USA.
* UTC is the lingua franca of machine time communication -- if you ever have to, e.g., send data about transaction times, server event logs, order histories, etc., to any third party or API, then because there are 23 other timezones besides yours, and data normalization is important, they will almost certainly expect ISO8601 UTC. Yes, ISO8601 has time offsets. No, not all libraries implement them properly on either side.
* The first time you ever have logs from two different timezones -- e.g., a service provider log denominated in UTC and yours in PST -- you will hate computers so much that you will likely quit your job, throw your keys on the floor, and go run a hotdog stand down by the pier rather than deal with it. This is an honest response to time and localization issues but think of your children.
There's your conversion. I'm not saying it's not easy, I'm saying you're just pulling you out of a hole that your tools were configured to dig.
Multiple applications in different languages that are running on different environments and managed by different teams posses a real challenge.
Just one question, I saw a lot of logs being copy-pasted, aren't you concerned about any security sensitive data leaking out?
We're worried about sensitive data, see my other answer https://news.ycombinator.com/item?id=8003800
So now two specific questions: 1. So why did you have a big read spike? And 2. are you going to share the TODOs as well? :)
BTW We're using gitlab internally (at EverythingMe) and are very happy about it, especially the flow of fixes and features implemented.
2. Yes, most of the things after the hashrockets (=>) currently in the doc are TODO's
BTW Awesome to hear that you are happy users of GitLab.
Where's the source code? Their GitHub page is rather sparse, and the only hint I've found is this tweet:
The source code of which is here:
Hackpad is apparently a fork of that:
9. gitlab-ctl start starts the GitLab with the staging data in /var/opt/gitlab on the root filesystem that doesn’t have any production data. At this point logs report that the production db doesn’t exist which is correct because we are not on the production file system. No production data has been touched at this point. The Gitlab web UI is not responding (502 error from nginx)"
This is what strikes me as needing addressing. There shouldn't be staging data that is normally hidden by mounting the production filesystem. If the production database isn't there, Postgres should fail to start. The Postgres team is pretty adamant that otherwise bad things can happen; see http://email@example.com...
Personally, I honestly wouldn't trust to run a database on top of drbd and I'm unsure whether using a drbd volume as a database directory will even cause the other replica to have correct and usable data in case of an error.
Personally, I would use a real disk or maybe LVM as the store for the database in order to reduce points of failures and remove less tested components from the picture.
Then of course you need a database slave, but it looks like you had that anyways which, again, leads me to question why drbd was even involved.
If you used it for a database then everything committed to disk is immediately available on the partner, which you can spin up instantly with some scripts. And look, your transaction log is exactly where you left it! With a database and built-in replication you could go either way, but there are some applications with persistency that you need to make highly available, and that's another area DRBD would shine.
The major advantage is that no other specialized components (NAS/SAN/etc) is needed and you'll still have decent shared storage between the nodes.
Three nodes in such a cluster is highly recommended resolve/avoid any insane-in-the-split-brain scenarios, drdb-storage needs to be protected from split and can not help you to resolve it in the same way as a traditional NAS/SAN-lun.
We had to do some tuning to get the performance we needed, but this was a few years ago and the situation has probably improved since. Databases usually likes vm-dirty-* to be very low or 0.