I've seen this in action. Using code generators to convert XML configuration to a few API end points. Or using a DSL/rules engine because you don't want to write code. Or having APIs that hit other APIs ad infinitum when the whole thing runs on one server because "micro services are the only right way". The result was we spent time gluing together what was already a monolith disguising as microservices, rather than adding features the customers wanted
More recently I had to solve time drift on 1000s of devices. The problem was someone installed puppet to manage those devices which uses NTP. The devices are behind firewalls so if they block the puppet master or mess with SSL puppet doesn't even phone home. Or worse it gets incorrect time from NTP peers on the network. The solution was to throw out the shiny tool "puppet" and just call "date". Puppet and NTP are great in theory for getting time down to the millisecond but totally backfired when some devices were off by over 24 hours. For our purposes as long as all devices were within 5 minutes we were good. The irony was after disabling NTP puppet just started it again. And we couldn't use puppet to fix that since 50% of our users had it blocked. No other choice but to throw out puppet and start over from scratch. The guy who spent months setting up puppet was not happy.
With all due respect, you don't know the real issue. Your response is the same thing the guy who installed Puppet said to me... just have them unblock it.
Our sales pitch is "these devices use plain http and will work behind your corporate firewall". The blockage wasn't an issue that could be solved, it was our whole business model to workaround the blocks by using simple http instead of https, proxying everything through our IP, and things like that.
Even the puppet documentation says not to run a puppet master when you have devices that are behind firewalls or limited network. The guy who added puppet apparently didn't read that.
I wasn't the one who decided the business model just the guy who fixed it to work as advertised while dealing with the pressure of everything crashing & burning. You're right no one had experience but thats not the point.
My point was that the fancier tools sometimes just add new issues without solving your real issue. Despite my lack of experience I solved the time drift using a linux built in "date" to set the date time. It didn't account for network lag like NTP, and an NTP developer would probably laugh at my solution, but now all devices are accurate to within a few minutes & that particular problem was solved. So don't always go for the most complex tool is all I'm saying.
For what its worth I do plan to bring back puppet but run it in "puppet agent" (offline) mode. We'll using custom scripts to copy in new puppet configs so puppet does not need to phone home.
It's worth knowing that NTP's round-trip delay compensation is not at all magic — it works like this (see RFC 5905, section 8):
1. Client records "origin" timestamp T1 when its sends a query to the server.
2. Server records "receive" timestamp T2 when it receives the query from the client.
3. Server records "transmit" timestamp T3 when it sends a response back to the client (the response includes T2 and T3).
4. Client records "destination" timestamp T4 when it receives the response from the server.
I would love to get into it more deeply as you continue to supply details! :)
distributing hardware outside of your network and using puppet in master/client mode is obviously a bad idea, just like having any dependency is difficult to manage (sometimes like NTP)
However, clocks will drift. Consider ntpdate in a cron or an easier-to-manage sntp client vs ntpd, which is a little nutty.
So the point is that a tool like puppet, only properly configured, is probably a great asset for your use case of distributing hardware, as it can help keep things working as expected.
Yes Puppet solved one problem... How do I add a cron to all devices, and retry it if it failed without adding it twice to devices where it worked. Puppet is amazing. It solved that problem....
But then it created a whole new world of problems since it violated our business model to have it phone home. Thanks for the suggestions on NTP. We'll likely add features that do require more accurate time in the future & your suggestions will probably come in handy!
You can use date to set the time, but you must use some other way to get the correct time to be set. So you essentially had to re-implement NTP. Your solution was the more engineered one, not his, he used the standard tool for syncing dates.
I'm not seeing your option wasn't correct (if NTP didn't work, you can't use it), but I don't think it's a good example of the errors described in the article.
> you must use some other way to get the correct time
When the devices phone home to our IP (which is whitelisted in all their firewalls) I compare to the server time & set the device to the server's time if it is off by more than 5 minutes. Our server in turn uses NTP (which is not whitelisted on the devices networks). The devices will be still be off by 30 seconds or so if there is a network delay. On one hand we could have just told them to whitelist NTP, on the other hand we tried that & you get one department who blames another department, or worse they just don't return our calls. Plus we originally told them they only needed to whitelist one IP.
> you essentially had to re-implement NTP
No, I called "date -s". I didn't re-implement NTP. My solution does not handle many of the things NTP handles. It does not attempt to deal with any of the things NTP handles like compensating for network delay. If I had originally written this, I would have just used NTP but proxied it through our domain. Instead I was called in when things were "on fire" & had to come up with a quick fix.
> Your solution was the more engineered one
That's your opinion. My solution took 15 minutes while he spent months setting up Puppet. His solution resulted in devices being off by multiple days, whereas my solution has proven to keep devices accurate to within +/- 5 minutes.
> he used the standard tool for syncing dates.
What "standard" says I need to use NTP? Surely the "date" command can be considered standard as well.
> don't think it's a good example of the errors described in the article.
Its exactly what's described in the article. Our problem was to do a job at a certain time. No one cared if it happened at 4:21 instead of 4:20. Big companies like Google need more accurate time & control their network so ntpd is a good fit for them. I don't need as accurate of a time & don't control my network, so ntpd was not a good fit. So just because Google does it doesn't mean you/I should. Because sometimes you/I are solving a different problem than Google had to solve.
Just to be clear, I'm not in any way criticizing your decision. It seems clear to me that it was correct, from what you've told us. I just don't think it's a good example, that's all.
More recently I had to solve time drift on 1000s of devices. The problem was someone installed puppet to manage those devices which uses NTP. The devices are behind firewalls so if they block the puppet master or mess with SSL puppet doesn't even phone home. Or worse it gets incorrect time from NTP peers on the network. The solution was to throw out the shiny tool "puppet" and just call "date". Puppet and NTP are great in theory for getting time down to the millisecond but totally backfired when some devices were off by over 24 hours. For our purposes as long as all devices were within 5 minutes we were good. The irony was after disabling NTP puppet just started it again. And we couldn't use puppet to fix that since 50% of our users had it blocked. No other choice but to throw out puppet and start over from scratch. The guy who spent months setting up puppet was not happy.