People always -- *always* -- pull cattle-vs-pets on me after I advocate for this...

hnlmorg · on Sept 26, 2020

> I've slaughtered more machines than any of you.

I'd rather not engage in pointless unprovable arguments, if you don't mind.

> The problem is that you are treating your logs as pets when the truth is the logs are also cattle. Virtually all debug logs will pass through their life cycle without being read, so indexing them is just a flagrant waste of energy.

You shouldn't have debugging enabled on production systems unless you have an interim process that filters out debug messages before they get indexed (and thus you can toggle which logs get indexed there rather than reconfiguring / redeploying all of your application nodes).

Also nobody is suggesting logs should be treated as "pets". You still want to purge out older logs however the problem is you cannot always replicate reported errors so if you don't have those log messages captured then you're sod out of luck.

Don't get me wrong, there is a certain allure to the traditional method of systems administration - I've been on both sides of the fence - but central logging services have so many other benefits such as security (tamper proof logs, users don't require SSH), ease of use, persistent logging, etc. The only real downside is cost but that quickly becomes absorbed in your pricing plan when customers start asking for SLAs.

R0b0t1 · on Sept 27, 2020

I have to agree with the other guy. Going too far into "Don't touch the machine" leads you to ridiculous places. Designing your system to be instrumented is invaluable, and sometimes that means logging on to a machine. Should you minimize this? Yes, but not to the point you invest tens of thousands or more into a way to not use SSH.

Aeolun · on Sept 27, 2020

Since we already know you can put a more or less adequate elasticsearch cluster on DO for $60 per month, I don’t think that argument holds a lot of water.

By the time you want centralized logs, you are probably already spending much more than the logging will cost on infrastructure.

majkinetor · on Sept 27, 2020

ES cluster might be easy to install but doing all other chores with your log, along with learning ES search language and quirks is far from trivial. I personally spent months making viable solution on premise and its not something I want to do again. Way less hassle to access server directly for daily logs and just move them during the night on some archival storage.

hnlmorg · on Sept 27, 2020

Nobody said "you should never log into the machine!" We just said reading log files shouldn't require SSH access.

What you're doing here is constructing a straw man argument while agreeing with the same point I was making.

R0b0t1 · on Sept 27, 2020

Hold up, never log into the machine is exactly what the discussion is about, not logging into machines for log access being one aspect of it.

hnlmorg · on Sept 27, 2020

I'm guessing you skim read most of the replies after the OP? The OP did touch upon it when describing their own architecture (albeit he wasn't actually suggesting that should be how everyone operates) but everything after that has been more narrowly focused. This particular branch of the discussion was specifically discussing log access:

> > ship log events out of the box and into something searchable, indexable, and can derive metrics

> That's just a way to spend a ton of money. There's really not a reason to ship or index logs...

R0b0t1 · on Sept 28, 2020

They're explicitly talking about not logging into machines for log access. One of the ways to make it seem a reasonable suggestion is the broader topic of not logging into machines at all, ever, and why that may be a good thing.

hnlmorg · on Sept 28, 2020

> They're explicitly talking about not logging into machines for log access.

That's what I said. ;)

> One of the ways to make it seem a reasonable suggestion is the broader topic of not logging into machines at all, ever, and why that may be a good thing.

Except none of that was being discussed. Only log access.

R0b0t1 · on Sept 29, 2020

The conversation involves both.

hnlmorg · on Sept 29, 2020

No it doesn't. By your own admission:

> They're explicitly talking about not logging into machines for log access.

Why can't you just admit that you didn't read the thread properly rather than insulting all of our intelligence with these piss poor mental gymnastics where you redefine the context of what people had very clearly written.

R0b0t1 · on Sept 30, 2020

Because I read the thread? How many times do you want me to repeat myself? They're talking about the specific case of not logging in for log access, but that is partially justified by the wider desire to never have people log in at all. You can separate them but the original conversation didn't.

hnlmorg · on Oct 4, 2020

> You can separate them but the original conversation didn't.

The conversation you're replying to, however, did. I know this for a fact because I was involved in that conversation and I made that distinction myself ;)

tristor · on Sept 27, 2020

> I've slaughtered more machines than any of you.

I guarantee that you haven't, as the number of people that can make that claim in relation to my background is vanishingly small and I know almost all of them by name.

I centralize logs. Not only does it make more sense for administration at scale, it's invaluable for security reasons and assists in compliance by providing a controllable guaranteed audit trail.

You may have a valid argument about cost here for some applications, but it's unwise to make arrogant claims you cannot back up.

jdxcode · on Sept 27, 2020

> I've slaughtered more machines than any of you.

Not only is that vanishingly unlikely to be true with this crowd it’s also not at all the kind of rhetoric that they’ll listen to either.

markbnj · on Sept 27, 2020

When I read that comment I was thinking about our auto-scaling nodepools on GKE that have been running for a couple of years and scale up/down every business day by 20-40 nodes or so depending on load... but I am not sure I get to take credit for slaughtering them.

seized · on Sept 27, 2020

So that's the new replacement for pointless uptime boasting.... Except even less provable.

reactordev · on Sept 27, 2020

the logs in my case are audit trails required for compliance and for investigation by 3rd parties.