More

Teletio · on April 22, 2022

Why would you say twitter is a high profile place?

I'm lost on this. I never seen ads on Twitter and only read it to see if someone really posted whatever news side is posting a tweet.i have never seen anything being driven by twitter

wollsmoth · on April 22, 2022

huh? it's used by hundreds of millions of people. I see ads all the time on my feed.

Teletio · on April 21, 2022

If your swap is constantly used.

Which you should avoid by having proper monitoring.

Teletio · on April 21, 2022

There are plenty of workload which sometimes just spike.

Batch process for example.

With proper monitoring you can actually act on it yourself instead of just restarting which just leads to a oom loop.

lazide · on April 21, 2022

If you pushed something to swap, you didn’t have enough RAM to run everything at once. Or you have some serious memory leaks or the like.

If you can take the latency hit to load what was swapped out back in, and don’t care that it wasn’t ready when you did the batch process, then hey, that’s cool.

What I’ve had happen way too many times is something like the ‘colder’ data paths on a database server get pushed out under memory pressure, but the memory pressure doesn’t abate (and rarely will it push those pages back out of swap for no reason) before those cold paths get called again, leading to slowness, leading to bigger queues of work and more memory pressure, leading to doom loops of maxed out I/O, super high latency, and ‘it would have been better dead’.

These death spirals are particularly problematic because since they’re not ‘dead yet’ and may never be so dead they won’t, for instance, accept TCP connections, they defacto kill services in ways that are harder to detect and repair, and take way longer to do so, than if they’d just flat out died.

Certainly won’t happen every time, and if your machine never gets so loaded and always has time to recover before having to do something else, then hey maybe it never doom spirals.

Teletio · on April 21, 2022

I try to avoid swap for latency critical things.

I do a lot of ci/CD where we just have weird load and it would be a waste of money/resources to just shelf out the max memory.

Other example would be something like Prometheus: when it crashes and reads the wal, memory spikes.

Also it's probably a unsolved issue to tell applications how much memory they actually are allowed to consume. Java has some direct buffer and heap etc.

I have plenty of workloads were I prefer to get an alert warning and acting on that instead of handling broken builds etc.

Teletio · on April 21, 2022

There are CICD builds out there which consume much more resources and time were just killing one part of the build would destroy the work of hours.

Not sure why you wouldn't want swap for it?

It will allow you to fine-tune the build later and give that build a realistic chance to finish

theamk · on April 22, 2022

Because once swap activates, build now takes hours instead of tens of minutes. So it would timeout anyway, but only after wasting lots of resources. And even if you increase the timeout a lot instead, your machine how has a bunch of things swapped out, so now your tests timeout, which is even worse.

Yes, killing that part of the build did destroy the work of hours. It was still better to disable the swap than try to "ride it out".

Teletio · on April 22, 2022

Things don't just take hours longer just because the Linux kernel throws out a few pages which haven't been used for a while.

And it also totally depends on how much memory is missing.

I still prefer to have something taking 20 minutes longer instead of failing and fine-tuning the resources after it.

Teletio · on April 20, 2022

It's an optical illusion. Your brain knows how 3d a alley should look like.

Teletio · on April 20, 2022

I thought about the exact thing just a week ago.

The internet literally acts as a human interconnect while in parallel learning.

Now with ml we evolve the internet brain by no longer needing humans.

Teletio · on April 20, 2022

Do you even know under which rule it gotten taken down?

Teletio · on April 20, 2022

I read news/comments about it but I'm not experience it myself

Teletio · on April 20, 2022

Funny that you advocate for manual things while I have the feeling we do too much manual and advocate for the opposite.

Especially the error scenarios are bad. They cost you time no one measures.

Manual processes are also difficult to sync across multiple team members and you need tooling around it to make sure manual things happen.

My mantra / priority looks more like this:

1. Try not to do it at all

2. Make it automated

3. Do it manually with a heartbeat system

I don't want to do things manually I prefer to be able to go to a beer garden in the summer and being flexible.

And as an endpoint: automation for me is the necessary base of adding additional value with high return. Only with an automation base you can extend it by fixing more and more issues automatically. While you fix the full disk issue a 100 times, I fix it once.

411111111111111 · on April 20, 2022

I think you alluded to it wonderfully: a small team with < 10 ppl will be fine with sharing knowledge and doing parts of the pipeline manually. The overhead of creating and maintaining a stable automation for edge cases quickly exceeds the time saved.

It's a different story altogether if there are multiple teams etc that are supposed to utilize the same pipeline

Teletio · on April 20, 2022

My practical experience is with small teams below 10 people.

As soon as you have a well understood base system for automation (running code with Cron, monitoring and alerting) all further automationsteps are easy to add to that system.

The initial effort was always worth it.

And the big issue is, quality is very flexible.

If you need to do something every few days and you forget about it once and you get informed, did you heart someone?

Probably not but your quality suffered.

We even had a process which was broke for 3 weeks and a customer realized the issue, not us.

Automation was missing, monitoring and alerting as well.

One solution for a manual process was a Jira plugin which would create a ticket every Monday. It would describe what to do. Half automated. Purely manually would lead again to quality issues.

Teletio · on April 19, 2022

What's your strategy?

Using compression? Second instance for old data? Creating in-between states?

Just curious :)