One day I was working with my colleague and when the fileserver was full he went to a project folder and removed a file called balloon.txt which immediately freed up a few percent of disk space.
Turned out that we had a number of people who, as soon as the disk had some free space, created large files in order to reserve that free space for themself. About half the capacity of the fileserver was taken up by balloon.txt files.
At the end of the budget period they've only spent 80% of their allocated budget, so they throw out a bunch of perfectly good equipment/furniture/etc. and order new stuff so that their budget doesn't get cut the following year, rather than accepting that maybe they were over-budgeted to begin with.
Rinse, repeat, thus continuing the cycle of wasting X% of the budget every year.
Definite case of misaligned incentives.
Another surprising place where this happens is project scheduling. We budget time for each individual step of a project based on our guess of a 90% or 95% success rate, then our "old-timers' experience" kicks in and we double or triple our time for all the steps together, then our boss adds 50% before giving the estimate to their boss, which sounds gratuitous but it is to protect you because their boss looks at how grotesquely long the estimate is and barks out a cut of 20%, so the overall effect of those two is (3/2) × (4/5), so your boss still netted you a 20% buffer while making the skip-level feel very productive and important.
Say the 50%-confidence-to-95%-confidence gives you 30% more time as safety buffer, and you only double the estimate, and the work that you missed in your initial assessment, while it's not gonna be say half the project, maybe generously it's a third of the project or so. So the project actually takes time 1.5 measured properly, you have together budgeted 1.3 × 2 × 1.2 = 3.12 time. The total project deadline is more than half composed of safety buffer. And we still consistently overrun~!
But if Alice needs to work on some step after Bob, and Bob finishes early, when does Alice start on it? Usually not when Bob finishes. Alice has been told that Bob has until X deadline to complete, and has scheduled herself with other tasks until X. Bob says "I got done early!" and Alice says "that's great, I'm still working on other things but I will pick my tasks up right on time." Bob's safety buffer gets wasted. This does not always cause any impact to the deadline, but it does for the important steps.
Of course, if you are a web developer you already know this intuitively because you work on servers, and you don't run your servers (Alice, for example) at 100% load, because if you do then you can't respond to new requests (Bob's completion event) with low latency. It's worth thinking about, in an efficient workplace, how much are you not working so that you have excess capacity to operate efficiently?
It's a revelation. You get to have some hard conversations with other managers. But in the end everyone finds it easier to deal with "it'll be ready when it's ready" rather than endless missed deadlines and overruns.
When people who are experts on the topic evaluate the work needed over a period of say 3 months, even in something as notoriously hard to plan as video game production, it can hold. This entails being willing to adjust scope and resources though, when planning, in order to ensure the objectives are likely to be met.
Sorry, but this frustrates the hell out of me! What am I missing here? What arcane bit of finance lore leads us down this path? Am I just hopelessly naive? Is acing money such a bad thing!? I just don't get it...
But I'd say it's not a feature of governments only, any sufficiently big organisation which centralise power ends up being like this. That's why large corporates need nimble startups to innovate: startups either innovate or die.
The government is just the largest example of this phenomenon - and they don't have any analogue for startups, they're just doomed to grow larger and larger over centuries until they collapse.
Look at the USA, once the perfect minarchist experiment and now the largest employer in the world.
But god help the poor people who file simple tax returns that can be easily audited, however.
Zero based budgeting is one answer to the moral hazards of either over or under-estimating your budget on purpose. If each year you start with a blank spreadsheet and then add (with justification) expenses for the year, it avoids some of the pitfalls. Not a panacea however.
Ahah, that would be nice. In most company I know, they are shrinking every years, because you know, “cost reduction plans”.
1) separation of duty: you might not be the best department to invest surplus
2) cost effectiveness: if you're operating with a deficit, as is generally the case with governments these days, this money is not free, so it could effectively be cheaper to give it back and re-borrow it when you actually need it
But with this reasoning there is no surplus, because departments will spend their money at all cost.
> 2) cost effectiveness: if you're operating with a deficit, as is generally the case with governments these days, this money is not free, so it could effectively be cheaper to give it back and re-borrow it when you actually need it
That's totally fine, when GP said “Save the money” they didn't meant “on their own bank account”. It just means: the top management owe them this money when they'll need it later.
Anecdote: I'm currently working on a project started in emergency earlier this month, which must be done before the end of the month (because it's the end of the accounting year at this company) for this exact reason. And this project is overprices by a factor close to two, because this money really had to be spent!
Budgets are meant to estimate costs and manage cash flow. From a greedy team perspective it’s best (and self interested) to try to game the system as much as possible so you get the largest share of the pie. From the organizational perspective it’s best to reallocate capital efficiently, especially if a team consistently over budgets.
No, but if you accurately forecast that dinner will cost $100 on average, and this time it only happened to cost $75, you should put most of the savings aside for the other times when it will cost $125 and not reallocate it to be spent on something else.
Consistent over-budgeting is still an issue which would need to be addressed, of course, but a system where any annual cost underrun is treated as over-budgeting and punished by reallocating that part of the budget to other groups ignores the inevitable presence of risk in the budget forecast.
For example, if a team says they need $100 a year and comes in at $90 then I don’t think next year’s budget should be $110 while some people in this thread think it should be. That makes no sense. Neither do I think the budget should be cut to $90. Unless something has changed, the budget should stay the same.
Your point about average cost just means that you’re budgeting on the wrong timeframe. If you estimate your average dinner is $100 but you’re spending $75 most of the time except for one huge dinner every month then you should be budgeting $75 for dinner and then budget separately for one large dinner a month. Similarly, if a team says they need $10MM a year but half of that is them trying to amortize a $25MM cost over 5 years then they are budgeting incorrectly. Their budget should be $5MM with a $25MM side fund contributed to on a risk adjusted basis.
The worst case scenario is the team budgeting $10MM when they only need $5MM and losing control of their budget so that when the real charge comes due they’re fucked because they’ve been spending $10MM for the past 5 years without realizing the fixed charge is coming or, worse, realizing the fixed charge is coming but just ignoring it so they can buy new office furniture and exhaust their budget this year selfishly.
IMHO it depends on why the expenses were less than the budget. If it's a matter of probability or essential uncertainty then the savings should be set aside for other occasions where luck isn't as favorable. If the department realized cost savings by improving business practices then most or all of the savings should stay with the department to be invested in future improvements (a one-time carry-over into the next budget period) and/or distributed as reward to those responsible for the improvements, as an incentive to continue making such improvements. If costs were lower because the department didn't accomplish everything they set out to do then that might be a justification for reallocating their budget, and/or implementing more drastic changes to get them back on track.
> Your point about average cost just means that you’re budgeting on the wrong timeframe.
The timeframe for the budget would generally be predetermined (e.g. one fiscal year) and not set by the department itself.
> If you estimate your average dinner is $100 but you’re spending $75 most of the time except for one huge dinner every month then you should be budgeting $75 for dinner and then budget separately for one large dinner a month.
Sure, but I was referring to probabilistic variation due to uncertainty in the forecast, not a predictable mix of large and small expenses. And the "dinners" in this analogy would be once per budget period (i.e. annual for most organizations), not frequent enough to average out.
Unfortunately I couldn't think of much. I suggested maybe we buy some more computers with it but I'm sure he'd already thought of that himself. I don't know what he ended up doing, but I'm sure he'd have decided to buy something with it rather than just losing it entirely.
You contact a department who’s services you use a lot, then you arrange to pre-pay for services. Ideally you negotiate a discount.
Then you use the service and state which grant to draw from.
This way you have grants paying for things that are completely unrelated to their intent, you have one nightmare of a billing system which no one understands and you get to use everybody cent.
We shot so much we destroyed some of the rifles, apparently that was better than getting a smaller allocation next year.
The non-government sector isn’t immune to this.
Only if the inefficiency is large enough to overcome other forces.
Or to put it another way, picture if every single individual teams at Google did this to the tune of 100k a year, per team, and assume among 135,000 employees there are 13500 teams.
That's 1.35 billion dollars. Well under 1% of their revenue.
No way is a competitor going to appear that is identical to Google in every way except they have better budget management. Google has too many moats around their business, they can be really inefficient in many many ways and still dominate in multiple markets.
It was not even believed by Adam Smith. He writes that it only works that way in a controlled environment. That’s why European countries usually rank higher in market freedom than the US, because we don’t have companies getting so cancerously big that they have very real effects on law making (how lobbying is legal is still beyond me)
It's less blatant, but just as pernicious.
Nowadays the USA jurisdiction is comparable to the EU one, but they still have more $$$.
If you like freedom rankings I recommend these ones: https://www.heritage.org/index/ranking
I work at one of those US bank.
The amount of inefficiencies in form of red tape, confusing processes, custom half-baked tools that crash half the time is just mind-boggling. I've spent more than a week now for opening firewall on one ip/port on one host just to test my prototype in dev environment(local machine or docker are not an option due to lack of admin rights ), and it's still in change approval stage.
If we weren't this giant too-big-to-fail bank we'd be out of business by now.
I see that as the SOP for large companies too...
What strikes me as unique to government is the tendency for sufficiently powerful appendages to secure enough resources to start wagging the dog (e.g. the military industry in the US), although now that I think about it seems possible that it would happen within companies.
Any business with a natural monopoly, high migration costs, etc. can support a surprising amount of inefficiency even if most of their customers find the experience unsatisfying.
I think we imagine a lot of market forces that no doubt exist, but people aren't logical in the face of them.
But large companies tend to have MBA types scurrying around rooting this stuff out as it pops up or shortly thereafter. Government has no such sort of immune system to fix these problems on the go. It just gets sicker and sicker until the tax payers vote for something drastic or revolt.
You see this in nonprofit entities too. They get big, abstract away from their mission and waste a lot of money until someone gets tasked with cleaning house or a more mission-driven comes along and replaces them.
Department either uses or loses the budget, so, there was a push to make sure nothing is left.
He told me that the school had to prevent those automatic budget cuts. His reasoning was that it's nearly impossible to get a higher budget if some big expenses had to be made.
And suddenly needing a higher budge, after for example 3 years of low expenses, doesn't make a good impression on higher-level administrators.
Sounds like the eighties!
That's the way in every major corporation I ever worked for too.
By slowly releasing supply you prevent anyone having to self-regulate (which requires unreasonable deprivation, OR global knowledge) and everyone bases their decisions off of the only global signal, free space.
More like perverse incentives.
The conventional Game Theory take is that this is a prisoners dilemma, and everyone creating balloon.txt files are defecting. They are making the most rational choice under the rules of the game (no communication thus no reliable cooperation). It's no globally optimal, but it is locally for each of them. This take also suffers from the same assumption: that rationality is centered on self-interest only.
If we are to evolve as a species, then we need to get beyond such limited thinking. We need to transcend our base natures. That is the whole point of culture: to transcend as a group what our genes otherwise program us as individuals to do.
Understanding sociology as ecology at human scale is core to libertarianism.
Though did you mean, "Understanding sociology as Darwinian ecology at human scale is core to libertarianism."? Because the notion that ecology is characterized only by "the law of the jungle" is also strongly debated. Even "the selfish gene" is debatable simplistic reductionism. Individuals aren't the only actors; there are higher order emergent entities, e.g. species and ecosystems, that also evolve to perpetuate themselves and flourish, much like our own bodies are cooperative and interdependent systems of cells (with native and foreign DNA, the latter existing primarily in our GI tract) that originally evolved as single-celled "selfish" organisms.
And as you point out, "we have the cognitive ability" that nature lacks. We can do at least as well.
As to "lofty", I agree. But let's consider other things that were once considered lofty if not insanity:
- in ancient Greece, that democracy should be extended beyond the aristocracy
- in Medieval Europe, that democracy should exist at all, that the divine right of kings should be seen as a scam
- in the 19th century United States, that democracy should include women and blacks
- in the 1970's United States, that lesbians, gays, bisexuals, transexuals and queers should be treated with the same dignity as straights, should be able to marry, adopt children and serve in the military. And that we stop using "he/him" by default as you just did because that is an artifact of patriarchy as well as outmoded thinking about even binary gender.
- in India today, that when a woman is raped, she should be protected by law and the male rapist should be punished, not the other way around. The same proposition if proposed in America or Europe not all that long ago.
- I can make a really long list but you get it :)
Human selfishness IS nature. It is not just about humans either, all evolution is guided by environment (resource availability).
For anything else you need ALL people to NOT be selfish, only some being altruistic does not cut it. Your only other option is to punish selfishness, but then you will ban progress.
If most people don't create the balloon.txt file, BUT, there is no punishment for creating one, then if I believe I have a good idea and that I DESERVE more resources to pursue it, I'll create a nice big balloon.txt file. Your only option is to punish me for doing so. I would not want to live in a world where people are punished for trying to gather resources to make things that most other people won't. Some people have bright ideas, and they need resources to pursue them. Most people don't have many ideas and they don't want to do anything. If you prevent the means of passionate people to gather big resources to do big things, and want to live in a zero entropy world where everything is equal (made sure through the use of force / punishment, which will eventually be corrupt, because by definition punishers can't be equals to others) and nothing moves because of it, keep dreaming. It is not even scary because that literally cannot happen.
Refusing to accept the human nature as-is and always requiring some sort of "evolved new man" is one of the characteristics of the communist/socialist ideology.
Also a handy excuse when the system inevitably fails: it wasn't the system, it was the selfish people who did not implement it correctly.
Let's assume one could even call those failures communism/socialism. How long have we experimented with and developed socialism/communism? 100 years.
How long have we been trying to get democracy right? 2,500 years. With many starts, fits and failures, devolving into dictatorships many, many times. The self-proclaimed "greatest democracy in history" is guilty of genocide and slavery. Even today how much it is a democracy as opposed to an oligarchy/kleptocracy/plutocracy is questionable.
How about capitalism? 500-800 years. And in that time it has exploited, enslaved and murdered people, pillaged entire nations and continents, raped the environment, and poisoned every culture that has adopted it with the notion that "selfishness is a virtue".
The only reason capitalism hasn't collapsed (yet) is because capitalists are smart enough to not do pure capitalism, knowing that it would lead quickly to revolution, and because the environment's revolt is just getting started.
 “The west called [the Soviet Union] Socialism in order to defame Socialism by associating it with this miserable tyranny; the Soviet Union called it Socialism to benefit from the moral appeal that true Socialism had among large parts of the general world population.” ~ Chomsky
 The United States: "look how many people died in the Soviet Union's industrialization program!"
Socialists: "how did the United States industrialize again?
The United States: "look, you need to do a BIT of genocide and slavery to kick things off…" ~ Existential Comics
 One of the most beneficial things about immersing yourself in deep study of American history is that you get to a point where this country can no longer effectively lie to you about why it is the way it is. It disabuses you of the notion that the inequality we see is an accident. ~ Clint Smith
Exploiting, enslaving and murdering is purely what socialist countries do - and they can get away with all of this, just because they can socialise the cost of all their evil deeds and force people into paying them money.
The only reason capitalism hasn't collapsed is that it's the only way to have a profitable economy. The crooks that you call government recognise that they can steal only so much from the economy before a country collapse.
I'd also argue that we've experimented with elements of socialism and elements of capitalism for the entire existence of civilisation.
Communism can't work unless you have either perfect individuals or a tyrannical states which force resources distribution. In the real world, you end up with socialism. Because people are not perfect, the government which will redistribute resources won't do a perfect job in the best case and will just be completely corrupted in the worst case.
And still, communism was attempted. When did we ever attempt to have an entire capitalist society without a government to ruin it?
You could make that so in this shared computing scenario, but our broader world is systemically rigged in favor of some people and against others. Capitalism depends on the un-levelness of the playing field for cheap labor.
i.e. while it can be useful if prices are attached to commodities (with caveats around externalities etc), it is not a good thing that prices are attached to humans, making some people's being and work less valued than others.
I made a bunch of 100MiB files of `/dev/random` noise (so they don't compress, compressed size was part of the quota) and emailed them to myself before the migration, to get a few GiB of quota buffer.
My co-workers were constantly having to delete old emails in Outlook to stay under quota, but not me. I'd just delete one of my jumbo attachment emails, as needed. ;)
My first full-time job had an unexplained email expiry policy. After being frustrated several times at losing some explanation on how/why, I started forwarding all my emails to gmail. In retrospect, that's probably a worse result to whoever imposed the expiration.
Fortunately, these days people are better about consolidating knowledge on wikis or some kind of shared docs instead of only email.
The excuse of resource contention provides plausible deniability
But nobody ever talks about it (except on said un-recorded meetings. That reminds me, I should explain this to our junior today, so that he knows for the future).
(apologies it's on medium, I couldn't find it anywhere else)
iirc at the time the only industries that required retention were health, legal and government
With SOX (PCI, FDIC, et al) retention laws we had another explosion of work rolling out all the compliance features of Exchange
Those were crazy times getting everybody either migrated with email or onto corporate email - there's a similar explosion of work right now with migration to M365
Imagine getting sued and having the entire paper trail in your email going back 3+ years. I expire all email after 1 year.
The IT people get smart about looking for OST or PST files, but let's see them catch that :-)
Then configure a new mail account in Outlook and connect to the IMAP server. It's optional, but convenient for replies, to configure the account to send via postfix if you have an internal SMTP server to connect to.
I gave up on email folders years ago, so at the end of the month would just create two new folders in the archive account (YYYYMM and YYYYMM_Sent) and drag all the mail from the Exchange account into the IMAP folders. Et voila! You now have your own local email archive.
The reason being that it's hard to show intent to defraud, and much easier to threaten bad press.
Email hosts love 50/100gb/unlimited mailboxes because nobody wants to migrate a bunch of giant mailboxes
A bit like when investigating police/government misconduct and a lot of files turn out to have been destroyed - but of course our data gets kept forever
I use some scripts that monitor disk space, and monitor disk usages by "subsystem" (logs, mail, services, etc) using Nagios. And as DevOps Borat says, "Disk not full unless Nagios say 'Disk is full'" :_) Although long before it is full it starts warning me.
It doesn't go off very much, but it did when I had a bunch of attacks on my web server that started core dumping and that filled up disk reasonably quickly.
Back in the day we actually put different things in different partitions so that we could partition failures but that seems out of favor with a lot of the distros these days.
This is actually kind of clever. How the tribal knowledge for how to "reserve space" was developed and disseminated would be pretty interesting to study.
To help some students had put in place a crawler making statistics about who was using the space for all classes. And usually once bitten you made your own space requisition script which would take any byte left when available until it hit some reasonable size.
It's basically reserving part of the disk for very important things only, which scares off less important uses. Like making the commons seem more polluted than it actually is to get some action taken.
If those files weren't there, the space would probably fill up, but now without any emergency relief valves.
It would be better if these files were a smaller fraction of space and had more oversight... but that's just a quota system. This is something halfway in between real quotas and full-on tragedy of the commons.
Similarly for file storage and "reserving" it by creating huge but useless files. If everyone was charged a fee per gigabyte per day, then people would be less likely to create those placeholder files. You probably have to be careful about how you measure, otherwise you'll get automated processes that delete the placeholder files at 11:59pm and create them at 12:01am.
1) It's not intended to be a Real Solution(tm). It's intended to buy the admin some time to solve the Real Issue.
2) Having a failsafe on standby such as this will save an admin's butt when it's 2am and PagerDuty won't shut up, and you're just awake enough to apply a temp fix and work on it in the morning.
3) Because "FIX IT NOW OR ELSE" is a thing. Okay, sure. Null the file and then fill it with 7GB. Problem solved, for now. Everybody is happy and now I can work on the Real Problem: Bob won't stop hoarding spam.
That is all.
I noticed on the way to pick them up that the truck was running on empty in the main tank but I checked and the aux tank was full. Then I remembered the first time my dad let the tank run down and start sputtering down the road and decided to keep going on the empty tank.
Make it to pick them up and start heading down the highway(where we were it was a good 3-4 miles to the nearest gas station) and then the truck finally started to sputter. I proceed to play along with it pretending to panic for a good 20 seconds and then I turned and saw the look on their face and couldn't help but start laughing. Switched to the aux tank and when the truck started running again I turned and and the look I was getting indicated I was being mentally murdered. Then they punched the crap outta my arm and started laughing and calling me not so nice things.
Ended up being an awesome night out with someone I'd end up being friends with for a long time. It's weird how this kind of random conversation in an unrelated internet post can drag you way back down memory lane.
I kept spare inline fuel filters in a tool roll just in case after a while.
suppose then that you go fill up and forget to set the petcock back to normal. 8ball says: "I see a long walk in your future."
one time i was eastbound on the bay bridge when my bike started to sputter. i'd just reassembled the tank and had left the screw-style reserve fuel valve open, so there was no reserve fuel to be had. a very kind lady put her blinkers on behind me and followed as i coasted the last few hundred yards toward yerba buena island.
i pushed my bike up the ramp and looked in the tank to assess. it's a dirtbike, so the tank has two distinct "lobes" to accomodate the top tube of the frame. I had a few ounces in the tank but they were not in the lobe with the fuel pickup, so i dumped the bike on its side to get the fuel to slosh over to where i wanted it.
i got back on the highway and, going quite slowly and gently, managed to get to the gas station at west oakland bart, the engine leaning out and sputtering right as i rolled into their lot.
Normally you take for granted that the engine works for hours at at time.
When you've come to a stop and found those last few ounces of fuel, it's such a relief that the engine can run again, and you know it won't run for very long, but every minute that it continues running saves you many minutes of walking or pushing. You appreciate every minute that the engine produces that amazing amount of power (compared to your own power when you're pushing a 300+ pound bike)
OTOH, Honda Goldwings have stereo systems. They might grow an automatic fuel reserve switcher-backer someday too. :)
Carb jets can get clogged, too, but are wider since they're not under as much pressure. Also, since they're a wear item they're a lot easier to clean and/or replace.
I am one of those who likes things old school. My bike still has a carburetor, has no fuel light or tachometer, and I have certainly had some practice reaching down to turn the fuel petcock to reserve while sputtering on the highway. If they didn't intend for me to do that, why did they put it on the left side? :)
See multiple Honda bikes with DCT (dual clutch transmission). This is what I'm planning to get as my first bike.
New bike has a vacuum-actuated fuel valve, no reserve. It does have a fuel gauge but since the tank is not a nice simple rectangle and the angle makes a difference the gauge is basically untrustworthy. So I go by the mileage and hope I don’t get it wrong. How hard would it be for them to add a reserve setting so it could just be between On and Reserve so I could just flip between them as needed?
If you don't have monitoring, will you even be aware that your disk is filling up?
If you do have monitoring, why are you artificially filling up your disk so that it will be at 100% more quickly instead of just setting your monitoring up to alert you when it's at $whateverItWasSetToMinusEightGB?
A second argument is it's not opened by any process. One problem I've had fixing disk full errors was figuring out which process still had a file open.
(For any POSIX noobs: the space occupied by a file is controlled by its inode. Deleting a file "unlinks" the inode from the directory, but an open filehandle counts as a link to that inode. Until all links to the inode are deleted, the OS won't release the space occupied by the file. Particularly with log files, you need to kill any processes that have it open to actually reclaim the disk space.)
I think the idea is that once you are at the system you can try to find out the cause without removing the file, or worse case remove the file and act fast (you may be on a short timer at this point). So for example if you find out that process X broke and is writing a ton of logs you can disable that process, remove the file, then most of your system is operational while you can properly fix the root cause or at the very least decide how to handle the data that filled up the disk in the first place. (You can't always just delete it without thought)
I think a more refined approach would be disk quotas that ensured that root (or a debugging user) always had a buffer to do the repairs. This file just serves as a system wide disk quota (but you need to remove it to take advantage of that reserved space).
There's always a big gap between what should never happen because you planned well and what does happen
And if your monitoring is working correctly, the spacer file really serves no purpose other than lowering the available disk space.
2. Maybe you didn't get it, but "nullmailer not forwarding cron email due to mailgun problems" was a bit too specific to be an example I just made up, wasn't it? Again, the premise "if your monitoring is working correctly" is not a good one to base your reasoning upon. Especially if you have 1 VM (VPS) and not a whole k8s cluster with a devops team with rotational on-call assignments.
When you actually fill up your disc, many linux commands will simply fail to run, meaning getting out of that state is extremely difficult. Deleting the file means you have room to move files around / run emacs / whatever, to fix the problem.
I understand the benefit to be able to quickly delete some file to be able to run some command that would need space, though I find that highly theoretical. If it's your shell that requires space to start, you won't be able to run the command to remove the spacer, and once you're in the shell, I've never found it hard to clean up space; path autocompletion is the only noticeable victim usually. And at this point, the services are down anyhow, and you likely don't want to restart them before figuring out what the problem was, so I don't see the point of quickly being able to make some room.
It feels like "having two flat tires at the same it is highly unlikely, so I always drive with a flat tire just to make sure I don't get an unforeseen flat tire". It's cute, but I'd look for a new job if anyone in the company suggested that unironically.
I run bunch of websites for pet projects and for friends clubs etc. They don't need monitoring, and even if they go down for couple of hours (or days) doesn't really matter.
I do monitor them, but mostly as an excuse to test various software, that I don't get you use during my day job (pretty sure that bunch of static sites and low use forums don't need elstic cluster, for log storage :) )
And sometims you simply don't have the time to deal with this right now. So you do a quick hack, and do it later.
So when you're running out of space, you immediately delete the junk file. Suddenly there's "No Problem" and you've reset the symptom back to hopefully well before it was an issue. Now you can run whatever you need to, do reports, do traces etc. Even add more storage if necessary.
More importantly, as soon as you delete that junk file now you have space for logs. You have space and time for investigation.
It doesn't do that though. If you don't have monitoring/alerting that can either a) give you sufficient notice that you're trending out of disk space, b) take action on its own (e.g. defensively shutting down the machine), or c) both of the above, then having your server disks fill up is bad whether you have a ballast file or not.
If your database server goes to 100%, you can't trust your database anymore whether you could ssh in and delete an 8GB file or not.
If you really need some reserve space (physical server), I'd much rather store it in a vg (or zfs/btrfs subvolume). Will you remember the file exists at 2am? What about the other admins on your team?
As someone who has been woken up at 2am for this exact issue, emphatically yes. I would much rather be back in bed than trying to Google the command to find large files on disk.
Hopefully if you were doing something like this it would be part of your standard incident response runsheet/checklist.
You have it mail you when it goes over 80% disk usage (and what if you are on holiday)? Does it mail all colleagues? Who picks it up (I thought Bob picked it up, but Bob thought Anne picked it up. So no one did)? Does it come and wake you in person when it reaches 92%?
Will this catch this async job that fails (but should never) in an endless loop but keeps creating 20MB json files as fast as the disk allows it to?
Is it an alerting that finds anomalies in trends? Will it be fast enough for you to come online before that job has filled the disk?
I've been doing a lot of hosting management and such. And there is one constant: all unforeseen issues are unforeseen.
I work in hosting too, and have been for a long time. I feel ya.
On a current gig, we host at heroku. Our monitoring is all about 95th percentile response-times, secondary services, backlogs, slow-queries and whatnots. For another job, "disk space filling up" is important. Again another job will need to monitor email-delivery-rates and so on and so forth.
My org is in the middle of a SRE introduction and for some reason I'm getting a lot of pushback on the topic of 'error budgets' and what to do with alerts when they are exceeded. Can't imagine why.
This 8gb file idea isn't to replace monitoring. It's to offer a quick stopgap solution so you can do things in a hurry and give yourself a little extra "out" when things go awry. Because believe me, they WILL go awry. And if you're not prepared for that eventuality, then I don't know what else to say.
Yes. If I didn't feel that I can trust it, I would get another solution.
> You've never had a server fill up before you can respond to the alert?
I have. With the proposed hack in this article: it would fill up even faster: by that amount of time it would take the problem to write 8gb of data.
> Because believe me, they WILL go awry.
In my experience: not in any way that this would help. If your disk fills up, it's either slow (and your monitoring alerts you days or at least hours before it's a problem) or it's really, really fast. In the latter case, it's much faster than you can jump on your computer, ssh into the machine and delete your spacer file.
Invest in better monitoring, that's much, much, much, much better than adding spacer files to fill up your disk or changing the wall clock to give you more time.
If you've had a disk go full on you, what's the first thing you do? For me, I log in and start looking for a log file to truncate to buy me a few megs of space, at least. This spacer file is just a guaranteed way to find the space you need without having to hunt for it.
Also it doesn't HAVE to be 8GB. On most systems I think a 500mb file would be every bit as effective.
> he had put aside those two megabytes of memory early in the development cycle. He knew from experience that it was always impossible to cut content down to memory budgets, and that many projects had come close to failing because of it. So now, as a regular practice, he always put aside a nice block of memory to free up when it's really needed.
Would've been nice if someone had reserved some space ahead of time. Maybe they did, but nobody was around who remembered that codebase.
Consider how long it takes to edit or recreate art assets to reduce their size. Depending on the asset, you might be basically starting over from scratch. Rewriting code to reduce its size is likely to be an even worse option, introducing new bugs and possibly running slower to boot. At least smaller, simpler art assets are likely to render faster.
This is also the kind of problem that's more likely to occur later in the schedule, when time is even more scarce. Between these two factors (lack of time and amount of effort required to get art assets which are both decent looking and smaller), I think in practice you're actually more likely to get better quality art assets by having an artificially reduced memory budget from the outset.
1. Deal with possibly multiple issues possibly involving multiple people with the politics that entails resulting in a lot of stress for all involved as any one issue could render it a complete failure.
2. Have extra space you can decide to optimise if you want. You could even have politics and arguments over what to optimise, but if nothing happens it all still works so there is a lot less stress.
I pick 2.
Also, don't forget you're hearing decades-later retellings of someone else's story. I don't doubt that they trickled this extra space out as changing requirements mandated it, but that they kept from doing so until the team had actually reached a certain level of product-maturity and reclaimed all of their own waste first.
Remember that the PMs goal is to ship. Them blocking some assets but actually shipping is a success. Better 95% of the product than 0%.
As developers, we need to be better at handling edge cases like out of disk space, out of memory, pegged bandwidth and pegged CPU. We typically see the bug in our triage queue and think in our minds "Oh! out of disk space: Edge case. P3. Punt it to the backlog forever." This is how we get in this place where every tool in the toolbox simply stops working when there's zero disk space.
Especially on today's mobile devices, running out of disk space is common. I know people who install apps, use them, then uninstall them when they're done, in order to save space, because their filesystem is choked with thousands of pictures and videos. It's not an edge case anymore, and should not be treated as such.
Consider the hypothetical scenario of being totally out of memory. I mean completely: not a single page free, all buffers and caches flushed, everything else taken up by data that cannot be evicted. So in result, you cannot spawn a process. You cannot do any filesystem operations that would end up in allocations. You can't even get new page tables.
Hence things like Linux's OOM killer, which judiciously kills processes--not necessarily the ones you would like killed in such a situation. And again, a lot of preventative measures to not let it come that far.
Our Turing Machines still want infinite tapes, in a way.
It felt like everything was falling apart. As soon as I deleted something another app filled it up in minutes. Even Bash Tab completion breaks... There really should be a 98% disk usage threshold in Linux so that you can at least use all system tools to try and fix it.
/home still has space, though, so nothing truly breaks. Perhaps I should file a bug report about that.
"I have an unlimited data plan, I'll just store everything in the cloud." only to discover later that unlimited has an asterisk by it and a footnote that says "LOL it's still limited".
In what situation though? Let's consider disk space. This certainly does not apply to all developers or all programs. Making your program understand the fact that the system has no space left does not seem like something that would be very productive in the vast majority of cases. Like running out of memory, it is not something the program can recover from all by itself unless it knows it created temporary files somewhere that it could go and delete. If that scenario does in fact apply to your program, then it's not even an edge case: the program should be deleting temporary files if it doesn't need them anymore. If the P3 was created to add support for that exact function, then I agree that it should be acted upon. A P3 is fine as long as it's reached. If you don't reach your P3s ever, then there are different issues that need addressing. I'd even say for something littering users' disks it should be higher than a P3, but the point is it's a specific case where it makes sense to handle that error. In every other case, your best bet is a _generic_ exception handler for write operations that will catch any failure and inform the user (e.g. "[Errno 28] No space left on device"), but that's something that should already be a habit.
There are cases when you want to try to avoid running out of disk space because your program might know that it needs to consume a lot of it (e.g. installers) so it will be checked preemptively. Even then you probably do want to try to handle running out of disk space (e.g. in the unfortunate event that something else consumed the rest of your disk _after_ you preemptively calculated how much was required) so you can attempt a rollback and inform the user to try again.
Other than that, when else is that _specific_ error more important than knowing that the data just couldn't be written in general? Let's say you have a camera app that tries to save an image. Surely you'd have a generic exception handler for not being able to save the image, rather than a specific handler for "out of space", which seems oddly specific considering there are literally hundreds of specific errnos you could be encountering that would prohibit you from writing. I'm sure the user doesn't want to see something like "Looks like you're out of disk space. Do you want to try save this image in lower quality instead?"
So my point in all of this is I agree that we should _consider_ the impact of disk space but it doesn't need to be prioritized by developers unless it's actually important like in the first few examples I gave.
For example, I'm working on an NVR project. It has a SQLite database that should be placed on your SSD-based root filesystem and puts video frames on spinning disks. It's essentially a specialized DBMS. You should never touch its data except though its interface.
If you misconfigure it, it will fill the spinning disks and stall. No surprise there. The logical thing for the admin to do is stop it, go into the config tool, reduce the retention, and restart. (Eventually I'd like to be able to reconfigure a running system but for now this is fine.)
But...in an earlier version, this wouldn't work. It updates a small metadata file in each video dir on startup to help catch accidents like starting with an older version of the db than the dir or vice versa. It used to do this by writing a new metadata file and then renaming into place. This procedure would fail and you couldn't delete anything. Ugh.
I fixed it through a(nother) variation of preallocation. Now the metadata files are a fixed 512 bytes. I just overwrite them directly, assuming the filesystem/block/hardware layers offer atomic writes this size. I'm not sure this assumption is entirely true (you really can't find an authoritative list of filesystem guarantees, unfortunately), but it's more true then assuming disks never fill.
It might also not start if your root filesystem is full because it expects to be able to run SQLite transactions, which might grow the database or WAL. I'm not as concerned about this. The SQLite db is normally relatively small and you should have other options for freeing space on the root filesystem. Certainly you could keep a delete-me file around as the author does.