> doing the devops/sysadmin parts of a project is a huge upfront drain.
> Many projects don’t really talk about this. Instead you see a nice README that explains how to do things, but that README that doesn’t reflect the human pain and suffering it took to figure out that the path for your Steam directory depends on the version of Proton you have. You can look at our final instructions (less than three hundred words) and compare it to the literal five months of trying to get things working; that ratio probably isn’t unique to us
I had to move my system to a new SSD with more space, mostly because the great Python venv system makes it that each project takes up between 5Go and 30Go of disk space, so having 224Go was not enough anymore.
Nvidia driver didn't install properly so second reboot was to a black screen. Thankfully after some fiddling with packages it worked again.
Then the old ways I loaded up ssh-agent didn't work anymore, so it took a few logouts to find a new one.
I tried to recompile my CV, but somehow Latex dependencies all changed names, so I had to track down missing packages. Then it still would not compile because the names of the fonts had changed. Configuring the font line was frustrating as xelatex and pdflatex did not expect the same line, and both crashed on a different error that did not lead to an easy solution.
When I tried some game on Steam, it just crashed without an explanation. After checking ways to boot it from the CLI -- which is unsupported but search results lead to partial solutions to that problem anyway -- I found that it was possible to add a DEBUGGING argument to the proton command. Searching the error lead to simply find that some lib was missing (actually the challenge was in finding the proper error to search for). I'm the most amazed at this one since this is not in any way "developer software", but a standard user one, for which at least 1 hour of pain would have been removed by an optional dependency.
I'm cutting down the list but it took days to solve every single broken thing.
By coincidence I watched "how to prevent the collapse of civilization" by Johnathan Blow. His point that nobody is surprised anymore that everything breaks and is bugged, it has become the baseline expectation.
In my Fortran IV class fifty years ago, and it was not unusual for the Burroughs mainframe to be down when one showed up with punch cards or showed up to punch them.
Modern cars are vastly more reliable than the ones I grew up driving. Modern tires last a lot longer. It is true that sixty years ago one could pull tubes out of the TV and test them at the hardware store's plugboard; but it is also true that new TVs without such troubleshooting options are much more reliable. I think that we got our current TV because my wife wanted to watch the 2008 presidential debates. It still works.
As other mentioned, anything AI (Stable Diffusion for instance is 5.86, ComfyUI 5.19, then there is Whisper, OpenLyrics, LLM), ... this is not counting the model files of course.
Some client projects are packaged in a way that bring in a lot of unneeded dependencies. They could be repackaged to be 'lite' but no one thinks this is worth the time investment and is going to pay for it.
In almost all cases venv are needed because the dependencies are brittle and changing the python version or some package version break the program.
Considering that those sizes don't even include the models, what is actually taking up several gigabytes? How is it so much code? Or is it more than code?
I know it includes dependencies, but I'm still baffled.
Like the other comment mentions my venv also regularly reaches 5Gb thanks to deep learning libraries.
The Nvidia packages alone take 2.8G and torch another 1.4G. Numpy, transformers, matplotlib and two dependencies named sympy and triton add another 500 MB, and with that alone we're already at 4.7G.
I dunno about the Blow part. IME since the late 80s computing was mostly wrestling with bugs, crashes, and issues you had to solve mostly on your own, or just live with (t. Guru Meditation vet), and it was only worse during the 9x era. The relatively recent "everything works and is stable" era was more of an anomaly than how it always was.
I think it's a growing experience to some degree anyway. Suffering through Windows 98SE taught me some pretty useful practical skills and expectations that are still applicable in some capacity even today.
Did you want to reinstall or did you not know better? For the latter case, dd (or cp actually, you rarely need dd!) and resize2fs (similar tools exist for other filesystems) have served me well to move to new storage with minimal effort. Copy the whole disk, resize the partition, resize the FS, set new boot drive, boot from new drive.
I thought it would be a good opportunity to improve some of my setup, in particular because I migrated my work config that covered way more cases. Also since I had all those setup scripts that I wrote to setup servers (git, python, postgresql, nginx, ...) I expected it would barely take more time (this part went fine actually!). In hindsight I regret not going the dd way which I had initially planned, even if the new setup is a bit cleaner.
I have been suffering from this venv problem as well. Anyone know if there is a programme or effort to dedup dependencies across projects or something? Feels like it is probably doable.
You can create a virtual environment that you don't source/activate (for your projects), then you can use that to `pip install` the libraries you want to share and add that to your `PYTHONPATH`.
Since your `PYTHONPATH` searches in order your project virtual environment will override any other libraries you have in your path, say with different version.
Will pip also see that and not install what’s installed there? This is most useful in AI tools, but they tend to have their own “setup” process that pip-installs everything regardless. If not, it’s no use.
> Perhaps the best lesson is this: you just need it to work. On your machine. For now. It doesn’t need to be perfect, just good enough to let you do what you actually want to do.
And that's how you end up with the kind of trail of flakiness that brought them so much trouble. Disappearing tutorials and deprecated plugins are a result of this kind of attitude.
A lot of this honestly sounds like "skill issue" - they're trying to do a cascade of things that is multiple layers of unsupported, and a lot of the problems are ones they've made for themselves (e.g. why not set it up on a local machine first, get your setup steps working there, and then put it on your headless machine, rather than having to deal with Xvnc at the same time as everything else? Why not use a working computer and OS instead of buying a new one?). But also this is what doing real work, especially research, tends to look like. The stuff that is hard is rarely the deep mathematical parts, it's the ops rough edges.
Never make multiple changes. Test every case where you have a known answer or oracle, and start with the simplest possible case - because it'll be broken anyway. Log the heck out of everything. Make sure you can visualize the agent's state and inputs. Start with the simplest possible agents like a hardwired sequence of actions and only gradually ramp up the complexity etc. Know your domain very well, so you don't have surprises like the Nethack DRL dev recently discovering a bizarre performance regression was caused by the phase of the moon during testing runs vs training data. etc
Jumping in to a complex proprietary video game stack where this purpose is totally unsupported and wanting to apply fancy DRL is a recipe for suffering, and for Heisenbugs, and for interactions, and papercuts.
You might be getting a downvote for trivializing the issues I think, but there is definitely something to be said for stacking the deck against yourself when it comes to debugging.
I focus a lot on reducing iteration time to keep devs efficient, and the largest part of that is making sure debugging can happen locally, so you can test your changes or step through the code with a debugger.
If your setup doesn't allow that because you have inimitable third party APIs, enterprise data or hosting restrictions for example, you will see a significant cost in development speed, and tricky problems become massive delays.
Also it's unpleasant to work so hamstrung by the tools, and unhappy devs leave.
> I focus a lot on reducing iteration time to keep devs efficient, and the largest part of that is making sure debugging can happen locally, so you can test your changes or step through the code with a debugger.
I think people don't realize how big of a deal this is. I hate that not being able to run and debug your project locally (without crazy mocking) is becoming the new normal these days.
> Many projects don’t really talk about this. Instead you see a nice README that explains how to do things, but that README that doesn’t reflect the human pain and suffering it took [to get working]
This makes me feel better about wasting the last 2 days debugging a failed unit test that was due to duplicate dependencies (with different versions).
expect(mockDynamoDBClient).toHaveReceivedAnyCommand(); // passes
expect(mockDynamoDBClient).toHaveReceivedCommand(UpdateCommand); // fails
Received number of calls: 2
Expected number of calls: >= 1
Expected command: UpdateCommand
Received commands:
- UpdateCommand
- UpdateCommand
This has happened to me multiple times with AWS SDK V3 (and other packages like react, cassandra-driver, etc.) and while there are ways to avoid it, it's not entirely simple to prevent.
Sadly, the most robust solution is often to use duck typing, e.g. instead of checking if the error is an instance of ResourceNotFoundError, check if error.code equals "ResourceNotFound" which works even if there are multiple library versions present. This is especially the case if you're writing a library that might be provided an instance of an AWS SDK v3 client, as it could be a different version.
I've been doing this stuff for 20 years so I'm used to it, I just feel bad for the novices.
Shit, I've been working for 10 years (but only a few months with these tools).
I actually considered the looser validation when I discovered that toHaveReceivedAnyCommand() worked, but, ultimately decided there *had* to be something wrong with my setup and I refused to leave it to hit someone else.
Shame the AWS SDK doesn't communicate the actual error in a more human readable way. I was legitimately questioning my sanity and/or the validity of the library for a minute there.
I think the real lesson here is how bad the state of support has gotten. Whenever they tried to reach someone they were forwarded to people who don't know what they're doing, to forums where no one answers, to dead Reddit threads, all while hinting that paying extra may solve their issue.
Imagine a world where you call Verizon and the person on the phone knows what port forwarding is, or where you write on an official Discord channel and someone from the team replies with a fix.
> What’s the problem? Trackmania 2020 is available on Steam, so just install it through Steam and use Lutris. The problem, just like in the Xvfb section, is that there’s a middleman: Steam opens the Ubisoft Connect Launcher, which itself opens Trackmania
Every single time Ubisoft comes up in an article, I'm reminded how much I despise that company.
I understand your strugles. However i think that instead of bloatbuntu an arch based solution would be "less hassle" the aur has several solutions for steam in various ways and for me to get trackmania running all it took was a yay -Syyu steam launch steam install trackmania and play. I think when it comes down to doing custumized projects like this arch based distro's seem to be less troublesome in some way or there are packages in the aur that greatly help with customized stuff, alternatively, the heroic games launcher also has trackmania.
I think server based distro's are less "flexible" for custom projects like this under the hood it is all GNU+Linux but i think arch offers more flexibility for customizations. And tbh isnt any harder/unstable/or anything else they say than any other distro. Certainly if you are allready familiar with gnu+linux. I would even recommend arch for beginners. Because of the flexibility and the aur.
however i wont say there wont be any struggles to get it working the way you want to since i have had similar "problems" in the past and i think that that will always be to some degree and you will only learn to solve them faster by experiencing more of them. But as you gain experience it will improve.
it is also a case of knowing what will work and what not. I know that i will always make a python venv instead of a conda one because the conda envoirments are doomed with torch, cuda and glibc and conda envoirments seem to get bigger in size as well however my venvs range from 10 to 30gb as well certainly for something like a comfyui with a ton of custom nodes requiring their depencies.
I understand your struggles and congratulate you on your accomplishments and thank you for sharing your experience since it gonna be helpfull for someone sometime.
> doing the devops/sysadmin parts of a project is a huge upfront drain.
> Many projects don’t really talk about this. Instead you see a nice README that explains how to do things, but that README that doesn’t reflect the human pain and suffering it took to figure out that the path for your Steam directory depends on the version of Proton you have. You can look at our final instructions (less than three hundred words) and compare it to the literal five months of trying to get things working; that ratio probably isn’t unique to us