Hacker News new | past | comments | ask | show | jobs | submit login
Wipe and reinstall a running Linux system via SSH (2017) (github.com)
399 points by 1nvalid 41 days ago | hide | past | web | favorite | 79 comments

> This script does not have any provisions for exiting out of the new environment back into something sane. You will have to reboot when you're done. If you get anything wrong, your machine won't boot. Tough luck.

So then "without rebooting" in the title is inaccurate, no?

I haven't studied TFA carefully. But reinstalling running Linux systems via SSH is pretty routine, no? Using debian-installer with network-console, I mean.

I occasionally reinstall remote servers with LUKS via SSH. I just login, build the installer, and reboot into it. Then I SSH to the installer, and almost complete it. Just before rebooting, I go to single-user mode, and setup dropbear in initramfs. Then I reboot.

So OK, that's two reboots, not one. And if it fails, I just reboot using the control panel. If I've really fucked up, I reinstall and try again.

Wipe and reinstall, not reinstall and load/run. I think the title is fair.

I guess. But after reinstalling with LUKS, the box is pretty thoroughly wiped.

Maybe kexec-ing into the new kernel would be feasible?

I investigated this many years ago to see if it was, but I found scant information on compatibility requirements when using kexec to hand over execution to an arbitrary kernel.

The problem seems really hard though. One issue that stands out to me is that even if you properly shutdown the old kernel, will all system devices be in a 'good enough' state to be reinitialized properly by the new one? Or do some devices require a reboot for some reason?

I have no clue.

When I've built custom kernels, I'm pretty sure that the new kernel wasn't active until I rebooted. But rebooting after even dist-upgrade has just become automatic.

For unattended upgrades, you can disable automatic reboot. But then, I think there's risk that some upgrades won't take effect.

kexec is basically the same as a reboot, as far as userspace is concerned. Your sshd is going to go away.

Userspace sure, but what about underlying hardware? How will device drivers react if they come up and encounter hardware that is not coming out of a ACPI induced reboot? Will some devices and their corresponding drivers be OK? Or will the drivers panic when they encounter a device in a weird state?

I'm genuinely curious is all. At the time I was pursing this I decided it was going to get too complicated and that I had to live with a reboot.

This is a concern but it usually works for two reasons:

1. Most firmware is sufficiently broken that Linux drivers are already hardened against devices being brought up in arbitrary states.

2. kexec walks the device tree to shut down all devices before starting the new kernel. This usually gets devices closer to a startup state, or at least a smaller number of known shutdown states.

It's clever how one can restart sshd, after editing sshd_config, without losing the ssh connection. Very thoughtful, and counter-intuitive.

I do this method also (for personal projects I host in vultr/do), I call it "initram shimming"

The Stack Overflow answer linked is horrifyingly delightful: https://unix.stackexchange.com/questions/226872/how-to-shrin...

I love how the first comment praises it for being "straightforward". For reference, this is how you shrink a file system on Windows:

  select volume C:
  shrink desired=4096
I guess it's nice that the Linux version lets you move the partition as well (or "shrink from behind", so to speak), but damn, "straightforward" is not a word I would have used to describe it.

I love that this line sounds like an old sea shanty:

    for i in dev proc sys run; do mount --move /oldroot/$i /$i; done


I now want to make it my goal for every line of code I write to be singable as a shanty.

I think by straightforward they mean you can follow it properly, that's because it's a full step by step that explains each step in enough details to be used, and even includes details for "what to do if you're not in the state expected".

A lot of such guides have a fair amount of "then use tool X to do important part Y" without explaining any of it, making it horrible if you don't know X or the details of Y.

Source: I once tried to fix and resize half-broken LVM volumes and if the sonology guys didn't take time to help me I would probably have just ended up wiping it all and recovering all 16 TB from backups because internet self help is still a harsh place on some linux fs matter.

So you are saying the Synology guys stepped you through it?

That would be amazing support.

(My Synology just works so I don't know what the support is like.)

They asked a permission and for me to fill a form so that they could remote connect to my NAS, and after investigating gave me on the support chat/list thingy the commands that they thought would fix it but they wouldn't run them themselves. Ended up working (issue was that a power failure happened during a shr2 reshape, and while the lvm volume had recovered mdadm hadn't and was blocked in a failed reshape loop, since it was a mix of mdadm and synology's own shr2 [which uses lvm to achieve its goal] trying to find help on the internet was next to impossible, which is why without them I would have wiped the whole thing).

Was as a personnal customer and no paid support (beside having bought the product of course), within the second year of my purchase. Overall I've had a fair amount of support requests with them for my personnal NASes and the couple dozen I manage for professionnal purpose, and I'm very happy with that relationship.

PS: my original support request was very detailled though, I did not just go and ask "doesn't work, fix it !"

Had some quite exceptional experiences with them as well. Good support for a decent product.

And this is how to shrink a mounted filesystem in Linux when you use btrfs:

  btrfs filesystem resize 4g /
It's just not supported for ext4. You can grow an ext4 file system, even when it is your currently mounted root, just not shrink it. While it's great that NTFS does (now) support it few other file systems do because it's a relatively uncommon and quite risky.

I certainly wouldn't shrink mounted filesystems even on those that support it because storage systems are quite fickle and the threat of silent corruption is real.

For most filesystems what you normally do is a full dump and restore. Closing all file handles and live migrating to a moved root filesystem is someone's idea of a showing off, it is absolutely not something anyone would do in production.

Does that also shrink the partition? Otherwise you're not comparing equivalent things...

P.S. I realize an existence proof makes us mathematically happy, but it's a little disingenuous to suggest file system experience on Linux and Windows is comparable merely because of the existence of some uncommon file system with similar capabilities. The reality is most Linux users are on ext4 rather than on btrfs or zfs, and most Windows users are on NTFS rather than ReFS or FAT32, so their experience with shrinking file systems is not going to be remotely comparable.

> Does that also shrink the partition? Otherwise you're not comparing equivalent things...

yes, it does.

btfs is already the default in SUSE derivatives. It remains to be seen how the other distributions will handle it within the next ~3 yrs.

its only just become stable enough to be usable in production.

The btrfs resize command only changes the size of the filesystem. A fdisk or similar command is needed to adjust the partition size as well.

Well, the answer more like "it depends on what you want, exactly".

In the specific example of btrfs there is an extra layer of indirection just like there is with ZFS. Filesystems live inside pools of devices, and when one is shrunk that leaves room for another one to grow. You wouldn't want to resize the physical volume or partition unless the device is shared with other types of filesystems.

(This is why ZFS and btrfs sometimes are referred to as "layering violations". Other filesystems expect the logical volume manager to pool devices into logical block devices.)

So resizing an btrfs filesystem absolutely makes sense even in isolation.

What you describe is accurate for ZFS, but not for btrfs. Shrinking a btrfs filesystem with the resize command only makes sense if you are going to follow up by shrinking the partition or other underlying logical block device (eg lvm). btrfs does not have ZFS's concept of a device pool from which multiple filesystems can be allocated.

It's close enough. A completely valid use case for shrinking a btrfs file system is to remove one of the physical volumes from it, of which there can be several. They can also be split into subvolumes, which is the canonical use case for partitioning your block storage.

Neither of those use cases involves the `btrfs filesystem resize` command. Removing a physical device is accomplished with `btrfs device remove`, and subvolumes don't have a capacity unless you enable the optional quota features (and even then, quota groups use thin provisioning, so setting a lower quota limit doesn't free up any space for anything else).

You can totally shrink an ext4 filesystem.

Not without unmounting it first.

You can also resize partitions online in Linux. The point of the article was how to migrate the running system into memory and then wipe and reinstall the system. Try doing that from Windows :)

He is not talking about the article but the stack overflow question and answer of parent; and the point of that is to resize the root partition without booting into another OS, not to wipe or migrate to memory (ergo the question being asked: "How to shrink root filesystem without booting a livecd").

You can do that just fine on Windows, hell you can even do it with a nice GUI using computer management.

Yeah, but the Linux method is so much more generic; so can be used for more than just downsizing the partition.

It's far more generic but also practically useless for an end-user. Really, its only use seems to be for someone who is a remote sysadmin of some sort. You have to stop pretty much everything on the computer, and still go through a reboot. The only difference thing it buys you is being able to stay SSH'd at the cost of wasting so much more time and going through so much more risk and inconvenience. On Windows you'd just shrink and keep using the system as usual.

>It's far more generic but also practically useless for an end-user.

Do end-users really mess with partitioning usually (outside of formatting brand new disks I suppose)? I'm not asking rhetorically, I suppose there must be a use case if MS implemented this (tricky) feature but I can't really imagine any of my non-techies friends and relative decide to shrink a partition (actually most of them probably aren't aware of the concept of partition in the first place).

Replacing your hard drive by a larger one, duplicate partitions and then resize them is very common in the windows end user world, including with root partition (eg replace by a ssd, or a larger ssd). There are lots of paid tools to help do that, notably the duplicate part.

Now personnaly I find it weird, I would rather use the excuse to wipe and start clean if root, and for non root just making the partition you want and copy the content instead feels cleaner, but is it fairly common nonetheless.

They indeed don't know the concept of partitions, but they google "replace my hard drive by a larger one" and follow a guide usually (and such guide contains link to a specific duplicate took they can buy, of course).

"End-user" is not a synonym for "layman". Are you a developer? Do you work on your laptop? Great, you're an end-user.

More generally, I was trying to approximate "the set of users who aren't solely sysadmins of remote systems". Substitute for it whatever word you see fit.

I fully expect a developer to be able to follow the instructions given in the post then. It's not more complicated than your average framework or build system.

> I fully expect a developer to be able to follow the instructions given in the post then. It's not more complicated than your average framework or build system.

And I never suggested a developer would be incapable of following these. Seems like you got so sidetracked in arguing that you forgot what the discussion was actually about. See my original comment: https://news.ycombinator.com/item?id=19357672

I recently installed ubuntu and there was no warning that the previous (many years ago) functionality of a default swap sized large enough to allow for hibernation was no longer the case. I installed using defaults without much thought because of course why wouldn't hibernation be possible using the default. Now I want to resize my partitions.

I thought you'd still need to reboot on Windows? Last time I tried to do any changes to the system partition I had to reboot for it to take effect.

If this is still the case then it's not really a far comparison to Windows because you could then do the same thing from a boot media in Linux (ie boot into it, resize and you're done). In fact you probably could also do that via a GUI (maybe gparted?).

In any case I do agree that Linux does still leave a lot to be desired when it comes to making some of the more advanced file system operations far more complicated for end users than they need to be.

> I thought you'd still need to reboot on Windows?

Not if all you're doing is shrinking from the end. When was the last time you tried? I'm guessing the XP days?

> I'm guessing the XP days?

To be fair, it might have been. I'm not a heavy Windows user but the fact I can't recall the last time I resized the system volume probably says more about how long ago it was.

I'm glad you at least phrased it as a question rather than as an assertion then! There is little more infuriating than a *nix fan blasting Windows in 2019 based on their experience with it 15 years ago.

Can't you do the same (in Linux) e.g. with GParted?

The "good" Linux answer would be to do the equivalent to your script on parted. I have no idea why an explanation of how to remotely install Linux on a machine while running Linux become the accepted answer.

But about Windows, last year I had to resize the partitions on my work desktop, there are enough restrictions around changing things on the disk that Windows is installed that it became a week long research task for our IT support people.

> But about Windows, last year I had to resize the partitions on my work desktop, there are enough restrictions around changing things on the disk that Windows is installed that it became a week long research task for our IT support people.

You're suggesting the research phase for partition management is somehow easier on Linux than on Windows?

> I have no idea why an explanation of how to remotely install Linux [mess with partitions] on a machine while running Linux

The question was explicitly about shrinking the root filesystem without booting a livecd or any other OS.

Gparted cannot do that directly because the root filesystem cannot be unmounted, and the appropriate answer for nearly everyone else (boot from a livecd and use gparted) doesn't apply because the question explicitly bars this option.

The SO answer creates a root environment and uses `pivot_root`, seems you could "just" duplicate the root filesystem, pivot to that full copy, then unmount the original root and do what you wanted with it?

You can remount the root filesystem elsewhere, but not unmount it while it is still the rootfs. Your suggestion would work but it involves restarting nearly everything and having another partition with enough space to copy everything into. It can be done but is pretty hard in general, so gparted just doesn't support it. If you really want/need to do it, you can do so. Otherwise just use a livecd and make your life easier :)

I assumed that the point was it's a remote situation and so mounting a liveUSB would be costly.

The question explicitly stated that booting another os wasn't an option, so you are right in this case.

However, you should avoid being in that situation entirely, which is nowadays very feasible: all modern servers include remote management facilities, and all sensible virtual server providers give you some way or another to boot from a network image (and get a console session through VNC).

If your provider doesn't give you these options you should definitely switch providers. Not having this option means that you are always one mistake away from total doom (server won't boot -> you will never ever be able to access it again).

Yeah, admittedly if you don't know what's going on there it might feel pretty convoluted and "wtf am I doing here". OTOH, on Linux you could do this since forever, it's not been that long ago that Windows finally shipped tools to do so. Now actually, Windows seems even more convenient since you can easily shrink a FS that is online. On Linux I'm not even sure that is possible; maybe with btrfs but I never really needed to do that, so I have no idea.

> it's not been that long ago that Windows finally shipped tools to do so

I beg to differ, Computer management > Disk management has allowed you to do that in a visual and safe way since at least windows seven in 2009. God knows since how long the underlying cli commands have been available.

While setting up a new laptop I tried to use it and I ended up having to force kill Disk Management after it was "shrinking" for a couple hours. I think the underlying command had succeeded but it just froze. I've never had that kind of experience with (G)Parted, and I think I still trust the Linux tools more...

Iirc it couldn't resize mounted partitions (that was win8), and shrinking always just failed with no meaningful error message.

This is mostly filesystem-dependent. If you have a LVM or ZFS volume for instance resizing is (usually) trivial. When you think about it resizing and in particular shrinking a filesystem is very much a non-trivial operation behind the curtains, it's not surprising that some filesystems don't support it out of the box especially since it's not a super common operation in the wild in my experience.

It's not trivial and also it's likely to be very poorly tested scenario. I would recreate FS even if it supports resize.

Thank you for making me double-check, you're right that ZFS can't shrink volumes but you can use resize2fs to shrink EXT2+ volumes offline relatively easily, see for instance https://blog.shadypixel.com/how-to-shrink-an-lvm-volume-safe... for a LVM + EXT example. What makes you say that it's poorly tested exactly?

Of course if you want to shrink the rootfs without rebooting you'll have to do it while mounted and I'm not sure if that's supported by any Linux FS out there (outside of NFS I suppose). That being said I think that's understandable, implementing resizing of a live FS seems very tricky to get right and not extremely useful IMO.

Can Windows really let you shrink NTFS while mounted? That's a pretty impressive feat if that's true, I wonder what motivated that.

> Can Windows really let you shrink NTFS while mounted?

Yes, I have done it, even on the running system partition, and no reboot required. Just right click the partition in Disk Management and select Shrink. It will calculate the smallest size the partition can shrink to, and you can use that or any larger size.

The amount of shrinkage available seems to be semi-random though, you can't shrink away all of the free space.

That will happen when there are unmovable files. There are a few tricks you can do to unlock some of these files, or you can use an external utility that you reboot into.

This article is from a vendor of such a utility, but it also describes how to unlock some of the unmovable files within Windows:


For a lot of filesystems, growing is indeed trivial and even shrinking is explicitly supported. Why would this be less tested than any other feature?

Because users put their systems under various I/O workloads every day, but very few people resize their file systems and they do it very rarely.

Case in point: I tried to shrink an NTFS filesystem offline with ntfsresize (offline!) and this happened:


This is a tool that is part of most Linux installers and tested by huge numbers of people, and yet things still went wrong. Shrinking filesystems is hard, and this was offline. Shrinking filesystems online is much harder.

I've rarely had this work successfully on Windows... normally after several hours it pops up an indecipherable error message

presumably because it couldn't make the filesystem as small as I asked for, for whatever reason

It's straightforward for me, but to each his own.

I can't believe I didn't know about column before. I've really been missing out.

Not needed anymore since DO supports custom OS images now but I found this script quite interesting, setting up a "blockplan" and applying it from a minimal root FS in RAM to entirely replace a Debian OS with an ArchLinux install unattended and without rebooting (except as a very last step to actually boot into the Arch kernel using the replaced GRUB2)


I forked it to try and convert Debian installs from ext4 to btrfs. Unfortunately while it does work, that's a very bad idea since btrfs-convert produces a fs that fails to operate properly on the long run (IIRC there is - was? - some increasing random space usage that cannot be reclaimed, ever).


BTW, the process to convert to btrfs is excellent, only creating btrfs metadata in unallocated ext3 space, writing the btrfs header at the last minute, and using subvolumes allowing you to keep the ext3 metadata around as long as you want to roll back (obviously losing any subsequent modifications) because barring the header the whole ext filesystem and data is untouched.


I suppose Apple did something similar to convert from HFS+ to APFS so swiftly and so reliably.

I actually implemented something on an embedded linux product once: a busybox initramfs that you could boot into "recovery mode," then along with dropbear, SSH into a completely in-memory system and re-flash the entire system image without having to pop out an SD card or connect a cable for DFU.

The recovery mode could even be initiated remotely, so you could re-flash a device without ever touching it. Of course you have to be careful, if the re-flash failed you could be SOL :) Apparently I need to go back and improve it so we can re-flash without rebooting!

These days you can use things like containers (Balena also looks very cool) to achieve a similar goal in possibly a "safer" way. But the idea of being able to re-flash the entire system while running it felt sort of like changing the engine of a car while driving it down the freeway!

I've implemented something very similar for upgrading headless Linux mobile robots. One nice property of this approach is that the in-memory installer environment and associated scripts can be common between USB-based install media and a remotely-triggerable kexec type installer.

At first, it surprised me there wasn't more standard tooling out there for this kind of thing, but as I got more into it, I realised how specific to our particular needs my solution had become, and I could see how it would be hard to offer something generic that would be a good fit for a wide range of use-cases without being super-bloated.

We upgraded a whole fleet of AWS / Digitalocean instances without floating IPs from Ubuntu Precise to Xenial based on this method back in the summer of 2017. While obviously not having to do crazy stuff like this would be better, it's nice to know that it is possible if you really need it.


Recently we ran into another use-case for this in production actually, we needed to wipe a lot of servers in our datacenter remotely and we figured one of the options would be to install some OS in memory with the relevant wiping tools, pivot_root to that, unmount all disks and then perform the wipe. In the end we went a different route and opted for a custom PXE-boot image instead that the servers would boot into that scripted the whole thing.

The closest I think I came to doing this was to migrate a running Debian system from being an i386 system to being an amd64-system, in-place:


The first step is to update the kernel from an i386 one, so that it could run both i386 and amd64 binaries, but then you essentially overwrite every package with the version from the new architecture, and hope like hell it doesn't mess up.

At the time I had a pair of servers, a mail-host, and a web-host, and I managed to successfully upgrade both, although it was a little scary. At least I had console access if things did get horribly screwed up.

You could first chroot to a minimal stable system then update everything without having to rely on hope so much (but yeah big updates are always "exciting" and yum/apt/etc never get it completely right)

The first two links you posted have 1 and 0 comments respectively. The third one is helpful though with lots of discussion.

This whole thread just makes me glad I refuse to work on bare metal anymore for anything but the most niche of cases.

AWS everything.

"you know you want to" Okay, I'm in.

You can also do this with NixOS...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact