I don't know why dd remains the standard for so many tutorials when there are competent GUI tools that can take all the guesswork out of it. It's like a perpetual hazing ritual for new Linux users.
You can also do something like
dd if=file.iso of=/dev/disk/by-id/ata-Samsung_SSD_840_EVO_120GB
I've also been doing paths through `/dev/disk/by-id/` in the ops documentation and scripts for my startup.
It's a tiny convention for us promote, and one of countless good practices we exercise, but this particular one can save much misery, at negligible cost.
And early, precarious startups are one of the contexts in which some of these tiny negligible-cost good practices seem to pay off especially well: in an early startup, it's easy for me to imagine a data loss, bad downtime, or missed opportunity ending the company (when a more-established company might be able to weather a mess-up better).
In short, I think ignorance is the simpler answer ;)
Not even then. The prospect of making an occasional mistake, while low, is clearly much higher than when you are selecting the disk using its name. Being experienced doesn't excuse you from the responsibility of using a less risky tool.
Usually on Linux I only use /dev/disk/by-id/foo in scripts and config files. But using it when doing routine stuff with dd is a pretty good idea too.
It’s too bad that macOS and FreeBSD hasn’t anything similar to my knowledge. And since I use both of these operating systems so much, and so often for things involving dd, I think in my case I don’t gain much from doing by-id for dd on Linux unfortunately.
Even this is faulty solution. By default, it only lists the devices with their "/dev/sd*" names.
It's full of other great nuts-and-bolts advice.
I feel it's important to mention you want by-id as there are 5 permutations of /dev/disk/by-*:
[0:0:0:0] disk ATA ST2000DX001-1NS1 CC41 /dev/sda
[3:0:0:0] cd/dvd ASUS SH-224FB 1.00 /dev/sr0
[4:0:0:0] disk Generic- Multiple Reader 1.11 /dev/sdb
/dev/disk/by-id/ata-Samsung_SSD_860_PRO_1TB_S42NNF0K123456N -> ../../sdc
For folks just getting started or more comfortable with a GUI, I’d recommend giving USBimager a look. It does exactly what’d you’d expect based on the name, it performant, and they have native apps. No affiliation, just a fan of a KISS app done right.
Though if it's just for ISOs, ventoy is fantastic, just drag and drop the file and no burning at all :)
I ran into this very issue when trying to make a bootable Windows 10 USB on macOS. No amount of fiddling with dd, unetbootin or Etcher resulted in a bootable USB. Despite being principled about it, I had to admit defeat and just pulled an old <4GB Windows 10 iso and flashed that to the stick.
I know I could have installed Linux through a virtual machine and got it done that way, but that seemed horrible overkill. Oh well.
This has always worked for me to boot UEFI installers
Recent windows ISOs have a file that is >4GB, so you can't have the partition formatted as f32.
exfat isn't compatible with uefi.
These are the two largest files in that image.
Dism /Split-Image /ImageFile:C:\folder_name\sources\install.wim /SWMFile:C:\folder_name\sources\install.swm /FileSize:3800
Wouldn't one (at least partial) solution here be, that the kernel should refuse to let you write directly to a block device in use by a mounted filesystem? (Maybe combined with some special ioctl/whatever to bypass that restriction if you ever really need to.) Then, if you are running dd from an OS running on your main drive, the kernel will refuse to let dd overwrite the main drive, but will let it overwrite the flash drive (which presumably is not mounted, and anyway shouldn't be if you are about to overwrite it)
I have written a few wrappers like that on my own system preventing me from making a few common mistakes of mine (like scp'ing a file locally to a filename ressembling an IP address instead of on a remote server ;)
I think this could be addressed by the idea of a "parent block device". So /dev/sda1 is mounted, then its parent /dev/sda would be classified as mounted, but /dev/sda2 would not be (assuming there is no partition mounted there.)
I'm sure one could work something out that would work for device-mapper, LVM, etc as well
> Secondly, the Linux kernel has a very strong backwards compatibility guarantee
What about a sysctl knob? Turn it on, you get this new behaviour, turn it off, you get the backwards compatible behaviour. Each distribution can decide what to default it to. If it defaults to off in Linus' tree, that should satisfy his backwards compatibility concerns.
> With your idea, many disk management programs will be broken.
There would need to be some escape hatch, e.g. an ioctl, to allow unsafe writes. And disk management programs would have to be patched to invoke that escape hatch. A distribution wouldn't ship the sysctl as defaulting to on until it had patched all the disk management programs in that distribution. And, if you download a third-party tool, either its developers have patched it to use that ioctl, or else you can just temporarily turn off the sysctl knob while you use it.
While dd isn't the best tool for writing images to devices (if only because of its arcane and bizarre command-line syntax), it is a valuable tool when you want to recover data from media that has errors. The 'conv=noerror' option is a lifesaver for when you want to recover something from your media.
But overall I agree: plenty of people recommend using dd just because "it's always been done that way" and often also because it makes them look smart :-)
I hate tutorials that start out with “I’m gonna show you how to do X. I like to use Y, but since the Y tool isn’t installed, we have to get it. Start by editing sources.list. You might need to be root to do this. Here is how you do that.” Halfway through the tutorial we’re still Yak shaving.
ddrescue -b 2048 -d -r 3 -R -v /dev/sr0 image.iso image.log
If you try to read with a larger block size than the media's native block size, you'll get errors for chunks of that size even when part of the data may have been recoverable.
The above is true whether or not the OS has access to the media's raw ECC data.
From experience of recovering bad DVDs and BRs, there were discs I had to pass through 100+ times to get a valid read on all blocks.
This is also because, the underlying hardware will always do a full ecc block read (only way for it to determine that it read the block correctly, to read whole block and verify it), so any smaller reads are pointless.
the way optical media works is that the optical media reads bits from the drive in ECC block size and then verifies / fixes the block and passes that back to the OS if its has a valid block, otherwise returns an error to the OS. hence, my logic that I describe below.
optical media is an unreliable medium in general and hence depends on the ECC codes to ensure blocks are read correctly and they are used a lot. back in the day of CD and DVD burning there were fancier burners that provided apis for reading the error correcting stats into user space (i.e. how many of different types of errors were corrected), dont know if they still exist. It was never 0 across the board, but that's how the medium was designed, not to require it be 0 across the board.
As you said, the nature of the medium requires ECC (side note: modern hard drives do too). So if I ask for a 2048 byte sector, the drive has to read the ECC. So why ask for more than that? It already knows the sector boundaries. In other words, if I tell `dd` to use a block size of 2048+$ECC, won’t that actually work a sector and a half (well, 1 + $ECC/2048) at a time?
: In fact, unlike CDs, I don’t even know or have any way of finding out how many ECC “bytes” there are in my hard drives' sectors
you just care about the size of data the ECC is protecting. if the ECC protects 16K or 32K of data, you want to read on those physical boundaries. as then you'll read a whole ECC block and it will either pass or fail. If it passes, you never have to try to read that ECC protected block again (and maybe fail).
Of course, there is one hitch to my scheme. Ensuring that you always read on ECC protected block boundaries. I'm pretty sure if you use ddrescue to always read the right block size it will, but not 100% (why not? perhaps the ECC protects data not visible to the end user in some way (say the first block is only 8kb, not 16kb in practice).
on the issue of hard drives, there is a lot more going on that puts you at the mercy of the firmware (relocatable sectors and the like). I did lose a RAID5 once (1 drive totaly died, and then in rebuild, another drive threw and error) , and I was able to use ddrescue to recover all but 4K block on it. as I was using a 128KB stripe size, that meant I probably lost somewhere between half MB and a MB of data - if the 4K was contained within a single stripe or not (probable it was). I was content with that. never did discover what data if at all was corrupted, but I was able to recover the raid5.
simplistic case, imagine we have 1 ECC block of 16k, but we read at 2k, so we'll number the 2k blocks 0-7
T0 - read block 0, fails
T1 - read block 1, succeeds!
T2 - read block 2, fails
T3-T7 repeat for blocks 3-7, all fail
in practice if we read a 16k bock at T1, we would be golden (and finished). Instead we did 8 steps, and only got 1/8 of the data.
This is becaue the OS doesn't have a concept of the hardware's ECC block size, so the optical hardware in a sense virtualizes it, and the OS will just keep on rereading the same ECC block on the media and possibly continue to get errors.
I actually like dd's command-line syntax the most out of the Linux coreutils, and I wish more programs used a similar "key=value" argument system. It's pretty easy to remember too, since there's really only a few keys you need to remember to do 99% of what dd typically gets used for.
It is not arcane, you just have to read docs, like for every command line tool. Different cli applications have different syntaxes usually influenced by their domains, eg compare find, tcpdump, iptables, cut, docker.
The tool is originally meant to do various conversions of formats of data on 8-track tapes and both its name and syntax is reference to (arguably less bizarre) syntax used by IBM's JCL to produce contents of tapes that need that kind of conversion to be usable on unix.
I used dd a lot to burn images on USB. And that command is simple as expected.
I didn't say it wasn't simple. But it is very weird.
Do you realize that anyone can make a cli tool with whatever syntax they like? :) Also, see "man xm", which uses quite similar approach. And there are probably many other examples.
One trick to be sure that you've typed correctly, start with 'echo':
# echo dd if=xxx of=yyy [enter]
This way you'll be able to check that command will be executed with the parameters you want. Especially helpful if you're running loops, e.g:
# for i in xxx; do echo yyy; done
I would never use GUI for that, as you can't be sure what it will execute, no matter what it shows.
You miss the point entirely then. The point is to use a command line because what you think is competent for a GUI tool isn't nearly as intuitive as you think especially when a GUI isn't available.
The root user is special in that it can overwrite devices. Look at what you’re typing before using its powers.
Other than that, someone can replace the whole of Etcher with a simple bash script that will list available block devices with nice descriptive names taken from sysfs, allow the user to select the one he wants to flash to, and handle flashing of potentially compressed image and verifying the result automatically.
It would probably not be much longer than 100 LoC, at least on Linux.
If a user blindly copy-pastes and destroys their main drive, that's a very good lesson to thoroughly double check before running destructive operations.
Also the advantage of dd is that you can run it with sudo easily, while with cat it will not work, since the redirection is done by the shell and not the cat binary itself, and you either have to open a root shell, or pipe the output of cat in `sudo tee filename >/dev/null` that is less than ideal.
Also I think the problem is of Linux that lets you write on disks that are mounted. In MacOS is forbidden and you must umount the drive (so overwriting your root partition by error is impossible).
I just ran gparted:
When I realized what I had done, I couldn't recover the partition table, but I managed to rsync everything elsewhere. I'm sure there was a better/safer way but everything was recovered in the end.
I've often used whatever disk utility came with my distro. Or, rufus (rufus.ie) if I'm on windows.
No later than yesterday I copied my wife's entire (Windows 8.1) HDD to an SSD using the free (GUI) version of "Macrium Reflect". I'm pretty sure you can make the exact same mistake: copying destination unto source instead of the contratry. I used a Windows software and not DD for it was a Windows computer and I didn't feel like booting a live Linux CD to do the dump but under Linux, I always use dd.
Is it really that hard to learn that in dd the 'i' in "if" means "input" and that the 'o' in "of" means "output"?
The problem with "making things simple using a GUI" is that you typically totally lose the ability to do not just advanced things but even "average" things. For example for read-only media I always write the checksum on the media itself, using a sharpie. That way I can easily verify that my disk (or the copy) is ok doing a dd and piping into sha256sum (there are some gotchas to keep in mind but it works fine when you know how to do it).
Like, say, a Debian install ISO. I like to have these on read-only medium and make sure the checksum matches the official one (so I prefer a write-once / read-only DVD than a read/write memory stick).
How do you that with the GUI? Piping into a cryptographic hash?
I can understand that people prefer GUI over command line for many things but I think that people imaging entire disks are at least power users and can learn the difference between "input" and "output". Heck, maybe it's even doing them a service to teach them to use the CLI. Maybe it's the opportunity to teach them about piping, about cryptographic hashes, etc.
And once again: you can totally screw up with a GUI too.
I don't mean it in a bad way at all but... Linux on the desktop (which I use since 20 years) ain't exactly enjoying a huge market share compared to Windows or OS X and I think that's fine. If a Linux user cannot or isn't willing to learn dd, maybe that user is better served by Windows or OS X. And really: I don't mean it in a bad way. I don't think Linux has to "win" the Desktop war. I don't think it's wrong not to use Linux.
But I do think it's wrong to believe users willing to learn Linux cannot learn the difference between input and ouptut.
Also a strong case can be made that someone for whom it's a "disaster" to overwrite it's main drive is one hard disk failure away from disaster anyway.
I'm for educating users, not baby-feeding them with tools that are going to keep them in the dark and reinforce their bad practices (like not doing proper backups and hence being "one hard drive failure" away from disaster).
DVDs are for the most part pretty easy to duplicate identically. CDs have a lot of quirks, and it's very easy to end up with a useless "backup". Especially if the file format is ISO instead of bin/cue or similar. If one is trying to duplicate CDs/CD-ROMs, it's really vital to cross all your Ts and dot all your Is.
I'm simplifying here somewhat for space, but the CD-ROM specifications allow for at least two ways of storing the data on the disc.
A 100% vanilla data CD-ROM with no copy protection uses 2048 bytes per sector for the data that's visible to you as a user, and the remaining 304 bytes in that sector are used for data that helps recover the user-level data if any of it is unreadable (kind of like a RAID5 setup).
Mode 2/XA discs (or mode 2/XA sections on mixed-mode discs) use those 304 bytes per sector to store user-level data instead. i.e. they are trading more space for less reliability. PlayStation games are the most common example. If you've ever tried to copy XA audio or STR video files off of a PlayStation disc in Windows and wondered why you got an error, or why the files copied but were corrupted, that's why. Your PC was only copying 2048 out of every 2352 bytes in each sector.
The ISO file format for discs can ONLY support 2048 bytes per sector. This is why groups like Redump use bin/cue for anything that comes off of a CD. If you convert a bin/cue to ISO, you are throwing away a little over 10% of the data in every sector.
If you want to learn more about this, read the specs for different types of CDs and CD-ROMs. It's a big mess, and I think everyone is happier that the industry standardized down to fewer options in the DVD era.
 This is in addition to the physical-level redundant data encoded in the pits on the disc itself, but AFAIK almost nothing can read discs at that level.
This is my mental model for what's going on:
There's a bunch of bytes stored on a CD-ROM in a defined order. Zeroes and ones. Copy them in order. You should now have a file on magnetic disk or flash that is precisely those byes in precisely that order. Anything that can make sense of one should be able to make sense of the other.
What am I missing here?
You may not care about copying the inode structure when you are copying your files from point a to point b, but you should when you are cloning the disk.
Many times the software on the CD is looking at the physical layout of the disk, not just the logical data, to function correctly.
That works fine most of the time. But imagine if the computing industry had come up with a "RAID5 Mode 2" that was actually RAID0, where the parity disk was just used to store more user data instead of parity data, but most of the copying tools didn't know the difference between "RAID5" and "RAID5 mode 2", and so they just copied 2/3 of the data, on the assumption that the parity data would be recreated on the receiving end. 1/3 of the user data just went down the drain. That's basically what's happening when you try to store anything other than a 100% vanilla data CD-ROM as an ISO file instead of bin/cue.
Windows had a lot of great disc software in the 90s/00s which would handle just about any disc format one would encounter, e.g. Alcohol 120%, CloneCD, Disc Juggler. Some of these were capable of backing up the various protections of the time such as SecureROM and Starforce.
dd is a powerful tool but I often find it the "wrong" tool, e.g. is it desired to clone a 2GB partition layout to a 128GB usb stick for a live CD and have to manually edit my partition tables myself?
Reasons to use dd include being able to set the block size, seek and skip to resume copies or extract parts of the data, and a whole bunch of special options for the very specific use case of large binary files or block devices.
Reasons to use cp include it being file based, so metadata and recursive copying is natural, efficient, and easy to use. This probably includes most copying on a daily basis. See also rsync.
I agree that the use cases for all tools are different so comparing them fairly is impossible. dd is still my go-to for making disk images, but it's unnecessary and has high overhead without a lot of tweaking for simple tasks like file operations. cat is often a good tool for things like piped commands where you need a flag cat supports (so you can't use shell redirection) and cp/rsync are obviously superior for copying files rather than just data. Sure, uu could hack my commandsin such a way that one tool can do another's job, but why would I?
I'd previously used cdparanoia, but found it not paranoid enough.(for some reason it always outputs aiff regardless of options as well)
I'm looking at defeating the drive cache by actually reading the disc in a second and 3rd drive and doing a NASA style "all 3 must agree".
It's annoying, but I've got 800,000 CDs to rip right now, and I have to make sure each one is perfect.(Without proprietary solutions)
I also need to be able to positively reduplicate heavily scratched discs. It's a tough problem.
There's a community of people ripping Audio CD to flac file and making sure all their rips are bit-perfect.
I've ripped my own collection and used, as far as I remember, "whipper" on Fedora Linux (for whatever reason I couldn't make that software work on Debian back then). After ripping a track/CD the software would automatically verify the checksum with an online DB of rips.
You say you can't use proprietary solution: as far as I know there are several free rippers on Linux and the online DB of checksum isn't proprietary either (?).
In case you rip with an error, it's near impossible that someone else who ripped the CD would have read exactly the same error and end up with the same checksum.
Now: for all the audio CDs I tried there was at least one other person who had already ripped it but it's not always the case. For example I've read about some collection of classical music CDs coming in pack of 300 CDs (!) and nobody had bothered, at home, to rip them / to share the checksum.
But it can already help for all the "common" CDs you own.
How does any of that prevent you from using those checksums as an indicator of whether your own ripping process is working as expected?
The scripts I have are very streamlined, and make multi-machine ripping very fast. I will probably just use md5 or sha1 as my checksum.
But also, I'll have all the flac files so I can always do a full comparison to see exactly what the difference is.
I've got enough copies of each disc that I don't need their data, and I want a better check than they do.
Let's say you do this 8 hours per day for 5 days. The above series of movements can take around 30 seconds. That's 400k minutes which results in 400k / 40 hours == 10k weeks. At 50 weeks per year (only 2 weeks vacation each year) that still results in 200 years!!. Even if you do one CD every second this result in 200 years / 30 ==~7 years.
People don't understand the scale, and why I might want to make sure I do it right the first time rather than get halfway through and have to start over because I missed something.
I suppose you're going to pay people to do it, so lemme help you a little.
6667 hours at let's say $15/hour that's roughly one million dollars for this task. I hope you have large pockets ;).
That's 6 per minute.
360 per hour.
800,000 should take about 2,222 hours.
I've got a bunch of people that will work for about $10/hour, and that's only $22,220 worth of labor over 277 days for 1 person.
I'd like to be able to take the pieces of a broken disc and verify somehow that it's a legit copy. Then I'd treat the pieces like every other copy of that disc and give the owner access.
Redbook audio CDs does use CIRC error correction, by storing 8 bytes of parity data inside each 33 byte F3 frame. It is not enough to correct all errors though, and Yellowbook data CDs stores extra correction codes on top of it. (276 bytes for each 2352 byte sector).
(and there's also issue below F3 frame level about EFM modulation, whether merging codes are generated properly to keep DSV low enough)
See: https://byuu.net/compact-discs/structure/ https://john-millikin.com/%F0%9F%A4%94/why-i-ripped-the-same... https://john-millikin.com/%F0%9F%A4%94/error-beneath-the-wav...
This is going to be quite a task, even with a bunch of the big boy 1000 disk auto loaders. Im not familiar with any solutions capable of opening CD jewel cases. You might want to contact few hackerspaces to inquire about building specialized machines just for the task of loading disks into spindles.
btw why not proprietary? like http://wiki.hydrogenaud.io/index.php?title=Exact_Audio_Copy
You can't (easily/legibly) present a one-liner that combines cat and sudo because sudo doesn't give the shell redirection (sudo cat >> /dev/whatever) permissions. But "sudo dd of=/dev/whatever" works fine.
Edit: Yes, you can make a one liner via sh -c, tee, etc, but depending on cut/paste with quotes is tricky, or the one-liner gets long.
(This obviously ignores that you should almost never use `su`)
Performance. This is an I/O-bound process; the main influence in performance
is the buffer size: the tool reads a chunk from the source, writes the chunk
to the destination, repeats. If the chunk is too small, the computer spends
its time switching between tasks. If the chunk is too large, the read and
write operations can't be parallelized. The optimal chunk size on a PC is
typically around a few megabytes but this is obviously very dependent on the
OS, on the hardware, and on what else the computer is doing. I made benchmarks
for hard disk to hard disk copies a while ago, on Linux, which showed that
for copies within the same disk, dd with a large buffer size has the advantage,
but for cross-disk copies, cat won over any dd buffer size.
Apparently it's nothing to do with that and originally comes from "data definition" though.
cdrdao read-cd --read-raw
I have no idea if the resulting iso is "more accurate" than using dd - but they've always worked, even from CDs with copy protection and obscure file systems (old mac CDs).
The other important part is that cat will not have issues with binary data, which I am embarrassed to admit I assumed was not the case for the reasons stated in TFA. pv is the tool I want to use for things like copying images to SD cards but I can never remember the exact syntax needed to get it to show me the percentage since I don’t use it often. dd has a decent syntax for its command parameters and has some fun options like ability to skip, so I can quickly create large empty files.
You're in luck: the "syntax needed" is to literally not use any options:
pv your.img > /dev/sdb
Some pointers: dd has oflags; may be oflags=direct is faster?
You can also use oflags=sparse and sometimes save space by creating a sparse file.
oflag=direct does direct I/O => copied data won't go into the buffercache.
On Linux search for 'O_DIRECT' in the open(2) manpage.
oflag (for output) and iflag (for input) are indeed useful. During/after a massive non-'direct' copy a system running other processes which benefit from data in the buffercache may crawl if the system, while copying, replaces some of it by the copied data, then has to re-read it.
In other terms this seems adequate when copying data which will not be, after the copy, soon read by any process. A raw filesystem image is a good candidate.
As usual YMMV. If most of the data to be copied already is in the buffercache or if it will occupy some unused part of the core memory... such optimization is useless. However in most cases (on most adequately-dimensioned non-idle systems) 'O_DIRECT' induces less systemwide load than cp, cat, pv(...) when copying a large set of data if most of it will not be, then, immediately read by anything.
Other tools (cp, cat, pv...) just cannot easily work in 'O_DIRECT' mode. Using some trick to enable it thank to a local version of openat() and LD_PRELOAD (which calls openat in O_DIRECT mode), albeit possible, isn't realistic in most contexts.
$ cd ~/tmp
$ strace -e openat dd if=/etc/hosts of=useless.tmp count=1 >& nodirect
$ strace -e openat dd if=/etc/hosts of=useless.tmp iflag=direct oflag=direct count=1 >& direct
$ diff direct nodirect
< openat(AT_FDCWD, "/etc/hosts", O_RDONLY|O_DIRECT) = 3
< openat(AT_FDCWD, "useless.tmp", O_WRONLY|O_CREAT|O_TRUNC|O_DIRECT, 0666) = 3
> openat(AT_FDCWD, "/etc/hosts", O_RDONLY) = 3
> openat(AT_FDCWD, "useless.tmp", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
Moreover 'dd' has many options without equivalent in most other readily available tools.
Looks like it was originally written for optical disks, but I suppose it would work for other media: