Hacker News new | comments | show | ask | jobs | submit login
Dd is not a disk writing tool (2015) (vidarholen.net)
359 points by Ianvdl 343 days ago | hide | past | web | favorite | 198 comments

Yes "D" is not for disk/drive/device/...

It comes from the DD (data definition) statement of OS/360 JCL, and hence why dd has the unusual option syntax compared to other unix utils

BTW if you are using dd to write usb drives etc. it's useful to bypass the Linux VM as much as possible to avoid systems stalls, especially with slow devices. You can do that with O_DIRECT. Also dd recently got a progress option, so...

    dd bs=2M if=disk.img of=/dev/sda... status=progress iflag=direct oflag=direct
Note dd is a lower level tool, which is why there are some gotchas when using for higher level operations. I've noted a few at:


You don't have to wait for updates to newer versions to get dd to report its progress. The status line that dd prints when it finishes can also be forced at any point during dd's operation by sending the USR1 or INFO signals to the process. E.g.:

    ps a | grep "\<dd"
    # [...]
    kill -USR1 $YOUR_DD_PID

    pkill -USR1 ^dd
It also doesn't require you to get everything nailed down at the beginning. You've just spent the last 20 seconds waiting and realize you want a status update, but you didn't think to specify the option ahead of time? No problem.

I've thought that dd's behavior could serve as a model for a new standard of interaction. Persistent progress indicators are known to cause performance degradation unless implemented carefully. And reality is, you generally don't need something to constantly report its progress even while you're not looking, anyway.

To figure out the ideal interaction, try modeling it after the conversation you'd have if you were talking to a person instead of your shell:

"Hey, how much longer is it going to take to flash that image?"

The way dd works is close to this scenario.

Yes this is true. Note BSD supports this better with Ctrl-T to generate SIGINFO, which one can send to any command even if not supported, in which case it's ignored. Using kill on linux, and having that kill processes by default is decidedly more awkward.

It's also worth noting the separate "progress" project which can be used to give the progress of running file based utilities.

We generally have pushed back on adding progress to each of the coreutils for these reasons, but the low overhead of implementation and high overlap with existing options was deemed enough to warrant adding this to dd

Also a gotcha on BSD (FBSD at least): SIGUSR1 kills dd.

Ctrl-T for SIGINFO is pretty useful, it would be good if Linux could pick this up.

Been waiting for years, frankly. I guess if I was motivated enough and had the time, I could do the research and submit patches..

They don't want it. Even if you got the patches accepted to the kernel they'd never accept patches to GNU Coreutils to support it.

The most recent comment I've seen on this is lukewarm at best: http://lkml.iu.edu/hypermail/linux/kernel/1411.0/03374.html

> in which case it's ignored

Ignored by the application, sure. But FreeBSD always prints useful stuff like load, current command, its pid and state:

$ dd if=/dev/random of=/dev/null

load: 0.72 cmd: dd 5820 [running] 0.70r 0.02u 0.68s 6% 2008k

263276+0 records in

263276+0 records out

134797312 bytes transferred in 0.708372 secs (190291809 bytes/sec)

(here, the "load:..." line is from the system, and the other 3 lines are from dd)

Yes! I'm thinking of building something like this for my neural net training (1-2 days on AWS, 16 GPUs/processes on the job). In this case the "state" that I'd like to access is all the parameters of the model and training history, so I'm thinking I'll probably store an mmapped file so I can use other processes to poke at it while it's running. That way I can decouple the write-test-debug loops for the training code and the viz code.

> I'm thinking I'll probably store an mmapped file so I can use other processes to poke at it while it's running.

That seems to run substantial risk of seeing it in an inconsistent state, yeah?

I generally use a semaphore when I'm reading and writing from my shm'd things. The data structure will also likely be append-only for the training process, as I want to see how things are changing over time.

Also I meant shm'd, not mmap'd.

I am knew to the shared memory concept. I am familiar with named pipes. Could you please elaborate a bit, I'm curios.

Are you passing the reference to an mmap adress, or using the shm systemcalls? In what language are you programming in? Does race conditions endanger the shared memory? If so, how does using semaphores help?

Sorry, if I asked a lot of questions, feel free to answer any/none of them :)

Sure! SHM is really cool, I just found out about it. It's an old Posix functionality, so people should use it more!

I'm using shm system calls in Python. Basically I get a buffer of raw bytes of a fixed size that is referred to by a key. When I have multiple processes running I just have to pass that key between them and they get access to that buffer of bytes to read and write.

On each iteration first I wait until the semaphore is free and then I lock it (P). That prevents anyone else from accessing the shared memory. I have the process read from the shared memory a set of variables - I have little helper functions that serialize and deserialize numpy arrays into raw bytes using fixed shapes and dtypes. Those arrays are then updated using some function combining the output of the process and the current value of the array. Then those arrays are reserialized and written back to the shm buffer as raw bytes again. Finally, the process releases the semaphore using P() so other processes can access it. The purpose of the semaphore is to prevent reading the arrays while another process is writing them - otherwise you might get interleaved old and new data from a given update. In a process-wise sense there is a race-condition, as each process can update at different times or in a different order, but for my purposes this is acceptable since neural net training is a stochastic sort of thing and it shouldn't care too much.

[0] http://nikitathespider.com/python/shm/ - original library which works fine for me

[1] http://semanchuk.com/philip/PythonIpc/ - updated version

>I've thought that dd's behavior could serve as a model for a new standard of interaction. Persistent progress indicators are known to cause performance degradation unless implemented carefully. And reality is, you generally don't need something to constantly report its progress even while you're not looking, anyway.

Progress bars by default are also garbage if you are scripting and want to just log results. ffmpeg is terrible for this.

> Persistent progress indicators are known to cause performance degradation unless implemented carefully.

Are you referring to that npm progress bar thing a few months back? I'm pretty sure the reason for that can be summed up as "javascript, and web developers".

Anyway, he's not proposing progress bars by default, he's proposing a method by which you can query a process to see how far it's come. I think there's even a key combination to do this on FreeBSD.

Or, for example, you could write a small program that sends a USR1 signal every 5 seconds, splitting out the responsibility of managing a progress bar:

% progress cp bigfile /tmp/

And then the 'progress' program would draw you a text progress bar, or even pop up an X window with a progress bar.

That's great! I think due to the way it's implemented it wouldn't be able to do progress reporting for e.g. "dd if=/dev/zero of=bigfile bs=1M count=2048", but that's a less common case than just cp'ing a big "regular" file.

Yes, C-t for SIGINFO, works on all BSDs (including macOS).

I've always used pv to get progress from dd, or other pipes:

  pv image.img | dd of=/dev/rdisk2 bs=1M
This adds another pipe though. I don't know the effect this has on performance.

On OS X (and BSD?) be sure to use /dev/rdisk[0-9]+ instead of /dev/disk[0-9]+.

Details as to exactly why it's faster are welcome. (I just know it bypasses stuff).

EDIT: someone mentioned this below http://superuser.com/questions/631592/why-is-dev-rdisk-about...

In FreeBSD, cached/block disk devices are long gone: https://www.freebsd.org/doc/en/books/arch-handbook/driverbas... so all disk devices in /dev are implicitly O_DIRECT.

Though, read cache can be enabled manually by creating separate device via gcache(8). This is usually not required, because caching is done at the filesystem layer.

It's important to specify block size for uncached devices, of course. dd(1) with bs= option will surely work, and with cp(1) your mileage may vary, depending on whether underlying disk driver supports I/O with partial sector size or not.

Ah wish I knew this last week. Writing Xubuntu to my USB took something like 2900 seconds from a Mac.

Usually just specifying a reasonable blocksize works for me. bs=1m or so.

Without that it does literally take hours.

I suspect the default blocksize is really small (1?) and combined with uncached/unbuffered writes to slower devices, it just kills all performance outright.

Edit: answered! https://news.ycombinator.com/item?id=13350002

Per the sibling comments, you just need to specify a sane block size. dd's default is really low and if you experiment a bit with 2M or around that you'll get near-theoretical throughput.

NB: Remember the units! Without the units you specify it as bytes or something insanely small like that. I've made that mistake more than once!

In other words, about 48 minutes for a ~1.2 GB file?

About 3mbit/second, or 400kbytes a second. I'd expect something 50-100 times faster.

"Yes "D" is not for disk/drive/device/..."

But that's the very beauty of unix!

If you can find a way to use 'dd' for disk/drive/device you can use it in interesting new manners (pipelines, etc.) and have very good confidence that it won't break in weird ways. It will do the small, simple thing it is supposed to do even if you are abusing it horribly.

Like this, for instance:

  pg_dump -U postgres db | ssh user@rsync.net "dd of=db_dump"

Is there a benefit to use dd over cat in this case?

You could use it to rate limit... or arbitrarily set block sizes per use case. I've used it for the former when doing 'over the wire' backups through ssh

Thanks for the tips.

Clueless noob here . . . most guides I've seen use bs=1M for writing e.g. a Linux installer to a USB drive. Does 1 MB vs 2 MB change anything?

The setting controls the block size. When writing to block devices, you can maximize throughput by tuning the block size for the filesystem, architecture, and specific disk drive in use. You can tune it by benchmarks and searching over various multiples of 512K block sizes.

For most modern systems, 1MB is a reasonable place to start. Even as high as 4MB can work well.

The block size can make a major difference in terms of sustained write speed due to reduced overhead in system calls and saturation of the disk interface.

A similar thing happens when writing to sockets where lots of small messages kill throughput, but they can decrease latency for a system that passes a high volume of small control messages.

>it's useful to bypass the Linux VM as much as possible to avoid systems stalls

Oh man, I didn't even know that was the cause of these problems.

Nice! Thanks. Built-in status and that Linux bypass trick are beautiful.

Hasn't the ability to check progress be around forever? You could always send it SIGUSR1 and get back a progress report on stderr.

Sure it's been around for a while, GNU version on linux at least. Personally found pipe viewer (pv) quite handy too https://www.ivarch.com/programs/pv.shtml - available in most distros

Yup. Just don't do what I did earlier and `pkill -USR1 -f dd` if your desktop session is currently being provided to you courtesy of `sddm` …

As a programmer, it usually pays off to be on the lazy side, but every once in a while it comes back and bites me in the arse ;)

Oh? I had read that its name was originally "copy and convert" but `cc` was already taken by the compiler

Yeah, this article is wrong. Have you ever noticed the syntax for DD is unusual? It is set up more like a JCL syntax.

I thought it stands for copy and convert but cc was taken bu the c compiler.

Ask a graybeard.

Thank you for the historical information.

Most of the time it's much better (as in, faster) to just use cat (or pv, to get a nice progress bar) for writing a file to a block device, because it streams, and lets underlying heuristics worry about block sizes and whatnot.


  cat foobar.img > /dev/sdi
will stream the file rather than what dd does, i.e. read block, write block, read block, write block and so on.

Usually I also lower the vm.dirty_bytes and vm.dirty_background_bytes to 16 resp. 48 MB (in bytes) beforehand, which limits the buffer sizes to those amounts. Else it will seem that the progress bar indicates 300MB/s is written, and when it completes you still have to wait a really long time for things to have been written out. Afterwards I restore back vm.dirty_ratio and vm.dirty_background_ratio to respectively 10 and 5 - the defaults on my system.

I wish that all of those projects, tutorials etc. that explain how to write their image to a block device, like an sdcard, would start advise using cat, because there is no reason to use dd, it's just something that people stick with because others do it too.

I only use dd for specific blocks, like writing back a backup of the mbr, or as a rudimentary hex editor.

> So:

> cat foobar.img > /dev/sdi

> will stream the file rather than what dd does,

Sorry, that's not quite right. `cat` (and your shell, presumably bash) does the same fundamental thing as `dd`, ie, read block, write block. There's not really an underlying 'stream' primitive that `cat` (or `bash`, as you're using redirection to write the file) is using compared to `dd`

What `cat` does do, is it does a better job of trying to find an optimal blocksize than a naive `dd` call does. `dd` simply defaults to a 512-byte block size, which is just inherited from history when 512-bytes was the alignment for, well, everything.

There are numerous optimizations upon the fundamental read-block, write-block primitives to make it go faster (`cat` makes use of some of these). The linux kernel actually has a "stream file to socket" syscall to avoid the copy to user-land and back to "stream" a file out to the network, but that's not happening here, and there's still reading and writing of blocks in the kernel happening.

See also: http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f...

That's still not quite right.

Once cat is spawned, the shell gets out of the way, and has nothing to do with it. (Exception: zsh will "helpfully" insert itself with `tee`-like operation in certain situations.)

What `cat` doesn't do is finding an optimal block size for writes. It does a good job of finding optimal block size for reads, but not for writes. Many older block devices perform very poorly when the write block size is not optimal.

The fact that zsh is a parent doesn't necessarily hurt performance. The parent and child can share the stdin fd, and the child could be the only one reading from it.

That's not what I was referring to. Though I should have been more clear: I don't believe that zsh inserts itself in the mentioned situation; just that it does insert itself in some situations, and that makes it an exception to my general "once the process is spawned, the calling shell doesn't matter" statement.

As an example of what I was referring to, in zsh, this

    a >&2 | b
is equivalent to Bourne shell:

    a | tee /dev/stderr | b
except that `tee` is implemented as part of zsh, rather than a separate program (for no difference in performance).

That is, zsh inserted itself into the middle of the pipeline. There are two pipes instead of one; zsh/tee reads from the a pipe, and then writes that data to the b pipe (and stderr). This does hurt performance.

This doesn't seem to be the case for me

    -> % python test.py 2>&1 | python test.py
    python  24150 someone    0u   CHR  136,0      0t0        3 /dev/pts/0
    python  24150 someone    1w  FIFO    0,8      0t0 72772851 pipe
    python  24150 someone    2w  FIFO    0,8      0t0 72772851 pipe
    python  24150 someone    3w   CHR    5,0      0t0     1049 /dev/tty
    python  24151 someone    0r  FIFO    0,8      0t0 72772851 pipe
    python  24151 someone    1u   CHR  136,0      0t0        3 /dev/pts/0
    python  24151 someone    2u   CHR  136,0      0t0        3 /dev/pts/0
    python  24151 someone    3w   CHR    5,0      0t0     1049 /dev/tty
    someone@arch-two [01:29:00] [~]
    -> % echo $ZSH_VERSION
    someone@arch-two [01:29:16] [~]
    -> % cat test.py
    import os
    import sys
    import subprocess

    with open("/dev/tty", "w") as f:
        subprocess.check_call(["lsof", "-p", str(os.getpid())], stdout=f)

Edit: Ok so I misread your comment and you were talking about >&2. Which is weird because it actually has different behaviour in bash vs zsh. Having said I can't think why you would want the bash behaviour, although the bash behaviour does seem more consistent.

In the particular example I used, the Bourne shell behavior probably has no practical use (but I can write code of no practical use in many languages).

It's not particularly surprising that the zsh behavior is different than bash; zsh redirections are only Bourne shell-like in simple cases.

I dislike zsh's behavior because it isn't what the user typed. It should be simple to look at a command line and see how many pipes it makes; see the processes that will be involved in the pipeline. Zsh's behavior seems to me to be clever and implicit; I'd much rather have dumb and explicit.

> I wish that all of those projects, tutorials etc. that explain how to write their image to a block device, like an sdcard, would start advise using cat, because there is no reason to use dd, it's just something that people stick with because others do it too.

I'd wondered whether dd or cat were faster, and indeed cat is faster, but not by much. Also, for some embedded devices, you have to write to specific offsets, so dd is more convenient and explicit. Lastly, cat composes poorly with sudo.

  $ sudo dd if=foobar.img of=/dev/sdi # works
  $ sudo cat foobar.img > /dev/sdi    # fails unless root b/c redirection is done by shell
> I only use dd for specific blocks, like writing back a backup of the mbr, or as a rudimentary hex editor.

xxd / xxd -r is much nicer, but I suppose sometimes vim is not available...

> I'd wondered whether dd or cat were faster, and indeed cat is faster, but not by much.

That performance difference often comes from block size; "dd bs=1M" typically runs much faster than the default block size of 512 bytes.

> xxd / xxd -r is much nicer, but I suppose sometimes vim is not available

`od` is ubiquitous--it's POSIX and a requirement of the Single UNIX Specification


    sudo (cat foobar.img > /dev/sdi)

No, you should use

  sudo sh -c "cat foobar.img > /dev/sdi"

  sudo -s "cat foobar.img > /dev/sdi"

I was originally going to put that in my examples, but opted to leave it out, because with sh -c you have to think about escaping special characters. Most of the time it doesn't matter, but when running commands as root you ought to be absolutely sure.

What about using a here-doc (as in http://stackoverflow.com/a/16514624/2954547)?

    sudo sh <<EOF
    cat foobar.img > /dev/sdi

No, unless you have a shell where sudo is a built-in, and I don't know of any.

This is actually impossible (at least with UNIX shells), as shell builtins cannot have the SUID bit set - and your shellitself also shouldn't ;)

You could have a shell with a "sudo" builtin that knew how to invoke a separate "sudo" program with the right shell syntax and quoting, such that "sudo somecommand > /path/to/root-writable-file" did the right thing.

Not really - to sudo, your shell would have to be setuid - and constantly fork stuff as you to get user permissions. Alternatively your shell could maintain a separate process for privileged access, but that puts a whole lot of your security on the assumption that your shell has no bugs that might allow escalation.

In short, you could do it, but it'd be ripped out of every server that's been hardened, and for users that don't want to care - they're just running 'sudo su' anyhow.

Speaking only for myself, the thought of my shell having a magical escalation process would scare the bejeezus out of me - and I'm supposed to have root on our boxes!

The shell wouldn't need to be setuid if it just performed a syntax transformation and then called the existing sudo binary.

Exactly. The shell would just treat "sudo" as a builtin prefix similar to "time", but would then run the real "sudo" with an appropriate shell invocation and proper quoting.

I finally got home to try it, and indeed it does fail (Zsh).

    zsh: parse error near `)'
Is it just ignoring the parentheses? Does the redirection "take precedence" over parentheses and brackets?

one could use:

cat foobar.img | sudo tee /dev/sdi > /dev/null

I've used that to write short files (such as settings in /sys or /proc), but for large files, tee has the disadvantage of writing everything twice, and the pipe adds another write and read of every byte.

I just realized this too and tried to close stdout instead. tee complains about it but goes on with its business:

cat foobar.img | sudo tee /dev/sdi >&-

Why's everyone so against dd? :-) If you're going to use tee, might as well not bother with cat at all[0]:

  $ sudo tee /dev/sdi >&- < foobar.img
Or better yet, pv[1]; you even get a progress bar that way!

[0] https://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_...

[1] http://www.ivarch.com/programs/pv.shtml

Thanks for the reminder :) I do fall quite often for the useless use of cat. But most of the time i also do not really care about it much. I did omit pv in believe tee will always be available but it is great and absolutely preferred when available.

Sudo is anachronistic anyways.

Laptops don't need to follow multiuser best practices, and frankly a complex root password offers little.

    sudo (cat foobar.img > /dev/sdi)

That is a syntax error in bash.

Learn something new every day. Thanks.

This is all wrong.

I am not even sure what you mean by stream or how it would be any difference than "read", "write", "read", "write". Because that is literally what cat does, and so does dd.

Many people feel cat is faster than dd because dd's default block size is 512 bytes as you can see from a simple dd command.

  strace dd if=10M of=junk 

  read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
  write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
  read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
  write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
  read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
  write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512

  strace cat 10M > junk2

  read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
  write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
  read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
  write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
  read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
  write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
So when benchmarking dd vs cat make sure you are comparing apples to apples.

The beauty of dd is you don't have to tinker with things like vm-.dirty_bytes and vm.dirty_background_bytes before using a command. Those should not be messed with on most systems and should definitely not be messed with for the sake of running a single command. When you use cat, it makes some decisions for you. Most of the time makes good choices but it happens to make not so optimal choices for working with large files being written to other devices or file systems.

To avoid the pitfalls of using cat to copy image files (and not have to change global system settings) you use dd. With dd you can use O_DIRECT to bypass the VFS caching layer so your vm.dirty_* settings are ignored in general. You can also specify the block size optimal for the device you are writing to and even reading from.

While dd may not stand for disk or drive or its name have anything to do with disk it is a powerful tool that is the appropriate tool to use when working with large files, precise movement or data and yes copying image files to devices.

Almost, but not quite. You really want buffer in there to smooth out IO time variances.

    <foo.img pv -cNsrc | buffer -p75 -s256k -b128 | pv -cNdst >/dev/whatever
buffer lets the rest of the pipeline proceed while pv is blocked writing to the output file. Yes, conventional filesystem readahead helps to some extent, but IME, not enough, especially when foo.img is a block device itself.

Holy moly. I have spent a lot of years typing Unix commands, but it never occurred to me to put the input redirection first. But that's much more pipeline-ish, so I like it a lot.

"Legend has it that the intended name, “cc”, was taken by the C compiler, so the letters were shifted by one to give “dd”."

Legend is wrong, this clearly derived from the mainframe JCL DD command. This is also why the syntax is so non unix-like.


I've seen people try to use dd to clone a bad disk before tossing it or attempting recovery - while dd is not the right tool for the job, ddrescue [0] is.

ddrescue gives you options for error handling and will skip past bad blocks, it handles read errors much more gracefully.

[0]: https://www.gnu.org/software/ddrescue/

The other cool thing ddrescue does is will 'mine' away at the failing blocks by reading it from both forward and reverse and clawing out every last bit of data it can. That, coupled with the resumption capacity, logging feature and ability to specify the number of retries on bad blocks, makes it a very useful tool and I've used it to save both my drives and aquainteances' drives many times over the years.

I can't sing the praises of ddrescue enough. NB: ddrescue and dd_rescue are not the same!

ddrescue and gddrescue also have restartable mode for failing drives, very very handy feature.

I've recovered numerous otherwise unmountable Mac/Windows drives using ddrescue, mmls, sleuthkit, and foremost.

Usually I'm able to lsblk to determine the block device, rip the data partition with ddrescue and mount it loopback with mount once I use mmls to confirm the partition type.

For Mac volumes that dont mount, either fsck.hfsplus or get the file to a Mac to run diskwarrior (Alsoft has been repairing HFS volumes for a couple decades).

If nothing else works once you've got the raw bits saved, foremost will usually scrape something worthwhile off your image file.

Agreed,using dd on media with flaws is not straight forward.

See http://www.noah.org/wiki/Dd_-_Destroyer_of_Disks

Breaking your silence! Are you Mr. Spurrier? Ciao.

You can specify options with dd to handle errors as well, such as padding unreadable blocks on the destination.

If you use block multiples on media with defects to speed things the good data fills the buffer first, then the remainder is zero padded for the blocks in error. This shifts good data relative to its original position and most likely further corrupts your attempts at recovery - BTDT.

This page explains the challenge - https://wiki.archlinux.org/index.php/disk_cloning

There are likely a number of reasons for using dd in tutorials designed for beginners:

- it is available by default on all Unix systems

- it distinguishes between input and output (i.e. if= and of=)

- it reports results

- it avoids using a common command for a dangerous operation that the user may not understand

dd also has another benefit: the ability to select a range of blocks to copy from and to. This isn't the most common scenario, but it certainly pops up on some devices.

Sometimes, it seems like people try to outsmart common knowledge, and it's fine to present new ideas, but one doesn't have to call what most know works and use inappropriate for the tasks they use it to perform.

It reports results horribly. Nothing like copying 1+TB of data and having your SIGINFO signals print nothing to the screen.

> This belief can make simple tasks complicated. How do you combine dd with gzip? How do you use pv if the source is raw device? How do you dd over ssh?

These operations are not that complicated. Behold the magic of UNIX pipes:

dd if=/dev/sdb | pv | gzip -c | ssh name@host "gzip -dc | dd of=/dev/sdc"

It's even easier once you realize that dd is not required to read/write disks:

pv /dev/sdb | gzip -c | ssh name@host "gzip -dc > /dev/sdc"

Not only is this shorter with less cargo cult steps, it also gives a much better `pv` output now that it can determine the size and show a percentage.

Not that parent poster's example is better than yours, since it also doesn't do it, but I believe the biggest strength of dd over cat or redirection, is that you can specify block size. With right block size you will get far better performance and reduce wear of the device when writing block by click than when writing data byte by byte.

The key though is to be aware of block size, and use multiply of it.

It's not quite as simple as that when you're dealing with pipelines. For example:

    cat /dev/zero | strace dd bs=1M of=/dev/null
will show that you're not actually writing 1M blocks. You're just reading whatever `cat` outputs (128KiB blocks on my system), so all you've done is an unnecessary pipe and copy. You would have been better off just redirecting.


    dd if=/dev/zero bs=1M | strace gzip -1 > /dev/null
will show you that `gzip` will not start processing larger blocks. You're just adding an additional copy. Linux already has readahead, so I don't know if you'll ever see a benefit to this.

On my system, adding dd with higher block size on both sides of a gzip -1 pipeline just ends up being slower.

That's because bs means "up to" so if input comes at smaller blocks it will write in those. You should use obs, and preferably specify ibs to being multiple of obs.

I omitted block size to simplify it.

That said, what you say is correct on writes. On reads, specifying block size is mostly to reduce syscall overhead because readahead will prefetch data to accelerate things.

You don't need any of those gzip options.

pv /dev/sdb | gzip | ssh name@host "gunzip > /dev/sdc"

you dont need gzip at all, ssh offers compression too:

pv /dev/sdb | ssh -C name@host "cat > /dev/sdc"

Then I would not have been able to demonstrate how to use the gzip command with dd.

+1 for using pv, available in the AUR

Another useful tip is `curl https://...iso | sudo dd of=/dev/sdb` if you don't have enough disk space to hold the ISO. And sometimes internet speeds are faster than disk speeds anyway.

+ "On UNIX, the adage goes, everything is a file."

- not all the things on unix are abstracted as files (or 'byte streams' to be more accurate). however, i/o resources and some ipc facilities are defined so. an operating system provides many other abstractions in addition to these such as processes, threads, non-stream devices, concurrency and synchronization primitives, etc.; thus it's absolutely wrong to say that everything is a file on unix.

Most notably: network sockets do not have file-like semantics, mainly because they were introduced as a concept and implemented long after the system was designed. Plan9 is an effort to revise all system objects to be accessible with the same file-like fopen/fclose system calls.

"Everything is a file on UNIX" is as true as "Vulcan's cannot lie."

Great analogy.

I assume Vulcans can lie?

Technically they cannot lie. But they can exaggerate. Or omit facts.

Only one ipc primitive (fifo) can be used with man 2 read or write. Sysv ipc cannot, sockets (any domain) cannot (unless there is a kernel interface via /proc or another pseudo fs). What is meant by this saying (and the way it is has been used in my experience) is that everything in unix _looks_ like a file. That is usually meant in reference to the C API which has some r[ec]v|read|write|snd type namespace.

You can read(2) and write(2) to a socket, although it is more normal to recv and send on it. On Linux, you can also (and pretty much MUST) read(2) and write(2) to an eventfd; you can read(2) from a timerfd. That said, the semantics of the latter are considerably different from that of fifos or sockets.

You are right. I've long forgotten about read() and write() compat with sockets for very good reason. I don't lump an eventfd into the same IPC category as those mentioned in parent for the semantic(s) reason you mention and also for it's intended usage.

Could this provide an example please?

Only for those like me, who are wondering...^^

The unique advantage of dd over cat, cp and friends: you can specify the block size.

Just try (on OS X) dd if=disk.img of=/dev/disk1... first speedup is gained by using rdisk1, but the real improvement comes with the bs=1m. 2 vs 16 vs 300 MB/s on my machine, when cloning via USB-SATA adapter.

Funny you should mention OS X. Something I learned only yesterday is that if you're using dd to write an .img file to an SD card (may apply to other disk types as well I imagine), using /dev/rdisk devices instead of /dev/disk devices can be much faster; in my case using the /dev/rdisk device wrote to the SD card at nearly 20 MB/sec, vs the 2 MB/sec speed I got when using /dev/disk.

See the second answer here as to why: http://superuser.com/questions/631592/why-is-dev-rdisk-about...

I always wondered: what determines the optimal block size, and how can I know?

In theory the native block size (512 bytes for most drives these days) should be the fastest, but the problem is that if you're doing such small sized I/O, you introduce a shitload of overhead for all the individual r/w calls - I guess that a huge blocksize benefits from DMA and look-ahead reads.

For some reason I cannot reply to your message directly, only to a parent.

Flashbench [1] should be able to tell you what the erase-block and page size is.

[1] https://github.com/bradfa/flashbench

> native block size (512 bytes for most drives these days)

Not anymore. Advanced Format (4096-byte sector) hard drives have taken the market like a storm, and SSDs benefit even more from using larger I/O sizes (because erase sectors are way larger).

Ah, I thought it was the other way round. Thanks for the information.

Do you know a way to get an SSD's native erase sector size?

I would guess that using erase sector size as a block size won't help, because SSD controller anyway rearranges blocks all the time.

There is no simple answer to this. Stack Overflow has some hints but no definite answer.

Isn't it: the larger the better until you face diminishing returns as with small block sizes the system call overhead becomes noticeable?

I've always used bs=4M for writing .iso or .img files to USB flash drives, as it gives me the best times. This is on Linux, OpenBSD, and macOS (using /dev/rdiskn for the latter two).

Which theory suggests that this should be the fastest?

The optimal block size is probably the the amount of data which can be transferred with one DMA operation.

For NVMe disks on Linux, you can find out this size with the nvme-cli [0] tool. Use "nvme id-ctrl" to find the Maximum Data Transfer Size (MDTS) in disk (LBA) blocks and "nvme id-ns" to find the LBA Data Size (LBADS). The value is then 2^MDTS * 2^LBADS byte.

For example, the Intel SSD 450 can transfer 32 blocks of 4096 byte per NVMe command, so you'd want a block size of 128 kiB.

[0] https://github.com/linux-nvme/nvme-cli

On FreeBSD, use diskinfo -v.

Also check out the BUGS section of the manpage :) https://www.freebsd.org/cgi/man.cgi?diskinfo

stat(2) the file, and use the value from the st_blksize member of struct stat.

Nononono, that's waaaayy to small. This will be something like 512B or 4K, meaning you'll burn the CPU on syscalls instead of doing meaningful work.

Even the 32K read/write size used by many utilities (shells, XYZsum, rsync and so on) can slow things down with modern/fast IO devices.

Today you'll want to use something like 32K to 2MB. Doesn't really matter around there.

If you're writing synchronously (which you shouldn't), then it becomes a tad more difficult to figure out for optimal performance.

Sync writes have a place, if I'm copying a disk image to a flash drive I typically want to know it's done when dd finishes so I can yank the drive without worrying if the write cache has been flushed.

At least a couple years ago, it was said one should not rip out USB sticks without ejecting them before, because their controller might do invisible maintenance work and that might lead to data corruption...

Hmm. Microsoft doesn't agree: https://i.imgur.com/JIZveQz.png

"Disables write caching […] you can disconnect the device safely without [unmounting it]"

Broadly speaking, flash GC/management journaling is the problem of the controller and is usually not the most broken part of it.

$ sync; sync; sync


More importantly: dd is not a benchmarking tool. I can't count how many times people have complained about dd being slow on a distributed filesystem. Well, yeah. When you're writing only one block at a time with no concurrency/parallelism at all, over a network interface that has much higher latency than your disk interface, of course it's slow. When you're using tiny block sizes, as about 80% of these people do, the effect is only magnified. Unless your use case for a distributed system is a single user who doesn't even multitask, use iozone or fio instead.

Sure it is. It tells you exactly what performance a single threaded application can expect with serial read()s and write()s (and whatever options you choose to invoke dd with) against whatever file source and constraints/conditions are extant at the time. Perfectly valid.

OK, so it tells you how a very poorly written application - one which probably shouldn't be running on such a system in the first place - will perform. And it's useless for everyone else. Here, have an internet point.

It also tells you how a bunch of common tools will fare against your wonderfully designed solution that is only 'truly' testable by sophisticated methods.

Thanks for whatever it is that you gave me. I don't keep up with nomenclature these days.

Let me try to be a bit clearer. Yes, dd will tell you how one instance of a common tool might perform. That's one piece of information, but very likely the least interesting piece of information for most use cases. It would be far more useful to know how performance changes as you run many instances of those same common tools simultaneously, or what kind of performance a well written application can achieve. Iozone or fio can give you all of the answers dd would have, and many more answers besides.

Using dd in this role and then complaining about the result before running any other kind of test is a waste of everyone's time. No filesystem, even local, is optimized for that kind of performance. You do know the difference between performance and scalability, don't you? People who evaluate server-oriented systems in 2017 based on a methodology more appropriate for a 1997 desktop are doomed to fail. In everything.


Also, seriously people, use the bs flag on dd.

I came from Solaris land originally and it always surprises me how seldom people use filesystem level utilities to copy such stuff, i.e. ufsdump | ufsrestore or dump | restore

Works a treat and using the fs level tool you know everything will be properly copied, much safer.

Obviously I'm not talking about raw devices, but people copying disks which often don't have that many filesystems on them.

Can we have a single-purpose tool for getting the text between two delimiter strings?

I know it's possible with regex, but given how frequently that parsing logic is needed, and the difficulty of getting sed right, I think a "tb" tool would be very helpful.

At least for delimiter characters, that's what cut is.

Unless those characters are Unicode (unpatched GNU cut).

Does cut(1) work for you?

cut only supports single ASCII characters as delimiters.

Reading a value from "<td>2017-01-09 <b>08:30</b></td>" is harder than it should be.

cut needs single-character delimiters, so only splitting on "<" or ">" won't work.

sed trips up on the "/", ":", and "-" without proper escaping.

This is before even mentioning Unicode. High-level scripting languages handle all of these just fine. I'd much rather have a standard tool and library for this purpose though.

There's loads of things you can easily do with tools and I'm sure someone will come up with something, but sometimes it's just easier to use Python:

    cat dates | python -c 'import sys; for line in sys.stdin: print(line.split("<td>"))'

    php -r 'foreach (explode("\n", file_get_contents("dates")) as $line) { print_r(explode("<td>", $line)); }'
Or with Perl or Ruby or whatever floats your boat.

While writing this I did think of something though, depending on what you're actually trying to do it might work:

    cat dates | tr "<>" "  " | awk '{print $yourColumnOfInterest}'

As I say, "High-level scripting languages handle all of these just fine".

Back to the article. dd is not a disk writing tool. Everything on UNIX is a disk writing tool. dd has one job, and does it right.

Again, I propose a single-purpose text delimiter cutting program. It doesn't need to be a whole scripting language, e.g. awk, perl, python, etc. Just get the text between two strings, and have sensible syntax.

Maybe use awk?

  $ echo '<td>2017-01-09 <b>08:30</b></td>' | awk -F'[<>]' '{print $5}'
This will give you '08:30'. Unicode seems to be also supported:

  $ echo 'λf. (λx. f (x x)) (λx. f (x x))' | awk -F'[λ.]' '{print $4}'
This prints 'x'. For clarity some other matches:

  $1: ''
  $2: 'f'
  $3: ' ('
  $4: 'x'

Fine, until the next row doesn't have <b></b> tags. So the $5 no longer matches where it should.

I'm just suggesting easier syntax, and a custom tool. Like dd is for disk copying, instead of cp/gzip/etc. A simple tool that will just work.

This is the syntax I'm going for: textBetween("<td>", "</td>") | replace ("<b>", "") | replace ("</b>", "")

Maybe with some magic strings,

e.g. textBetween(":", "end")

Yeah you are right, but if the input is more sophisticated I would probably use a scripting language and its full-fledged XML parser.

Your replace part is already covered by sed (although I agree with you that the escaping can be awkward).

Hello! I'm the author of this article.

Sorry for the confusion: dd is still a very useful tool for copying disks.

The point is that you should not feel like you have to shoehorn dd into any command dealing with disks, because only dd is somehow "raw" or "low level" enough to access them.

For example, if you have a command like this:

    pv file1 | gzip | ssh host "gzip -d > file2"
and you want to make it work with disks, just replace file1 and file2 with /dev/yourdisks and it's fine.

Yes, but you can't set block size to n bytes in cat.

Also, it's useful to have in the back of your mind that dd can very easily mean Disk Destroyer, specially because of it's sui generis syntax

On the other hand, cat's block size can increase over time to remain reasonable. When I first became aware of it, it was 4 KiB. Now strace shows my GNU cat using 128 KiB.

If you're building disk images with blank space in them (say, for an 8GB EMMC, but your root is only 2GB) you may want to use bmap-tools [1] [2]

This way only actual data is written to the device, blocks of zeros can be skipped.

1: https://lwn.net/Articles/563355/ 2: https://github.com/01org/bmap-tools


    $ man dd
           dd - convert and copy a file

           dd [OPERAND]...
           dd OPTION

           Copy a file, converting and formatting according to the operands.
More at http://man7.org/linux/man-pages/man1/dd.1.html

See the part after "Each CONV symbol may be:"

It may not be a disk writing tool, but it works wonders for swapping from HDD to SDD without having windows freak out and decide you are stealing it.


My favorite thing to do with 'dd' is to break up multi-gigabyte log files into, say, 500MB chunks, so I can easily view and search them in XEmacs (this is 'csh' syntax as I use 'tcsh'):

  foreach i (0 1 2 3 [...])
  dd <big.log >big.log.$i bs=500m count=1 skip=$i
(XEmacs is very fast at reading large files but has a 1GB limit.)

How about using

    split -db 500M big.log big.log.

Oh wow, I never knew about that. Thanks!

Reminds me of "find", which is another tool that is mostly just used for its most boring application.

What are some of its more interesting applications?

If you need to do anything on a list of files/dirs, but they are not in the same place, or not easily filterable (e.g., all sub directories "/Fillion" but not the ones that start with "Nathan") you can use find to prefilter everything, then hand it over via xargs (if you want all to be handled by one instance of the action program) or with "-exec" (if you want each instance to be handled by a separate instance by the action program).

Also check out "find tutorial" articles like this one: http://www.grymoire.com/Unix/Find.html (e.g., did you know you can filter by size?)

How about "awk"? It's an entire programming language, but it's only ever used to get single fields.

AWK is also awesome. AWK is in my category of languages you should learn even if you never use it (but you will for sure), because it enables another kind of thinking.

> It’s unique in its ability to issue seeks and reads of specific lengths, which enables a whole world of shell scripts that have no business being shell scripts. Want to simulate a lseek+execve? Use dd!

How would one simulate a call to execve with dd? Seems like a totally different problem domain.

There's a common idiom of skipping a file header and handing off processing to some other program, like this:

    cat foo | ( dd bs=$HEADERSIZE skip=1 of=/dev/null; process-foo-contents )

Isn't this the same as:

    tail -c +$HEADERSIZE <foo

The dd idiom can be used to split a file into parts with known block size, something like

    (dd bs=$SIZE1 count=1 of=file1; dd bs=$SIZE2 ...

Cool, although this doesn't really help with my most common-case, skipping the first line.

    (head -1 > /dev/null; cat -) < file

This is undefined behavior: while `head -1` will only output a single line, it may read more.

It happens to work on GNU head when stdin is seekable file, because GNU head specifically rewinds the stream before exiting:

    $ (strace  -e read,write,lseek head -1 > /dev/null; cat -) < file
    read(0, "hello\nworld\n", 8192)         = 12
    lseek(0, -6, SEEK_CUR)                  = 6    # <-- here
    write(1, "hello\n", 6)                  = 6
    +++ exited with 0 +++
If not for that explicit `lseek`, `head -1` would have skipped the entire 8k buffer.

As far as I know, this is exclusive to GNU cat. Neither Busybox nor OSX cat will do this, and will therefore throw away an entire buffer instead of just the first line. You can try it out:

(busybox head -1 > /dev/null; cat -) < file

Interesting. Is this true of `tail -n +2` as well? (On mobile, can't test at the moment).

Tail employs a large read buffer as well, but it does not matter because you wouldn't use it in the same manner.

Tail is the right tool for the job here. But if you wish to stick with your idiom, read will reliably consume a single line of input, regardless of how it is implemented:

  (read -r; cat) < file

If anyone is wondering why `read` can reliably read a single line while head can't, it's because it reads byte by byte.

This is just as inefficient as it sounds, but it doesn't matter much in practice since you rarely read a lot with it.

That always reads to eof, so it can't be used in the same way.

Unintuitively, tail is the utility you want:

  $ tail -n +2 file
From tail(1):

  -n, --lines=K
    output the last K lines, instead of the last 10; or use -n +K to output lines starting with the Kth

Yeah, I know about that, I just prefer not to use that option because `head -1 > /dev/null` is clearer

head is line-wise. dd is byte-wise.

Yeah, I was just saying that generally, when I need to strip off a header, the header is the first line of the file

dd is for converting EBCDIC to ASCII and vice versa :)

  $ echo how now brown cow > text.ascii
  $ dd conv=ebcdic < text.ascii > text.ebcdic
  0+1 records in
  0+1 records out
  18 bytes copied, 0.000261094 s, 68.9 kB/s
  $ od -xc text.ebcdic 
  0000000    9688    40a6    9695    40a6    9982    a696    4095    9683       
          210 226 246   @ 225 226 246   @ 202 231 226 246 225   @ 203 226
  0000020    25a6
          246   %
  $ dd conv=ascii < text.ebcdic
  how now brown cow
  0+1 records in
  0+1 records out
  18 bytes copied, 0.000140529 s, 128 kB/s

What? Yes dd absolutely is a disk writing tool, although that is not the only or even the tool's intended primary purpose.

It is useful for generating serial IO for a variety of purposes. For example, writing data with specific target block size; allocating contiguous blocks for use by an application (be it zeroing out a thin LUN before partitioning, or a file system); or simply dumping the content of one device to another (or to a file).

Good luck stretching out a thin LUN or creating an empty file that allocates contiguous space with cat.

It also doesn't stand for 'destroy-disk' as was thought by a junior admin I employed once, and eventually had to fire because the level of incompetence was getting to the point of almost being destructive.

Nope, that's not hyperbole. I had to stop the kid from almost installing software that would have connected to a known botnet to help a user connect a personal computer to the VPN. He passed enough checks during the interview we figured "Okay, we can train him in the rest of the things"

Lesson learned.

What software connects to a known botnet to allow connection to a VPN?


A user was wanting to bypass some of our network restrictions and the intrepid jr. Admin suggested Hola unblocker to watch Netflix with me sitting 3 feet away.

This was effectively the final straw and convinced me I had made the wrong hire. He was out two days later.

> How do you combine dd with gzip?

# dd < /dev/ada0 bs=8m | gzip -c -9 > /mnt/file.raw.gz

> How do you use pv if the source is raw device?

# dd < /dev/ada0 bs=8m | pv | dd > /dev/ada1 bs=8m

> How do you dd over ssh?

# dd < /dev/ada0 bs=8m | gzip -9 | pv | ssh user@host 'dd > /dev/da1 bs=8m'

> This belief can make simple tasks complicated.

As master Dennis Ritchie once said - "UNIX is very simple, it just needs a genius to understand its simplicity."

Did you really want an un-gzip'd copy on your target in that last one?

"Not a disk-writing tool?!?"

Who knew?

Now we have pretty good confirmation that this little utility is performing way more effectively than designed.

Software itself could probably benefit from some of the same approaches that allowed this little computer program to outperform its original design goals, in ways that might not have been anticipated.

IIRC, there was a windowing system called "W", which pre-dated X. However, it was crude and there was good reason for wanting to replace it.

Funny how the new windowing system is also "W"… (https://wayland.freedesktop.org especially the logo)

The best thing about dd is that you can use it with conv=noerror, which will let you recover as much data as possible from an otherwise damaged device.

That loses you a dd blocksize chunk, which is likely much bigger than the underlying damaged sector.

ddrescue or recoverdisk (part of FreeBSD base) will both skip over unreadable blocks, then retry with smaller block sizes along the damaged areas to save as much data as possible.

Indeed. Also, you'd actually want 'conv=noerror,sync'. The 'sync' keeps the input and output block counts in sync (if an input block can't be read, it writes an output block of zeros to keep the block counts in 'sync').

The arch wiki page is very useful: https://wiki.archlinux.org/index.php/disk_cloning

So what is the best way to clone a disk(or in my case a raspberry pi sd cars)?

I tried to backup one of my cards last week using dd > .iso file and then tried to put it on a new card. I tried with /dev/Rdisk (faster) but none of the new cards was bootable.

So this is saying just use copy.

(I ended up just creating a second boot disk, and ftping the files over which seems less than ideal...)

The article doesn't say to stop using `dd` to write disks.

It's just saying that if you have other commands that can read/write files, such as `pv /dev/thing > file.img` (to show a progress bar), you don't have to try to shoe-horn dd into it just because /dev/thing happens to be a drive.

In my experience dd is great for binary data. Yes, pretty much everything on Unix operates on files, as does dd. But so many utilities are either line based or don't handle null bytes, and it's a pain to have determine how a given program handles binary data when I know dd will at least not mess with it.

One of those Unix tools that would deserve to be better known is dcfldd (http://dcfldd.sourceforge.net/). It is basically dd with extra powers including on-the-fly hashing, progression, multiple outputs...

>> The reason why people started using it in the first place is that it does exactly what it’s told, no more and no less.

Oh yes indeed. And for this exact reason, "dd" is commonly backronymed to "Data Death" (or, indeed, "Disk Death").

This is not strictly true. For example, good luck trying to scp a device to a remote host.

> scp a device to a remote host

  dd if=/dev/hda bs=512 | (ssh root@remote dd of=/dev/hdX bs=512)
where bs=512 is the block size in bytes. Of course, the hdX drive must be different from the remote hosts's main drive, otherwise dd won't complete :-)

That is not my point. I meant to say that scp itself will refuse to open the block device file for reading.

"Everything is a stream" is closer to the truth until you learn the dark secret of ioctl and similar fun.

The statement

  cp myfile.iso /dev/sdb
is simply wrong. That will overwrite the device node for your flash drive with the ISO, instead of writing the data to the drive.

bmaptool is my image writing tool, and I think it should be yours too.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact