
The Cult of DD - eklitzke
https://eklitzke.org/the-cult-of-dd
======
cat199
"This is a strange program of obscure provenance that somehow, still manages
to survive in the 21st century."

-> links to wikipedia page with direct discription of lineage back to 5th ed research unix

"That weird bs=4M argument in the dd version isn’t actually doing anything
special—all it’s doing is instructing the dd command to use a 4 MB buffer size
while copying. But who cares? Why not just let the command figure out the
right buffer size automatically?"

Um -

a) it is 'doing the special thing' of changing the _block_ size (not buffer
size)

b) Because the command probably doesn't figure out the right size
automatically, much like your 'cat' example above which also doesn't

c) And this can mean massive performance differences between invocations

> Another reason to prefer the cat variant is that it lets you actually string
> together a normal shell pipeline. For instance, if you want progress
> information with cat you can combine it with the pv command

Umm:

    
    
      dd if=file bs=some-optimal-block-size | rest-of-pipeline
    

that was hard.

>If you want to create a file of a certain size, you can do so using other
standard programs like head. For instance, here are two ways to create a 100
MB file containing all zeroes:

    
    
      $ uname -sr
      OpenBSD 6.0
      $ head -c 10MB /dev/zero 
      head: unknown option -- c
      usage: head [-count | -n count] [file ...]
    

well.. guess that wasn't so 'standard' after all.. I must be using some
nonstandard version...

    
    
      $ man head |sed -ne 47,51p
      HISTORY
         The head utility first appeared in 1BSD.
    
      AUTHORS
         Bill Joy, August 24, 1977.
      $ sed -ne 4p /usr/src/usr.bin/head/head.c
       * Copyright (c) 1980, 1987 Regents of the University of California.
    

Hmm..

> So if you find yourself doing that a lot, I won’t blame you for reaching for
> dd. But otherwise, try to stick to more standard Unix tools.

Like 'pv'?

edit: added formatting, sector size note, head manpage/head.c stuffs..
apologies.

~~~
gkya
> pv

(You probably know this as you use OpenBSD, but) something I really like about
BSDs is that nost of the core commands respond to ^T with progress info ofsome
kin, dd included.

~~~
ksherlock
cool - by default, ^T generates the non-standard SIGINFO. That seems to be
true of other BSDs as well, including OS X.

~~~
gkya
Also, Linux dd responds to SIGUSR1, writing progress info on stdout. Sth. to
be vary of is that the same signal kills BSD dd.

------
viraptor
There's one good (?) reason to use dd with devices: it specifies target in the
same command. For devices, writing to them usually requires root privileges,
so it's easy to:

    
    
        sudo dd .... of=/dev/...
    

But there's no trivial cat equivalent:

    
    
        sudo cat ... > target
    

Will open target as your current user anyway. You can play around with tee and
redirection of course. But that's getting more complicated than the original.

~~~
thom_nic
This. The alternative:

    
    
        sudo sh -c 'cat some.img > /dev/sdb'
    

or even more baroque:

    
    
        cat some.img | sudo tee /dev/sdb > /dev/null
    

is a pain by comparison, and the `sudo sh -c` variant has env implications
when spawning a sub-shell.

I have an ARM/linux installer script that writes the u-boot image to a
specific offset _before_ the first partition:

    
    
        dd if=${UBOOT_DIR}/MLO of=$LO_DEVICE count=1 seek=1 bs=128k
        dd if=${UBOOT_DIR}/u-boot.img of=$LO_DEVICE count=2 seek=1 bs=384k
    

This is admittedly somewhat esoteric, but it seems like a stretch to say `dd`
does not have some place, especially when transferring binary data in very
specific ways.

~~~
majewsky
Since we're sharing shell tricks: The "sudo tee > /dev/null" may be baroque,
but I find it useful whenever I start editing stuff in /etc in vim, only to
find that I cannot write my changes because I'm not root. In that case,

    
    
      :w !sudo tee %
    

does the trick. (What "w!" does is send the buffer into the given shell
command as stdin.)

------
colemannugent
One thing I'll often use dd for is recovering data from a failing drive. Can
head ignore read errors? dd can.

As far as I'm concerned, dd is lower-level than most of the other utilities
and provides more control over what's happening.

The author does have a point that the syntax is strange though.

~~~
Freaky
dd is awful and error-prone for this sort of use.

Use noerror, but forget sync? Corrupt output file if there is an error. Use a
bigger bs so it's not slow as treacle? A single faulty sector blows away a
whole bs of data, and your output image may get unwanted padding appended to
the end. Recoverable error? dd's not going to retry.

Use ddrescue or FreeBSD's recoverdisk(1). They're faster, they're safer,
they're more effective, and they're easier to use.

~~~
sillysaurus3
ddrescue is excellent.

~~~
ianai
ddrescue is so good that that particular example feels a little strawman-ish

------
stirner
This article is full of Useless Uses of Cat[1] that could just use redirection
operators. For instance,

    
    
        cat image.iso | pv >/dev/sdb
    

could be rewritten as

    
    
        pv < image.iso > /dev/sdb
    

A related mistake is the Useless Use of Echo, since any command of the form

    
    
        echo "foo" | bar
    

can be written using here strings as

    
    
        bar <<< "foo"
    

or even

    
    
        bar <<WORD
        foo
        WORD
    

[1]
[http://porkmail.org/era/unix/award.html](http://porkmail.org/era/unix/award.html)

~~~
j3097736
>pv < image.iso > /dev/sdb

Huh? pv can cat stuff on it's own, and it will be able to make a progress bar
based on the filesize

    
    
      pv image.img > /dev/sdb

~~~
amelius
So I suppose that the command

    
    
        pv < image.iso > /dev/sdb
    

would actually need to buffer the complete file, before commencing to write to
the device (and showing the progress), which would defeat the whole idea of
showing progress.

~~~
heinrich5991
If `pv` doesn't know the input size, it doesn't show it. In your case, it can
determine that its stdin is a file by looking at the `/proc/self/fd/0`
symlink:

    
    
       $ ls -l /proc/self/fd/0 < /tmp/x
       lr-x------ 1 user users 64 <date> /proc/self/fd/0 -> /tmp/x

------
hvs
For those of you that are blissfully unaware of what the JCL DD command looks
like, here's a example (with only the DD section of the JCL shown):

    
    
      //SYSPRINT DD SYSOUT=*                                                          
      //SYSLIN   DD DSN=&&OBJAPBND,                                                   
      //            DISP=(NEW,PASS),SPACE=(TRK,(3,3)),                                
      //            DCB=(RECFM=FB,LRECL=80,BLKSIZE=3200),                             
      //            UNIT=&SAMPUNIT                                                    
      //SYSLIB   DD DSN=SYS1.MACLIB,DISP=SHR                                          
      //SYSIN    DD DSN=&SAMPLIB(IEWAPBND),DISP=SHR

~~~
DrScump
It links a file name (as referenced within a program) to the proper physical
file[0], conceptually like an environment variable in UNIX and Windows.

Ah, I miss elements of the mainframe days.

[0]
[https://www.ibm.com/support/knowledgecenter/zosbasics/com.ib...](https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.zjcl/zjclc_jclDDstmt.htm)

------
tambourine_man
_But who cares? Why not just let the command figure out the right buffer size
automatically?_

Because it can be a lot slower. dd is low level, hence powerful and dangerous.

And, if we are going down that rabbit hole, you don't need cat[1]

“The purpose of cat is to concatenate (or "catenate") files. If it's only one
file, concatenating it with nothing at all is a waste of time, and costs you a
process.”

[1][http://porkmail.org/era/unix/award.html#cat](http://porkmail.org/era/unix/award.html#cat)

~~~
schoen
But if you don't use _any_ process, what process is doing the reading?

(This sounds like a zen koan somehow.)

~~~
dredmorbius
I like the koan. But the point should be an _additional_ process.

In a UUOC avoidance case, it's the _current_ process which reads, generally
via stdin. Say, the shell, or dd itself with an 'if=' parameter.

Which I strongly suspect you know.

~~~
schoen
Yep, my thought was that the UUOC critique doesn't apply to most attempts to
substitute cat for dd, because typically those are copying from one (regular
or special) to another, and you can't simply use redirection to accomplish
this in the absence of a reader.

------
gens
The Ignorance Of Err Ignorant People

dd is a tool. dd can do a lot more then cat. dd can count, seek, skip
(seek/drop input), and do basic-ish data conversion. dd _is_ standard, even
more standard then cat (the GNU breed). I even used it to flip a byte in a
binary, a couple of times.

New-ish gnu dd even adds a nice progress display option (standard is sending
it sigusr1, since dd is made to be scripted where only the exit code matters).

> Actually, using dd is almost never necessary, and due to its highly
> nonstandard syntax is usually just an easy way to mess things up.

Personally I never messed it up, nor was confused about it. This sentence also
sets the tone of the whole article, a rather subjective tone that is.

edit: Some dd usage examples: [http://www.linuxquestions.org/questions/linux-
newbie-8/learn...](http://www.linuxquestions.org/questions/linux-
newbie-8/learn-the-dd-command-362506/)

~~~
jaclaz
Just in case (same site):

[http://www.linuxquestions.org/linux/answers/Applications_GUI...](http://www.linuxquestions.org/linux/answers/Applications_GUI_Multimedia/How_To_Do_Eveything_With_DD)

------
electrum
Don't cat a file and pipe it into pv. Use "pv file" as a replacement for "cat
file" and it will show you the progress as a percentage. When it's in the
middle of a pipeline, it doesn't know the total size (unless you tell it with
-s), so it can only show the throughput.

~~~
apostacy
Actually, that's not completley true. pv will detect the file size if you use
the shell to read a file into it, like pv < file.

~~~
angry_octet
At first I thought, no, that's not possible. Then I thought, no, they wouldn't
do THAT would they?

But I guess they do...

[http://stackoverflow.com/questions/1734243/in-c-how-do-i-
pri...](http://stackoverflow.com/questions/1734243/in-c-how-do-i-print-
filename-of-file-that-is-redirected-as-input-in-shell)

I've seen that kind of brokenness from programs trying to find their binary
image on disk. Don't do it, it's bad.

~~~
kam
It doesn't need to hunt for the directory entry, just needs to call `fstat()`
on the stdin file descriptor.

~~~
rwmj
Unless the input file is a device, and then you need to call:

    
    
        ioctl(STDIN_FILENO, BLKGETSIZE64, &size)

~~~
ithkuil
But I guess this is orthogonal to where the file descriptor comes from (i.e.
stdin or opening a file whose name is passed in the args)

~~~
rwmj
It's an orthogonal issue, yes, but calling stat or fstat on any block device
whether from stdin or argv will return .st_size == 0, so your progress bar
won't display the correct answers (or could display better answers if it used
the ioctl).

------
gunnihinn
A counterpoint: dd survives not because it's good or makes sense, but
explicitly because it doesn't.

You wanna format a usb key? Google this, copy/paste these dd instructions, it
works, move on with your life.

You wanna format a usb key using something related to cat you once saw and
didn't fully understand? Have fun.

Both approaches have their weak points, but in any OS the answer to "How do I
format a usb key" should not start with "Oh boy, let's have a Socratic dialog
over 10 years on how to do that."

~~~
Aloha
This probably has more truth than most unix admins would like to admin.

"Why do we do it like that? I dunno, that's how I learned how, how do you do
it?"

~~~
xelxebar
Definitely this. I have found many times that I'm an offender of these "bad
practices", and usually that's because a certain pattern I learned way back in
the beginnings of my Linux days still hangs around.

Embarrassingly, it took me a long time before I started reaching for man pages
instead of Google. That has probably has had the biggest effect on tightening
up my command line fu.

find is another tool that seems to get only one specific use case that ignores
its rather large and useful toolset.

~~~
xenithorb
I learned Linux this way a decade and a half ago when it was far (and still is
imho!) more convenient to quickly search a man page than google something.
(with slow internet start times, browser startup times, etc)

Now, sometimes when people watch me work in a shared session they comment on
my "peculiar" (to them) usage of flipping between -h --help and man $command,
because there's a whole lot of switches I have memorized over time, but even
more that I just have good reference points for.

But, bar none, what I've noticed among my peers is that the people that have
always bowed to quick google solutions never really have taken the time to
learn what they're doing. They almost always seems to be the 'quick fix', 'get
it working now, sort it out later' types.

~~~
Aloha
I often use google to find the answer, then go read the man page for it.

------
knz42
What about the `seek` argument which skips over some blocks at the beginning
but still allocates them (unix "holes")?

Also note that there are still unix systems out there which do not support
byte-level granularity of access to block devices. On those devices you must
actually use a buffer of exactly the size of the blocks on the device. Heck,
linux was like this until at least v2.

~~~
AdamJacobMuller
Also keep in mind that specifying the block size can be important, especially
for efficiently reading data. standard shell tools don't just "figure it out"
automatically. They guess, and sometimes those assumptions can be incorrect
resulting in lower (orders of magnitude) performance.

~~~
angry_octet
Very useful when dealing with raid (make the blocks stripe sized) or tape (512
byte) or esoteric devices.

An essential tool for low level repair, like when you can guess the partition
table values but there is no partition table anymore.

------
chrisfosterelli
I think dd is primarily so popular because it is used in mostly dangerous
operations. Sure, using cat makes logicial sense, but if we are talking about
writing directly to disk devices here I'll trust the command I read from the
manual and not explore commands I _think_ would work.

dd's "highly nonstandard syntax" comes from the JCL programming language, but
it's really just another tool to read and write files. At the end of the day
it's not more complex or incompatible than other unix tools. For example, you
can also use tools like `pv` with dd no problem to get progress statements.

~~~
gfody
I always thought dd stood for disk destroyer, only ever used it for making low
level copies of whole disks or shredding them with if=/dev/random. This thread
has been informative and terrifying as I learn cat and cp are every bit as
dangerous as dd! I never would expect something like cp xxx /dev/sda to
actually work. Thinking about it, why should cp even support something like
that? I'll copy files but I'll also DESTROY YOUR SHIT if you say so?

~~~
luca_ing
> I never would expect something like cp xxx /dev/sda to actually work.
> Thinking about it, why should cp even support something like that? I'll copy
> files but I'll also DESTROY YOUR SHIT if you say so?

That's the beauty of Unix.

Everything is a file. Thus every program that can work with files, can in fact
work with everything.

It's actually very liberating.

------
donaldihunter
Cult of pv. It looks to have more command-line complexity than dd.
[https://linux.die.net/man/1/pv](https://linux.die.net/man/1/pv)

~~~
merlincorey
This is a good point as well. On BSD, we don't have `pv`, but we do have ^T.
This will print some sort of status for just about any long running process.
It prints very specialized status for certain programs aware of it.

~~~
emmelaich
Nice! MacOS has it too (unsurprisingly)

Linux, not. I wonder why.

~~~
yjftsjthsd-h
You're not the first to wonder:
[https://unix.stackexchange.com/questions/179481/siginfo-
on-g...](https://unix.stackexchange.com/questions/179481/siginfo-on-gnu-linux-
arch-linux-missing)

But it looks like the answer is just "it was complicated to implement so Linux
didn't add it."

------
angry_octet
This is a great example of why downvoting submissions should be a thing. Or at
least showing the up/down tuple. I would say every upvote represents someone
misled and likely to further propagate this nonsense.

~~~
Kiro
You can flag submissions.

~~~
angry_octet
I kinda thought that was supposed to be reserved for submissions that are non
permissible, e.g., hate speech, sites hosting malware, or very off-topic &
uninformative.

------
merlincorey
The author doesn't even give correct invocations of dd (on BSD, at least, for
their last example with head).

I certainly agree the syntax of the arguments is strange, due to its age, but
I don't agree that learning it is difficult or a waste of time.

All I've learned is that the author doesn't like dd well enough to learn it.

------
betaby
Author is wrong bs IS useful, try to dd one hard drive to another without
reasonable bs (1-8M) with and without and you will see a difference.

~~~
tmccrmck
It also doesn't care about file type. If you want to copy a malformed file dd
will do it - cat won't.

~~~
pwdisswordfish
cat doesn't care about file type either.

------
snickerbockers
OP, your alternatives to DD are more complicated, not less complicated. I
shouldn't need to pipeline two commands together just to cut off the first
100MB of a file.

~~~
chungy
Honestly, this is a problem with all of the examples.

* Using "cat source > target" instead of "cp source target"

* Using "cat source | pv > target" instead of "pv source > target"

* Using "head -c 100MB /dev/zero > target" instead of "truncate -s 100MB target"

------
ocschwar
Dude's missing an important point:

If you mess up the syntax on a dd invocation, a nice thing happens: nothing.

Use a shell command and pipes, and your command better be perfect before you
hit return.

~~~
cat199
though I'm not a fan of this article by any means, I've definately dd'ed the
wrong disk before and slapped myself for doing so..

Usually it's not past the easily-reproducable system partition yet or on a
data disk that is backed up regularly so I can recover in an hour or so..

~~~
bigbugbag
If you dd'ed the wrong disk the syntax of your dd invocation was correct.

------
sndean
Somewhat related short story: Earlier this week my friend said that he dd'd
away just over 50 bitcoins, back when they were worth ~$3 each.

"One of the biggest regrets of my life."

~~~
ajross
If one of the biggest the biggest lifetime regrets is a $50k financial loss,
your friend is doing pretty well.

~~~
TheAdamAndChe
$50k is a massive sum for a lot of people...

~~~
sndean
Well yeah, he's in med school. Maybe one day it won't be a massive sum, but,
for now, he's broke.

~~~
ajross
And I state again: if you're a future doctor and your biggest regret is that
you could be $50k richer right now, I'm not inclined to do much weeping.
Basically everyone has been a broke student.

------
AdamJacobMuller
I'll point out that dd also allows you to control lots of other filesystem and
OS-related things that other tools do not. See: fsync/fdatasync. I'm not aware
of any shell tools that allow you to write data like that.

------
gravypod
An even easier solution: don't make people fall into the command line to
format a USB reliably.

The command line should be reserved for times where you need the fine grain
control to do something that DD is meant to do. A GUI should implement
everything else in a reliable way that doesn't break half the time or crash on
unexprected input.

~~~
empath75
I'm a Linux sysadmin/developer and I've literally never used a Linux GUI.

~~~
xelxebar
I want to do this so bad, but the two things that keep me running X are my
browser and mpv. Oh, and viewing PDFs.

~~~
throwanem
You might find
[https://github.com/saitoha/libsixel/blob/master/README.md](https://github.com/saitoha/libsixel/blob/master/README.md)
of interest.

------
kev009
Ignorance on the blocksize arg.

Also, I only need to remember one progress command for my entire operating
system: control+t. I also get a kernel wait channel from that which is
phenomenally pertinent to rapidly understanding and diagnosing what the heck a
command is doing or why it is stuck.

I hate what Linux has done to systems software culture.

------
emmelaich
Specifiying a large block size used to help a LOT with performance. From
memory shell redirection used a tiny blocksize. On Solaris at least.

And _if_ you use dd then you probably should specify a bigger block size than
the default of 512 bytes.

But yeah, most usage is obsolete.

~~~
barsonme
plain cat (no options) uses max(128 * 1024, st_blksize) aligned to page_size
for reads and writes.

so you get the best block size for reads and writes. I can't speak to what the
shell does, though.

------
gabrielblack
I think this article is full of "alternative computer science" and reminds me
other article, published here as well, about the obsolescence of Unix. The
only good thing is this discussion thread.

------
ori_b
To be fair, dd was mostly a toungue in cheek reference to the overly baroque
JCL command for IBM mainframes.

------
jsd1982
Interesting assertion. Can you show me a shell invocation without using dd
that cuts off the first 16 bytes of a binary file, for example? This is a
common reason I use dd.

~~~
advisedwang
tail -c +17

~~~
to3m
This nearly tells you all you need to know. The other bit of info you'll want
to note is that head -c +N produces as many bytes as you ask. So if you try to
get the prefix using "head -c +N" and the suffix using "tail -c +N" then
you'll have 1 byte of overlap.

(dd's corresponding options do not suffer from this problem.)

~~~
xelxebar
That seems pretty intuitive to me:

Grab the first N bytes vs. grab everything starting from the Nth byte.

------
tardo99
One of the charms of dd is its hilarious syntax. And, used properly, it's a
bit of a swiss army knife for a few different disk operations.

------
noir_lord
not sure status=progress is that obscure a command, it was added relatively
recently as well (in terms of dd).

~~~
ajdlinux
The fact that it was added relatively recently is exactly why it's so obscure.
Unlike if, of, bs and count, I haven't had status=progress drilled into my
head by every single dd command I've read out of a manual or tutorial, so even
now I still forget whether it's "status=progress" or "progress=status" or
something else.

Also it's a victim of dd's bizarre non-Unix syntax - an option like
"\--status" or "\--progress" would be more in keeping with expectations.

------
kazinator
dd precisely controls the sizes of read, write and lseek system calls. This
doesn't matter on buffered block devices; there is no "reblocking" benefit.

Some kinds of devices are structured such that each write produces a discrete
block, with a maximum size (such that any bytes in excess are discarded) and
each read reads only from one block, advancing to the next one (such that any
unread bytes in the current block due to the buffer being too small are
discarded). This is very reminiscent of datagram sockets in the IPC/networking
arena. dd was developed as an invaluable tool for "reblocking" data for these
kinds of devices.

One point that the blog author doesn't realize (or neglects to comment upon)
is that "head -c 100MB" relies on an extension, whereas "dd if=/dev/zero
of=image.iso bs=4MB count=25" is ... _almost_ POSIX: there is no MB suffix
documented by POSIX, only "b" and "k" (lower case). The operator "x" is in
POSIX: bs=4x1024x1024.

Here is a non-useless use of dd to request exactly one byte of input from a
TTY in raw mode:

file:///usr/share/doc/bash-doc/examples/scripts/line-input.bash

Wrote that myself, back in 1996; was surprised years later to find it in the
Bash distribution.

------
paulddraper
My most common use of dd is warming up AWS EBS volumes.
[http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-
initi...](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-
initialize.html)

Though fio is better because it can work in parallel.

------
diegorbaquero
Question: Will cat do a bit-to-bit copy between disks?

------
dom0
dd is for handling blocked data, while cat, redirection and pipelines are
completely useless for that, since they are not meant to manipulate blocks of
data, but streams. They do not compare (apart from really simple cases where
either will do, like copying a file into some other file); this blog posts
mainly highlights that neither the author nor many tutorial writers now the
difference.

------
nwah1
Someone should write a wiki bot to crawl through the wikis for Arch, Debian,
and so forth to help rewrite all these bad instructions.

~~~
boondaburrah
I mean, are they bad instructions? They work for most people, and they're what
most people are familliar with. If I ask a random *ix user what's wrong with
my shell command, they're more likely to know about dd.

~~~
gunnihinn
They're good instructions, Brent. 13/10 would follow again.

------
rurban
Instead of

    
    
        cat image.iso | pv >/dev/sdb
    

just do

    
    
        pv image.iso >/dev/sdb

------
bigbugbag
A self submitted opinion blog post pretty much entirely wrong ending up on HN
front page. What gives ?

------
gbin
Instead of `cat file | pv > dev` why not `pv file > dev` ?

------
jeffdavis
What about writing a block into the middle of a file?

------
number6
This is cat abuse

------
badatusernames
TLDR This has nothing to do with dunkin donuts

