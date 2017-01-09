It comes from the DD (data definition) statement of OS/360 JCL, and hence why dd has the unusual option syntax compared to other unix utils
BTW if you are using dd to write usb drives etc. it's useful to bypass the Linux VM as much as possible to avoid systems stalls, especially with slow devices.
You can do that with O_DIRECT. Also dd recently got a progress option, so...
dd bs=2M if=disk.img of=/dev/sda... status=progress iflag=direct oflag=direct
http://www.pixelbeat.org/docs/coreutils-gotchas.html#dd
ps a | grep "\<dd"
# [...]
kill -USR1 $YOUR_DD_PID
pkill -USR1 ^dd
I've thought that dd's behavior could serve as a model for a new standard of interaction. Persistent progress indicators are known to cause performance degradation unless implemented carefully. And reality is, you generally don't need something to constantly report its progress even while you're not looking, anyway.
To figure out the ideal interaction, try modeling it after the conversation you'd have if you were talking to a person instead of your shell:
"Hey, how much longer is it going to take to flash that image?"
The way dd works is close to this scenario.
It's also worth noting the separate "progress" project which can be used to give the progress of running file based utilities.
We generally have pushed back on adding progress to each of the coreutils for these reasons, but the low overhead of implementation and high overlap with existing options was deemed enough to warrant adding this to dd
Ignored by the application, sure. But FreeBSD always prints useful stuff like load, current command, its pid and state:
$ dd if=/dev/random of=/dev/null
load: 0.72 cmd: dd 5820 [running] 0.70r 0.02u 0.68s 6% 2008k
263276+0 records in
263276+0 records out
134797312 bytes transferred in 0.708372 secs (190291809 bytes/sec)
(here, the "load:..." line is from the system, and the other 3 lines are from dd)
That seems to run substantial risk of seeing it in an inconsistent state, yeah?
Also I meant shm'd, not mmap'd.
Are you passing the reference to an mmap adress, or using the shm systemcalls? In what language are you programming in? Does race conditions endanger the shared memory? If so, how does using semaphores help?
Sorry, if I asked a lot of questions, feel free to answer any/none of them :)
I'm using shm system calls in Python. Basically I get a buffer of raw bytes of a fixed size that is referred to by a key. When I have multiple processes running I just have to pass that key between them and they get access to that buffer of bytes to read and write.
On each iteration first I wait until the semaphore is free and then I lock it (P). That prevents anyone else from accessing the shared memory. I have the process read from the shared memory a set of variables - I have little helper functions that serialize and deserialize numpy arrays into raw bytes using fixed shapes and dtypes. Those arrays are then updated using some function combining the output of the process and the current value of the array. Then those arrays are reserialized and written back to the shm buffer as raw bytes again. Finally, the process releases the semaphore using P() so other processes can access it. The purpose of the semaphore is to prevent reading the arrays while another process is writing them - otherwise you might get interleaved old and new data from a given update. In a process-wise sense there is a race-condition, as each process can update at different times or in a different order, but for my purposes this is acceptable since neural net training is a stochastic sort of thing and it shouldn't care too much.
[0] http://nikitathespider.com/python/shm/ - original library which works fine for me
[1] http://semanchuk.com/philip/PythonIpc/ - updated version
Progress bars by default are also garbage if you are scripting and want to just log results. ffmpeg is terrible for this.
Are you referring to that npm progress bar thing a few months back? I'm pretty sure the reason for that can be summed up as "javascript, and web developers".
Anyway, he's not proposing progress bars by default, he's proposing a method by which you can query a process to see how far it's come. I think there's even a key combination to do this on FreeBSD.
Or, for example, you could write a small program that sends a USR1 signal every 5 seconds, splitting out the responsibility of managing a progress bar:
% progress cp bigfile /tmp/
And then the 'progress' program would draw you a text progress bar, or even pop up an X window with a progress bar.
pv image.img | dd of=/dev/rdisk2 bs=1M
Details as to exactly why it's faster are welcome. (I just know it bypasses stuff).
EDIT: someone mentioned this below http://superuser.com/questions/631592/why-is-dev-rdisk-about...
Though, read cache can be enabled manually by creating separate device via gcache(8). This is usually not required, because caching is done at the filesystem layer.
It's important to specify block size for uncached devices, of course. dd(1) with bs= option will surely work, and with cp(1) your mileage may vary, depending on whether underlying disk driver supports I/O with partial sector size or not.
Without that it does literally take hours.
I suspect the default blocksize is really small (1?) and combined with uncached/unbuffered writes to slower devices, it just kills all performance outright.
Edit: answered! https://news.ycombinator.com/item?id=13350002
NB: Remember the units! Without the units you specify it as bytes or something insanely small like that. I've made that mistake more than once!
But that's the very beauty of unix!
If you can find a way to use 'dd' for disk/drive/device you can use it in interesting new manners (pipelines, etc.) and have very good confidence that it won't break in weird ways. It will do the small, simple thing it is supposed to do even if you are abusing it horribly.
Like this, for instance:
pg_dump -U postgres db | ssh user@rsync.net "dd of=db_dump"
Clueless noob here . . . most guides I've seen use bs=1M for writing e.g. a Linux installer to a USB drive. Does 1 MB vs 2 MB change anything?
For most modern systems, 1MB is a reasonable place to start. Even as high as 4MB can work well.
The block size can make a major difference in terms of sustained write speed due to reduced overhead in system calls and saturation of the disk interface.
A similar thing happens when writing to sockets where lots of small messages kill throughput, but they can decrease latency for a system that passes a high volume of small control messages.
Oh man, I didn't even know that was the cause of these problems.
As a programmer, it usually pays off to be on the lazy side, but every once in a while it comes back and bites me in the arse ;)
So:
cat foobar.img > /dev/sdi
Usually I also lower the vm.dirty_bytes and vm.dirty_background_bytes to 16 resp. 48 MB (in bytes) beforehand, which limits the buffer sizes to those amounts. Else it will seem that the progress bar indicates 300MB/s is written, and when it completes you still have to wait a really long time for things to have been written out.
Afterwards I restore back vm.dirty_ratio and vm.dirty_background_ratio to respectively 10 and 5 - the defaults on my system.
I wish that all of those projects, tutorials etc. that explain how to write their image to a block device, like an sdcard, would start advise using cat, because there is no reason to use dd, it's just something that people stick with because others do it too.
I only use dd for specific blocks, like writing back a backup of the mbr, or as a rudimentary hex editor.
> cat foobar.img > /dev/sdi
> will stream the file rather than what dd does,
Sorry, that's not quite right. `cat` (and your shell, presumably bash) does the same fundamental thing as `dd`, ie, read block, write block. There's not really an underlying 'stream' primitive that `cat` (or `bash`, as you're using redirection to write the file) is using compared to `dd`
What `cat` does do, is it does a better job of trying to find an optimal blocksize than a naive `dd` call does. `dd` simply defaults to a 512-byte block size, which is just inherited from history when 512-bytes was the alignment for, well, everything.
There are numerous optimizations upon the fundamental read-block, write-block primitives to make it go faster (`cat` makes use of some of these). The linux kernel actually has a "stream file to socket" syscall to avoid the copy to user-land and back to "stream" a file out to the network, but that's not happening here, and there's still reading and writing of blocks in the kernel happening.
See also:
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f...
Once cat is spawned, the shell gets out of the way, and has nothing to do with it. (Exception: zsh will "helpfully" insert itself with `tee`-like operation in certain situations.)
What `cat` doesn't do is finding an optimal block size for writes. It does a good job of finding optimal block size for reads, but not for writes. Many older block devices perform very poorly when the write block size is not optimal.
As an example of what I was referring to, in zsh, this
a >&2 | b
a | tee /dev/stderr | b
That is, zsh inserted itself into the middle of the pipeline. There are two pipes instead of one; zsh/tee reads from the a pipe, and then writes that data to the b pipe (and stderr). This does hurt performance.
-> % python test.py 2>&1 | python test.py
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
[snip]
python 24150 someone 0u CHR 136,0 0t0 3 /dev/pts/0
python 24150 someone 1w FIFO 0,8 0t0 72772851 pipe
python 24150 someone 2w FIFO 0,8 0t0 72772851 pipe
python 24150 someone 3w CHR 5,0 0t0 1049 /dev/tty
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
[snip]
python 24151 someone 0r FIFO 0,8 0t0 72772851 pipe
python 24151 someone 1u CHR 136,0 0t0 3 /dev/pts/0
python 24151 someone 2u CHR 136,0 0t0 3 /dev/pts/0
python 24151 someone 3w CHR 5,0 0t0 1049 /dev/tty
someone@arch-two [01:29:00] [~]
-> % echo $ZSH_VERSION
5.2
someone@arch-two [01:29:16] [~]
-> % cat test.py
import os
import sys
import subprocess
with open("/dev/tty", "w") as f:
subprocess.check_call(["lsof", "-p", str(os.getpid())], stdout=f)
It's not particularly surprising that the zsh behavior is different than bash; zsh redirections are only Bourne shell-like in simple cases.
I dislike zsh's behavior because it isn't what the user typed. It should be simple to look at a command line and see how many pipes it makes; see the processes that will be involved in the pipeline. Zsh's behavior seems to me to be clever and implicit; I'd much rather have dumb and explicit.
I'd wondered whether dd or cat were faster, and indeed cat is faster, but not by much. Also, for some embedded devices, you have to write to specific offsets, so dd is more convenient and explicit. Lastly, cat composes poorly with sudo.
$ sudo dd if=foobar.img of=/dev/sdi # works
$ sudo cat foobar.img > /dev/sdi # fails unless root b/c redirection is done by shell
xxd / xxd -r is much nicer, but I suppose sometimes vim is not available...
That performance difference often comes from block size; "dd bs=1M" typically runs much faster than the default block size of 512 bytes.
`od` is ubiquitous--it's POSIX and a requirement of the Single UNIX Specification
sudo (cat foobar.img > /dev/sdi)
sudo sh -c "cat foobar.img > /dev/sdi"
sudo -s "cat foobar.img > /dev/sdi"
sudo sh <<EOF
cat foobar.img > /dev/sdi
EOF
In short, you could do it, but it'd be ripped out of every server that's been hardened, and for users that don't want to care - they're just running 'sudo su' anyhow.
Speaking only for myself, the thought of my shell having a magical escalation process would scare the bejeezus out of me - and I'm supposed to have root on our boxes!
zsh: parse error near `)'
cat foobar.img | sudo tee /dev/sdi > /dev/null
cat foobar.img | sudo tee /dev/sdi >&-
$ sudo tee /dev/sdi >&- < foobar.img
[0] https://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_...
[1] http://www.ivarch.com/programs/pv.shtml
Laptops don't need to follow multiuser best practices, and frankly a complex root password offers little.
I am not even sure what you mean by stream or how it would be any difference than "read", "write", "read", "write". Because that is literally what cat does, and so does dd.
Many people feel cat is faster than dd because dd's default block size is 512 bytes as you can see from a simple dd command.
strace dd if=10M of=junk
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
strace cat 10M > junk2
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072
The beauty of dd is you don't have to tinker with things like vm-.dirty_bytes and vm.dirty_background_bytes before using a command. Those should not be messed with on most systems and should definitely not be messed with for the sake of running a single command. When you use cat, it makes some decisions for you. Most of the time makes good choices but it happens to make not so optimal choices for working with large files being written to other devices or file systems.
To avoid the pitfalls of using cat to copy image files (and not have to change global system settings) you use dd. With dd you can use O_DIRECT to bypass the VFS caching layer so your vm.dirty_* settings are ignored in general. You can also specify the block size optimal for the device you are writing to and even reading from.
While dd may not stand for disk or drive or its name have anything to do with disk it is a powerful tool that is the appropriate tool to use when working with large files, precise movement or data and yes copying image files to devices.
<foo.img pv -cNsrc | buffer -p75 -s256k -b128 | pv -cNdst >/dev/whatever
Legend is wrong, this clearly derived from the mainframe JCL DD command. This is also why the syntax is so non unix-like.
https://en.wikipedia.org/wiki/Job_Control_Language#In-stream...
ddrescue gives you options for error handling and will skip past bad blocks, it handles read errors much more gracefully.
[0]: https://www.gnu.org/software/ddrescue/
I can't sing the praises of ddrescue enough. NB: ddrescue and dd_rescue are not the same!
I've recovered numerous otherwise unmountable Mac/Windows drives using ddrescue, mmls, sleuthkit, and foremost.
Usually I'm able to lsblk to determine the block device, rip the data partition with ddrescue and mount it loopback with mount once I use mmls to confirm the partition type.
For Mac volumes that dont mount, either fsck.hfsplus or get the file to a Mac to run diskwarrior (Alsoft has been repairing HFS volumes for a couple decades).
If nothing else works once you've got the raw bits saved, foremost will usually scrape something worthwhile off your image file.
See http://www.noah.org/wiki/Dd_-_Destroyer_of_Disks
This page explains the challenge - https://wiki.archlinux.org/index.php/disk_cloning
- it is available by default on all Unix systems
- it distinguishes between input and output (i.e. if= and of=)
- it reports results
- it avoids using a common command for a dangerous operation that the user may not understand
dd also has another benefit: the ability to select a range of blocks to copy from and to. This isn't the most common scenario, but it certainly pops up on some devices.
These operations are not that complicated. Behold the magic of UNIX pipes:
dd if=/dev/sdb | pv | gzip -c | ssh name@host "gzip -dc | dd of=/dev/sdc"
pv /dev/sdb | gzip -c | ssh name@host "gzip -dc > /dev/sdc"
Not only is this shorter with less cargo cult steps, it also gives a much better `pv` output now that it can determine the size and show a percentage.
The key though is to be aware of block size, and use multiply of it.
cat /dev/zero | strace dd bs=1M of=/dev/null
Similarly,
dd if=/dev/zero bs=1M | strace gzip -1 > /dev/null
On my system, adding dd with higher block size on both sides of a gzip -1 pipeline just ends up being slower.
That said, what you say is correct on writes. On reads, specifying block size is mostly to reduce syscall overhead because readahead will prefetch data to accelerate things.
pv /dev/sdb | gzip | ssh name@host "gunzip > /dev/sdc"
pv /dev/sdb | ssh -C name@host "cat > /dev/sdc"
Another useful tip is `curl https://...iso | sudo dd of=/dev/sdb` if you don't have enough disk space to hold the ISO. And sometimes internet speeds are faster than disk speeds anyway.
- not all the things on unix are abstracted as files (or 'byte streams' to be more accurate). however, i/o resources and some ipc facilities are defined so. an operating system provides many other abstractions in addition to these such as processes, threads, non-stream devices, concurrency and synchronization primitives, etc.; thus it's absolutely wrong to say that everything is a file on unix.
Only for those like me, who are wondering...^^
Just try (on OS X) dd if=disk.img of=/dev/disk1... first speedup is gained by using rdisk1, but the real improvement comes with the bs=1m. 2 vs 16 vs 300 MB/s on my machine, when cloning via USB-SATA adapter.
See the second answer here as to why:
http://superuser.com/questions/631592/why-is-dev-rdisk-about...
Flashbench [1] should be able to tell you what the erase-block and page size is.
[1] https://github.com/bradfa/flashbench
Not anymore. Advanced Format (4096-byte sector) hard drives have taken the market like a storm, and SSDs benefit even more from using larger I/O sizes (because erase sectors are way larger).
Do you know a way to get an SSD's native erase sector size?
For NVMe disks on Linux, you can find out this size with the nvme-cli [0] tool. Use "nvme id-ctrl" to find the Maximum Data Transfer Size (MDTS) in disk (LBA) blocks and "nvme id-ns" to find the LBA Data Size (LBADS). The value is then 2^MDTS * 2^LBADS byte.
For example, the Intel SSD 450 can transfer 32 blocks of 4096 byte per NVMe command, so you'd want a block size of 128 kiB.
[0] https://github.com/linux-nvme/nvme-cli
Also check out the BUGS section of the manpage :) https://www.freebsd.org/cgi/man.cgi?diskinfo
Even the 32K read/write size used by many utilities (shells, XYZsum, rsync and so on) can slow things down with modern/fast IO devices.
Today you'll want to use something like 32K to 2MB. Doesn't really matter around there.
If you're writing synchronously (which you shouldn't), then it becomes a tad more difficult to figure out for optimal performance.
"Disables write caching […] you can disconnect the device safely without [unmounting it]"
:D
Thanks for whatever it is that you gave me. I don't keep up with nomenclature these days.
Using dd in this role and then complaining about the result before running any other kind of test is a waste of everyone's time. No filesystem, even local, is optimized for that kind of performance. You do know the difference between performance and scalability, don't you? People who evaluate server-oriented systems in 2017 based on a methodology more appropriate for a 1997 desktop are doomed to fail. In everything.
Also, seriously people, use the bs flag on dd.
Works a treat and using the fs level tool you know everything will be properly copied, much safer.
I know it's possible with regex, but given how frequently that parsing logic is needed, and the difficulty of getting sed right, I think a "tb" tool would be very helpful.
Reading a value from "<td>2017-01-09 <b>08:30</b></td>" is harder than it should be.
cut needs single-character delimiters, so only splitting on "<" or ">" won't work.
sed trips up on the "/", ":", and "-" without proper escaping.
This is before even mentioning Unicode. High-level scripting languages handle all of these just fine. I'd much rather have a standard tool and library for this purpose though.
cat dates | python -c 'import sys; for line in sys.stdin: print(line.split("<td>"))'
php -r 'foreach (explode("\n", file_get_contents("dates")) as $line) { print_r(explode("<td>", $line)); }'
While writing this I did think of something though, depending on what you're actually trying to do it might work:
cat dates | tr "<>" " " | awk '{print $yourColumnOfInterest}'
Back to the article. dd is not a disk writing tool. Everything on UNIX is a disk writing tool. dd has one job, and does it right.
Again, I propose a single-purpose text delimiter cutting program. It doesn't need to be a whole scripting language, e.g. awk, perl, python, etc. Just get the text between two strings, and have sensible syntax.
$ echo '<td>2017-01-09 <b>08:30</b></td>' | awk -F'[<>]' '{print $5}'
$ echo 'λf. (λx. f (x x)) (λx. f (x x))' | awk -F'[λ.]' '{print $4}'
$1: ''
$2: 'f'
$3: ' ('
$4: 'x'
...
I'm just suggesting easier syntax, and a custom tool. Like dd is for disk copying, instead of cp/gzip/etc. A simple tool that will just work.
This is the syntax I'm going for:
textBetween("<td>", "</td>") | replace ("<b>", "") | replace ("</b>", "")
Maybe with some magic strings,
e.g. textBetween(":", "end")
Your replace part is already covered by sed (although I agree with you that the escaping can be awkward).
Sorry for the confusion: dd is still a very useful tool for copying disks.
The point is that you should not feel like you have to shoehorn dd into any command dealing with disks, because only dd is somehow "raw" or "low level" enough to access them.
For example, if you have a command like this:
pv file1 | gzip | ssh host "gzip -d > file2"
Also, it's useful to have in the back of your mind that dd can very easily mean Disk Destroyer, specially because of it's sui generis syntax
This way only actual data is written to the device, blocks of zeros can be skipped.
1: https://lwn.net/Articles/563355/
2: https://github.com/01org/bmap-tools
$ man dd
NAME
dd - convert and copy a file
SYNOPSIS
dd [OPERAND]...
dd OPTION
DESCRIPTION
Copy a file, converting and formatting according to the operands.
...
See the part after "Each CONV symbol may be:"
foreach i (0 1 2 3 [...])
dd <big.log >big.log.$i bs=500m count=1 skip=$i
end
split -db 500M big.log big.log.
?
Also check out "find tutorial" articles like this one: http://www.grymoire.com/Unix/Find.html (e.g., did you know you can filter by size?)
How would one simulate a call to execve with dd? Seems like a totally different problem domain.
cat foo | ( dd bs=$HEADERSIZE skip=1 of=/dev/null; process-foo-contents )
tail -c +$HEADERSIZE <foo
(dd bs=$SIZE1 count=1 of=file1; dd bs=$SIZE2 ...
(head -1 > /dev/null; cat -) < file
It happens to work on GNU head when stdin is seekable file, because GNU head specifically rewinds the stream before exiting:
$ (strace -e read,write,lseek head -1 > /dev/null; cat -) < file
...
read(0, "hello\nworld\n", 8192) = 12
lseek(0, -6, SEEK_CUR) = 6 # <-- here
write(1, "hello\n", 6) = 6
+++ exited with 0 +++
As far as I know, this is exclusive to GNU cat. Neither Busybox nor OSX cat will do this, and will therefore throw away an entire buffer instead of just the first line. You can try it out:
(busybox head -1 > /dev/null; cat -) < file
Tail is the right tool for the job here. But if you wish to stick with your idiom, read will reliably consume a single line of input, regardless of how it is implemented:
(read -r; cat) < file
This is just as inefficient as it sounds, but it doesn't matter much in practice since you rarely read a lot with it.
$ tail -n +2 file
-n, --lines=K
output the last K lines, instead of the last 10; or use -n +K to output lines starting with the Kth
$ echo how now brown cow > text.ascii
$ dd conv=ebcdic < text.ascii > text.ebcdic
0+1 records in
0+1 records out
18 bytes copied, 0.000261094 s, 68.9 kB/s
$ od -xc text.ebcdic
0000000 9688 40a6 9695 40a6 9982 a696 4095 9683
210 226 246 @ 225 226 246 @ 202 231 226 246 225 @ 203 226
0000020 25a6
246 %
0000022
$ dd conv=ascii < text.ebcdic
how now brown cow
0+1 records in
0+1 records out
18 bytes copied, 0.000140529 s, 128 kB/s
It is useful for generating serial IO for a variety of purposes. For example, writing data with specific target block size; allocating contiguous blocks for use by an application (be it zeroing out a thin LUN before partitioning, or a file system); or simply dumping the content of one device to another (or to a file).
Good luck stretching out a thin LUN or creating an empty file that allocates contiguous space with cat.
Nope, that's not hyperbole. I had to stop the kid from almost installing software that would have connected to a known botnet to help a user connect a personal computer to the VPN. He passed enough checks during the interview we figured "Okay, we can train him in the rest of the things"
Lesson learned.
A user was wanting to bypass some of our network restrictions and the intrepid jr. Admin suggested Hola unblocker to watch Netflix with me sitting 3 feet away.
This was effectively the final straw and convinced me I had made the wrong hire. He was out two days later.
# dd < /dev/ada0 bs=8m | gzip -c -9 > /mnt/file.raw.gz
> How do you use pv if the source is raw device?
# dd < /dev/ada0 bs=8m | pv | dd > /dev/ada1 bs=8m
> How do you dd over ssh?
# dd < /dev/ada0 bs=8m | gzip -9 | pv | ssh user@host 'dd > /dev/da1 bs=8m'
> This belief can make simple tasks complicated.
As master Dennis Ritchie once said - "UNIX is very simple, it just needs a genius to understand its simplicity."
Who knew?
Now we have pretty good confirmation that this little utility is performing way more effectively than designed.
Software itself could probably benefit from some of the same approaches that allowed this little computer program to outperform its original design goals, in ways that might not have been anticipated.
ddrescue or recoverdisk (part of FreeBSD base) will both skip over unreadable blocks, then retry with smaller block sizes along the damaged areas to save as much data as possible.
The arch wiki page is very useful: https://wiki.archlinux.org/index.php/disk_cloning
I tried to backup one of my cards last week using dd > .iso file and then tried to put it on a new card. I tried with /dev/Rdisk (faster) but none of the new cards was bootable.
So this is saying just use copy.
(I ended up just creating a second boot disk, and ftping the files over which seems less than ideal...)
It's just saying that if you have other commands that can read/write files, such as `pv /dev/thing > file.img` (to show a progress bar), you don't have to try to shoe-horn dd into it just because /dev/thing happens to be a drive.
Oh yes indeed. And for this exact reason, "dd" is commonly backronymed to "Data Death" (or, indeed, "Disk Death").
dd if=/dev/hda bs=512 | (ssh root@remote dd of=/dev/hdX bs=512)
cp myfile.iso /dev/sdb
