
Show HN: rewrite, a rust-powered in-place file rewrite utility - ComputerGuru
https://neosmart.net/blog/2017/rewrite-a-rust-powered-in-place-file-rewrite-utility/
======
ComputerGuru
So before anyone chimes in and asks why the title (or the OP) feels the need
to point out that this was written in rust and perhaps any upvotes this
submission has garnered are just from the rust fanclub.. you're right, I
mostly wrote this in rust because of a desire to further familiarize myself
with the language (it's been a year since I last used it).

However, I was going to write this anyway as it satisfies an itch I've had for
a while. I had an implementation of `rewrite` as a bash script previously, but
was not satisfied with it because I'm using `rewrite` in critical parts of my
workflow and just don't trust bash the same way I do rust (or C/C++ or any
other language, for that matter).

If you're also new to there are a few new things I've learned about rust that
show up in the commit history (though not as much as I would have liked). In
particular, using .expect() and .to_string_lossy() to avoid using .unwrap()
directly (except where user input isn't a possible cause of failure). I had
some try! code in there that was replaced by the new ? operator (not to be
confused with the same operator in other languages like C#, the ? in rust is
more of an inline try..catch rather than a null test/coalescing operator) but
that code didn't even make the first commit.

------
jstimpfle
There is also the "sponge" tool from Joey Hess' moreutils.

> You see, sort is too smart to read the entire file into memory, sort it,
> then spit it out on stdout, which is how it can manage sorting huge amounts
> of data without completely trashing your system memory.

This is not why the file is empty. sort doesn't need the original file for
very long. Very likely it starts a merge sort by sorting smaller chunks of the
input file to temporary files. As soon as all these initial-size chunks have
been read, the input file is no longer needed.

The reason for the file being truncated is the semantics of I/O redirection:

    
    
        anyprogram somefile > somefile
    

means that the file is truncated by the shell (because of the redirection)
before "anyprogram" even has the chance to open that file.

~~~
ComputerGuru
You are right and I was wrong.

I do however routinely run into cases where

    
    
        some chain | tee inputfile.txt > /dev/null 
    

also results in lost data. I will try to find it.

~~~
jstimpfle
That's a similar case, with the difference that there is a race condition
here. "some chain" and "tee" are started in parallel, and it's a random
process who comes first: "some chain" for opening inputfile for reading, or
"tee" for opening (a new file, but with the same name) it for writing.

You can provoke it by

    
    
        echo a > a
        (sleep 1; cat a) | tee a
    

very likely "a" is empty now.

~~~
ComputerGuru
That makes sense. Thank you. `rewrite` should work in those cases.

