
How do I delete bytes from the beginning of a file? (2010) - userbinator
https://devblogs.microsoft.com/oldnewthing/20101201-00/?p=12153
======
oblio
I swear that everytime someone posts a generic Microsoft blog entry, I
automatically think: "oldnewthing".

I hovered over the link, and there it was.

Raymond Chen is a machine, the T-1000 of Windows development. I've never
developed Windows applications using C++ and I still read his blog articles.

------
LanceH
Reading this, I couldn't help but try to find the rhythm of a poem to match
the formatting.

~~~
blt
I guess Raymond uses a line oriented editor like vim, and the original line
breaks have been preserved. If you use one line per clause, it's much easier
to change sentence and paragraph structures without having to seek around by
the word.

~~~
oblio
Vim's not line oriented... at least as far I know. ed/ex, now those are line
oriented.

But I wouldn't call vim line oriented. Unless you also want to call Notepad
line oriented, since you can use Shift + End to do operations on an entire
line :D

------
derefr
I've often thought it would be useful (especially for software that manages
container or archive file-formats, or for databases) to have exposed to the
user a file-system object that operates _like_ a file, but which—instead of
being exposed by OS as a seekable byte stream that can be appended/truncated
on one end—is exposed by syscalls as a _vector_ of arbitrary, non-uniformly-
sized extents, where the user is expected to ask the OS to pre-allocate extent
buffers (think mmap(2) with anonymous private disk pages), and then,
separately, stuff those into this vector object (which would be an operation
almost exactly like hard-linking an existing file into a directory, in terms
of its time and space complexity.)

Of course, for most filesystems (and especially ones with sparse-file support,
and _especially_ ones with copy-on-write support), such an on-disk data
structure is exactly what's already underlying the byte-stream abstraction. So
this would just be a passthrough to allow people to directly manipulate that
data structure. (In the process probably breaking certain preconditions the
filesystem relies on, though, so it would need to track these "low-level
extent vectors" as a separate filesystem object type. Programs would still be
able to use regular file-abstraction syscalls on them, though.)

Interestingly, despite filesystems themselves not exposing the lower-level
extent-vector abstractions, in some other systems that have file-stream-like
abstractions, you _can_ operate on "files" this way.

Postgres's BLOBs, for example, are seekable byte-streams, which also (at least
theoretically) allow you to insert into the middle of them. (I say
theoretically because it's not an implemented API, but it's not exactly hidden
from the user, either. BLOBs just get broken out into records in a table
representing their extents; you can rewrite the keys of said records in that
table to do whatever low-level operations you like.)

Or, for another example, S3 and its competitors let you arbitrarily compose
objects as “components” of other virtual objects, which then read back as the
concatenation of the objects they contain. (Sadly you don’t get any other
vector-manipulation ops than this, but if you keep references to the leaf
extents around, that’s often enough to rebuild your extent-vectors any time
they change.)

Also, of course, back on the real filesystem, you can just manage your "vector
of parts of a file" as multiple files in a directory, and then abstract over
that directory that by using a FUSE server which exposes a view where the
files are one contiguous file.

~~~
nine_k
This, and more, has been tried on mainframes. When you could not afford to
install a free RDBMS with a click of a mouse (the mouse being not available
yet, too), you could really appreciate record-oriented files you could use as
a database, and even have indexed access to the records.

~~~
panic
Has anyone written a high-level introduction to mainframe tech that's
accessible to people outside the mainframe culture? I've always been curious
about it, but the IBM manuals I've come across have been somewhat
impenetrable.

~~~
lboc
How about:

'Introduction to the New Mainframe: z/OS Basics'
[http://www.redbooks.ibm.com/abstracts/sg246366.html?Open](http://www.redbooks.ibm.com/abstracts/sg246366.html?Open)

This, and the 'ABCs of Systems Programming' series were pretty good I thought.

------
burmecia
Another approach I can think of is using rolling hash
([https://en.wikipedia.org/wiki/Rolling_hash](https://en.wikipedia.org/wiki/Rolling_hash))
to split file to different size of chunks, those chunks can be saved on
different sectors which don’t need to be continuous. The file keeps a list of
indexes for all those chunks, just like inode did for blocks. When insert some
bytes in the beginning of file, thanks for the rolling hash, most likely only
the first several chunks need to be re-chunk and re-hash, thus the change is
localised and can be cheaply done, all the rest chunks still untouched. When
this combined with append-only storage, things are even easier because it only
needs to deal with bytes around that beginning, and the update chunk index
list in file.

This approach is already implemented in ZboxFS
([https://github.com/zboxfs/zbox](https://github.com/zboxfs/zbox)) to do
content-based deduplication.

------
gpvos
I didn't get the "minus 100 points" reference, but it was not hard to find it
on the web:
[https://blogs.msdn.microsoft.com/ericgu/2004/01/12/minus-100...](https://blogs.msdn.microsoft.com/ericgu/2004/01/12/minus-100-points/)

------
fjfaase
Why would you want to delete bytes from the beginning of a file? A file should
be seen as a low-level data structure with a limited set of operations. If you
want to implement a more complex data structure with a wider range of
operations, there are usually multiple ways of implementing this on a low-
level data structure with a limited set of operations. This is basically what
a lot of software engineering is about. If you want to implement an array of
bytes with delete and insert operations, there are many ways you could
implement this on a low-level file. Which way is the best depends on so many
other factors. Do you want undo-redo functionality, transaction properties,
distributed access, and so on?

~~~
posix_me_less
Because sometimes I want to remove old and irrelevant lines in a logfile but
want to retain the recent lines.

> A file should be seen as a low-level data structure with a limited set of
> operations.

Why should? For many people, a file is a basic way to permanently store data.
Changing the data at any point of the file should be easy.

~~~
fjfaase
Actually, most file systems let you randomly read and write to files. Usually,
they implement this by reading and writing blocks (of 256 bytes or a
multiple). One could create file systems with more advanced operations, but
most likely, these will be based on the more primitive operations. For
example, to add the function to remove 10 bytes from the start of the file,
one could read the whole file, block-by-block, and move the data 10 bytes. For
a large log file, this will take a long time. Another solution is to have a
begin of file offset for all files, this means that some additional
information need to be stored with every the file and that this information
needs to be accessed with all file operations, resulting in some performance
loss for all file operations. Existing programs that access files as blocks of
a certain size, might see significant performance penalties when the offset is
not a multiple of the block size, because reading the data from one block,
needs to read two actual blocks from the underlying medium.

If you often want to strip a logfile from old data, a solution might be to
create new logfiles at a regular intervals and delete old logfiles. Or maybe
even better, use a database to store the log messages. Gives you query
functionality and allows you to perform more advanced operations, as removing
certain types of log messages.

------
tyingq
See fallocate() for Linux/ext4. [http://man7.org/linux/man-
pages/man2/fallocate.2.html](http://man7.org/linux/man-
pages/man2/fallocate.2.html)

~~~
maxxxxx
It looks like this will zero out data but not remove.

~~~
andoma
It will remove if you use FALLOC_FL_COLLAPSE_RANGE

~~~
maxxxxx
But: " A filesystem may place limitations on the granularity of the operation,
in order to ensure efficient implementation. Typically, offset and len must be
a multiple of the filesystem logical block size, which varies according to the
filesystem type and configuration. If a filesystem has such a requirement,
fallocate() fails with the error EINVAL if this requirement is violated."

Seems it's limited.

~~~
andoma
Correct, my bad. Did some tests and AFAICT neither EXT4, XFS nor BTRFS allows
arbitrary byte ranges here.

------
vkaku
Theoretically, if the number of bytes to be removed were a multiple of FS data
block size (usually an inode size), it should be possible to unlink that inode
as well, and rewrite the metadata. It probably involves fewer shifts and fewer
bytes/time wasted.

In general, if you intend to create structures that allow one to reduce file
size, reclaim the data, defragment - that sort of thing, a rewrite/coalesce
happens to be the viable solution. In general, if allowed to increase,
fragmentation will worsen with time.

------
ALittleLight
I'm not sure I really understood this. If you wanted to delete 10 bytes,
couldn't you just read a sector, then copy the sector back sector[10:]?

Edit: then delete the last ten bytes.

~~~
maxxxxx
Does the whole sector thing still make sense for SSDs? Do they still work by
sector or can every byte be addressed directly?

~~~
pstrateman
The "native" sector size of an SSD is the erasure block size, typically 64KiB.

~~~
vardump
I'm fairly sure SSD erase block sizes are more like 2 MB or more. Even 8 MB
wouldn't surprise me anymore.

------
JulianMorrison
To delete from the beginning of a file, overwrite offset 0 from offset n, 1
from n+1 and so on, and then truncate it when the read offset hits EOF.

------
davidork
dunno about windows, but in llinux its fairly simple. I've had to do do this a
lot of this sort of thing on big disk images my friend made with some weird
program that pads the start of everything with 512 bytes of some random crap.

dd skip=10 iflag=skip_bytes if=original.file of=first10ktrimmed.file

despite being old, and the amount of insanely horrible things you can do when
you mess up the syntax or mistype something using it, dd is fucking amazing.

to be fair, its not directly editing the file, but making a copy with the
first 10k skipped.

you could have a jump/pointer that points to the shortened first block of the
file and then at the end of that, a pointer/jump to pick up where the rest of
the file continues, but you'd wind up with misalignment and fs fragmentation.

~~~
CrowFly
Linux doesn't implement sparse files.

~~~
fl0wenol
Yes it does? The details are file system specific and if they support mmap
then it's also likely there's a page alignment requirement but that's not
atypical.

------
bagels
Is this a problem that really needs a solution?

I can't recall any times where this would have been useful to me or think of a
situation where it'd be a deal breaker. I could contrive a bunch of other
operations that are inefficient on file systems like this (remove every other
byte from the file, why not?), but doesn't mean that literally any idea we can
think up should be supported by the file system apis.

~~~
buckminster
One place this would be useful is a DVR. You want to record the last _n_ hours
of video and discard anything older. And it's not inefficient. It just
requires a lot of careful engineering that's probably not worth the bother.

~~~
frei
It's unnecessary to create a filesystem-level support I think, since you could
make a circular buffer holding a video stream in a file with some constant
size that is approximately n hours of video. With some clever tricks you could
even keep the format compatible with existing video container formats.

~~~
eps
Or just use a set of files, each holding N minutes with, perhaps, some overlap
+ an index file. No real need to over-engineer things here.

------
newnewpdro
The filesystem just needs a concept of a file start offset.

The initial value is 0, if you wanted to support truncating at the head, you
just add to this offset the amount truncated.

In a sector-based filesystem, whenever this offset goes far enough into the
file that crosses sector boundaries, you reclaim those sectors as free space.
When it lands within a sector somewhere, that sector is pinned and you waste
some space.

The problem is more that the userspace APIs don't expose well-supported
mechanisms for doing this. Implementing it at the filesystem level is trivial.

~~~
cjhanks
I can think of a cool case for this.

What if you have an append-only system which is making remote backups. One
service is writing the activity log append only, a second service is reading
that, check-pointing, and then truncating the head of the file when the check-
point has been committed. No need to do the tricky file-swap trickery.

~~~
minaguib
Linux and fallocate() allows precisely this - we use this to "log locally" for
applications, and a "tail this asap and publish to kafka then delete locally"
helper.

See (
[https://gist.github.com/minaguib/1cbe29922b06d50755a2f580b8c...](https://gist.github.com/minaguib/1cbe29922b06d50755a2f580b8c343fa)
) for some test notes I took a couple of years ago.

~~~
gmueckl
fallocate() is not supported on network file systems, is it?

