More

datastack · 2025-06-29T08:02:24 1751184144

I agree it seems more common. However back-up time and data movement should be equivalent if you follow the algo steps.

According to chat GPT the forward delta approach is common because it can be implemented purely append only, whereas reverse deltas require the last snapshot to be mutable. This doesn't work well for backup tapes.

Do you also think that the forward delta approach is a mere historical artifact?

Although perhaps backup tapes are still widely used, I have no idea, I am not in this field. If so the reverse delta approach would not work in industrial settings.

jiggawatts · 2025-06-29T08:48:38 1751186918

Nobody[1] backs up directly to tape any more. It’s typically SSD to cheap disk with a copy to tape hours later.

This is more-or-less how most cloud backups work. You copy your “premium” SSD to something like a shingled spinning rust (SMR) that behaves almost like tape for writes but like a disk for reads. Then monthly this is compacted and/or archived to tape.

[1] For some values of nobody.

datastack · 2025-06-29T07:57:37 1751183857

Exciting!

Yes, the deduplicated approach is superior, if you can accept requiring dedicated software to read the data or can rely on a file system that supports it (like Unix with hard links).

I'm looking for a cross-platform solution that is simple and can restore files without any app (in case I didn't maintain my app for the next twenty years).

I'm curious if the software you were working on used proprietary format, was relying on Linux, or used some other method of duplication.

vrighter · 2025-07-02T12:37:03 1751459823

The deduplication in the product I worked on was implemented by me and a colleague of mine, in a custom format. The point of it was to do inline deduplication on a best-effort basis. I.e. handling the case where the system does NOT have enough memory to store hashes for every single block. This might have resulted in some duplicated data if you didn't have enough memory, instead of slowed down to a crawl by hitting the disk (spinning rust, at the time) for each block we wanted to deduplicate.

datastack · 2025-06-29T06:43:56 1751179436

Nice to realize that this boils down to copy on write. Makes it easier to explain.

sandreas · 2025-06-29T13:59:54 1751205594

Is there a reason NOT to use ZFS or BTRFS?

I mean the idea sounds cool but what are you missing? ZFS even works on Windows these days and with tools like zrepl you can configure time based snapshotting, auto-sync and auto-cleanup

datastack · 2025-06-29T06:41:38 1751179298

Great resource in general, will look into it if it describes how to implement this backup scheme

datastack · 2025-06-29T06:39:24 1751179164

This sounds more like a downside of single site backups

codingdave · 2025-06-29T11:44:27 1751197467

Totally. Which is exactly what your post outlines. You said it yourself: "Only one full copy is needed." You would need to update your logic to have a 2nd copy pushed offsite at some point if you wanted to resolve this edge case.

datastack · 2025-06-29T06:38:41 1751179121

You can see in step 2 and 3 that no full copy is written every time. It's only move operations to create the delta, and copy of new or changes files, so quite minimal on IO.

datastack · 2025-06-29T06:37:10 1751179030

Thank you for bringing this to my attention. Knowing that there is a working product using this approach gives me confidence. I'm working on a simple backup app for my personal/family use, so good to know I'm not heading in the wrong direction

trod1234 · 2025-06-29T23:42:42 1751240562

These type of projects can easily get sidetracked without a overarching goal. Are you looking to do something specific?

An app (that requires remote infrastructure), seems a bit overkill and if your going through the hassle of doing that you might as well set up the equivalent of what MS used to call the Modern Desktop Experience which is how many enterprise level customers have their systems configured now.

The core parts are cloud-based IDp, storage, and a slipstreamed deployment image which with network connectivity will pull down the config and sets the desired state, replicating the workspace down as needed (with OneDrive).

Backup data layout/strategy/BCDR plan can then be automated from the workspace/IDp/cloud-storage backend with no user interaction/learning curve.

If hardware fails, you use the deployment image to enroll new hardware, login and replicate the user related state down, etc. Automation for recurring tasks can be matched up to the device lifecycle phases (Provision, Enrollment, Recovery, Migration, Retirement). This is basically done in a professional setup with EntraID/Autopilot MDM with MSO365 plans. You can easily set up equivalents but you have to write your own glue.

Most of that structure was taken from Linux grey beards ages ago, MS just made a lot of glue and put it in a nice package.

datastack · 2025-06-29T06:35:49 1751178949

In this algo nothing is rewritten. A diff between source and latest is made, the changed or deleted files archives to a folder and the latest folder updated with source, like r sync. No more IO than any other backup tool. Versions other than the last one are never touched again

datastack · on Jan 8, 2024

Nosql doesn't solve the schema migration problem. It just means you don't formalize your schema. But your code will implicitly require a certain schema anyway. Changing the schema means changing the code and migrating data. You'll have to write migration scripts and think about backward compatibility. Same problems as in sql.

valty · on Jan 8, 2024

The trick is maintaining a full graph of all data dependencies through the entire codebase. Then migrations can be done with ease. But no one does this. They shovel data from one database to the next, with tons of little adhoc data stores along the way.

datastack · on Aug 30, 2023

Nice, thanks for sharing. Curious what your motivation was for making it. Perhaps include a "why" at the start of the repo. As evidenced by some comments, depending on whether the goal was to write HTML with a lisp compatible syntax, vs, a more concise but simple xml alternative, it creates different expectations. Personally I'm just interested from a parser/ compiler perspective.

lelanthran · on Aug 31, 2023

> Curious what your motivation was for making it. Perhaps include a "why" at the start of the repo. As evidenced by some comments, depending on whether the goal was to write HTML with a lisp compatible syntax, vs, a more concise but simple xml alternative, it creates different expectations.

You are quite correct, my goal appeared to be "Lisp interpreted HTML", my actual goal is "Write HTML tag trees easier".