Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Baxx – Unix-friendly backup service (txt.black)
424 points by zulgan 62 days ago | hide | past | web | favorite | 198 comments

(Alternative product recommendation, please downvote/remove if you feel that isn't appropriate)

For Unix/Linux backups, may I suggest Borg Backup? It encrypts and does dedupe astonishingly well. It also works over SSH incredibly fast, and restores are via a mounted FUSE filesystem so they're easy to pick and choose what you need. It prunes really well too, and is a single executable so it's easy to distribute via Ansible/Puppet/etc. I've been using it for several years and it hasn't failed me yet.

If y'all want to laugh at my bash scripting skills, I have a backup script that sends backup status to a Zabbix monitor server at https://gist.github.com/anthonyclarka2/cef41d201dd5b890dae67... (I'd appreciate improvements to that script if anyone wants to critique)

I evaluated Borg and Restic and found that both of them fall over once you get to (what I consider to be) production level volumes; in my case that's ~1 PB and in the range of a billion files.

Sadly, the only thing I've found so far that works at all at those scales is Bacula, and that is file-based -- i.e., if you have a gigabyte file that changes by one byte, it backs up the whole gigabyte again. Not ideal.

I can't say i've been testing with PB sizes, but for my "meager" 8TB backup, Borg works well. A daily backup & prune operation takes less than 20 minutes.

Restic falls over as soon as you cross 2TB sizes, and prune operations are painfully slow. Backing up 8TB took 4 weeks with Restic, and i gave up waiting for prune to finish. It was crossing the 24 hour mark, making it unsuitable for daily backups.

Yeah. I back up a few terabytes a day with no problem in Bacula.

> Sadly, the only thing I've found so far that works at all at those scales is Bacula, and that is file-based -- i.e., if you have a gigabyte file that changes by one byte, it backs up the whole gigabyte again. Not ideal.

Would it possible to split those bigger files before doing backup (log files with log splitters, sql databases with incremental specialised backup tools like e.g. xtrabackup), or are these e.g. image files with different versions?

I really like the concept of byte level deduplication, but have often thought the price you pay for the space reduction might not be worth it considering today's network speed as well as storage sizes and prices.

Would be interesting to hear your experience concerning this!

I'm definitely looking into what I can do as far as getting logs rotated (and thus split) better, but a lot of what I back up are legacy medical systems where making any kind of change is extremely difficult both technically and politically.

A slightly hacky method might be an overlay filesystem (using FUSE or similar) that mirrors the underlying filesystem for small files but represents the larger ones as smaller units (so bigfile becomes bigfile.block0000, bigfile.block0001, ...). That way only the changed block would get transferred if you one-byte change is a modify or append rather than on insert or delete.

If the backup service is using file timestamps as the only key to refresh then it would have to store the last modification date and a good hash of each block, and when the date of the file being mirrored is updated scan each block to see if there is a change there and update the stored hash and timestamps accordingly. This would need to be orchestrated to reduce the risk of temporary corruption if there are several updates and the rechecking process coincides with a backup sweep (i.e. make sure you don't present updated dates for any block until you can present them for all needed).

For restoration, you either manually concatenate the parts or have an overlay filesystem that operates in reverse: showing the smaller block files as a single large unit.

You'd have to very thoroughly test the overlays and their interaction with the backup service before risking it on important data, so it might not be something you would genuinely consider...

Dude. That’s more than just “slightly” hacky. Points for the idea though.

Manic me has some wonderful ideas that are simultaneously excellent and terrible. I have to keep a careful eye on him in the day job! One of these days I intend to implement and collect a few of his musings under the banner "the ministry of silly code".

Okay, I see. Thanks for the feedback!

In what way do they fall over? Also, did you file bug requests?

They start taking unrealistically large amounts of time to deal with their catalogs. For instance, with a 500 TB dataset, Restic on my (128 GB ram) server took over 4 days to do its prune operation.

In that case, what do you use?

We're using Bacula, because in our case it's the only thing that works with our volume, and also works with tape (which is the only cost effective way we've found to archive multi petabyte datasets).

But I'm not very happy about it, because it's insanely overcomplicated for no really good reason, and because of it being file-based.

Backblaze has done some cost comparisons between LTO and cloud storage:


If you're interested, I'm doing experiments with another site with 500T to backup, where I added sampling and sharding to HashBackup (I'm the author).

Sampling allows you to do faster simulated backups to determine the best backup parameters to use. In his case, we determine that a very large block size - 64M - was the best way to backup his data.

Sharding automatically partitions the filesystem so multiple backup can run simultaneously to get backup speed in the 250-400 MB/s range.

It's more at the proof of concept stage, but having another large site to work with would be fantastic! A couple of the larger sites using HashBackup are EURAC (European Research Center) and HMDC (Harvard MIT Data Center)

Yeah, I read that Backblaze thing, and used their spreadsheet. Tape came out to be nearly 9x cheaper for us, given our large data set but extremely slow growth rate. Basically, we have a lot of data we need to protect from disaster, but it changes/gets added to very slowly, so capex (buy some tapes, put the data on tape, put the tapes in an offsite box and forget about them) is vastly cheaper than opex (pay by the month for hundreds of terabytes).

For bash scripting: have you seen shellcheck? It's a very solid first-pass for any script, and the codes it emits have excellent documentation: https://www.shellcheck.net/

Yes I have, I made sure it passes shellcheck before posting it here :)

I do sort of wish that Visual Studio Code had the "auto-fixes" already built for shellcheck's error messages. I should probably create those and make them available somewhere.

I don't see anything wrong with that bash script. It's well formatted and very simple, there aren't any functions or anything, just a single switch and two conditionals. Not much to optimize.

Simple and explicit is better in shell scripts than being fancy.

Thank you, I really appreciate the feedback.

Borg is hands-down the best backup utility I've used. I also use Borgmatic for some added niceties around checks/notifications/etc.

I recently launched a backup service for Borg – https://www.borgbase.com. Also big on monitoring (email, pushover, webhook), as this is a common concern for backups.

It just lacks terminal-based registration. :-) You could use the GraphQL API to manage everything from the command line if you really wanted.

This looks pretty cool, well done! I use rsync.net and am very satisfied, but yours looks great as well. I'm not a big fan of the initial price hike (the first 100 GB is 2 cents per GB instead of 1), but it looks like the service is well worth it.

This looks good, and competitively priced too! What's the backend cloud storage?

Just guessing here but probably Backblaze b2? It has a similar pricing.

Currently it's on RAID boxes, but I'll move to CephFS soon. The data volume is getting too big to copy it around. Ceph allows you to nicely expand your storage over time and account for some failures on the way.

Borg needs a "smart" backend with a filesystem, so it can't use only object-storage.

I've been using borgbase for a couple of weeks and am very happy with it.

Is there a limit to how many times I can download my backups?

No limits. Downloads are unmetered and free, as long as they look reasonable. Once you download your full backup every hour, I'll ask you for a reason probably.

Does anyone know how it compares to Restic? That's what I've been using and have been pretty pleased. It seems like the main difference is that restic supports integrations with other services, whereas Borg seems uses SSH for its connections

I've used both Borg and Restic pretty heavily. I settled on Restic because it was much lighter on resources. A few of the servers I manage have small amounts of ram, but large amounts of data. Borg would often get oom-killed, but Restic wouldn't.

Restic also starts uploading changes as soon as it sees them, whereas borg has to scan the entire list of files before it starts uploading changes. Unfortunately for multi-tb datasets with millions of files, borg would take hours just to scan all of the files.

Pruning old archives also worked a lot better in Restic compared to Borg.

If you're happy with Restic, I'd keep sticking with it.

Interesting. I had deployed restic to my fleet of 100 servers around a year ago to backup to a common Backblaze B2 destination, and rolled it back after running into a production service getting OOM killed because of restic memory use.

That's interesting, did you have a lot of small files or large files?

I don't know if it's an option for you, but have you considered something like rsync from each server to a central backup server and then running restic/borg/whatever on that server? Sometimes that works out nicely too, and would be less impactful to other services on the machine.

What did you switch to instead of restic, if you don't mind me asking?

Restic looks pretty interesting, I wasn't aware of it, thank you. I don't know how Restic can achieve its goals, since AFAIK you need a binary running on the server to diff/check data/verify restores/etc without transferring all that over the network.

Borg runs a binary on the server to do this, that's why it can't easily back up to S3 et al.

IIRC Restic keeps indexes of data it uploads to configured backends. I think it caches that index locally, but it can also download it from the remote backend to compare.

One big disadvantage of restic at the moment is that it doesn't do compression (https://github.com/restic/restic/issues/21).

I was using restic and switched to borg because of that and could save hundreds of gigabytes.

IMO compression in backups is overrated. (I run Relica, a backup service built atop restic: https://relicabackup.com)

Most large files are already in compressed formats: mp4, zip, tgz, jpg, etc.

What you _really_ want is deduplication (which restic does), so that chunks are only stored once, even if multiple machines sharing some data back up to the same repo. Dedup saves me about 30% per backup repo in my own experience.

Not saying compression wouldn't be nice (to have done properly, it is somewhat at odds with encryption), but I'm just saying I don't think it's worth making it a #1 deciding factor.

I agree that compression is overrated for most backups. It'd help with a lot of text-oriented data, but honestly that takes up so little space compared to a vm image or something like that.

If I was backing up a single computer, compression would matter more to me. But if I'm backing up 10s or 100s of servers, there's going to be a lot of duplicated data that would make a much bigger difference.

You can always back up onto a compression enabled file system to further save space + dedupe with rustic.

Borg is a fantastic tool but the time it takes to start and parse the files req on a large system negates the benefits of compression for me unfortunately.

Encryption totally defeats any raw compression. It would need to be inside the blobs themselves.

Restic also works on Windows, whereas Borg doesn't (yet), if that's a consideration.

I use it on Windows via the Linux Subsystem of Windows 10. It's not a pure native solution, but it works quite well for me.

Is "it" Borg or Restic here?

it = Borg

It works very well under Cygwin, so no problem, if you can install that.

I've been quite happy with Restic, doing always-incremental encrypted backups to S3.

Borg is also big on deduplication

Deduplication is one the things that drove me to Borg: it has the speed and space saving of an incremental backup but every single backup can be used as it was a full backup.

It is also very easy to setup a pruning policy for old backups, so that you can say that you want to keep one backup per day for 90 days and after that only one backup per month for 2 years.

The dedupe in Borg is downright magical. I know I lose some space by keeping my hosts in separate repositories, but even then, I get insane space savings.

You should make a habit of setting

  set -euo Pipefail
  if "${DEBUG:-false}”; then set -x; fi
The -o Pipefail is irrelevant in this case, as you're not using any.. but better it's there in case you ever extend the script.

As others have pointed out: readability comes right after correctness in shell script, do keeping it simple as you have done is always a good idea.

I hope you see this reply, I am sorry for it being so late:

Do you have a link to a good explanation of why you'd set those? The "set -e" seems to indicate that the script would fail immediately if the borg backup job fails, which would prevent the Zabbix "send" command from alerting the monitoring system.

Thank you for your feedback, it's given me more to learn and think about!

There are quite a few examples around, just google the parameters 'set euo pipefail' (first hit [0])

you're right that just putting the set options on top of your file would be a bad idea. you need to write your script to actually account for these errors, making possible exitcodes entirely transparent

    1 #!/bin/bash
    3 set -euo pipefail
    4 if ${DEBUG:-false}; then set -x; fi
    6 function random_exitcode(){
    7         return $(( RANDOM % 5))
    8 }
   10 function handle_error(){
   11   case "$?" in
   12     1)  echo "handling exitcode 1!";;
   13     2)  echo "handling exitcode 2!!";;
   14     3)  echo "handling exitcode 3!!!";;
   15     *)  echo "encountered unexpected exitcode: $?"; exit 2;;
   16   esac
   17 }
   19 random_exitcode || handle_error
or, if you don't like functions:

  random_exitcode || LAST_EXITCODE=$?
  case "${LAST_EXITCODE:=0}" in ...

[0] https://coderwall.com/p/fkfaqq/safer-bash-scripts-with-set-e...

Many thanks, I very much appreciate the effort you put into your reply, and your followup.

I would follow up with a suggestion to also consider restic, which is very similar to Borg Backup -- except it doesn't have the restriction that the server needs to be "clever" (you can upload to S3 or any dumb blob store).

I just run borgbackup and backup to a local folder, and rsync to my home nas if I am connected to my home network.

If I need to force a sync over the internet, there is a small shell script that would rsync the backup folder to the home nas.

This allows me to run hourly backups even if there is no network connection (or limited bandwidth), and then just auto sync when I get home, or force a backup when I get a good connection.

I am using zfs send and it works like a charm. Once you have configured filesystems for only data, you can use dedup on receiving system.

The encryption in Borg is weak but otherwise it works well if it fits your use case.

Can you give more details on how the encryption is weak?


> When the above attack model is extended to include multiple clients independently updating the same repository, then Borg fails to provide confidentiality (i.e. guarantees 3) and 4) do not apply any more).

Thanks for pointing this out.

The first entry on the Borg FAQ [1] is “can I backup from multiple servers into a single repository” and the answer is basically “yes, but it might be slow”. Seems to me like it should say “not really, it's insecure and might be slow”, plus the tool should make it clear you're doing something unwise if you try it anyway.

I've always shied away from shared backup repositories anyway, because it usually means data from host A can be read by host B, and that's just not what I usually want. The deployment instructions appear sane to me. [2]

[1] https://borgbackup.readthedocs.io/en/stable/faq.html#can-i-b...

[2] https://borgbackup.readthedocs.io/en/stable/deployment/centr...

I'm not the person you responded to, but here are my observations:

1. Borg's authenticated encryption is a composition of AES-256 in CTR mode and HMAC-SHA256 (or alternatively, Blake2b256). This consists of two distinct constructions. Most cryptographic vulnerabilities are introduced in the combination of distinct primitives and constructions. While combining AES-CTR and HMAC is a way of doing authenticated encryption, it's not the safest way. As a specific example - in order to avoid introducing a nonce-misuse vulnerability, Borg needs to implement specific logic beyond the construction to ensure that the CTR counter (containing a randomized nonce) is not reused. This kind of overhead is dangerous because it adds a lot of room for a developer to make a fatal mistake.

It would be much better to use a dedicated authenticated encryption scheme, like AES-GCM. AES-GCM is a more complex construction, but it also has very convenient interfaces and language bindings which obviate the need for developers to interact with raw primitives and constructions entirely. You don't need to implement AES-GCM so much as call its seal and unseal operations from your language of choice. There's far less room to footgun yourself here. The even more modern method would be to use something in the ChaPoly family. For example, XSalsa-Poly1305 goes one step further than AES-GCM by using an extended nonce. ChaPoly constructions also have ubiquitous language bindings and easily used interfaces to safe implementations.

2. Borg uses OpenSSL directly. This is not great because libcrypto exposes raw primitives directly to the developer. To its credit, Borg's documentation does provide a few reasons why its use of OpenSSL is trustworthy. But those reasons are more to do with TLS implementations than cryptographic constructions. As with above, working directly with raw primitives provides a lot of room for implementation errors and avoidable security vulnerabilities. It's generally better to abstract away your cryptography implementation to a known-good, trustworthy source which you can just call via a straightforward (and opinionated) interface. For example Borg could use NaCL instead, which would provide XSalsa-Poly1305 encryption out of the box. I hypothesize they'd eliminate a few hundred lines of code with NaCL instead of OpenSSL.

That being said, I don't know if I'd call Borg's encryption "weak." It's not ideal, and I don't personally trust it. But it's not like they're using AES in ECB mode or Mac-Then-Encrypt CBC mode.

The AES/MAC composition itself in Borg is done correctly, the problem is the way Borg uses the composition: with a single AES/MAC key for the entire repository which a) never changes b) cannot be changed and c) IVs/nonces used with those keys are primarily tracked by server-side, i.e. untrusted, state. This means that using multiple clients with one repository is never secure in Borg.

> It would be much better to use a dedicated authenticated encryption scheme, like AES-GCM.

Actually, in the scheme Borg uses, which is vulnerable to repeating IVs/nonces, using AES-GCM or Chapoly (or indeed any Wegman-Carter authenticator) would not only delete confidentiality, but also authenticity, i.e. it wouldn't just disclose plaintext to the attacker, but allow an attacker to potentially change your backups [1].

> That being said, I don't know if I'd call Borg's encryption "weak." It's not ideal, and I don't personally trust it. But it's not like they're using AES in ECB mode or Mac-Then-Encrypt CBC mode.

I'm calling it out as weak because it is an insufficient and poor design. If you say encrypted today it should better be up to me throwing my encrypted data up on a random cloud server and be sure that it's actually encrypted. This is not the case with Borg; partial credit for getting some aspects of the construction right (e.g. EtM, separate keys, properly salted master key encryption key derivation) isn't worth much when it fails to provide confidentiality in not-at-all unreasonable, practical usage.

[1] Probably not because by necessity Borg uses a separate HMAC over the plaintext for deduplication, and the manifest has a third layer of HMACing due to protocol issues, so it should be impossible to just change stuff even if you break the ciphertext authentication.

Can you walk me through how AES-GCM or ChaPoly breaks authenticity and confidentiality, but AES-CTR doesn't? I'm not following you.

Moreover, I'm pretty sure Borg's developers are actively considering ChaPoly for the future.

GCM/Poly1305 require unique nonces, so if they are repeated, the authenticity is broken. HMAC doesn't need nonces, so authenticity is preserved as long as the key is secret.

I'm aware of that, and mentioned that point in my original comment. What I'm not following is why Borg can't safely use AES-GCM or ChaPoly instead of AES-CTR + HMAC.

Again: the team is actively considering using AES-GCM and ChaPoly in the future. I don't see anything intrinsic to either that preempts their use in Borg.

Yes, in a different construction there would be no problem. And there have been plans since at least 2016 to replace/augment the current construction with something that uses a master key to derive per-chunk encryption keys; it just never has been implemented.

IIRC AES-GCM was kinda low on the list with a preference for just using Chapoly, because Chapoly just works and is also secure on any processor, unlike AES-GCM, which is very nasty to implement without hardware support for the arithmetic over GF(2^128).

How so? When I glanced over it it looked trustworthy enough.

This is really neat. I really love the idea and the presentation.

I would not trust my backups to your service yet, just because of the "this is a prototype" language. My immediate thought is, "this seems great, I'll have to come back and check it out once it's more of a real business". But therein lies the rub, I think: what will drive me back to check it out later? There doesn't seem to be a mailing list to sign up for. Maybe you'll hit the front page of HN with a full launch later, and I'll see it that way, which would be great, but maybe not!

In any case, nice work!

good point, I added 'make a mailing list' to the todo

i made a mailing list and slack

  slack: https://baxx.dev/join/slack
  google groups: https://baxx.dev/join/groups

Nobody has commented on what jumped at me as a great feature: detecting if the backup seems bad.

  * get notified if the file is too small  
  * get notified if the file is too old

I have a feeling rsync.net is far more Unix-friendly as a backup service (it's basically a Unix shell provider with automated ZFS snapshots). I do think the website has a neat aesthetic though, and it is cute you can register over SSH (not that this is practically advantageous over a website-based registration -- registration is a one-time operation).

The repo on GitHub doesn't have a license AFAICS.

Your email provider domain appears to have been hacked and is being used for Black SEO purposes: http://sofialondonmoskva.com/blog/

whoa this is my friends's company domain, i just have email there for 10 years or so, will let him know, thanks!

Some very basic html with no css would be much easier to read than parsing markdown in my head. Machines should do machine work, not people.

Edit: still love idea and execution, thanks for sharing!

This is kind of a funny comment, markdown was designed specifically to be readable as-is, which is why so much of it’s syntax looks like conventional email and other plain-text usages that predate it.

True, but markdown was meant to be readable for a markup language.

Which is likely still less readable than something whose sole purpose is to present to humans, such as HTML/CSS.

I never got the sense that markdown was even designed, per se - the original markdown.pl script, so far as I understand it, was not intended to implement a formal language at all; its attitude was pure descriptivism, simply rendering what the author thought of as "plain text" in a more typographically-pretty format.

I think a good portion of it would parse as Markdown, but I don't think that was the intent. There's at least 1 header that isn't actually a typical header's content.

What markdown? I don't see markdown anywhere.

The "# headlines", the "* bullet points" would parse as valid markdown. Which would give the headlines visual style to distinguish them from the other part of the content.

Headers in markdown are denoted as # and bullet points as * or - usually.

> Trial 1 Month 0.1E

> Subscription: 5E per Month

> ...

> I decided to charge 5$ (0.1$ trial)

So which is it? $ or E? And is E supposed to be €? Also, in English the currency marker goes to the left of the number.

You are wrong about the usage of currency symbols. Quoting Wikipedia [0]:

> When writing currency amounts, the location of the symbol varies by currency. Many currencies in the English-speaking world and Latin America place it before the amount (e.g., R$50,00). The Cape Verdean escudo places its symbol in the decimal separator position (i.e., 20$00). In many European countries such as France, Germany, Greece, Scandinavian countries, the symbol is usually placed after the amount (e.g., 20,50 €).


Excerpt from Wikipedia's article for the Euro symbol [1]:

> Placement of the sign also varies. Countries have generated varying conventions or sustained those of their former currencies. For example, in Ireland and the Netherlands, where previous currency signs (£ and ƒ, respectively) were placed before the figure, the euro sign is universally placed in the same position. In many other countries, including France, Belgium, Germany, Italy, Spain, Latvia and Lithuania, an amount such as €3.50 is usually written as 3,50 € instead, largely in accordance with conventions for previous currencies.

> The European Union did indeed usher a guideline on the use of the euro sign, stating it should be placed in front of the amount without any space in English, but after the amount in most other languages.

[0] https://en.wikipedia.org/wiki/Currency_symbol

[1] https://en.wikipedia.org/wiki/Euro_sign

> You are wrong about the usage of currency symbols.

Wikipedia is wrong if it says that the placement is based on currency. Wikipedia can also be edited by anyone with a pulse, so there you go.

Here are some official government resources on currency code placement in English.



> The European Union did indeed usher a guideline on the use of the euro sign, stating it should be placed in front of the amount without any space in English

So...what you're saying is that I'm right. I'm very confused now.

The currency symbol's placement is a matter of convention, which varies from person to person, mostly influenced by what they're used to.

I'm attacking your pedantic attempt to reduce the matter to an inflexible rule.

Some English speaking countries prefix the currency symbol, so if let's say my audience is from Australia, and I want to make the gesture of adhering to their customs, I will prefix it. However, in other (most) scenarios, I'll write it the way I write it 99% of the time, in the postfix position. And that's ok. Everyone doesn't need to use the same customs, words, grammar, or even language. Regarding these things, the only thing we care about, at the end of the day, is being able to understand the other, and the other being able to understand you.

As you've noticed, I said _some_ English speaking countries. That's because there are a lot of people, who even though use a lot of English, only a fraction of that is with people from US, Canada, Australia or Great Britain, so the customs of English users from those countries become irrelevant.

There's customs and there's language. This is an obvious example where one does not have to entail the other.

I used E because € does not render on xterm, maybe I should switch it to EUR to remove the confusion

also fixed the $ to E in the blog, thanks for the heads up

The EU currency code guidelines do say to use EUR in such instances.

Why not just say "euros"? It's not much longer and less ambiguous.

EUR is the common term, I tought E is some kind of crypto stuff.

Just to empathize, before I figured out how to type funky foreign characters on my en-us keyboard, I tried to use L for £ for awhile and...it didn't work. I thought I was being creative and industrious in an obvious way but it didn't even scan well for me. Heck, that may have been the one character that got me to learn. :)

> because € does not render on xterm,

It does for me?

Yes, no problem rendering a € in an xterm.

at least not with Terminus

Yeah as an individual who doesn't use the euro, I didn't know it was euros until I read it in the thread here.

Hm. Yeah I thought it was priced in Ether.

changed the text to EUR thanks!

> Also, in English the currency marker goes to the left of the number.

But in the majority of european countries where they use euros, the symbol is placed to the right

That is only correct when writing in a language other than English. Doing it in English is a mistake transferred from doing it in other languages.


In general it is standard to put the unit after the quantity though.

Again, not with currencies in English.

Maybe I should have said "it is commonly standard to put the unit after the quantity". My point is that this format is ubiquitous and that English language formatting of currency values is the outlier. It is not even internally consistent, as this rule only holds for the symbol, not for the ISO code, the full name or for speech.

I don't see how you can find language prescriptivism as a tenable position in this day and age

The link seems broken.

Lol, woah. Maybe HN broke the doc site. Here's an archive.org backup https://web.archive.org/web/20190323083630/http://publicatio...

> Also, in English the currency marker goes to the left of the number.

I always thought this depended on the currency; like $5 vs 5€.

This is a common misconception, but proper placement is dependent on the language not the currency.

This is correct. Visit English and French versions of a Canadian retailer, e.g. Amazon.ca. The English version follows English rules ($29.99) and the French version follows French rules (29,99 $) even though the currency is Canadian dollars for both.

EN: https://www.amazon.ca/?language=en_CA

FR: https://www.amazon.ca/?language=fr_CA

Writing 29.99 € or 29.99 EUR is still super common even if it's English. This might be in part because the UK never really used euros (and will probably never use them).

I'd say 29.99 EUR is very common, but not 29.99 €. I only ever see the € symbol after the value in documents from Europe, never in those from the UK (I see a lot of quotes, purchase orders and other financial documents from various places and companies)

I want to thank-you for using a good-name for your service.

Too many times some developer will pick a common word to launch their service/product thus making it impossible to search for it.

Your name (a) is a single syllable

(b) almost hints at what it does in the name (Baxxup?)

(c) is not easily confused with other products

(d) Your closest competitor from a google SEO standpoint is a tanning salon (single word google)

Great job! I wish I was creative like that. Usually my crap looks like: sbs (server backup script) Works for me but my co-workers hate me.

thanks! my process is actually much easier than you think i simply do some small transpositions on really cool(for me) names i have seen in world of warcraft :) such as zulgan and juun, judoc; jaxx -> baxx because backups start with b

others I chose are horrible, such as https://scrambled-eggs.xyz which is effectively unsearchable, though you could say it was by design haha

For some reason there's a lot of negativity in this thread. I thought the txt file was really cool, signed up immediately, and have been having a lot of fun with the interface. I don't remember the last time I found registering for a SASS product to be fun.

" I don't remember the last time I found registering for a SASS product to be fun."

Agreed. This:

"The way you register is through ssh, just `ssh register@ui.baxx.dev`"

Is very cool and makes me happy.

That is very cool, how is this accomplished??

Enable empty password for a particular user, and set the default login shell to the registration program.



not author, but you can force login commands for users in ssh configuration, so make this your script and away you go

thanks! i have a plan to make a whole midnight commander-like ui to list files, manage tokens and notification rules and etc

started playing with https://github.com/rivo/tview last week, but didnt have much time.

How is this comparing to Tarsnap?


I like this.

Another down-to-earth, unix-oriented, fair-deal style "cloud" service worth investigating is rsync.net. I've been using it to sync data between my devices, and I like the way it leaves the user in control.

I have not tried Baxx yet, but it seems to be a project in a similar spirit, and I am happy to see more investment in that kind of future.

This is really cool. The aesthetic is amazing. This really speaks to the hacker in me, the no-website thing is a really inventive way to stand out from other backup services.

Great execution!

If there's no website, what am I looking at when I follow the link?

I think the intent is that, there is no web UI for the product itself. The UI for product itself is a terminal program, accessed via either SSH, or a web based terminal client. The product UI has zero HTML/CSS/etc.

The website https://ui.baxx.dev is, in fact, serving HTML, CSS, and JS.

I get the point, but the "without a website" claim is kind of a cutesy attention-grabbing prevarication.

I'm usually pretty tolerant of those, but personally find that here it mixes poorly with the idea of a backup service. From such a service, I'd prefer to see total honesty, and a sort of buttoned-down, almost military humorlessness/seriousness.

I setup shellinabox for https://ui.baxx.dev just because some of my friends dont have access to ssh from work and I wanted to show it to them

You're of course right, that page is serving HTML/CSS/JS - but that's kinda nitpicking.

It roughly akin to me saying "my Golang on Linux program is 100% C code free - no C is used in the execution of my code!" .. well, clearly that statement would be false.

However, it's perfectly legitimate to say my project is C code free - and that's what this project is saying. It's saying the project is web-tech free. There just happens to be an off-the-shelf execution environment which uses web-tech.

Perhaps if you don't mind fuzzy hand-wavy claims that are as likely to mislead as inform, they could say "our project's source code includes no 'web-tech'.

But that's not the headline. The headline was instead that it's a service "without a website". But there is absolutely, positively, no-nitpicking-involved an HTTPS-accessible website for the service, set up by the service's own creators.

It is a plain text file, served over the HTTP protocol and only incidentally displayed by your web browser, but it could be displayed in a plain text editor or on your console as well.

It's a grabby title. However you're basically looking at the repo readme.

author here: I am going to my daughter's birthday, so I wont reply to comments for a while, but if anyone runs into trouble please send an email to jack@baxx.dev

thanks a lot for the feedback, much appreciated!

May I suggest Kimsufi along with Hetzner for if you scale up? Its similar pricing and they have some NA locations (although less storage). I've had good experiences with them.

i will check it out thanks!

For my home directory/active data I use 'rdiff-backup' to keep 7 days of hourly snapshots. If I screw anything up I never lose more than an hour. My bulk data gets a snapshot every 3 days due to the size/time. All that data is already on raid1 but the extra filesystem level backup will protect from any low level disk hosings. I mirror that backup drive weekly and keep an extra copy off site. If I'm super paranoid about losing something I'll also save it with 'duplicity' to S3, although I also like remote mounting an AWS volume and using it with 'EncFS'.

If I'm going anywhere with my laptop and want access to my data faster than internet speeds I'll preload everything with 'unison' onto the laptop and temporarily have my own Dropbox clone while mobile.

Using those open source tools lets me do basically the same thing as these services while keeping full control of my data.

My key objection is that it doesn't offer any additional value: if you are the sort of customer who would do this, you are also the sort of customer who would implement the same chain of commands to backup to a virtual machine that you owned, eliminating the attractiveness of baxx.dev as a aggregated target.

What additional value can you add?

* it took me good 2 weeks to implement it

* i have actually implemented this kinds of systems multiple times and always failed at the 'who watches the watchers' and usually had broken backups (as in i usually needed them few years after the system is written and by then something is always wrong)

* in the future it will use contextual bandits to probe for broken backups (such as: hey, is file XYZ with size X uploaded at Y weird?) and learn from all the customers triggering the AI flywheel, more backups more data, better contextual bandits, better service, more customers (the idea is to use bandits[or something else with exploration factor] for probability of "good" notification)

so in few words: value = out of the box alerting + nice api + machine learning (not implemented yet) + good price (no cloud)

it's not in the same place as the rest of my data is the feature to me.

If you can do it at a price competitive to what I can buy storage (and everything that comes along with housing it) and worry about it then it's one of the few things I'm happy enough to have a managed service for.

edit for a bit more context: I do a lot of video work so my storage needs are constantly growing.

If it can do the “Dropbox thing” and make it seamless, then that’s a definite plus.

I get that it's probably "trendy" to say you don't have or need a website but with it being so easy to create a static page, I'm failing to understand why you wouldn't just do that, which is already the bare minimum, if you care about your project and want to sell it?

Why is an HTML document the bare minimum? I absolutely appreciate this far more accessible form, author's dedication to function and disregard for showmanship. Somewhat paradoxically, for a potential client like me, this is way better marketing than any amount of superfluous CSS sugaring or formatting.

I think this text/plain presentation is a form of showmanship.

Put another way, do think there is ever any amount of CSS that is not superfluous?

It's showmanship in that it's so nonstandard that it's theatrical. We shouldn't stigmatize that.

CSS can be non-superfluous in documents, like when formatting text or a table.

I'd like to focus on the fact that this required orders of magnitude less effort than designing a web site, and most importantly – the presentation did not suffer. There were no compromises to be made.

Some have mentioned that this lacks functionality, but I'm having a hard time imagining what that might be. Maybe mobile-readiness, in that it's preformatted. Considering that this is a Unix-y tool for power-users, I think it's ok to expect them to read this on a PC.

HTML is the standard form of internet document delivery, that's why it's the bare minimum. It's marginal effort to set up and objectively more accessible than not having a website.

I'm all for reducing complexity, but at this extent your are removing pretty huge pieces of functionality for extremely marginal streamlining.

It's a gimmick

Why is an HTML document the bare minimum?

HTML isn't a technical requirement, but they do say their goal is to sell the service.

I believe OP has created an ingenious way to qualify early users, which is also the stated purpose of this release.

I would say the linked url IS a website even though it is made by a single txt page.

I'm not sure though that a txt document is "simpler" than a bare (no js, not images) html document; on one side you can read it with something as simple as curl, on the other the txt didn't render properly on my phone.

Had he done that I doubt it would have done so well on HN. It’s a differentiation point from yet another bootstrap css SaaS with three tier pricing, faded monotone brand logos and circlular portrait photos of the “team”.

tbh i just wanted to have fun building it having the ssh registration just makes me excited, i feel happy every time i try it out; using the api and playing with it is the same, like making a todo list on baxx https://github.com/jackdoe/baxx/blob/master/examples/todo.sh

to me the `yet another bootstrap css SaaS with three tier pricing, faded monotone brand logos and circlular portrait photos of the “team”` is just depressing

Love the idea and implementation! However I have two main questions/concerns that might prevent me from being a customer:

Cost. Currently I pay $10/month to CrashPlan to store 2TB a month. I think your plan is to simply pass through the glacier price, but even that would be much higher cost I think?

Large files. Because I only have finite monthly bandwidth and a limited upload speed, it would be good to do incremental backup of large binaries, only changing the parts that changed.

Two reasons why I do not want to use the service in its current state (meant as constructive criticism):

1. If I would send my backups to some SASS there is no way I would do that without encryption.

2. I like to backup my filesystem and not just the files in it (to make sure I've got everything and make restoring easy). Currently, I just dd my block devices, but I am sure that could be optimized to not upload a complete image every time.

Sure, I could solve those two problems myself, but then I could also use AWS S3/Backblaze ;-)

> Currently, I just dd my block devices

That's actually not a good idea at all.

It's very fragile, the slightest problem and you may lose the entire backup.

Silent corruption may get backed up for months and you won't notice until it's too late.

Doing a restore means having space for the full file, just to restore a single small file.

If you don't know when your file changed you may have to do that multiple times instead of just being able to check when that single file changed.

In short: don't do that. Copy the full directory structure, sure. But not the disk image.

Why is dd fragile? I mean, yes doing so while the filesystem is mounted is like Russian Roulette, but if the filesystem is unmounted? I see that it isn't very efficient, but I never had any problems with reliability.

Silent corruption might actually be a problem, but that is something you won't solve by backing up files and directories instead of block devices. After all, a block devices backup just contains more information than the files and directories backup. The only advantage of filesystem backups is that you can easier validate if a file should have changed, but even if you detect that it changed even if it shouldn't have, you still need some kind of checksum or so to find out which version is correct.

On the other hand, if you back up the files and directories you have to care about the filesystem type and if there are special types like links, devices nodes and the like. On that side, I had enough unpleasant experiences that I am trying to avoid that trouble.

To restore single files I can simply mount the image on a loop device, so no problem there.

there is absolutely no technical problem doing it on unmounted filesystem but it just imposes certain restrictions that I dont think are worth their cost

considering in many cases you want to copy live files, you have to copy them to place you can unmount, so now you have 2 problems

if you dont gzip the dd you risk pure cosmic microwave background bitflips and no checksums, and they can corrupt superblocks and etc

if you gzip it (to have crc sums) you cant restore partial files from it so easily unless you use squashfs and significantly complicate the process

so tar and gzip is more robust (compared to the dd) just because of the crc (of gzip not of tar), size and portability

Rather than "don't do that", I would say "also, backup individual files".

Fragile seems to be a matter of perspective. I do a dd block-device backup before doing anything "risky" with a machine (i.e. upgrading major OS version, etc), because it is by far the most bullet-proof / fail-proof way to go back in time.

tbh you I wouldnt trust anyone with my encryption, that is why in Baxx I advise people to simply upload encrypted

  cat file | encrypt -k .pass | curl --data-binary
  cat file | encrypt -k .pass > file.enc && curl -T file.enc https://baxx.dev/io/$BAXX_TOKEN/file
(encrypt: https://github.com/jackdoe/updown/blob/master/cmd/encrypt/ma...)

offtopic: it is a bit annoying that cat | curl makes curl buffed the whole standard input, so for big files you have to either you have to do -H "Transfer-Encoding: chunked" or use -T file

2) dd block devices is very fragile, its better if you just tar --exclude/dev ... (https://help.ubuntu.com/community/BackupYourSystem/TAR - Alternate backup)

As cool as this could be, this service is not passing any company-grade backup quality requirements. Also, the price is a complete no. Not because 5€/m is much, but because 5€/m can get you hundreds of GBs of storage in AWS Glacier, with contractual guarantees about your data, how it's managed, etc...

Basically, this is a toy. Use it as a toy, not as a service where you'd actually put your company files.

That said, good job!

> because 5€/m can get you hundreds of GBs of storage in AWS Glacier

Yes... until you want to take anything back out - then you're going to have to wait, and you're going to pay a lot more than €5

Indeed, but it matters how much of the data you want to recover and how fast you want to recover it. Getting 100gb from glacier is not expensive at all if you can wait a couple of days.

When Glacier first launched I looked at pricing, and it was going to cost a lot to restore from backup if needed. But after your comment I checked the pricing again, and you're right - it's very much cheaper now that it used to be!

it is absolutely in alpha stage, and yes please use it as a toy and have fun

in the future my plan is to do 5€/m for 1TB which is pretty much glacier price

I'm confused by the `find | xargs | sha256 | curl` trick… Is it even remotely as efficient as the rolling checksum of an actual rsync?

This is great. Building a company from the terminal. Good luck on your journey.

Don't need a backup solution right now, but I keep it in mind.

".... 750E profit - 50% tax = 375E profit"[1]. 50% Taxes in DE, nice place to do business ...

[1] https://github.com/jackdoe/baxx/blob/master/infra-and-pricin...

Are you sure this is Germany? The author has their location set to 'Amsterdam' on GH, and has no impressum on their site.

FWIW, corporate taxes in Germany aren't 50%, but 15%. What is close to 50% is personal income tax + health care if you pass a certain threshold. However, even if you're running a just sole tradership (Freier Beruf or Gewerbe), this would only be your personal income - all business expenses are not counted towards your personal income tax base and you get the VAT you paid back.

I would guess it’s someone with a job, and this is hitting income tax.

But corporate taxes in DE are about 30%, not 15. Don’t forget your Gewerbesteuer.

Free school, healthcare and nice infrastructures have a cost..

Too bad that cost is levied on people who don’t even need or want those services. The rich, childless, houseless, and single are subsidizing services for everyone else regardless of whether they personally derive any benefit.

If that doesn’t strike you as unfair I’m not sure what would.

Unfair to me are situations where people die because they are unable to pay for their own needs like food and healthcare. I gladly pay 50% of my income for basic infrastructure, healthcare and welfare for everyone, and i still have more than enough money at the end of the month after all expenses like rent, food, entertainment, etc.

Without government assistance it would be very foolish to not have children, for most people at least. So this cuts both ways. If anything it is more than fair. Rich people avoid most of the taxes anyways also in Germany.

Since they're rich and childless, they should vote with their wallet and move to a country with less taxes.

loving the transparency of this. i'm definitely becoming converted as I'm reading…

If this a calculation is for Germany, it's nonsense.

How so? I pay exactly 51% in Germany (and I’m ok with that).

> As of 1 January 2008, Germany’s corporation tax rate is 15%. Counting both the solidarity surcharge (5.5% of corporation tax) and trade tax (averaging 14% as of 2008), tax on corporations in Germany is just below 30%.


You're paying income tax at the highest rate. The calculation in the linked document is about "business taxes". So subtracting just half of the money "because of taxes" is not correct as you'll probably not pay that much in taxes for a business.

Aha, thanks. Sounded to me like extra income... but I was clearly not paying full attention.

this is the tax for me in The Netherlands, given i am not a company, so it is normal income tax

Issuing invoices and paying directly in the terminal via Bitcoin's Lightning network would be pretty cool add-on.

Machine to Machine payments: Say you deploy a bitcoin miner to stranded gas. It starts to mine coins and can start paying you to back up it's data without ever needing to create a Paypal account.

Registration over ssh is cute but not safe from MITM attacks. Even if the ssh key was published somewhere (which as far as I can tell it isn't) you would be dependent on people manually adding the key to their known_hosts file which you can't reasonably expect most people to bother with.

You can't reasonably expect most people who would register to a service over ssh and make backups with cUrl in a shell script to update their known_hosts file?

The instructions on the main page would change from:

  ssh register@ui.baxx.dev
to something like

  curl https://ui.baxx.dev/ssh_keys -o /tmp/baxx_ssh_keys
  cat /tmp/baxx_ssh_keys
  cat /tmp/baxx_ssh_keys >> ~/.ssh/known_hosts
  rm /tmp/baxx_ssh_keys
  ssh register@ui.baxx.dev
or in one line:

  curl https://ssh_key.baxx.dev >> ~/.ssh/known_hosts && ssh register@ui.baxx.dev

The author of this service didn't think it important to provide the ssh key and based on the comments in this thread people have already signed up for this service without caring about the key. So yes, I think even people who would use this service mostly can't be bothered to manually update their known_hosts file.

In that sense then I agree with you.

Edit: I thought you meant that, given the command, most would still not do it.

You are right.

it is super cute though, I still get excited every time I try the registration flow.

the registration api is quite easy (https://baxx.dev/help/register)

  curl -d '{"email":"your.email@example.com", "password":"mickey mouse"}' \
I will probably do make register.sh endpoint that returns a bash script that runs local dialog like: curl https://baxx.dev/register.sh | sh (just idea, not implemented yet)

this will be fun, making portable tty dialogs :D

Interesting concept to sell a shell service. I hope we will see this more in the future.

That is a website.

I love the UX for this.. why haven't we seen more of going back to UNIX principles? I'm also curious.. why and what ML do you want to add?

* in the future it will use contextual bandits to probe for broken backups (such as: hey, is file XYZ with size X uploaded at Y weird?) and learn from all the customers triggering the AI flywheel, more backups more data, better contextual bandits, better service, more customers (the idea is to use bandits[or something else with exploration factor] for probability of "good" notification)

i think the problem needs exploration, i have been bitten by bad anomaly detection and setting up alerts post-moretm after losing data one too many times :)

i imagine creating a bunch of features such as:

  * file extension
  * time of upload
  * delta from previous version
  * size difference from other files in the directory
  * etc..
and having enough customers we should be able to do good-ish prediction

when a file version is "weird", so we could send notifications such as:

> hey is this file ok?

with some exploration factor, even using UCB will(should) be better than nothing

I plan to use vowpal wabbit's contextual bandits with only 2 actions, "send notification"/"dont send notification" given all the context

this of course will be just extra on top of the manual alert rules, but hopefully it will save some data :)

If the users agree we could also publish anonymized datasets with labeled data (such as: given context alert was sent: was [good]/[bad])

Which will be awesome.

Interesting! How did you integrate PayPal payment in the shell without a website?

using paypal IPN and redirect to paypal subscribe page, literally as simple as https://github.com/jackdoe/baxx/blob/master/api/account_rout...

I use borgbackup to a NAS on the LAN. I then use rclone to copy the repositories to a Google Drive that happens to be unlimited because of the subscription of my previous university. :-) Luckily my account persisted when I left the uni.

What's a good way to encrypt filenames on the remote side?

i use something like curl -T path/to/file https://baxx.dev/io/$BAXX_TOKEN/$(echo | encrypt -k ~/.pass | base32)


Go isn't as portable as it should be.

Where is this located (for GDPR purposes)?

Deducing from [1]: Currently somewhere on Digital Ocean infrastructure. Long term on Hetzner servers (apparently SX62 [2]), so at that stage it's either Germany or Finland.

1: https://github.com/jackdoe/baxx/blob/master/infra-and-pricin...

2: https://www.hetzner.com/dedicated-rootserver/matrix-sx

roach motel ?

Calling it Unix friendly but not supporting ssh (and thus sftp/rsync/etc) seems like quite a weird choice, and one that’s lost you a potential customer.

thanks for the feedback having `scp file $BAXX_TOKEN@scp.baxx.dev:file` support is definitely in my list

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact