Hacker News new | past | comments | ask | show | jobs | submit login
Upspin – Another option for file sharing (googleblog.com)
619 points by andybons on Feb 21, 2017 | hide | past | favorite | 164 comments

Seems to be very much in the space of kbfs and IPFS.

For the folks who are building this: can you compare and contrast this to both kbfs and IPFS? Why have you chosen to start another project in an already crowded space instead of contributing to either of those projects? They are both open source and much further along in development . . .

We are aware of kbfs, ipfs, and several other systems. There are many trade-offs one can make in this space, and I think Upspin's set of tradeoffs is somewhat unique.

One reason we started this project instead of contributing to others is that it's not clear that the trade-offs made by extant systems are really working for users, at a fundamental level. Maybe Upspin's will, maybe they won't. We'll see.

We wrote a bit about this in https://upspin.io/doc/overview.md but there is still more to say. I filed an issue a while back to write a substantial document that compares Upspin to other systems. Hopefully the community can help us flesh it out in time: https://github.com/upspin/upspin/issues/177

Cool. I'd love to see a section detailing the most important trade-offs that differentiate upspin from other projects, compare the approach to the alternatives and explain why upspin's approach is superior (for $TARGET_USE_CASE)

That's the plan! :-)

What problem does this solve for the end-user that isn't already or better solved by Google Drive, Dropbox, or sharing in chat apps like Hangouts or WhatApp?

Or is it like, "Here is some cool technology!" without identifying a clear user need this solves, like Google Wave?

Dropbox uses one key for all content...

Google drive means google can see all of your stuff...

Sharing in hangouts is just like drive...

WhatsApp isn't close to this robust...

Wave was just really far ahead of its time but you see the modern incarnation in google docs and paper. This is like the infra to build spideroak on your own hardware.

It's not the same as a service where you have to trust the operator.

Seems niche. Most people trust at least one of the above operators.

How about MEGA? It's sort of like multi-key Dropbox.

Centralized, but not inherently so; you could stand up clones using its same API, and then create a discovery service mapping bucket names (users, whatever) to servers using that API.

SpiderOak is a Zero-Knowledge service. You don't have to trust them.

And you shouldn't.

I wanted to link to various people who have experienced data loss in SpiderOak (me included) due to bugs in the client which have gone unsolved for at least three years.

But....they seem to have redesigned their homepage and the forums that used to be available seem to have gone.

Here is one example though: https://spekxvision.wordpress.com/2015/10/13/more-spideroak-...

So I can only talk from a personal perspective and my experience with their CTO and support team. I've lost data on multiple occasions due to bugs in the client. This data was not recoverable. I've since switched to Crashplan and am much happier.

Agree, SpiderOak is far from good. Haven't yet experienced data loss, but their sync is very slow and, with custom synced folder (not the 'Hive' one) pretty unreliable.

From Crashplan's web page I don't see if they do file sync, what do you recommend for this domain? I'd use OneDrive, but supported Linux client is a must.

You're right. I use Crashplan for backup. Haven't really found a solution for sync that I like. Missing Linux client is a problem for me as well.

> google docs and paper


Dropbox Paper. It's multiplayer Wordpad -- a reversion to the "big white space that (ironically) doesn't pretend to simulate sheets of printer paper," now with emojis.

Does upspin have a known "set of tradeoffs [that are] somewhat unique" or does the community need to flesh them out? High-level bullets would be nice.

We haven't written the docs. That's what the issue I linked in the parent post is about.

Are there going to be docs aimed at developers of Upspin-compatible & -interoperable implementations?

Don't know if you are allowed to comment but:

Is this an early sign of Google going back to its "not evil" roots or do I read too much into this?

Edit: for context, I used to be a google fanboy and I still to sone degree recommend some of their products. I just got a bit fed up with the butchering of xmpp and possibly a few things I can't remember right now.

It's not a Google product. So no. (not that I'd agree with it not adhering to those roots currently. Just that this project has nothing to do with company policy)

"This is the official list of people who can contribute (and typically have contributed) code to the Upspin repository. The AUTHORS file lists the copyright holders; this file lists people. For example, Google employees are listed here but not in AUTHORS, because Google holds the copyright."

There is a difference between "a Google Product" and "Code to which Google holds copyright". Most importantly, the latter may include all code written by any Google Employee during their employment.

There is a difference but it is only interesting once you get really close like in a lawsuit IMO.

Google has copyright on the code and runs the infrastructure?

Very much a Google product to me. Paid or not. Official or not.

But it makes a philosophical difference to the involvement of the company, of the people working on it and also a practical difference, because products and non-products have very different launch-requirements. You are perceiving Google as far more monolithic, than it really is; the difference between a product and a non-product is how different employees of the company interact. To the outside world, that might or might not have any meaning. But I feel for the original comment that I replied to, it does.

From the docs:

> Terms of Service

> The Upspin website (the “Website”) is hosted by Google. By using and/or visiting the Website, you consent to be bound by Google’s general Terms of Service and Google’s general Privacy Policy.

Really a shame.

We are aware of kbfs, ipfs, and several other systems. There are many trade-offs one can make in this space, and I think Upspin's set of tradeoffs is somewhat unique. One reason we started this project instead of contributing to others is that it's not clear that the trade-offs made by extant systems are really working for users, at a fundamental level. Maybe Upspin's will, maybe they won't. We'll see. We wrote a bit about this in https://upspin.io/doc/overview.md but there is still more to say. I filed an issue a while back to write a substantial document that compares Upspin to other systems. Hopefully the community can help us flesh it out in time: https://github.com/upspin/upspin/issues/177

I think KBFS is a sort of an add-on to Keybase, which is really quite different from this. Keybase's whole schtick is identity and key management. How do share a file with a stranger without exchanging emails? What happens when you lose the email you signed up with? How do you share keys between your devices without trusting a for-profit company's keystore? Keybase solves those for you, and IMHO is quite novel and useful.

Upspin is a separate thing that focuses entirely on how files should be stored and identified, and (AFAICT) does nothing new regarding identity management. Which is unfortunate, because that seems like the more urgent problem right now.

The comparison to kbfs and IPFS seems relatively straightforward based on a cursory reading of the three web sites...

kbfs still stores files, and I still depend on them to keep their service running. kbfs adds key management via keybase, but the file sharing model doesn't seem significantly, or at all, different than Dropbox.

IPFS on the other hand is completely peer-to-peer, meaning I have little control over content I publish. I can publish new versions, but my old content is still on the network, and forever out of my control.

Upspin provides a middle ground by providing a protocol for key management and server lookup. Unlike kbfs, the files live on servers that I maintain control of. Unlike IPFS, I can delete or update content after it is published, as needed.

Seems to be very much in the space of kbfs and IPFS. For the folks who are building this: can you compare and contrast this to both kbfs and IPFS? Why have you chosen to start another project in an already crowded space instead of contributing to either of those projects? They are both open source and much further along in development . . .

Rob Pike is the 2nd biggest contributor to this project: https://github.com/upspin/upspin/graphs/contributors

I could tell this without looking at the stats because Renee French definitely made their logo.

Looks like someone wants to bring plan9 into the 21st century !

Yeah, I noticed that David Presotto, another major Plan 9 developer, is contributor #4.

and eric grosse too

Several questions, some minor, some major:

AIUI, the keyserver is centralized in two regards: a) if it's down, I can't access anyone's data and b) it centralizes trust, key.upspin.io has complete control over which key belongs to which person and where the data is, so it can just take over accounts. Why not use a federated model, e.g. putting the directory into the DNS or have the directory server listen on a well-defined port or something? Or, if you want the usability of not having the E-Mail hoster necessarily deal with upspin, do the federated thing first and if that fails, fall back to the centralized keyserver?

There seems to be a 1:1 mapping of usernames to keys, meaning I have to share my private key with all my devices. If one device gets lost or compromised, I now have to revoke my key and rotate it on all devices. Why not putting in a 1:n mapping of usernames to keys, so that each device gets its own key?

Why store the upspin-specific keys in ~/.ssh? I won't use them to ssh anywhere and there already is a perfectly fine directory ~/upspin for upspin-related data.

The way sharing works means, that storage for a file grows linearly with users that a file is shared with. This would seem to preclude from sharing files with large non-public audiences (say, I have a group for all employees of my company, or members of my hackerspace, or attendees of a conference…).

The centralized keyserver puzzles me a little too but it does have some advantages:

1. The email address adds a layer of indirection on top of your upspin directory and storage server addresses. You can migrate upspin providers without changing your identifier in the global namespace.

2. The email address is instantly recognizable to friends and family. It's more user-friendly than introducing a separate decentralized upspin address (like Jabber IDs). You did mention a hybrid approach which uses the centralized keyserver as a fallback though.

A decentralized keyserver still seems more natural to me. Would be great to hear more on the reasoning from the creators of upspin.

"An Upspin user joins the system by publishing a key to a central key server. We’re running our own server for the moment but anticipate converting to Key Transparency." https://upspin.io/doc/security.md

I noted the disclaimer: "Upspin is not an official Google product". Yet it is announced on the Google Blog? Is it a separate "skunk works" team effort?

Also, I note that they can set access levels based on email addresses, but I am unsure how that would work? Would those emails have to be linked to a Google account so that Upspin could check the currently logged in Google account to allow/disallow access to the files?

Upspin is one of Google's many Open Source projects. It's not really skunkworks, per se, just something that we wanted to work on and were lucky enough to be supported by Google in doing so.

The email addresses are Upspin user names whose public keys are registered with a central server, key.upspin.io. To act as an Upspin client, you need to sign up: https://upspin.io/doc/signup.md

Requests made by Upspin users are signed with those public keys. Servers validate those users by validating the signatures against the key published by key.upspin.io.

Gotcha, thanks for the extra info. So you still have to sign up with the Upspin service to be able to consume shared files? I was under the impression from the post that it was an 'account free' type of service, in as much as you can publish a URL where people can grab files, but still provided a level of access control, and I was having a hard time reconciling that in my head.

You need to register your public key with key.upspin.io in order to speak to any Upspin servers, so in that sense it is not "account free". You need to prove to others that you are who you say you are.

There is an "all" permission that shares with the world (see https://upspin.io/doc/access_control.md) but in most cases we expect people will share with a specific set of named users.

Will Upspin key servers be operated by a non-Google entity so as to allow the global filesystem to continue to function after Google decides to no longer support the project?

What happens when that non-Google entity decides to no longer support the project? Now you're dependent on two organisations continuing support instead of just one.

It sounds like the project code is all publisgedcthough, so it should be possible to operate a private or alternative upspin network.

This doesn't really clarify what an "unofficial" Google project is. Does it mean that people don't work on it full-time and it's a 20% project? Does it mean that it was developed at home without using Google's resources, so they don't have an ownership interest in it, but they are letting it be hosted on their infrastructure and announced on their blog to be nice? Does it mean that Google isn't sure how long it will want to continue to contribute resources? Does it mean that support is only available on an ad-hoc basis (isn't that the case with all free Google products)? Does it mean that serious customers shouldn't rely on it to work reliably? Does it not really mean anything, and just thrown on there for legal protection (against what?)?

I didn't write the "unofficial" part, but I can answer the questions:

> Does it mean that people don't work on it full-time and it's a 20% project?

That there are a few of us paid full-time to work on this, plus some 20% contributors.

> Does it mean that it was developed at home without using Google's resources, so they don't have an ownership interest in it, but they are letting it be hosted on their infrastructure and announced on their blog to be nice?

No, this was developed by Google employees on Google's time.

> Does it mean that Google isn't sure how long it will want to continue to contribute resources?

I'm not sure what "Google" thinks about this, but I do know that the foreseeable future this is what I and my other teammates will be working on.

> Does it mean that support is only available on an ad-hoc basis (isn't that the case with all free Google products)?

There is definitely no support for Upspin beyond us and the community helping people on the mailing list.

> Does it mean that serious customers shouldn't rely on it to work reliably?

As is stated in the README, Upspin is still very rough. It's early days.

> Does it not really mean anything, and just thrown on there for legal protection (against what?)?

I'm not sure what "unofficial" means in this case but I hope I have answered your questions sufficiently.

"This doesn't really clarify what an "unofficial" Google project is."

Okay, so since i wrote this disclaimer, and the policy that requires it, let me try to explain this simply.

Historically, people were discovering projects were made by googlers, etc, and thinking this somehow meant it was an official google thing that google was supporting (though, admittedly, i have no better idea of what that really means at heart than anyone else). You'd even see tons of press stories about how Google had done x, y, or z, and worse, people would take it as a sign of strategy or best practices or whatever. IE "Google released j2objc, and thus thinks you shuold write your ios apps in java". But Google hadn't done anything. A bunch of people who worked there had done something, and someone discovered they were googlers (usually. There were a number of cases of people trying to associate their project with google in order to try to gain publicity, etc, but this was significantly more rare)

So I got asked to go solve this problem, and tried to do so in the lowest effort (for everyone involved) and simplest way:

Anything that was not an official google product, was marked as such. Now press, etc, can't claim they thought it was official anything :) People look at it get the right impression, even if they don't know precisely what it means. It raises a few questions sometimes (what does it mean to be official), but for over 5000 projects, i think the number of questions of people trying to understand what i means is "small".

In that regard, i believe doing this was a resounding success. [1]

This was simple back in the day. These days, there are projects, like upspin, that seem to be like they are are somewhere in the middle (honestly, i haven't looked at all to determine it).

So maybe i'll reevaluate.

[1] Almost as successful as me marking Chromium "copyright the chromium authors" (now copied everywhere), with nobody really understanding why that was done.

Looks like it's an open source project by individuals who also happen to be employed by Google, and the extent of Google's involvement is the copyright.

Edit: looking at the LICENSE file, scratch that, it appears Google doesn't even own the copyright.

Not really. Google owns the copyright for all code contributed by Google employees on Google time, and currently they are the only copyright holder [1].

In order to contribute you have to sign the standard Google Contributor License Agreement (either [0] or the corporate version) which gives Google a perpetual irrevocable copyright and patent license. Since the project is BSD licensed anyway, unless you're contributing something you intend to patent, you're giving Google nothing by signing the CLA: you still own the actual copyright. (Based on my skimming the agreement; IANAL)

[0] https://cla.developers.google.com/about/google-individual [1] https://github.com/upspin/upspin/blob/master/AUTHORS

Is there a list anywhere of these unofficial Google Open Source projects?

Kudos to you for asking this question. I was under the same impression that it was an official Google product but the repos didn't show any sign of this.

> If one wants to post a Facebook picture on one’s Twitter feed, one does that by downloading the data from Facebook and then uploading it to Twitter. Shouldn’t it be possible to have the image flow directly from Facebook to Twitter?

How does my upspin file "ann@example.com/pub/hello.jpg" solve the problem here? I would have a single source for my image to share but still no way to describe an image hosted by FB as an upspin address.

I think it's heavily implied ("its real contribution is a set of interfaces, protocols, and components from which an information management system can be built") that given enough buy-in, existing content stores could hook into the Upspin namespace, but that's of course a strategic decision that such content stores must evaluate.

Aside from the content-addressed bit, it sounds to me like in many ways this idea is similar to OAuth or even SAML both in purpose and ambition, prescribing a standard way one can punch a delegated-access hole into restricted space. Upspin would then act like an overlay network to locate files, then hand the authorization decision down to the implementing system.

Back in the day, lots of people bought into OAuth because not doing so offered no competitive advantage, but rather resulted in an explosion of tools and integrations that greatly benefited all those services. This believes that others would be tempted as well.

I remember when OAuth was first announced, and the use case was basically this - although this was before social networks became unanimous so it was more like sharing your Flickr photos with a photo printing site. It was a nice idea 10 years ago before everyone built their walls.

To describe the content, the idea was everyone would use microformats:


There's another interesting use-case: OAuth is in its extensions (OpenID Connect & XACML) can be used to create an attributes-based access control system (ABAC) (as distinct from an identity-based or roles-based one). The benefits of such a system are:

- It can be privacy preserving. Say you want to buy booze from an online bottle shop. In real life, you'd usually flash your ID to meet the over 18/21 legal requirement. But by doing this you're leaking a tonne of unnecessary information (your name, your address, even your date of birth). One implementation of an ABAC might be that you prove your identity to a mutually trusted third-party, who than asserts to the bottle shop that your attribute of 'over18' == True (this is a rough description of OpenID Connect). So the only thing the bottle shop learns about you is the only thing it needs to legally know.

- Flexibility. Say you're unemployed and receive welfare from the government. Under a roles-based system, they'd attach a role to your identity along the lines of 'is entitled to receive welfare'. However, if any aspect of government policy changes (e.g. an additional requirement that you have to be over 6 feet tall), they would have to audit each account and update the roles. Under an ABAC, they'd simply update the ABAC policy, and access control decisions would be made according to the new policy (assessed against your asserted attributes). It also means that access to new types of resources don't necessarily require new roles to be added to the system: it might already be covered by existing attributes within the system (i.e. attributes can be recombined in different ways to allow very finely grained access control without the overhead of maintaining an ever-growing list of roles).

> without the overhead of maintaining an ever-growing list of roles

For me it sounds like roles are being replaced with new set of derived attributes (isOver18, is6feetTall) that you'd still need to compute when the government policy changes. Maybe the main benefit would be the possibility of reuse but that could also be achieved with a sufficiently sophisticated role system.

I probably chose a bad example. Sticking with welfare, in many countries the payment is means tested. To simplify, let's imagine there's a policy that says "people earning over $1000 a month are not entitled to welfare". At some point politicians decide this is too generous and change it to $1500. In an RBAC, you have to the the roles audit and reassignment. Whereas in the ABAC, because you've already been collecting this attribute so you just change the policy and keep making access-control decisions as normal.

Nor would Twitter know what to do with an upspin file.

The problem seems to be distributing the content, not the naming of it.

Edited to add: Ideally that's what the URL is for, it should be possible to get the URL of the image on FB and post it to twitter, since both speak DNS/HTTP.

Yeah, this use case can be solved in the first place by FB making it possible to share outside their walled garden, and Twitter ingesting the appropriate URL and creating thumbnails to display inline for non-Twitter images.

No new tech is needed, only the will to open up the walled gardens.

By the way, this was sort of taking shape for a while with 3rd-party aggregators ("all your social media in one feed!"), but that approach just withered away with the growth of FB into an unstoppable juggernaut.

EDIT: Which isn't to say that Upspin doesn't have some very interesting uses, but the barrier of UGC walled gardens not wanting to open up in the first place makes this particular use-case moot.

also, I probably don't want to share a url/uri that contains my email address on Twitter...

(I guess it depends if Twitter would _then_ ingest the image to their servers or simply continue to reference the upspin url/uri publicly - which was the original use-case)

It's a bit unclear how email-based identifiers should work when publishing long-lived, public files.

In particular, which email address should you use? Will the general public learn your email address? What about people who want to post publicly and keep their email addresses private?

What happens if you switch email providers?

Also, there is there any provision for actually making public files available via https? How would that work?

For the moment, using your github username feels safer, since they are public from the beginning, intended to be permanent, and detached from your email addresses. A Twitter account is another intentionally-public userid that's not tied to anything else.

> Will the general public learn your email address?

Yes, if you store ann@example.com/dir/file and share the file path publicly then the valid email address "ann@example.com" leaks out. If you own your own domain then you could have one or more email address only associated with Upspin and store at files@mydomain.com/dir/file. The downside is the username in such paths is less recognizable to your friends than your regular email.

The reality today is anyone who uses any Google service or who has a Facebook page has an exposed @gmail.com or @facebook.com email address that can be spammed, and if he or she signs up for Upspin it's another way for it to leak out. You can use any email address that can receive the confirmation email; I'm not sure if yourusername@users.github.com is a valid email address, and there's no email address with your Twitter handle.

> What happens if you switch email providers?

You would execute a new `upspin signup -dir=dir.example.com -store=store.example.com you@newEmailAddress.com` command, keeping the same directory and storage servers. The question is whether you can tell your directory server and storage server to reuse or alias your directory and storage to those of your previous email address.

I saw nothing about requiring email address usernames to be backed by an email provider in the documentation. The email address is literally just a username format and is authenticated by public key. Anyone concerned with the privacy of their actual email address would obviously not use that address as their user. As I understand it, you can just make one up for a username. If this is incorrect, someone more informed please reply.

The signup process sends a verification message to your email address to confirm that you own it.

Seems like that results in a different problem? Someone could take your email address.

Perhaps the new-user process requires authentication initially via email to confirm that you are in control of that address.

That is exactly how it's described in the documentation, yes.

Is their hope or stated objective to move all file sharing to this format? My takeaway was that this was designed for the use case of "trivial day-to-day sharing without hassle" (or at least that's their focus on the consumer case), i.e. taking the "another option" at face value.

Long-lived public files seem like a different use case, and one that is already generally handled okay by traditional hosting or file sharing.

They used the example of sharing on Facebook or Twitter, which are archived indefinitely, so it seems like someone should be thinking about link rot and how to avoid it?

(Should a link be shared or the data copied?)

Cool to see in terms of Go style how they handle errors -- https://github.com/upspin/upspin/blob/master/errors/errors.g... It's a common pattern I've seen, but given that this comes from Rob Pike etc., glad to see that the pattern is considered a good one

Edit: I sound like a total fangirl :/

It's such a good pattern, that other languages have even standardized it. Java's exceptions are basically the same thing: The stacktrace and the underlying exception come by default. Arguably, Go made a mistake here by not having featureful errors, since now each project has to reimplement them.

That's one take on it, but these error types are vastly more lightweight than exceptions when the code isn't hitting the happy path (and there's little difference when it isn't).

Plus, when I write Java I have to constantly specify new Exception types. I don't see anything wrong with having to do it for Go when I really want more or less data depending on circumstance.

I'm excited to see what kind of discussion comes up around this codebase.

It's worth noting /u/eneff's comment about the initial code though: https://twitter.com/enneff/status/834167073508978689

Nice thanks for sharing that :)

Hmm. Most are focusing (understandably) on the mechanics of using Upspin for different use-cases.

To me, the more interesting aspect is that the server has zero knowledge of the contents of the file, which makes it more comparable to SpiderOak One: https://spideroak.com/manual/send-files-to-others

This sounds a whole lot like the Keybase filesystem (in intent, if not implementation): https://keybase.io/docs/kbfs

big difference: Keybase hosts everything, all your data is belong to Keybase. In upspin only your public key is centralized; the data can be anywhere.

Actually if you read the design document it sounds like they want you to provide some wrapper server to access, say, your photos on Google Photos through the Upspin protocol. Your photos would stay there, but authentication would be centralized on Upspin.

Keybase files are encrypted though, right? So if you provide your own PK, they can't read your files.

This was my understanding as well. Keybase goes further in that they do not have access to the file metadata either (only size, not even count - though by their own admission they could probably reverse engineer count if they tried hard).

You don't even register a single root PK with them anymore - you end up with a network of device-specifc and paper keys in their current security model. IMO, their current client is the most nontechnical-user friendly security app I've ever had the privilege to use.

Does anyone have more context to compare this to Camlistore, which is made by another member of the Go team?

Hi, Camlistore author here.

Andrew Gerrand worked with me on Camlistore too and is one of the Upspin authors.

The main difference I see is that Camlistore can model POSIX filesystems for backup and FUSE, but that's not its preferred view of the world. It is perfectly happy modeling a tweet or a "like" on its own, without any name in the world.

Upspin's data model is very much a traditional filesystem.

Also, upspin cared about the interop between different users from day 1 with keyservers etc, whereas for Camlistore that was not the primary design criteria. (We're only starting to work on that now in Camlistore).

But there is some similarity for sure, and Andrew knows both.

This is pretty much correct. Upspin is way more filesystem-like than Camlistore, but I would emphasize Upspin's single global name space, rather than the filesystem-ness.

I think both Camlistore and Upspin have promising models.

Would it make sense to provide an upspin server interface to Camlistore?

Seems inevitable. :)

For other reasons I checked camlistore website a few days ago and it seemed totally stale.

You mean work is progressing on camlistore? That would make me a little bit happier : )

bradfitz has already replied, I just wanted to add what seems to me to be the focus of both projects, because they are different:

Camlistore aims to be your repo of all your stuff that you may want to share with other people at a later time. It wants to be the repository of all your life and everything that happens.

Upspin aims to be a unified protocol for all applications to access (and possibly modify) data, wherever it is: maybe it is your website on your server, maybe it's a NASA dataset on some S3 tenant, maybe it's an imgur gallery, maybe it's an OpenStreetMap dump of their DB; the goal AFAICS is to give any application access to data regardless of where and how it is stored, so that applications can do what they're best at.

> Our target audience is personal users, families, or groups of friends. Although Upspin might have application in enterprise environments, we think that focusing on the consumer case enables easy-to-understand and easy-to-use sharing.

This definitely does not look "easy to use" unless the target audience of families only includes families where all have technical backgrounds. If I gave anyone in my family that list of instructions they would stare at me blankly. There is a reason Dropbox does so well, its simple and doesn't require you do any manual fiddling.

In case it's not clear, this project is in its very early stage of development. There's a ton of work to do to get it into the hands of non-technical users.

From the readme: "Upspin has rough edges, and is not yet suitable for non-technical users."

It seems like they're currently designing the APIs which, once solidified, can be used to make a user friendly service. The foundation, as described, isn't easy for customers. But it is easy for developers and engineers!

And once this framework/API work is complete, people will be able to use this to create customer-friendly solutions.

Public key, private key, etc... Yeah, next to Facebook, Dropbox, GDrive, etc, it won't be easy to convince grandma, "Jenny the hairdresser" and "Joe Sixpack" to use this thing.

> There is a reason Dropbox does so well, its simple and doesn't require you do any manual fiddling

Cue someone linking to that that HN chain of messages about how Dropbox just won't work.

Sounds a little like IPFS: a global file system based on end-to-end cryptographically verified files distributed through untrusted nodes. Anyone want to compare the two?

As far as I can make out, this isn't distributed --- while there may be multiple file servers, each piece of data is stored on exactly one file server, and you need to use the directory service to allow you to identify which file server stores the data. Then you contact that file server and it's a perfectly ordinary remote file system.

It sounds like it ought to be nigh-trivial to bolt IPFS support onto it, so that the file server is replaced by IPNS (can't be IPFS because Upspin is mutable). So, you look up elvis@theking.com/fnord and the directory server hands you '<hash>/fnord', and the client library then goes and looks up the data via IPNS. No file servers required.

Of course, this means that the only way of doing ACL enforcement is via encrypted data which is unlocked via ephemeral keys handed out by the keyserver, which as a solution is pretty terrible, but ACL enforcement in a decentralised system is practically impossible anyway.

After a quick read, to me the difference vs. something like IPFS+IPNS seems to be the quasi-"federation" angle, such that Upspin acts like a protocol you can retrofit into an existing content store instead of self-hosted software with its own "captive" private datastore.

For these reasons, I find it conceptually most akin to BitTorrent when combined with Magnet Links that are shared by strongly-authenticated identities than to any other distributed content store project.

This seems to focus on the human part (you can just type it) while the tech is underneath and IPFS seems to focus on the machine availability while the devs would have to create a front-end for humans when needed; I think that it would be better compared to IPNS

From my early reading, this isn't quite as distributed as it relies on a directory server and email for identification.

This looks really interesting. I will play around with it, but I'm curious if there's any sense of how much this would add to an existing command-line tool? For example if all I want to do is get an upspin file, make changes, and put a (possibly new) upspin file back, would it entail significant configuration and/or a much larger executable?

A critical part of storage for non-techies is having great search. However, in this case, since blobs are encrypted, the server cannot index them: it does not hold the private keys. It requires the client to make a full copy of the files locally to index them (and presumably re-index them when a different client modifies them?)

Has any thought been given on that?

It is odd they do not use URLs (or URIs) with 'upspin' scheme.

> It is odd they do not use URLs (or URIs) with 'upspin' scheme.

Yes it is. An Upspin file path like ann@example.com/dir/file is not obviously part of any scheme. If Upspin takes off, all the automagic parsing of plain text (that makes ann@example.com and news.ycombinator.com/formatdoc automatically turn into a clickable e-mail link and an HTTP URL) could be extended to turn ann@example.com/dir/file into a link that fires up a GUI to the Upspin service. The problem is that the file path also looks like an HTTP/S URL for https://example.com/dir/file specifying the username ann, so copying it into your browser's location bar without a scheme is fundamentally ambiguous.

Even though Upspin uses HTTPS for its API (e.g. https://store.upspin.io/api/Store/Get/something), there doesn't seem to be an HTTPS URL for access to a particular file. Since ann@example.com is by design just an email identifier disconnected from the directory server and storage server that ann uses to store her files, clients can't skip the root key server and talk to example.com to request ann's files, even though in some cases it will be the same server name.

A better godoc link: https://godoc.org/upspin.io

The only thing I dislike about this idea is that it appears to be another centralised service.

Why can't we come up with something similar but using HTTPS and Public Key Infrastructure that I can host on my own server but still be interoperable?

I don't think you fully read how it works. It said the storage providers etc can be self hosted.

I might have misinterpreted the overview, but the key server/service that provides authentication for users sounded like a central point in the system.

They mention in the docs that there's no reason to keep it only key.upspin.io, and that they expect it'l be replicated and hosted elsewhere.

Although I prefer a bit more federation - maybe like DNS - overall the idea is really cool!

One other downside: what if the owner of a shared file passes away, or a worker leaves the company, etc.? I guess if the file's that important, then I would have to have downloaded the file ahead of time, and hosted it on "my upspin"? Not saying this is an easy thing to solve for...but maybe ipfs at least addresses that...unless I'm missing something, and upspin does accommodate that use-case?

Overview that explains the problem this is meant to solve: https://upspin.io/doc/overview.md

"This “information silo” model we have migrated to over the last few years makes sense for the service providers but penalizes the users"

Well, there were many discussions about this topic. Ultimately non-tech users have to use the software that somebody has to make, market, maintain, etc., so for most users upspin model simply pushes service providers from web browsers to apps where service providers are still very much in control, not users.

I think only protocols could give some control to the users for the longest time. They could enable competition among implementations essentially preventing implementations from taking roads that punish users. They would have to find different and user respecting ways to make money. But even then a big corporation with too much money could monopolize the whole market, "extend" protocols and drive all of the user respecting competition out of business or make them play by their rules. And we are back at square one, thinking about taking back control. Something in software has to change very profoundly to change this, like making software that doesn't need maintenance or at least needs so little of it that anyone could afford to do it for their entire life.

> I think only protocols could give some control to the users for the longest time.

They do. If you run your own web or blog server you can provide resources at HTTP URLs forever. The server cost is dropping to zero either by running it on your home router or paying pennies/free for a small cloud instance. The biggest expense is owning and maintaining a domain name forever.

If Upspin takes off, you could provide resources at Upspin file paths forever without needing your own domain.

The problem is people like social sites, most of us now expect to be able to like and share and comment on resources. So although I will consider Upspin as a way to provide pictures to friends and family who aren't on Facebook, I'm unlikely to ever stop also putting them in the walled garden on Facebook for the upvotes. And building a social protocol in which Facebook will give a damn competing with other implementations seems vanishingly unlikely right now.

Can this enable following scenario:

I've a 10 GB file I want to share with the world. But my bandwidth is usual cable modem and I certainly can't take on 100,000 simultaneous download. Does protocol in Upspin enable peer-to-peer file sharing so I can share large files with the world while still on limited bandwidth?

In Upspin you typically store your files in a remote server somewhere on the net (our default implementation uses Google Cloud Storage as its storage backend), which is something you need to pay for.

The solution to the problem you describe is BitTorrent.

Dat just put out a desktop application. unlike BT it supports mutable archives.


Awesome! Mac only for now though :(

or IPFS.

I think it is a brilliant idea to merge file paths with email addresses. In hindsight it feels like a logical next step from services such as Dropbox, where everything is about sharing files. You share a file and the other person gets sent an email notification that you've shared that file with them.

It's great to see that they try to think outside the box and actually innovate on something that the Dropbox engineers also could've (and now also "should have") thought of.

Now when I think about it, Dropbox could still beat them to market by implementing and marketing this type of file addressing with their existing userbase.

To be fair, the way to reference remote files like this is hardly novel. For example the Unix rcp(1) command has supported a very similar username@example.com:path/to/file syntax since at least 4.3BSD-Reno (1990): https://www.freebsd.org/cgi/man.cgi?query=rcp&sektion=1&manp...

I know this syntax from ssh. But up until now nobody framed this technicality into the "email" context where even my grandmother could understand the concept!


The left hand doesn't seem to know what the right hand is doing it seems. Why not use another Google backed project, grpc, for their rpc stuff? I can understand not wanted extra dependencies, but then why depend on protobuf?

Looking at the overview, this seems to have a central key server at key.upspin.io... Why not use some decentralized mechanism like dns txt records?

Domain names are harder to get than email addresses.

If enneff is still listening...

How many files per directory has this been tested on?

I work on a very similar system with similar goals (global file namespace which will return a list of physical file locations and a collection of software based on that), but for scientific data and we've been trying to scale to millions of files per directory. It'd be great to have a baseline.

> How many files per directory has this been tested on?

Not a massive number. The current DirServer implementation will likely run into problems once the corpus reaches a certain size. But there can be more than one DirServer implementation, as long as it satisfies the interface. (https://godoc.org/upspin.io/upspin/#DirServer)

What protocol does this use or does it require a specific application? Is it HTTP or something else? If I wanted to use Upspin to host files for my biddle project (https://github.com/prettydiff/biddle) how would I retrieve those files?

There is a client library for accessing the files.

Similar in intention and scope to Keybase file sharing scheme.

> The full path might be ann+camera@example.com/video.jpg, with the idea that every read of the file retrieves the most recent frame.

Curious how this would work implementation-wise. Since the DirServer knows the file size (or at least has block descriptors), how would a custom StoreServer for the camera send back a dynamic-sized block? Would you define one block in the DirServer with a max size and allow the StoreServer to respond with a frame of any size below that max? Does the client validate the block sizes equate between the DirServer and StoreServer? Would the DirServer have to be "in on" the camera implementation alongside StoreServer for any of this to work?

Yes, I think it would require a custom directory server as well (as do other proposed features in the intro document), but at least for special addresses like this that shouldn't be much of an issue

> Other than occasional workarounds using a URL, information given to these services becomes accessible only through those services. If one wants to post a Facebook picture on one’s Twitter feed, one does that by downloading the data from Facebook and then uploading it to Twitter. Shouldn’t it be possible to have the image flow directly from Facebook to Twitter?

I'm not sure I understand the problem with URLs. "Copy share link" is a standard operation both on the web and in apps.

The share link is usually an HTML page, rather than an image. Either the user needs to figure out the image URL or the service needs to implement individual decoder for every type of share link.

Exactly. Upspin seems like the absolute golden ideal of file sharing, at least to me. Even as a kid I remember wishing I could do something as easy as share a link to "C:\picture.jpg" and just have the person I sent it to be able to view it. Right now, I can get close with Dropbox's "copy sharable link", but I am definitely looking forward to an even better user experience, either via IPFS or Upspin.

i do really like the whole email approach in the filename to this.

Also regarding Renee French's artwork - I much prefer this logo over the Go mascot. Upspin's mascot has a fully awkward tone and appearance, whereas Go's mascot has a more polished tone that contrasts too much with the otherwise awkward appearance.

Why no federation for key servers?

It'd be interesting to talk about how this might work.

SRV record for the domain part of the "email" address pointing to the key server for that domain. Isn't that the standard solution for this sort of thing?

Yeah, that's exactly what I had in mind.

I'm worried that there are no user-facing hashes, it seems.

Wow, this is really brilliant, and there's the potential for high disruption. Only for file sharing? Not as I see it. I'll have to make time to try this out asap.

Please share your thoughts for other uses. I'm curious.

I will, once I've had a chance to look at it more closely. It's captured my imagination. I'd forgotten about ipfs.

And, not that it's the main idea, but merging the file paths and an email address is something 'everyone' can understand. Twitter built a whole business around @somebody.

edit: Many potantial uses are described in the overview. Read the whole thing. https://upspin.io/doc/overview.md

You could use this to gather a large amount of social-graph data. I wonder, are people considered intimate in proportion to the amount of data they share?

"I hope not"

--the CERN team

it's not clear to me if they store the files on their servers so others can access them when my computer is offline

They do not.

This could really use a demo video of usage.

I feel like it's still early days, but that's a good suggestion!

I filed an issue: https://github.com/upspin/upspin/issues/209

From the company that doesn't offer Linux support to their "drive"...

there are 14 competing standards.... We need to develop one standard that covers everyones use cases


"It's early but again, we want all your and your families files and we want to train our AI from it"

How do you deal with the piracy problem? What steps guarantee that I cannot share a copy of e.g. a movie I do not own via the service?

Edit: google drive for example has a list of known hashes of pirate files and you can't share links to files matching those. I imagine something similar will be in effect here?

Upspin users maintain their own storage servers, or use storage servers maintained by someone else. The Upspin project itself doesn't have any authority or control over the type of content stored or shared with the system.

> google drive for example has a list of known hashes of pirate files and you can't share links to files matching those

It's actually impossible for us to know what people have stored. It's all encrypted.

BTW, if you can find a way to "guarantee" that you cannot share a copy of a movie, you should talk to the movie industry and make a billion dollars. :-)

Why do they have to deal with the piracy problem?

Because it's 2017 already. We, as programmers, should strive to make ethical software that doesn't infringe others' rights. Even Google's core motto is "don't be evil". The RIAA and MPAA are the supporting pillars of our culture - if they go down, our own culture dies. Making software that facilitates eroding the foundations on which our culture is built, seems pretty evil to me. Leveraging your position as a too-big-to-fail software company to steal billions of profits from poor artists seems pretty damn evil to me.

I just want Google to be better - I want them making ethical software that respects people's rights, not havens for thieves.

Sorry, my question was more about the "they" than about the "why". Why does ZFS have an ethical responsibility to prevent you from saving a copyrighted file? Why does TCP have an ethical responsibility to prevent you from transmitting copyrighted data? Why do we have an ethical responsibility to build DRM into more things?

If the MPAA and RIAA can't find a business model that works without restricting my freedom, maybe they should go down, and maybe your culture should die.

Unfortunately, the cost of preventing piracy is building a global censor with authority to veto any use of the system. Which can also be seen as "evil".

[Update] I've missed the upspin.io link, which wasn't in article. Now I see (unless I'm misunderstanding it) it's something similar to to Keybase Filesystem, except distributed. That's a good thing, then. My apologies for not getting this before writing the original comment.


So... they had reinvented HTTP, and had thrown in some email-based authentication module, and formalized aeons-old /~username/ spec into /user@email.address.tld/?

Anyone with the HTTP server can host files. Traditionally, .htpasswd or similar measures were used for access controls. This seems to be different only in a sense that it uses emails (found no docs and too lazy to read the code).

I've glanced over the repo and I don't get what all that code is for. A re-implementation of webdavfs that maps first-level directories (emails) to hosts? Or... what? Can anyone link to some docs?

You're definitely missing a lot. Please take a look at the overview doc, it should be mostly explained here: https://upspin.io/doc/overview.md

One of the key elements of Upspin is that all content is signed and encrypted by the client. Sharing is totally in control of the content owners, and the servers need not be trusted at all.

Ah, thanks. Article didn't have a link to upspin.io, so I've missed this link.

Yes, with client-side signing it makes more sense.

So it's something similar to Keybase Filesystem, except distributed? That's a good thing, then.

Yeah it was a mistake not to include that link in the blog post.

It is definitely similar to Keybase Filesystem, and a few other things too. There are many axes for making trade offs in this space, and I think Upspin set of trade offs is unique. We'll see if it works well for people.

There is a reason people use Dropbox not Apache to share files. Unless you can explain why that is, I'm not sure what criticizing their open source work achieves.

I get the reason, but don't think it's similar. (And, AFAIK, Dropbox uses nginx to share files.)

Yet, article clearly said this is not a product, but a set of protocols. So I was wondering what's the point, if the protocols (HTTP, WebDAV) are already there for a long time, and even have native support and can be easily mounted as local drives in most modern OSes.

I've missed the encryption and signing bits. Sorry about that.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact