For the folks who are building this: can you compare and contrast this to both kbfs and IPFS? Why have you chosen to start another project in an already crowded space instead of contributing to either of those projects? They are both open source and much further along in development . . .
One reason we started this project instead of contributing to others is that it's not clear that the trade-offs made by extant systems are really working for users, at a fundamental level. Maybe Upspin's will, maybe they won't. We'll see.
We wrote a bit about this in https://upspin.io/doc/overview.md but there is still more to say. I filed an issue a while back to write a substantial document that compares Upspin to other systems. Hopefully the community can help us flesh it out in time: https://github.com/upspin/upspin/issues/177
Or is it like, "Here is some cool technology!" without identifying a clear user need this solves, like Google Wave?
Google drive means google can see all of your stuff...
Sharing in hangouts is just like drive...
WhatsApp isn't close to this robust...
Wave was just really far ahead of its time but you see the modern incarnation in google docs and paper. This is like the infra to build spideroak on your own hardware.
It's not the same as a service where you have to trust the operator.
Centralized, but not inherently so; you could stand up clones using its same API, and then create a discovery service mapping bucket names (users, whatever) to servers using that API.
I wanted to link to various people who have experienced data loss in SpiderOak (me included) due to bugs in the client which have gone unsolved for at least three years.
But....they seem to have redesigned their homepage and the forums that used to be available seem to have gone.
Here is one example though: https://spekxvision.wordpress.com/2015/10/13/more-spideroak-...
So I can only talk from a personal perspective and my experience with their CTO and support team. I've lost data on multiple occasions due to bugs in the client. This data was not recoverable. I've since switched to Crashplan and am much happier.
From Crashplan's web page I don't see if they do file sync, what do you recommend for this domain? I'd use OneDrive, but supported Linux client is a must.
Is this an early sign of Google going back to its "not evil" roots or do I read too much into this?
Edit: for context, I used to be a google fanboy and I still to sone degree recommend some of their products. I just got a bit fed up with the butchering of xmpp and possibly a few things I can't remember right now.
Google has copyright on the code and runs the infrastructure?
Very much a Google product to me. Paid or not. Official or not.
> Terms of Service
Really a shame.
Upspin is a separate thing that focuses entirely on how files should be stored and identified, and (AFAICT) does nothing new regarding identity management. Which is unfortunate, because that seems like the more urgent problem right now.
kbfs still stores files, and I still depend on them to keep their service running. kbfs adds key management via keybase, but the file sharing model doesn't seem significantly, or at all, different than Dropbox.
IPFS on the other hand is completely peer-to-peer, meaning I have little control over content I publish. I can publish new versions, but my old content is still on the network, and forever out of my control.
Upspin provides a middle ground by providing a protocol for key management and server lookup. Unlike kbfs, the files live on servers that I maintain control of. Unlike IPFS, I can delete or update content after it is published, as needed.
AIUI, the keyserver is centralized in two regards: a) if it's down, I can't access anyone's data and b) it centralizes trust, key.upspin.io has complete control over which key belongs to which person and where the data is, so it can just take over accounts. Why not use a federated model, e.g. putting the directory into the DNS or have the directory server listen on a well-defined port or something? Or, if you want the usability of not having the E-Mail hoster necessarily deal with upspin, do the federated thing first and if that fails, fall back to the centralized keyserver?
There seems to be a 1:1 mapping of usernames to keys, meaning I have to share my private key with all my devices. If one device gets lost or compromised, I now have to revoke my key and rotate it on all devices. Why not putting in a 1:n mapping of usernames to keys, so that each device gets its own key?
Why store the upspin-specific keys in ~/.ssh? I won't use them to ssh anywhere and there already is a perfectly fine directory ~/upspin for upspin-related data.
The way sharing works means, that storage for a file grows linearly with users that a file is shared with. This would seem to preclude from sharing files with large non-public audiences (say, I have a group for all employees of my company, or members of my hackerspace, or attendees of a conference…).
1. The email address adds a layer of indirection on top of your upspin directory and storage server addresses. You can migrate upspin providers without changing your identifier in the global namespace.
2. The email address is instantly recognizable to friends and family. It's more user-friendly than introducing a separate decentralized upspin address (like Jabber IDs). You did mention a hybrid approach which uses the centralized keyserver as a fallback though.
A decentralized keyserver still seems more natural to me. Would be great to hear more on the reasoning from the creators of upspin.
Also, I note that they can set access levels based on email addresses, but I am unsure how that would work? Would those emails have to be linked to a Google account so that Upspin could check the currently logged in Google account to allow/disallow access to the files?
The email addresses are Upspin user names whose public keys are registered with a central server, key.upspin.io. To act as an Upspin client, you need to sign up: https://upspin.io/doc/signup.md
Requests made by Upspin users are signed with those public keys. Servers validate those users by validating the signatures against the key published by key.upspin.io.
There is an "all" permission that shares with the world (see https://upspin.io/doc/access_control.md) but in most cases we expect people will share with a specific set of named users.
It sounds like the project code is all publisgedcthough, so it should be possible to operate a private or alternative upspin network.
> Does it mean that people don't work on it full-time and it's a 20% project?
That there are a few of us paid full-time to work on this, plus some 20% contributors.
> Does it mean that it was developed at home without using Google's resources, so they don't have an ownership interest in it, but they are letting it be hosted on their infrastructure and announced on their blog to be nice?
No, this was developed by Google employees on Google's time.
> Does it mean that Google isn't sure how long it will want to continue to contribute resources?
I'm not sure what "Google" thinks about this, but I do know that the foreseeable future this is what I and my other teammates will be working on.
> Does it mean that support is only available on an ad-hoc basis (isn't that the case with all free Google products)?
There is definitely no support for Upspin beyond us and the community helping people on the mailing list.
> Does it mean that serious customers shouldn't rely on it to work reliably?
As is stated in the README, Upspin is still very rough. It's early days.
> Does it not really mean anything, and just thrown on there for legal protection (against what?)?
I'm not sure what "unofficial" means in this case but I hope I have answered your questions sufficiently.
Okay, so since i wrote this disclaimer, and the policy that requires it, let me try to explain this simply.
Historically, people were discovering projects were made by googlers, etc, and thinking this somehow meant it was an official google thing that google was supporting (though, admittedly, i have no better idea of what that really means at heart than anyone else).
You'd even see tons of press stories about how Google had done x, y, or z, and worse, people would take it as a sign of strategy or best practices or whatever. IE "Google released j2objc, and thus thinks you shuold write your ios apps in java".
But Google hadn't done anything. A bunch of people who worked there had done something, and someone discovered they were googlers (usually. There were a number of cases of people trying to associate their project with google in order to try to gain publicity, etc, but this was significantly more rare)
So I got asked to go solve this problem, and tried to do so in the lowest effort (for everyone involved) and simplest way:
Anything that was not an official google product, was marked as such. Now press, etc, can't claim they thought it was official anything :) People look at it get the right impression, even if they don't know precisely what it means.
It raises a few questions sometimes (what does it mean to be official), but for over 5000 projects, i think the number of questions of people trying to understand what i means is "small".
In that regard, i believe doing this was a resounding success.
This was simple back in the day.
These days, there are projects, like upspin, that seem to be like they are are somewhere in the middle (honestly, i haven't looked at all to determine it).
So maybe i'll reevaluate.
 Almost as successful as me marking Chromium "copyright the chromium authors" (now copied everywhere), with nobody really understanding why that was done.
Edit: looking at the LICENSE file, scratch that, it appears Google doesn't even own the copyright.
In order to contribute you have to sign the standard Google Contributor License Agreement (either  or the corporate version) which gives Google a perpetual irrevocable copyright and patent license. Since the project is BSD licensed anyway, unless you're contributing something you intend to patent, you're giving Google nothing by signing the CLA: you still own the actual copyright. (Based on my skimming the agreement; IANAL)
How does my upspin file "email@example.com/pub/hello.jpg" solve the problem here? I would have a single source for my image to share but still no way to describe an image hosted by FB as an upspin address.
Aside from the content-addressed bit, it sounds to me like in many ways this idea is similar to OAuth or even SAML both in purpose and ambition, prescribing a standard way one can punch a delegated-access hole into restricted space. Upspin would then act like an overlay network to locate files, then hand the authorization decision down to the implementing system.
Back in the day, lots of people bought into OAuth because not doing so offered no competitive advantage, but rather resulted in an explosion of tools and integrations that greatly benefited all those services. This believes that others would be tempted as well.
To describe the content, the idea was everyone would use microformats:
- It can be privacy preserving. Say you want to buy booze from an online bottle shop. In real life, you'd usually flash your ID to meet the over 18/21 legal requirement. But by doing this you're leaking a tonne of unnecessary information (your name, your address, even your date of birth). One implementation of an ABAC might be that you prove your identity to a mutually trusted third-party, who than asserts to the bottle shop that your attribute of 'over18' == True (this is a rough description of OpenID Connect). So the only thing the bottle shop learns about you is the only thing it needs to legally know.
- Flexibility. Say you're unemployed and receive welfare from the government. Under a roles-based system, they'd attach a role to your identity along the lines of 'is entitled to receive welfare'. However, if any aspect of government policy changes (e.g. an additional requirement that you have to be over 6 feet tall), they would have to audit each account and update the roles. Under an ABAC, they'd simply update the ABAC policy, and access control decisions would be made according to the new policy (assessed against your asserted attributes). It also means that access to new types of resources don't necessarily require new roles to be added to the system: it might already be covered by existing attributes within the system (i.e. attributes can be recombined in different ways to allow very finely grained access control without the overhead of maintaining an ever-growing list of roles).
For me it sounds like roles are being replaced with new set of derived attributes (isOver18, is6feetTall) that you'd still need to compute when the government policy changes. Maybe the main benefit would be the possibility of reuse but that could also be achieved with a sufficiently sophisticated role system.
The problem seems to be distributing the content, not the naming of it.
Edited to add: Ideally that's what the URL is for, it should be possible to get the URL of the image on FB and post it to twitter, since both speak DNS/HTTP.
No new tech is needed, only the will to open up the walled gardens.
By the way, this was sort of taking shape for a while with 3rd-party aggregators ("all your social media in one feed!"), but that approach just withered away with the growth of FB into an unstoppable juggernaut.
EDIT: Which isn't to say that Upspin doesn't have some very interesting uses, but the barrier of UGC walled gardens not wanting to open up in the first place makes this particular use-case moot.
(I guess it depends if Twitter would _then_ ingest the image to their servers or simply continue to reference the upspin url/uri publicly - which was the original use-case)
In particular, which email address should you use? Will the general public learn your email address? What about people who want to post publicly and keep their email addresses private?
What happens if you switch email providers?
Also, there is there any provision for actually making public files available via https? How would that work?
For the moment, using your github username feels safer, since they are public from the beginning, intended to be permanent, and detached from your email addresses. A Twitter account is another intentionally-public userid that's not tied to anything else.
Yes, if you store firstname.lastname@example.org/dir/file and share the file path publicly then the valid email address "email@example.com" leaks out. If you own your own domain then you could have one or more email address only associated with Upspin and store at firstname.lastname@example.org/dir/file. The downside is the username in such paths is less recognizable to your friends than your regular email.
The reality today is anyone who uses any Google service or who has a Facebook page has an exposed @gmail.com or @facebook.com email address that can be spammed, and if he or she signs up for Upspin it's another way for it to leak out. You can use any email address that can receive the confirmation email; I'm not sure if email@example.com is a valid email address, and there's no email address with your Twitter handle.
> What happens if you switch email providers?
You would execute a new `upspin signup -dir=dir.example.com -store=store.example.com you@newEmailAddress.com` command, keeping the same directory and storage servers. The question is whether you can tell your directory server and storage server to reuse or alias your directory and storage to those of your previous email address.
Long-lived public files seem like a different use case, and one that is already generally handled okay by traditional hosting or file sharing.
(Should a link be shared or the data copied?)
Edit: I sound like a total fangirl :/
Plus, when I write Java I have to constantly specify new Exception types. I don't see anything wrong with having to do it for Go when I really want more or less data depending on circumstance.
It's worth noting /u/eneff's comment about the initial code though:
To me, the more interesting aspect is that the server has zero knowledge of the contents of the file, which makes it more comparable to SpiderOak One: https://spideroak.com/manual/send-files-to-others
Actually if you read the design document it sounds like they want you to provide some wrapper server to access, say, your photos on Google Photos through the Upspin protocol. Your photos would stay there, but authentication would be centralized on Upspin.
Andrew Gerrand worked with me on Camlistore too and is one of the Upspin authors.
The main difference I see is that Camlistore can model POSIX filesystems for backup and FUSE, but that's not its preferred view of the world. It is perfectly happy modeling a tweet or a "like" on its own, without any name in the world.
Upspin's data model is very much a traditional filesystem.
Also, upspin cared about the interop between different users from day 1 with keyservers etc, whereas for Camlistore that was not the primary design criteria. (We're only starting to work on that now in Camlistore).
But there is some similarity for sure, and Andrew knows both.
I think both Camlistore and Upspin have promising models.
You mean work is progressing on camlistore? That would make me a little bit happier : )
Camlistore aims to be your repo of all your stuff that you may want to share with other people at a later time. It wants to be the repository of all your life and everything that happens.
Upspin aims to be a unified protocol for all applications to access (and possibly modify) data, wherever it is: maybe it is your website on your server, maybe it's a NASA dataset on some S3 tenant, maybe it's an imgur gallery, maybe it's an OpenStreetMap dump of their DB; the goal AFAICS is to give any application access to data regardless of where and how it is stored, so that applications can do what they're best at.
This definitely does not look "easy to use" unless the target audience of families only includes families where all have technical backgrounds. If I gave anyone in my family that list of instructions they would stare at me blankly. There is a reason Dropbox does so well, its simple and doesn't require you do any manual fiddling.
From the readme: "Upspin has rough edges, and is not yet suitable for non-technical users."
And once this framework/API work is complete, people will be able to use this to create customer-friendly solutions.
Cue someone linking to that that HN chain of messages about how Dropbox just won't work.
It sounds like it ought to be nigh-trivial to bolt IPFS support onto it, so that the file server is replaced by IPNS (can't be IPFS because Upspin is mutable). So, you look up firstname.lastname@example.org/fnord and the directory server hands you '<hash>/fnord', and the client library then goes and looks up the data via IPNS. No file servers required.
Of course, this means that the only way of doing ACL enforcement is via encrypted data which is unlocked via ephemeral keys handed out by the keyserver, which as a solution is pretty terrible, but ACL enforcement in a decentralised system is practically impossible anyway.
For these reasons, I find it conceptually most akin to BitTorrent when combined with Magnet Links that are shared by strongly-authenticated identities than to any other distributed content store project.
Has any thought been given on that?
Yes it is. An Upspin file path like email@example.com/dir/file is not obviously part of any scheme. If Upspin takes off, all the automagic parsing of plain text (that makes firstname.lastname@example.org and news.ycombinator.com/formatdoc automatically turn into a clickable e-mail link and an HTTP URL) could be extended to turn email@example.com/dir/file into a link that fires up a GUI to the Upspin service. The problem is that the file path also looks like an HTTP/S URL for https://example.com/dir/file specifying the username ann, so copying it into your browser's location bar without a scheme is fundamentally ambiguous.
Even though Upspin uses HTTPS for its API (e.g. https://store.upspin.io/api/Store/Get/something), there doesn't seem to be an HTTPS URL for access to a particular file. Since firstname.lastname@example.org is by design just an email identifier disconnected from the directory server and storage server that ann uses to store her files, clients can't skip the root key server and talk to example.com to request ann's files, even though in some cases it will be the same server name.
More on https://godoc.org/github.com/upspin/upspin/upspin
Why can't we come up with something similar but using HTTPS and Public Key Infrastructure that I can host on my own server but still be interoperable?
One other downside: what if the owner of a shared file passes away, or a worker leaves the company, etc.? I guess if the file's that important, then I would have to have downloaded the file ahead of time, and hosted it on "my upspin"? Not saying this is an easy thing to solve for...but maybe ipfs at least addresses that...unless I'm missing something, and upspin does accommodate that use-case?
Well, there were many discussions about this topic. Ultimately non-tech users have to use the software that somebody has to make, market, maintain, etc., so for most users upspin model simply pushes service providers from web browsers to apps where service providers are still very much in control, not users.
I think only protocols could give some control to the users for the longest time. They could enable competition among implementations essentially preventing implementations from taking roads that punish users. They would have to find different and user respecting ways to make money. But even then a big corporation with too much money could monopolize the whole market, "extend" protocols and drive all of the user respecting competition out of business or make them play by their rules. And we are back at square one, thinking about taking back control. Something in software has to change very profoundly to change this, like making software that doesn't need maintenance or at least needs so little of it that anyone could afford to do it for their entire life.
They do. If you run your own web or blog server you can provide resources at HTTP URLs forever. The server cost is dropping to zero either by running it on your home router or paying pennies/free for a small cloud instance. The biggest expense is owning and maintaining a domain name forever.
If Upspin takes off, you could provide resources at Upspin file paths forever without needing your own domain.
The problem is people like social sites, most of us now expect to be able to like and share and comment on resources. So although I will consider Upspin as a way to provide pictures to friends and family who aren't on Facebook, I'm unlikely to ever stop also putting them in the walled garden on Facebook for the upvotes. And building a social protocol in which Facebook will give a damn competing with other implementations seems vanishingly unlikely right now.
I've a 10 GB file I want to share with the world. But my bandwidth is usual cable modem and I certainly can't take on 100,000 simultaneous download. Does protocol in Upspin enable peer-to-peer file sharing so I can share large files with the world while still on limited bandwidth?
The solution to the problem you describe is BitTorrent.
It's great to see that they try to think outside the box and actually innovate on something that the Dropbox engineers also could've (and now also "should have") thought of.
The left hand doesn't seem to know what the right hand is doing it seems. Why not use another Google backed project, grpc, for their rpc stuff? I can understand not wanted extra dependencies, but then why depend on protobuf?
How many files per directory has this been tested on?
I work on a very similar system with similar goals (global file namespace which will return a list of physical file locations and a collection of software based on that), but for scientific data and we've been trying to scale to millions of files per directory. It'd be great to have a baseline.
Not a massive number. The current DirServer implementation will likely run into problems once the corpus reaches a certain size. But there can be more than one DirServer implementation, as long as it satisfies the interface. (https://godoc.org/upspin.io/upspin/#DirServer)
Curious how this would work implementation-wise. Since the DirServer knows the file size (or at least has block descriptors), how would a custom StoreServer for the camera send back a dynamic-sized block? Would you define one block in the DirServer with a max size and allow the StoreServer to respond with a frame of any size below that max? Does the client validate the block sizes equate between the DirServer and StoreServer? Would the DirServer have to be "in on" the camera implementation alongside StoreServer for any of this to work?
I'm not sure I understand the problem with URLs. "Copy share link" is a standard operation both on the web and in apps.
And, not that it's the main idea, but merging the file paths and an email address is something 'everyone' can understand. Twitter built a whole business around @somebody.
edit: Many potantial uses are described in the overview. Read the whole thing. https://upspin.io/doc/overview.md
--the CERN team
I filed an issue: https://github.com/upspin/upspin/issues/209
Edit: google drive for example has a list of known hashes of pirate files and you can't share links to files matching those. I imagine something similar will be in effect here?
> google drive for example has a list of known hashes of pirate files and you can't share links to files matching those
It's actually impossible for us to know what people have stored. It's all encrypted.
BTW, if you can find a way to "guarantee" that you cannot share a copy of a movie, you should talk to the movie industry and make a billion dollars. :-)
I just want Google to be better - I want them making ethical software that respects people's rights, not havens for thieves.
If the MPAA and RIAA can't find a business model that works without restricting my freedom, maybe they should go down, and maybe your culture should die.
So... they had reinvented HTTP, and had thrown in some email-based authentication module, and formalized aeons-old /~username/ spec into /email@example.com/?
Anyone with the HTTP server can host files. Traditionally, .htpasswd or similar measures were used for access controls. This seems to be different only in a sense that it uses emails (found no docs and too lazy to read the code).
I've glanced over the repo and I don't get what all that code is for. A re-implementation of webdavfs that maps first-level directories (emails) to hosts? Or... what? Can anyone link to some docs?
One of the key elements of Upspin is that all content is signed and encrypted by the client. Sharing is totally in control of the content owners, and the servers need not be trusted at all.
Yes, with client-side signing it makes more sense.
So it's something similar to Keybase Filesystem, except distributed? That's a good thing, then.
It is definitely similar to Keybase Filesystem, and a few other things too. There are many axes for making trade offs in this space, and I think Upspin set of trade offs is unique. We'll see if it works well for people.
Yet, article clearly said this is not a product, but a set of protocols. So I was wondering what's the point, if the protocols (HTTP, WebDAV) are already there for a long time, and even have native support and can be easily mounted as local drives in most modern OSes.
I've missed the encryption and signing bits. Sorry about that.