

My proposal for a distributed social networking protocol - theli0nheart
http://dlo.github.com/2010/05/07/openprovider-a-distributed-social-protocol.html

======
TomOfTTB
The flaw in your proposal is that the problem here really isn't a technical
one. Your argument seems to be "once you can export your connections that
solves all other problems because companies will have to listen to you". But
that's not true.

Right now you could easily create an export system using the e-mail addresses
of your connections. But once you do that you can't interact with Facebook
members.

That's the problem with Facebook and it's root is in a business reality not a
technical issue. Facebook and other social networks realize they lose their
greatest (and maybe only) customer retaining mechanism if they become inter-
operable. Once that happens they become a cheap commodity (much like e-mail
servers are today).

So becoming inter-operable means the end of companies like Facebook being
major players in the industry and they aren't going to do that willingly. THAT
is the problem.

~~~
arethuza
That is true, but I'm pretty sure back in the old days there were multiple
proprietary email systems that didn't interoperate and they eventually all did
end up supporting SMTP eventually.

------
chrischen
Email is already distributed. And two people are linked together simply by
knowing each other's email addresses. Might as well build on top of that.

~~~
Qz
So what differentiates 'social networking' from email? Anything besides making
some/all connections/messages automatically public?

~~~
alanh
Something I have found interesting and useful about Facebook is that it
outlasts email changes as a way for people to stay in touch, and connects
friends who don't even know each others' email addresses.

------
arethuza
"That value you see is not reversible"

Strictly speaking, that is true. But with a brute force dictionary attack all
you do is generate likely email addresses (and these are often very easy to
predict) and apply the same hashing process (which has to be public) and look
it up in your local copy of the database.

~~~
theli0nheart
True, but why would a spammer do that if he already had the address to begin
with?

~~~
arethuza
Actually, I just realized this would be a horrible thing as it provides a
means of checking validity of email addresses - presumably a spammers delight.

~~~
theli0nheart
This could be easily solved if we used some field such as first_name as a
salt. Thoughts?

~~~
arethuza
That makes things a bit better, but a lot of first names are embedded in email
addresses...

------
underscore
I like the protocol-agnostic (and locally-stored) connection information, but
I don't like the fact that I (I'm assuming) have to manually synchronize them
between my devices -- if I understand right, if I friended someone on my
phone, I'd have to somehow synchronize that information to my laptop if I
wanted to use it there. Probably the provider would keep that information
locally, and then prompt my browser to accept a new copy -- I know I wouldn't
want to deal with anything much more fiddly than that. Is there a compelling
reason to have that (edit: where "that" is the connection information stored
in the browser, as suggested in the article when I wrote this) instead of a
defined-in-the-protocol way of on-demand data export?

I guess the hash is just to fool spammers? It seems to add a fair amount of
complication to the protocol to address just that one use case.

How do you define privacy? How does your protocol protect it?

edit: clarify clumsy wording

~~~
theli0nheart
Regarding your first point, yes, I agree it doesn't make much sense to
manually synchronize between devices. Providers would probably need to store
that information to begin with, and you would have the option to export that
information at your leisure.

Yeah, that's the primary use of the hash. It does add a bit of complication,
but I think it's necessary for widespread adoption.

Privacy is provider-specific, therefore it's not up to the protocol to say
what and what shouldn't be private. It's up to the provider.

~~~
underscore
I assume that your centralized repository of (email, id, provider) tuples
wouldn't just spill out all of its contents to the world, preventing mass
email harvesting. Instead, it would reply with the appropriate (email, id,
provider) tuple when asked for an email (hashed or not) that it knows about,
and return nothing or an error or whatever when asked about an email that it
doesn't know about. In either case, as arethuza points about elsewhere in this
thread, it is essentially an email address validity oracle for emails that it
knows about -- hashed or not -- since a spammer can sit there all day and ask
it about random email addresses and record its responses. If you hash an email
address, the only thing you buy is to make the spammer add a line of code to
their script. In short, I think that using a hash is functionally equivalent
to not using a hash, at least for spam purposes. What do you think?

(note that a partial email validity oracle isn't necessarily a deal killer;
PGP/GPG keyservers, for example, are partial email validity oracles, and have
not only existed for years but have also gained acceptance amongst (some of)
the security conscious and the paranoid)

(also note that a hash would be a safeguard (whose effectiveness is dependent
on how resistant the hash function is to first preimage attacks and other
factors) against disclosure following a compromise of the central
database/authority, in addition to its questionable utility at fighting spam
in normal situations)

~~~
theli0nheart
What if a dynamic salt were added to the hash, i.e. instead of just email, it
were email+first_name? I think that would solve most of the email harvesting
issues that have been brought up.

------
petercooper
I never got into it when it had a brief spell of popularity, but hasn't
similar stuff been attempted with FOAF?
<http://en.wikipedia.org/wiki/FOAF_(software)>

------
Qz
What's the point of the id? Coordinating this across multiple providers seems
like a big headache.

~~~
warfangle
What if the ID were similar to a domain name? They stay the same, but the IP
address (in this case, the sha1 of the email address) can change.

What if the DNS model were taken in order to find which social provider a
given user was on? That combined with a trust metric (automatically gleaned?)
between providers would go a long way, I think...

~~~
theli0nheart
Sounds like a pretty interesting idea...can you elaborate a bit more?

~~~
warfangle
Essentially, social platform providers would be hosts to their own users and
each users' respective social graph. Nodes in that graph can be native to the
platform, or external.

External nodes/users can decide how much they want to share with external
providers, on a provider-by-provider basis.

Each provider, thus, would have a record of how each of its hosted nodes
perceives each external provider, trust-wise.

In order for a node to find a new external node, they would use their
contact's email address. The provider would send a request to all providers it
has on record. Depending on how the requestee perceives the requester, he can
either ignore or answer the query. If he "knows" the email hash, he can prompt
that node/user for permission to send data to the requester - and what level
of data to send. If the requestee does not know the hash, the request
recursively travels throughout the social platform providers.

One of the necessary requirements to being a part of this kind of distributed
social network, of course, would be the ability for a user to download their
social graph (nodes of email hashes connected to them) and easily port it to
another provider. Sort of like exporting your email box contents to bring to
another provider.

If we throw out the idea of users having unique IDs other than their email
address, it would be possible to have different social networks on different
providers - e.g., business from one provider and family on another. If a user
deemed it, they could potentially also link several email addresses to the
same account on one provider.

This kind of scheme would have the qualities of being:

* Robust: any platform which follows the protocol can participate; Potentially, if a platform goes AWOL, and the nodes within it had sufficient transparency to nodes outside of it, any given provider could rebuild their social graph for them.

* Trust: Based on how the nodes on a given provider dole out their social graph / user information to other providers, that provider can perceive the general level of trust its users have about that provider. It can pass this metric on to other providers who request it.

* Portability: need I say more? :)

------
xtacy
You might find some more information about such architectures that the
research community is currently looking at: <http://prpl.stanford.edu/> is one
such.

