
The Data Transfer Project - l2dy
https://datatransferproject.dev/
======
jedberg
I was shocked to see the list of partners involved (which is why I assume
their logos are big and bold and center).

I'm glad to see them participating, but I wonder what their motivation is? Is
it truly genuine?

I know for example that when we made the reddit API people questioned our
motives since "the data was everything" and "how can reddit be willing to make
the data so accessible", but we knew that in reality "the community is
everything".

So I really hope these companies are similarly motivated, knowing that it is
the community and their platform that are their true assets.

~~~
ballenf
> shocked to see the list of partners involved

Google and Microsoft?

Think of these companies' retail successes:

\- Google has basically zero retail services that are not designed to support
its ads business. Chromebook and Pixel devices are arguable exceptions, but as
hardware devices irrelevant to this proposal anyway.

\- MS has Surface, Windows and Xbox.

Google's moneymaker - Ads - wouldn't be affected by portability since it has
no retail customers. MS's products are equally impervious.

Facebook on the other hand would be existentially threatened by this achieving
ubiquity. If social media becomes federated, their monopoly rates charged
advertisers would fall to competitive levels.

There is Microsoft's LinkedIn and it would be very interesting to be a fly on
the wall of the discussions between its execs and the MS execs overseeing this
data portability project. Either they don't take the project seriously or MS
thinks they may gain more imports into LinkedIn than exports. Or at least
enough to come out even.

~~~
golem14
<snark> I guess this is why Microsoft has a great portable format for word,
and moving from/to Word is a piece of cake. </snark>

~~~
int_19h
Word supports ODT.

------
ilovetux
I'm very suspicious of this. The most dangerous thing here IMO would be if it
were to allow these companies to share data among themselves, a data cartel
so-to-speak. Currently from the website (emphasis mine) "enabling a seamless,
direct, _user initiated_ portability of data". I worry that they might simply
remove the "user initiated" part after adoption hits critical mass. I'm now
following the development on github [0]...

[0] [https://github.com/google/data-transfer-
project](https://github.com/google/data-transfer-project)

~~~
bwillard
Howdy (I work on DTP),

I'd say suspicion is always wanted with things like this. If you know of other
services you would like to see data transfer to/from, please let us know, we
want this be open to everyone, big and small, and are looking for suggestions.

FWIW, the team that is building this at Google is the same team that builds
Takeout, so we've have been trying to give users useful tools for leaving
Google a while now. We think giving users the ability to directly move data to
a new service provider is the next evolution of the Takeout ethos about not
locking users in.

~~~
ilovetux
Thanks for the reply, I've had some downtime today to look over the
documentation. It looks like it's pretty solid, but I have a way to go, but
I'm actually planning on becoming active on github for your project. The whole
java thing kind of irks me, but hey I did fine with it in college.

How welcoming of contributors is the project?

~~~
bwillard
Super welcoming :).

Re: Java, ya its in the roadmap that the adapters should be language agnostic.
Forcing them all into one language, regardless of what it is, is kind of lame.

Check out [https://github.com/google/data-transfer-
project/blob/master/...](https://github.com/google/data-transfer-
project/blob/master/Documentation/Developer.md) for ways to stay in contact
with us and start contributing.

~~~
ilovetux
Awesome! Thank you for taking the time to respond.

I will try to refrain from criticizing the project until I am more familiar.
The next time I interact with this project will be on github, my username
there is the same as it is here.

------
clavalle
Let me repeat this loudly so people in the back can hear:

TRANSPORT MECHANISMS ARE NOT DATA STANDARDS!

It is great that they want to build on REST and use common authentication
standards and what-not: but that's not the hard problem. The hard problem is
the data itself (including the critical structure of the data) and getting it
to agree.

I liken it to someone that wants to build a rail system out of LEGO and then
saying it is a 'standard system' because the LEGO are all the same...but they
don't say anything about the size of the tracks, the overhead clearance, how
the train systems should act in case of junctions or possible collision.

This is yet another 'data transfer standard' that hand-waves away the hard
part -- the inter-compatible data model.

~~~
red0point
Let me shout from the back so you can hear:

BUT THEY HAVE AN INTER-COMPATIBLE DATA MODEL?!

Or at least they're trying to create one. They talk about it in their
overview: [https://datatransferproject.dev/dtp-
overview.pdf](https://datatransferproject.dev/dtp-overview.pdf)

And the actual implementation of the model is on Github as well:

[https://github.com/google/data-transfer-
project/tree/master/...](https://github.com/google/data-transfer-
project/tree/master/portability-types-
transfer/src/main/java/org/datatransferproject/types/transfer/models)

~~~
edraferi
They have a couple Data Models:

Calendars [0] is a simple Event + Attendees model Contacts [1] Re-uses vCard
Mail [2] wants RFC 2822 compliant strings Photos [3] have just basic metadata
with a URL to fetch pixels Tasks [4] is a simple to-do model

Overall the data model seems like a significant amount of wheel-reinvention.
These data models are all just JSON records... they should be JSON-LD with
@context pointing to a shared schema, probably defined at schema.org.

[0] [https://github.com/google/data-transfer-
project/tree/master/...](https://github.com/google/data-transfer-
project/tree/master/portability-types-
transfer/src/main/java/org/datatransferproject/types/transfer/models/calendar)

[1] [https://github.com/google/data-transfer-
project/blob/master/...](https://github.com/google/data-transfer-
project/blob/master/portability-types-
transfer/src/main/java/org/datatransferproject/types/transfer/models/contacts/ContactsModelWrapper.java)

[2] [https://github.com/google/data-transfer-
project/tree/master/...](https://github.com/google/data-transfer-
project/tree/master/portability-types-
transfer/src/main/java/org/datatransferproject/types/transfer/models/mail)

[3] [https://github.com/google/data-transfer-
project/tree/master/...](https://github.com/google/data-transfer-
project/tree/master/portability-types-
transfer/src/main/java/org/datatransferproject/types/transfer/models/photos)

[4] [https://github.com/google/data-transfer-
project/tree/master/...](https://github.com/google/data-transfer-
project/tree/master/portability-types-
transfer/src/main/java/org/datatransferproject/types/transfer/models/tasks)

------
corobo
This is how I find out I've still got *.dev pointing to localhost... somewhere

~~~
pvinis
I think all .dev is owned by Google, right? I guess they are using that since
they are a contributor on DTP.

~~~
corobo
Oh yeah it is, I just mean for my local projects I used to use .dev and I
still had *.dev pointing to 127.0.0.1 but not on my local machine

It turns out I'd also set it up in my network's DNS server at some point

------
koolba
The linked PDF has more info on what this is about:
[https://datatransferproject.dev/dtp-
overview.pdf](https://datatransferproject.dev/dtp-overview.pdf)

The use cases and data model sections are particularly interesting. My initial
thoughts are that this is wildly unrealistic and will lead to the usual N+1
data formats issue that plagues other multi-company initiatives.

------
lkoolma
GitHub: [https://github.com/google/data-transfer-
project](https://github.com/google/data-transfer-project)

Several other companies are working on it in various forms. Disclaimer: I am
working for one of those companies.

~~~
arxpoetica
What's your company?

------
flying_sheep
The intention of this project is great. But I doubt if it can even work well.
The whole project depends on the fact that everything to synchronize can be
standardized. But this is almost impossible.

Let's say "Birth of Date". Some contact providers support BOD without year.
But some support only the whole date. If the companies do no even agree on
this tiny little item, how can they agree on larger deviations? Like max
number of phone numbers in a contact profile? The size/format of profile
image? Address format? and so on

~~~
rdiddly
To me it doesn't seem like the rocket science you're making it out to be. Most
teams store data in non-silly ways that correspond to the common reality that
the data represent. For example everybody knows what a date of birth is, and
that it has a day, month and year. If the year is missing from the source
data, you set it null in your intermediary data model. If it's required in the
destination data... basic solution would be choose a default and notify the
user; deluxe solution would be to give them some choices up front what to do
about it (since you'll already know about the mismatch based on knowing the
source & destination providers).

Max number of phone numbers: if the source contains more than the
destination's max, stop at the max (basic) or display a list and ask the user
to eliminate some (deluxe).

------
politician
> A user’s decision to move data to another service should not result in any
> loss of transparency or control over that data.

> It is worth noting that the Data Transfer Project doesn’t include any
> automated deletion architecture. Once a user has verified that the desired
> data is migrated, they would have to delete their data from their original
> service using that service’s deletion tool if they wanted the data deleted.

This project has copy, not move semantics. Therefore, in contrast to the
stated purpose of allowing users to control their data, it actually has the
opposite consequence of making it simpler to spread users' data around.
Without a delete capability, the bias is towards multiple copies of user data.

This project normalizes web scraping to export data from non-participating
APIs that project partners benefit from asymmetrically by establishing this as
an open-source effort. In other words, API providers that do not provide
export tools will nonetheless be subject to DTP adapters that exfiltrate data
and send it to the (no doubt excellent) DTP importers maintained by DTP
partners. This has the effect of creating a partial vacuum, sucking data from
non-participants into participants' systems.

The economics of maintaining a high-volume bidirectional synchronization
pipeline between DTP partners guarantees that these toy DTP adapters will not
be the technology used to copy data between DTP partners, but rather, a
dedicated pathway will be established instead. In other words, the public
open-source DTP effort could be understood as a facade designed to create a
plausible reason for why DTP partners have cross-connected their systems.

TLDR:

\- Copy semantics are counterproductive to the goal of providing user control
of their data.

\- The approach of using existing APIs to scrape data from non-participating
vendors is a priori hostile.

\- Economics dictate that the lowest cost option for providing bidirectional
synchronization between vendors involve dedicated links and specialized
transport schemes that DTP project itself does not provide equally.

There is some merit to providing abstract representations of common data
formats -- look at EDI, for instance. I'd welcome someone from the project
stopping by to explain away my concerns.

~~~
bwillard
Howdy, (I work on DTP)

I wanted to provide my thinking on some of these very valid wories,

Re: Copy vs. Move: This was a conscious choice that I think has a solid
backing in two things: 1) In our user research for Takeout, the majority of
users who user Takeout don't do it to leave Google. We suspect that the same
will be true for DTP, users will want to try out a new service, or user a
complementary service, instead of a replacement. 2) Users should absolutely be
able to delete their data once they copy it. However we think that separating
the two is better for the user. For instance you want to make sure the user
has a chance to verify the fidelity of the data at the destination. It would
be terrible if a user ported their photos to a new provider and the new
provider down-sampled them and originals were automatically deleted.

Re: Scraping Its true that DTP can use API of companies that are
'participating' in DTP. But we don't do it by scraping their UIs. We do it
like any other app developer, asking for an API key, which that service is
free to decline to give. One of the foundational principals we cover in the
white paper is that the source service maintain control over who, how, and and
when to give the data out via their API. So if they aren't interesteed in
their data being used via DTP, that is absolutely their choice.

Re: Economics As with all future looking statements we'll have to wait and see
how it works out. But I'll give one antidote on why I don't think this will
happen. Google Takeout (which I also work on) allows users to export their
data to OneDrive, DropBox, and Box (as well as Google Drive). One of the
reasons we wanted to make DTP is we were tired of dealing with other peoples
APIs, as it doesn't scale well. Google should build adapters for Google, and
Microsoft should build adapters for Microsoft. So with Takeout we tried the
specialized transport method, but it was a lot of work, so we went with the
DTP approach specifically to try to avoid having specialized transports.

DTP is still in the early phases, and I would encourage you, and everyone
else, to get involved in the project ([https://github.com/google/data-
transfer-project](https://github.com/google/data-transfer-project)) and help
shape the direction of the project.

~~~
politician
Hey! Thanks for the response. If you don't mind, I have some questions and
comments after reading through your feedback.

> We suspect that [the majority of users who use Takeout don't do it to leave
> Google] will be true for DTP, users will want to try out a new service, or
> user a complementary service, instead of a replacement.

Interesting, thanks. I think this sort of worldview makes sense from a certain
perspective.

> 2) Users should absolutely be able to delete their data once they copy it.

This is an aspirational statement and not a requirement of DTP, so it's
problematic from a public perception standpoint to make the claim that DTP
provides the user with more control of their data when the control very much
remains at the mercy of the data controller. Indeed, this project directly
facilitates the opportunity for more data controllers to obtain copies of the
subject's data.

> If they aren't interested in their data being used via DTP, that is
> absolutely their choice.

Can you clarify whether you are saying that the DTP Project will honor
takedown requests from parties targeted by DTP tooling?

> Google should build adapters for Google, and Microsoft should build adapters
> for Microsoft.

Can you explain the business drivers that incentivize these companies to
provide parity between their import and export capabilities? Does the DTP
Project require parity between these capabilities?

~~~
bwillard
>This is an aspirational statement and not a requirement of DTP, so it's
problematic from a public perception standpoint to make the claim that DTP
provides the user with more control of their data when the control very much
remains at the mercy of the data controller. Indeed, this project directly
facilitates the opportunity for more data controllers to obtain copies of the
subject's data.

I don't really disagree with what you, but I interpret things differently:

Without DTP, if you ask a data controller to delete your data you have to
trust that they do. There is very little way to verify that the deletion
actually happened, you more or less need to rely on the reputation of the
company. Nowadays they all should have published retention statements which
state their deletion practices in more details, so that helps some, and allow
for some recourse if in fact they aren't following it. But in general for the
average user, it comes down mostly to trust.

With DTP, nothing is worse. But users now can get their data into a new
service easier.

If DTP had move semantics you still have the same problem as above, it mostly
comes down to trust.

It is true that after a copy there are now two copies of the data, which isn't
ideal in terms of data minimization. But because of the reasons I outline
previously, I think it is important to keep deletion as a separate action from
copy. I do think that after a copy the option to delete the data should be
presented to the user prominently to make that as easy as possible if that is
what they want to do.

So DTP isn't trying to solve every problem, but my take is that it makes some
things better without making anything else significantly worse, so it's a net
win.

> Can you clarify whether you are saying that the DTP Project will honor
> takedown requests from parties targeted by DTP tooling?

DTP doesn't really store data, so I don't think it is scope for a traditional
takedown request. But I think more to the spirit of the question, yes if a
service doesn't want to grant a DTP host a API key, or revoked an API, we
wouldn't condone trying to work around that.

(One super detailed note, DTP is just an open source project, and doesn't
operate any production code. A Hosting Entity can download/run the code. A
Hosting Entity could be a company letting users transfer data in or out, or a
user running DTP locally. Each Hosting Entity is responsible for acquiring API
keys for all the services they want to interact with; including agreeing to
and complying with any restrictions that that service might impose for access
to their API.)

> Can you explain the business drivers that incentivize these companies to
> provide parity between their import and export capabilities? Does the DTP
> Project require parity between these capabilities?

This is a little bit of a bet on our part. I think Google has demonstrated,
through its almost decade long investment in Takeout, that giving users more
control over their data leads to greater user trust and that is good for
business.

As for requiring parity, we cover this a bit in the white paper, but as you
say, we recognize the reciprocity is key, and we need to incentive services to
invest equally in import and export otherwise the whole thing falls apart.

Right now the stance we are taking is the reciprocity is _strongly_ encouraged
and we will be collection stats/metrics to try to measure it so we can name
and shame folks that aren't following that. We hope that by providing
transparency around different service's practices in this area will allow
users to make informed decisions about where to store their data.

An interesting thought experiment in this area is that if a user wants to
transfer data from service A to service B, but service B doesn't allow export
back out, what should service A do? Ideally you force service B to support
export, but on the other hand the user should be in control, and who is
service A to say no. Its almost putting the good of an individual user against
the good of the ecosystem.

We are hoping that as the project, and the large portability ecosystem,
evolves there emerges some kind of neutral governance model that can help
mediate some of these issues. It is problematic for service A to decide that
question, but a neutral group representing the interests of users will have
more legitimacy in making these tough questions.

~~~
politician
Thanks for taking the time to provide these detailed follow ups. I'm still
pretty wary of this project, but you've demonstrated that at least one person
on the team is thinking through some of this stuff.

> An interesting thought experiment in this area is that if a user wants to
> transfer data from service A to service B, but service B doesn't allow
> export back out, what should service A do? Ideally you force service B to
> support export, but on the other hand the user should be in control, and who
> is service A to say no. Its almost putting the good of an individual user
> against the good of the ecosystem.

I'll offer that the European Union's answer to this -- the GDPR -- is to put
the data subject first. It would be nice to see the DTP Project align with
that position.

------
Bedon292
What would this actually gain me? Say I transfer my twitter data to facebook
what happens? Does it create new posts with all my tweets, or what? Why would
I bother? It would be nice to have a simple chat to show what it actually
does.

------
brunoluiz
It would be nice to have Apple aboard in this project. As an iPhone user I am
thinking to use iCloud BUT if I migrate to an Android phone I will have to
migrate all my data manually to google drive or onedrive.

------
dyarosla
It’s an interesting idea, and at the same time strange that some of the
companies listed (eg facebook) would be open to contributing to this... esp
given that for them data is their business model.

~~~
CraigRood
I don't think It's strange at all, if anything it gives them access to data
they wouldn't have otherwise.

~~~
dyarosla
But also shares THEIR data they’ve collected with competing companies for
free. Not to mention potentially lowering switching costs for users looking to
move away from their platform.

------
wmf
Data portability is interesting for private data (like the example of photos
given in the white paper) but I don't think it's useful for social data. It
doesn't solve privacy problems; in fact it would give more companies access to
your data. It doesn't solve monopoly problems; when you port data from
Facebook to Google you still have a Facebook account.

------
j88439h84
Is this a competitor to Segment.io?

------
gant
oh look who's once again abusing their monopoly over .dev

~~~
scrollaway
Ok, first of all, how is it a monopoly if you just bought the thing? That's
like saying facebook has a monopoly over the domain name facebook.com.

Second, how is _using it_... _abusing_ it?

~~~
ocdtrekkie
There's a reasonable argument to be made that ICANN should not have sold an
extremely desirable TLD like .dev to a company intending to use it internally
rather than permitting open sale.

Not to mention that it was well-known to be commonly used for internal domains
at other organizations, and they then intentionally acted to break everyone's
internal workflows on a global scale. It'd be kind of like if ICANN went and
sold .local (which Windows servers default to as internal domains.) The other
correct option for ICANN was just to refuse to sell .dev and mark it as an
internal use only TLD.

