
Docracy Terms of Service Tracker - matt2000
https://www.docracy.com/tos/changes
======
dmethvin
Buy.com is now Rakuten.com? Talk about choosing the wrong domain name to
consolidate under!

~~~
binarymax
Indeed. Having recently sat in an Executive style naming competition meeting,
I can confirm the state of multinational corporate decisions involving brands
and names is so intensely bureaucratic and complex that the rational behind
the decision is far removed from common sense. I imagine the process involved
dozens of executives and at least two outsourced firms, marketing managers,
and brand evangelists. Charts were produced and powerpoints created, showing
naming relevance and cultural sensitivities. SEO experts formulated brand
uniqueness prospects for virality and search, self declared linguists gauged
locality based pronunciation effectiveness. Translators estimated ease of use
and consistencies globally. In the end its obvious to them that they made the
correct decision.

~~~
willis77
"We need to ditch this short, easily typed, prescriptive domain name into
something that sounds like a combo move in Street Fighter."

~~~
minikomi
More like 'Rakuten means “positive spirit.” The name Rakuten Ichiba literally
means a “market of positive spirit,” where shopping is entertainment. You
see?'

------
jawns
While documenting small changes probably is OK, I would think that reproducing
long passages of a site's TOS, even in the context of documenting changes as a
public service, could run into copyright-infringement problems.

~~~
uvdiv
Yes, some of these TOS have clear copyright notices. Are these valid?

~~~
_delirium
The copyright notices are valid, but fair-use exceptions still apply, and
fair-use tends to be read fairly broadly with this kind of document, compared
to, say, lengthy excerpts from a novel. I'm not aware if there's any solid
caselaw specifically on quoting excerpts from contracts or licenses, though.

~~~
wisty
IANAL, but I think a big factor would be whether it acts as a substitute.
Since they aren't being used as a TOS on docracy, then it shouldn't be an
issue. If you copy a TOS and use it on your own site, then it could be a
problem.

------
DannyBee
Looks nice, but slightly buggy

1\. <https://www.docracy.com/doc/versions?docId=0b0kbmmpoon>

version 1 looks pretty wrong ([https://www.docracy.com/0b0kbmmpoon/local-com-
privacy-policy...](https://www.docracy.com/0b0kbmmpoon/local-com-privacy-
policy-tos?version=1)).

Wayback has versions from Jan 15th and Jan 16th
(<http://web.archive.org/web/*/http://www.local.com/privacy/>) around the same
time downloaded, and they look more normal.

2\. Edit and download seems to pull the wrong version in some cases:

[http://www.docracy.com/0xk2nizy6sk/fidelity-com-privacy-
poli...](http://www.docracy.com/0xk2nizy6sk/fidelity-com-privacy-policy-
tos?version=1)

(It says version 1, displays version 1, Click edit and download, you get
version 2)

3\. The diff engine doesn't seem to try very hard in certain cases:

[http://www.docracy.com/doc/diff?revisedId=0razhem25wh&or...](http://www.docracy.com/doc/diff?revisedId=0razhem25wh&originalId=0sr8cku71j2#tab_summary)

(The first paragraphs of these documents are a lot closer than it makes seem).
It seems to have a bunch of stream alignment issues, which makes me think you
are using a line based diff here, and post-processing the result.

Anyway, besides the above just found playing around, it looks otherwise nice.

------
onassar
Really rad implementation of a diff system. I've recently been working on a
service for diff/versioning. If you're interested in checking it out, it's
<http://imnosy.com>

~~~
uptown
Cool service. I've always used ChangeDetection, but wished they offered more-
control (better scheduling, frequency of polling, etc.). Are these features
you'd thinking about including?

~~~
onassar
Hey uptown, Long term I'd really love to make this service as robust and
feature-complete as possible, while also maintaining an approachable and
welcoming user experience.

Being able to schedule when you are sent notifications, batch them together
(eg. if you're watching to web pages, you get one email each day at a
specified time), as well as increasing the frequency of polling (goal is
1/hour) is on the list.

I'm working on a feature I see as unique in that it will allow you to choose
which part of the page counts as a change, thus minimizing false-positives.

Can you think of anything else that would be useful? Something frustrating you
of ChangeDetection?

~~~
uptown
I think those are the big things. ChangeDetection doesn't tell you when it'll
poll. I think it's driven by the setup-time, and their queue, but ideally I'd
like the ability to setup a specific time to check a given page, with a
recurring-check frequency, and a way to show the diff between the two in an
intuitive way.

If-this-then-that integration might be cool - but that's probably an edge-case
not useful to most of your potential clients.

~~~
onassar
Agreed. Frequency control would be really great, and an even more intuitive
interface for distinguishing what changes, really quickly, is all on my list
:)

------
noibl
Is there a way to subscribe to change notifications on ToS docs from a
specific company?

~~~
pents90
[I work at Docracy] There isn't, although you may be able to filter the main
RSS feed. That's a good idea, though, we should add a feed to every
document... something I will probably do... right now.

~~~
toomuchtodo
Would you guys be willing to submit the diffs to the Internet Archive in WARC
format?

[http://www.digitalpreservation.gov/formats/fdd/fdd000236.sht...](http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml)

------
rquirk
This is great, though there are loads of spurious differences caused by what
looks like encoding problems. e.g. the blogspot privacy policy
[https://www.docracy.com/0rl3vthb6b7/blogspot-com-privacy-
pol...](https://www.docracy.com/0rl3vthb6b7/blogspot-com-privacy-policy-
tos?version=2) has the classic "WTF UTF?" â€ symbols in there. Previously they
were long dashes or quote marks.

~~~
pents90
[I work at Docracy] C'mon, I wouldn't say "loads"! We are still working on a
few lingering encoding problems, and we usually filter them out when we spot
them. Other spurious changes are from the sites themselves. For example, every
couple of days the IRS changes the date format in their privacy policy for
some mysterious reason:
[https://www.docracy.com/doc/diff?originalId=1frr1ml4lt&r...](https://www.docracy.com/doc/diff?originalId=1frr1ml4lt&revisedId=0p9iqzn4re3#tab_summary)

~~~
rquirk
heh, weasel words are weasely. sorry. Skype had a lot of changes due to quote
marks changing too. don't take it the wrong way though, diffs are tricky i
guess. (mobile typing, forgive brevity)

------
jakub_g
I was thinking one day to build a similar kind of service, but rather per-user
(you upload a URL of your thing and we'll track it). But I've never started
after some analysis. It's not trivial to do it in a general automated manner
on a massive scale due to the problems with diffing PDFs, changing URLs etc.
(I was considering primarily the tables of banking fees & provisions). Maybe
one day... ;)

~~~
demoo
Let's say I have a url of a certain file, how would I go about to compare and
highlight the changes (in an automated way)?

~~~
jakub_g
There are known command line tools for that [0] since many years. While it's
easy to do it on purely text (ASCII) files, it's a bit more work on HTML files
or binary files. For them you would probably extract the textual context first
(e.g. stripping HTML tags) and then compare the clear text. Alternatively you
may render the HTML/PDF file and do visual comparison, then extract the diff
text from images.

By default diff programs create a line-based output, but you can change it to
minimum per-word highlighting via options (e.g. 'git diff --color-words').

The thing with PDF is that often even when you re-save the same PDF file in
the same editor, you would probably get entirely different files. I'm not a
PDF expert but from what I've learned, PDF is the type of file that saves kind
of vector representation of glyphs and their placements and is often unaware
of what that glyph represents (depends perhaps on the program used to create
the PDF and options). Importing PDF back to e.g. OpenOffice is an ugly work
for the plugings.

There are some exiting solutions for diffing PDFs [1] however I haven't played
with them really.

[0] <http://en.wikipedia.org/wiki/Diff> [1]
[http://stackoverflow.com/questions/887186/java-pdf-diff-
libr...](http://stackoverflow.com/questions/887186/java-pdf-diff-library)

------
Karunamon
Oh look, another inferior change to a thread title.

I really wish you guys would knock that off.

------
drawkbox
This is a great use of technology and programming/diffs to both cool and
useful ends. I wish all my agreements, insurance policies, etc had something
like this cleanly through one place. Maybe a policy standard for just that so
companies make it public that way and no copyright problems. Also, companies
eventually not doing it would be seen as devious.

~~~
pents90
[I work at Docracy, prepare for a plug!] Well note that we have a great
negotiation and e-signing service that uses these same diff tools, try it out!
<http://www.docracy.com/supersigning>

------
zt
This service is really cool. I think all companies should host diffs of their
terms. Stripe, for example, has both the previous versions and diffs of their
terms (<https://stripe.com/us/terms>) available at
<https://github.com/stripe/terms>.

------
jnazario
this is great, glad someone did this. i've often wanted such a diff instead of
"dig through 27 pages and see what possibly changed" with iTunes' TOS for
example.

that said some of these terms look scary.

------
unavoidable
Wow. I was just thinking of making something like this yesterday. Guess great
ideas don't last long. I was also interested in seeing how similar terms of
services are. I've been working on a paper studying how boilerplate terms seem
to be growing more dense over time, and one of the things I've noticed looking
at a lot of TOSes is how similar they all are. For example, there are really
only two or three variations on the wordings of choice of law clauses, and
almost all arbitration clauses look the same. Wonder if you can run a
comparison on your database across various TOSes to see how similar they are
(a la turnitin for essays).

I have a striking suspicion that the lawyers (or webmasters) are just copying
and pasting a lot of these terms from standard repositories or otherwise from
other services.

------
siliconviking
Awesome idea!! Any chance someone could curate it and highlight the changes
that were meaningful? For example, Geico just took out the following:

"We do not save this data nor disclose it to any third parties."

(anything but comforting...)

------
perssontm
Very nice, this is very much needed. I had the exact same idea, which I worked
on for a while about a year ago, as with lots of projects, it didnt get
finished. :)

As other have commented, a discussion area for each change would be very
interesting, especially if there are multiple changes happening at the same
time.

I can imagine not everyone want this focus on changed tos, but its very good
the user can easily get the information.

------
politician
Would you add a "top 20" list of the sites with the most words or characters
changed/month? I'd like to see which sites like to churn their terms.

------
mapleoin
_Using Docracy's unique document change analysis,_ etc.

I'm really curious to know what's so unique about this as compared to classic
diff plus colours?

~~~
pents90
[I work at Docracy] What is unique is how we handle diffing within a
hierarchical HTML structure and how our algorithm is tuned to display sensible
diffs for written language text, which requires some more nuance than what is
typically used to diff code lines. It's our own, homespun algorithm.

------
salgernon
Please license this to the IRS and lobby for all civil and criminal laws to be
subject to obvious revision history markup.

------
ghuntley
A similar service is run by the EFF:

<http://www.tosback.org/timeline.php>

Source code is available at:

<https://github.com/pde/tosback2>

Historical crawl data is available at:

<https://github.com/pde/tosback2-data>

------
libria
The Amazon TOS notes it was `Last updated: January 28th, 2013` and includes
the recent Elastic Transcoder terms. This precedes its announcement, and
journalists may try to use this for scoops.

However, the timestamp on it is January 29. How often do you check for
updates?

~~~
pents90
[I work at Docracy] We check once a day. We've seen some strange stuff going
on with "Last updated:" though. For example, if you check out the history of
Skype's ToS, they changed it in mid-January, but with a last-updated date of
"February 2013"! Then, just a couple of days ago, they reverted it. Here's the
first change:
[http://www.docracy.com/doc/diff?revisedId=0xgv5wfb72v&or...](http://www.docracy.com/doc/diff?revisedId=0xgv5wfb72v&originalId=0am1glu0a0q#tab_summary),
then they changed it back:
[http://www.docracy.com/doc/diff?revisedId=0jxg6uxmx4d&or...](http://www.docracy.com/doc/diff?revisedId=0jxg6uxmx4d&originalId=0xgv5wfb72v#tab_summary)

------
raghavsethi
Really excellent idea. Thanks for doing this.

