
Hey Dropbox, why can't I compare file versions like this? - azhenley
http://web.eecs.utk.edu/~azh/blog/whycanticomparefileversions.html
======
peteforde
Simple: just install MS Visual SourceSafe!

In all seriousness, though, I wanted this so badly that I started (and failed)
a startup with 12 employees nine years ago to build it. It was conceived for
use in working with "big data" but the system essentially provided Etherpad-
like scrubbable versioning of all common office document formats as a side-
effect. All of it was structured in an environment more similar to the social
aspects of Github than Dropbox, but you could sync up to your filesystem via a
FUSE wrapper. That is, people could easily follow or fork your work in
progress. If we'd continued, you'd have been able to accept the equivalent of
PRs on your Word docs.

It was so awesome that we couldn't find anyone to pay for it, sadly. Armchair
quarterbacks would fairly accuse us of failing to do proper customer
development.

I can't speak to the technical limitations of Dropbox's versioning
implementation, but given that they already have both viewing AND versioning
running for a decade, I honestly can't believe it would take more than a few
months for a small team to implement Etherpad-like editing functionality for
the office suite document formats.

~~~
Enginerrrd
That sounds pretty cool, though I'd have to think about how such a thing would
function with the workflow of my team. I think it could be done though! It
sounds like a more reliable way to do shared work on office docs.

I see some issues with it. No matter how smooth you make this, any software
with pull requests is going to be considered "technical". Christ, people think
basic excel skills are "technical". SO you have to get over that hurdle. But
personally, if I've already gotten people over that hurdle, I might as well
just use git and LaTeX documents.

I don't know, I think it sounds awesome, but I also think it might be tough to
sell.

~~~
peteforde
We tried to attack the market with a Github model: share your open source data
and docs publicly (our true passion) and pay to have private repos.

Our mistake, as I said elsewhere, is that there was no market. We had many
sales conversations and zero takers. For context, we were fully under the
spell of The Great Big Data Hype of 2011 (see the current AI hype for
reference) and convinced ourselves that there would be so many opportunities
that we'd have our pick of which path to take.

In fairness to our past selves, for a while this seemed true; valuations were
insane for companies with vague value props in the space. And we met with
dozens of influencers in the data world and they all professed to be excited
to use it. Most of them ultimately logged in once, realized they had no hair-
on-fire problem for us to solve, and stopped returning our emails. It was
frustrating in the extreme.

I just found one of our product videos on Youtube:
[https://www.youtube.com/watch?v=EWMjQhhxhQ4](https://www.youtube.com/watch?v=EWMjQhhxhQ4)

------
ramraj07
The irony here, is that Drew jokes about how Dropbox is going to solve these
ridiculous file name versioning convention with their product in their famous
YC application:

> Please tell us something surprising or amusing that one of you has
> discovered. (The answer need not be related to your project.)

> The ridiculous things people name their documents to do versioning, like
> "proposal v2 good revised NEW 11-15-06.doc", continue to crack me up.

[https://www.ycombinator.com/apply/dropbox](https://www.ycombinator.com/apply/dropbox)

And yet here we are a decade and change later and Dropbox, while having solved
"a" problem, sits like a ridiculous behemoth leaving it's users hungry for so
many other pain points to be addressed by another savior, including especially
this one problem they said they're gonna solve.

~~~
pier25
Dropbox still hasn't solved many of its core issues but it has been investing
in Paper (which I personally have never seen anyone using) and all that design
crap from a couple of years ago.

I introduced a lot of people to Dropbox like 8-9 years ago and after using it
to share files with other people I found out the hard way it's a terrible tool
for that. I then used it for a couple more years to share files between my
machines but they haven been introducing so much crap in their desktop app
that I moved to sync.com.

------
anvisha
Former dbx employee here— they always wanted to do it but it is technically
challenging to build a fully functional product here accounting for things
like formatting/comments/etc. when you have such a large enterprise user base
there are often trade offs - ship a basic prototype and risk customer
confusion/complaints or invest lots of resources and draw away from other
projects

~~~
jbverschoor
This is such a non-argument. They could've easily just started with text-
diffs, then photo diffs. And later do doc diffs. Perhaps with some
disclaimers. Heck, they don't do that for their main product, so why would
they even do that for such a product. It's probably in the terms somewhere.

The reason they're not doing it is because they want a piece of the
productivity pie.

They're not getting it from me. Ever.

------
BLanen
A feature like this is VERY application specific for a lot of files since you
can't just take out the rendering engine and would need to usually have a
third party make the software to render to web-views, whether opensource or
proprietary. It's not even as if first party software is allowed to run as
server. Example, psd rendering to web. AFAIK photoshop has no server license.
Pretty much all services that need to render psd files use ImageMagick afaik.
I looked it up and iirc. Photoshop's own api is pretty terrible to interact
with and iirc licensing for servers is weird and expensive even if available.

EDIT: This comment is almost a word salad, I need to sleep lol.

~~~
mercer
While true, as others have said it could be rolled out piece-meal. For me, for
example, simple text diffing would scratch a major itch, both for text and
code.

~~~
Traster
Right, but as the article points out - github already has the feature for some
file formats, but it's not good enough for him because he wants it
specifically for powerpoint and word.

In other words, someone wants a diff tool for Microsoft products but
specifically wants Dropbox to implement it.

------
rsync
Serious question - does anyone want this:

    
    
      ssh user@rsync.net diff some/file .zfs/snapshot/yesterday/some/file
    

We can implement this later today if it sounds useful ... I was sort of
surprised that 'diff' was not already a whitelisted command[1] ...

[1]
[https://www.rsync.net/resources/howto/remote_commands.html](https://www.rsync.net/resources/howto/remote_commands.html)

~~~
fragmede
Does rsync.net automatically run and name zfs snapshots?

~~~
rsync
Yes. By default, every rsync.net account has 7 daily snapshots that are
created and rotated automatically - no intervention on your end is required.

You may optionally set any arbitrary schedule you like
(day/week/month/quarter/year) and you simply pay for the bits on disk that
those (efficient, changes only) snapshots take up. Sometimes they take up
almost nothing.

My favorite part of all of this is that the snapshots are _immutable, or read-
only_. No matter who attacks your rsync.net login or what password you lose,
the snapshots cannot be destroyed by any outside action.

This allows for some interesting insurance against ransomware / Mallory ...

------
narak
Am I missing something? This already exists in Google Docs. Much easier to
implement when the doc is database-backed (recording every keystroke for OT)
vs file-based.

~~~
cwyers
Yeah, the thing is, you can't do this in a file-format-agnostic way (you need
to know what a Word doc or an Excel sheet is), which makes the file system
layer the wrong level of abstraction to consider.

~~~
enriquto
You can, as long as your files have a meaningful representation as a text
file. This is a good idea anyways.

~~~
eru
That representation is one way to encode the file-format knowledge you need.

Any other way to implement would probably also involve one common format (or a
small few) behind the scenes.

------
Shorel
There are third party document compare tools.

Dropbox could just buy one of these companies and work on integrate the
solution with its platform.

All arguments about the complexity of this feature are bogus when it has been
solved several times by different vendors over the last decades.

One such tool found via DDG:
[https://draftable.com/compare](https://draftable.com/compare)

~~~
cpach
One thing that looks weird about Draftable is that many of the comparisons are
publicly-viewable. I wonder if that’s intended? See e.g.
[https://www.google.com/search?q=site%3Adraftable.com+%22busi...](https://www.google.com/search?q=site%3Adraftable.com+%22business%22)

~~~
klohto
Trying ‘site:draftable.com "confidental"’ yields interesting results. Pretty
sure it’s not intended

------
drglitch
Word already has a diff view implementation that is pretty robust - it’s very
useful for figuring out what changed across manually-versioned documents. This
is in addition to classical track changes feature.

Adobe Acrobat also has a diff (including visual diff) feature that can be used
to do advanced comparisons if necessary.

Granted, author’s suggestion is more user friendly and integrated.

~~~
cpach
_“Word already has a diff view implementation that is pretty robust”_

It does? Didn’t know that. How does one activate it?

~~~
social_quotient
They call it “compare”

~~~
drglitch
Open a document and go to "Review" ribbon and then click "Compare...".

You can do "Combine" which is effectively a merge interface.

------
vivekkalyan
my solution is to use pandoc to generate the diffs. Combines the benefits of
word formatting but allows me to see the changes in git. (I use it mainly for
my resume)

I wrote up about it here for the curious: [https://www.vivekkalyan.com/using-
git-for-word](https://www.vivekkalyan.com/using-git-for-word)

~~~
ramraj07
Have you seen any options take advantage of the fact that docx files are just
zipped xml files? I can see the git repo ballooning if you have a few images
and you commit frequently!

~~~
cpach
Diffing the contents of an Office document is not trivial. See
[https://news.ycombinator.com/item?id=22222667](https://news.ycombinator.com/item?id=22222667)

------
woozyolliew
Isn’t the track changes feature of Word sufficient? I don’t use Dropbox though
so perhaps it loses that..?

~~~
Terretta
Author writes:

> _...but this is useless. Timestamps??? Tell me what changed! Let me see the
> changes over time. Word has a change tracking feature, but my PhD in
> computer science isn 't enough for me to figure it out._

> _But but but Austin, you should be using a proper version control system!
> Just use Git and GitHub!_

Found that aside curious, as track changes in Word is a first class versioning
implementation with word processing and editors savvy, just as Git is a first
class versioning implementation that's code lines and commits savvy.

Surely headspace around track changes is less "PhD" than git.

~~~
cpach
_“Surely headspace around track changes is less ‘PhD’ than git.”_

Have you tried track changes? :)

------
willvarfar
I can see a company like github or dropbox developing visual versioning and
promoting it to make users dependent upon it. It would be an extremely sticky
feature that made it hard for users to like competing products.

Imagine how github could push for MS Office integration and become a
versioning powerhouse for non-code-stuff.

But I can't see it standing as a stand-alone product that people would really
pay for. It has to be part of something else.

~~~
cpach
_”But I can 't see it standing as a stand-alone product that people would
really pay for”_

IMHO, if it could be smoothly integrated to e.g. Git then there would probably
quite a few companies that would pay good money for it.

------
juped
Version handling is built into Office 365, and many comments here indicate
it's even in the relatively crappy Google Docs, but I'm sure there's a market
for pretending it isn't and selling incredibly shitty half-baked attempts as a
B2B SaaS offering (this is not a sarcastic "I'm sure", I know about this
market space and it disgusts me on a deep level)

~~~
nihonde
I’m a lawyer who uses Word’s track changes as an integral component of my
work. I haven’t seen a single meaningful improvement in that feature in 20
years. Right down to the fact that I have to open “compare...” from within a
document, but then have to go hunt the same document down in the file system
to set it as the original. Don’t get me started on every other reason that
Word has failed to innovate on this front.

The solution is to dump Office and use text files, if you can get away with
it.

~~~
hcurtiss
What’s wild to me is that a third-party product, Workshare Compare, actually
does a better job of this than Microsoft does with its own product. Workshare
is used widely in the AmLaw 100.

~~~
nihonde
Also a garbage product IMO. The distance from something like git to Workshare
is measured in light years.

~~~
hcurtiss
I use Workshare several times a day, every day. Most transactional attorneys
do. I find it works well for its intended purpose.

------
bnj
I associate features like this with etherpad[0] which I was in the habit of
using for years for collaborative projects.

[0]: [https://etherpad.org](https://etherpad.org)

~~~
usaar333
Or Dropbox Paper which has origins in Etherpad.

~~~
bhl
Huh, Etherpad was acquired by Google in 2009 and a fork of Etherpad, Hackpad,
was acquired by Dropbox in 2014 [1]. Both projects got folded however:
Etherpad into Google Wave, and Hackpad into Dropbox Paper.

[1]
[https://en.wikipedia.org/wiki/Hackpad](https://en.wikipedia.org/wiki/Hackpad)

~~~
muxator
Etherpad lite is still alive: [https://github.com/ether/etherpad-
lite](https://github.com/ether/etherpad-lite)

------
hbcondo714
This looks similar to redline and blackline document comparisons[1]. We do
this on our site[2] where we display large financial documents that average
100 pages. Identifying what text and tables were removed, added and changed
from one year to another is useful information for predicting future company
earnings[3]

[1]
[https://en.wikipedia.org/wiki/Document_comparison](https://en.wikipedia.org/wiki/Document_comparison)

[2] [https://Last10K.com/compare.gif](https://Last10K.com/compare.gif)

[3]
[https://www.bloomberg.com/opinion/articles/2018-05-22/10-k-c...](https://www.bloomberg.com/opinion/articles/2018-05-22/10-k-company-
filings-are-actually-worth-reading)

Update: The site in reference is [https://Last10K.com](https://Last10K.com)

~~~
cpach
I’ afraid I don’t follow… What do you mean by “our site”? Is it a system you
developed for internal use or is it a product/service you sell?

~~~
hbcondo714
Sorry, I just updated the comment with a direct link to the site which is a
freemium SaaS. The other site link is an animated gif that shows how to toggle
between the redline and blackline views.

------
Lanrei
Modern MS documents files are zipped XML. To do this comparison they would
need to unzip each file, run it through a rendering engine and hold it in
memory, and then do version comparison. For this to be feasible you would need
to use a file format that supports this sort of comparison in a way that isn't
very resource intensive.

~~~
oblio
It's not that, it's not like 100% of your users will be diffing documents 100%
of the time. The real reason is that office formats are super, super complex
and diffing them is a hard problem, even more so for the proprietary Microsoft
formats.

[https://www.joelonsoftware.com/2008/02/19/why-are-the-
micros...](https://www.joelonsoftware.com/2008/02/19/why-are-the-microsoft-
office-file-formats-so-complicated-and-some-workarounds/)

The "zipped XMLs" you mention are basically XML dumps of the former binary
format that evolved organically from the 1980s, when resources were scarce and
they had to hack together a working office solution.

~~~
steerablesafe
The proper way to diff .docx documents would be to Microsoft release a diff
tool for .docx documents. If they released a three-way merge tool as well then
it could be used in git too. git supports 3rd party diff and merge tools for
specific file formats.

~~~
oblio
It might be a lot of work and the benefits are not super obvious for them
(other than community goodwill :-) ).

~~~
markus92
They already got the functionality to diff between two documents in Word. I
use it all the time to see if legal made any changes while "forgetting" track
changes.

------
vxNsr
This is a great idea, both one drive and gdrive have a versions feature but
the UI isn’t great. This is terrific UX.

It’s obvious UI on the level of pinch to zoom and mouse input. Hard to come up
with but obviously the right choice once suggested.

~~~
vxNsr
I’d also like to add on a different note, I don’t really get why git can’t
support docx, pptx, and xlsx. They’re open standards not binary blobs.
Basically just zipped xml.

~~~
scrollaway
You can configure it to support it. Git supports configuring different diff
programs for different file types. And there are tools to diff docx etc.

~~~
giornogiovanna
Can I configure Git such that the merge of Haskell source code (any language
will do) with base A:

    
    
        add x y = x + y
    

left B:

    
    
        add z y = z + y
    

and right C:

    
    
        add x w = x + w
    

succeeds without a conflict?

~~~
thaumasiotes
I don't like the example. Unless I'm missing something, all three of these are
exactly equivalent, so you could accept any of them as the result of a merge.

But the problem with that idea is that two different people explicitly made a
change that looks meaningless. That tells us that we're evaluating
"equivalent" incorrectly, which means we don't actually have any remaining
justification for picking one over another, and the conflict is hopeless
without further input.

~~~
giornogiovanna
The correct merge, in my _opinion_ would be D:

    
    
        add z w = z + w
    

My justification is that if you put each identifier on a separate line like
this:

    
    
        fn add(
            x: i32,
            y: i32,
        ) -> i32 {
            x
            +
            y
        }
    

then as far as I know, Git would happily merge B and C into D.

------
forrestthewoods
I have the exact same question for Git. I miss Perforce timelapse view so
much. :(

~~~
azhenley
Yep, I wish GitHub made it easier to do diffs between a specific commit and
many other commits, like with a timeline. Would be great for visually tracking
down when a change was introduced or how the code has evolved over time.

That was an inspiration for a tool I built called Yestercode [1] (though it
uses undo history, not version control).

[1]
[http://web.eecs.utk.edu/~azh/pubs/Henley2016VLHCC_Yestercode...](http://web.eecs.utk.edu/~azh/pubs/Henley2016VLHCC_Yestercode.pdf)

~~~
judge2020
They've at least improved it for release comparison:
[https://github.blog/changelog/2020-01-13-shortcut-to-
compare...](https://github.blog/changelog/2020-01-13-shortcut-to-compare-
across-two-releases)

~~~
TheRealPomax
technical but important nit: github is not git.

------
lewisjoe
Cloud word processors like Zoho Writer & Google Docs already have version
comparison features. But this idea of a sliding time traveler for documents is
very intuitive!

Also Zoho Writer has a combine feature, that lets you upload a docx and
combine it with another docx - with the changes highlighted as tracked-
changes. Pretty handy for comparing docx files.

[https://writer.zoho.com](https://writer.zoho.com)

------
Pfhreak
Google docs, at least, has version history with named versions.

I don't think I'd want a scrub bar like that though, maybe? I suppose I've
never tried it.

~~~
kurthr
I think you'ld probably want a tree for undo/redo... it works for git and
photoshop.

------
dexen
Reminds me of the _yesterday_ [1] tool from Plan 9. Pretty much what the
author expects, except with textual interface rather than a slider.

OHTF Vg'f uneq gb hfr guvf pbzznaq jvgubhg fvatvat.

\--

[1]
[http://man.cat-v.org/plan_9/1/yesterday](http://man.cat-v.org/plan_9/1/yesterday)

------
brandon272
I've always wanted to be able to right-click on a file that is synced in
Dropbox and either have a submenu with versions to select or an option that
pops up a window with the file's version history. Without having to open the
Dropbox web app.

------
bogidon
What if it’s time to add features to the undo/redo construct as a whole? Maybe
not discarding redo history when a modification is made in the past for
example. Computers have improved a giant amount since undo was designed
(clipboard too, for that matter). We should be redesigning these common*
features to keep up with the times.

*often (not in MS Office probably) the undo buffer is managed by the OS. It’s conceivable that some rethinking could happen at the OS level.

~~~
manwe150
vim 7+ keeps an ‘undo tree’ that permits this (it can also be instructed to
keep it in a file to persist across sessions). It’s helpful to install a
visualization extension to seek around easier
([https://stackoverflow.com/questions/1088864/how-is-vims-
undo...](https://stackoverflow.com/questions/1088864/how-is-vims-undo-tree-
used))

------
zweep
Google Wave had this exact thing, practically to the pixel.

~~~
raister
Google Wave was so innovative that it had to be shut down... sadly...

~~~
jacobush
Yeah, one of the Big Data Lizards (Google, FB) had to close one of their
services. Somehow, I don't feel sad.

------
slashink
Wasn’t the timeline concept a core part of Google Wave? If I recall correctly
you were able to scrub through the entire history of a document in a similar
way.

~~~
locusofself
That's the first thing that came to my mind as well. RIP Wave?

------
firefoxd
I'm still waiting for dropbox to support editing .txt files in the browser.

It supports docx, excel and whatnot. But .txt file? That's too complex.

------
KuhlMensch
I remember ~4 years ago, I needed to retrieve an old file, and I discovered
that DropBox does write a diff-based repo, which you can restore at different
points. I don't remember the details, but I needed to use some sort of CLI to
access/navigate it on the host system.

In short, it is probably possible now, using what DropBox already exposes.

~~~
cpach
_“DropBox does write a diff-based repo”_

Are you sure? I’ve never heard about that.

~~~
mhdhn
me neither, any more info?

------
diegof79
> So why isn't this built directly into Dropbox, Google Docs, and Microsoft
> Office?

macOS includes a built-in version history since OSX Leopard. Sadly the flashy
version UI with 3D effects is not the best to find differences and many
programs doesn’t use the native frameworks that bring this feature (e.g MS
apps, Adobe apps)

~~~
cpach
_“macOS includes a built-in version history since OSX Leopard”_

I had no idea. Do you have any pointers for further reading? That would be
very interesting.

~~~
jannes
I believe the grandparent is referring to Time Machine. Some of its versioning
features may have been integrated into iCloud storage as well (not sure).

~~~
diegof79
Yes, I was referring to the file version history. Sorry I couldn’t recall if
it was introduced in Leopard or Snow Leopard.

I totally agree with the comment from dreamcompiler, the file history feature
is a great idea but the execution needs a lot of improvement.

The main issue to me is not the change to Save as.. behavior. To me the
problem is that most of the apps didn’t adopted it. In particular cross
platform apps ignore it. So you never get used to the behavior change.

The version history UI, is also too “heavy”. It has a slider to go back in
time, but surrounded by a faux app window simulating traveling in time with
your app state... sounds cool but is distracting and not so useful to find
differences.

And yes, it works with iCloud, but only in apps that use auto save APIs:
[https://developer.apple.com/design/human-interface-
guideline...](https://developer.apple.com/design/human-interface-
guidelines/macos/system-capabilities/auto-save/)

------
sixhobbits
I used [0] for a while when I was exploring this space and was briefly in
touch with the developer.

It's pretty useful and impressive in its current state but isn't being
actively developed from what I understand.

[0] [https://www.revisionsapp.com/](https://www.revisionsapp.com/)

------
ehmorris
This is a really similar idea to a Chrome extension that exists for Google
Docs, Draftback: [https://features.jsomers.net/how-i-reverse-engineered-
google...](https://features.jsomers.net/how-i-reverse-engineered-google-docs/)

------
bane
It's almost like people just want to use sharepoint and onedrive.

I avoided both for as long as possible, but we switched to office365 at work
and the integration between the two, and teams, is pretty great tbh.

------
roland35
Wrike (project management software) actually does this, but it is for only
text descriptions which makes it easier. Wrike also color codes changes by
user.

------
jannes
Version History already exists in MS Word for documents that are saved in
OneDrive with AutoSave.

------
shultays
Because dropbox is not version control (i think?)

------
PavlovsCat
Wikipedia could use something like this, too.

~~~
rasz
What I would love for wikipedia is ability to select a block of text and click
"show me when this was last changed"

