
Dropbox opening my docs? - johns
http://www.wncinfosec.com/dropbox-opening-my-docs/
======
yesbabyyes
LibreOffice has a pretty powerful document conversion, which you can run
headless. I'm guessing they are converting to HTML and perhaps other formats
-- do they offer anything like that?

Edit: You can invoke it something like this:

    
    
        soffice --headless --convert_to html file.doc
    

I'm just speculating, but it seems reasonable that it would open the document
just like the regular LibreOffice, fetch external resources and so on.

~~~
the_mitsuhiko
That's pretty much exactly what is happening. DropBox converts documents into
HTML for easy viewing on the web interface.

~~~
skeletonjelly
You seem pretty confident! Is your source you? A DropBox employee?

~~~
icoder
abortz from DropBox has stated just this, in this thread, 4hrs before your
post

------
milkshakes
really?

you've already determined that it's running on an ec2 instance, but it's
somehow "suspicious" that the user-agent is libreoffice? and you're a
"security researcher" but "curious if this is an automated process"? please.

sure, dropbox might owe an explanation (even though you certainly gave them
permission to do this in their TOS), and you can call me cynical and jaded,
but this seems like pretty shameless FUD that appears to be tied to an effort
to shill a new product.

EDIT: first i thought this was written by the HoneyDocs founder. now i'm
actually unsure who the author is.

~~~
danielweber
There's really no need to attack him.

~~~
milkshakes
i'm not attacking him as a person, i'm attacking his actions, specifically in
this case his shamelessly disingenuous linkbait guerrilla marketing
masquerading as a public service announcement

------
sspiff
> Further digging into the HoneyDocs data reveals a suspicious User Agent,
> LibreOffice. Now I’m curious if this is still an automated process or one
> that involves human interaction?

Yes, because humans use LibreOffice over SSH/X11 from an EC2 instance.
Probably LibreOffice is being used for the parsing/rendering on a server.
Probably for something innocent like generating thumbnails or text-only
previews.

------
nigma
They are generating PDF for online viewing. Go to your files on dropbox site
and click on a .doc file. A preview popup will appear.

Open/LibreOffice with Python bridge is quite handy in converting documents to
PDF format and can be run in headless mode (using virtual frame buffer like
xvfb) on a server.

------
dweekly
Dropbox uses (used?) Crocodoc to do its document previews, which would be
interesting now that Crocodoc has been acquired by Box (a Dropbox competitor).
Crocodoc actually ran full Windows VMs to have Word interpret Word, unlike
what was speculated elsewhere here (using LibreOffice) - it turns out pretty
much everything else sucks pretty badly at rendering Word docs, largely
because the format is a bloody nightmare of binary encoded blobs including OLE
embeds, etc. My understanding was that these VMs were run on AWS Windows
instances, which explains why the document was seen opened on an AWS cluster.
I know they had a fun nightmare of a time getting the right licenses from
Microsoft to do this.

~~~
dweekly
Whoops; I'm an idiot. The request had a UA of LibreOffice. Looks like Dropbox
has indeed moved on from Crocodoc. My bad.

------
pwg
Much ado about nothing.

If you don't want your cloud storage provider reading the data you give them,
then _encrypt_ that data _before_ you upload it.

~~~
rexreed
You might want to check out SafeMonk that does this exact thing.
[http://www.safemonk.com](http://www.safemonk.com)

~~~
gboudrias
Am I reading this right? A third-party service that protects you from third-
party services? And you have to install it everywhere? And it's not FLOSS?
Please tell me I'm reading this wrong.

Edit: Okay I see it's based on FLOSS and that's great, but as far as I can
tell they're still asking you to install binary blobs, which makes the whole
thing pointless.

~~~
blcknight
Install EncFS and use it on your Dropbox. No account, no binary blobs needed.
You can compile all the bits for EncFS yourself, if you want.

~~~
CCs
TrueCrypt would work too - it has block level encryption and Drobox syncs on
block level too.

~~~
bigiain
Hmmm, I wonder if TrueCrypt adequately secures a hidden volume's existence
from an attacker (Dropbox) who can watch the patterns of your block level
writes?

~~~
CCs
It's content encryption app, not Rubberhose File System or similar.

------
eli
Did you bother asking Dropbox what's going on?

This kinda reads like an ad for HoneyDocs...

~~~
B-Con
I hate it whenever an article mentions a service or drops an affiliate link
and someone's verdict is that the article looks like advertising. Do you
prefer your reading content to be devoid of mentioning any products or brands?
Should bloggers never make a dime off affiliate links?

Be concerned with the content and only the content. If the article has it,
it's legit.

~~~
mhurron
Content is modified by the context. Someone trying to raise warnings about a
competitors product should make you question the motives.

~~~
fleitz
This is equivalent to an appeal to authority.

Content is not modified by the context, a fact is either true or it is not.
Everyone has a motive, it reminds me of how people call into question research
sponsored by corporations as if people who work in government sponsored
research are some how automatically saints with no ulterior motive.

To trust someone based on affiliate links is a quite silly line of deductive
reasoning.

From the information provided it seems simple enough to verify, embed an image
via URL into a doc file, upload to dropbox, see if the URL is accessed. No
need to argue about motive.

~~~
takluyver
Certainly context is relevant. Understanding who is saying something and what
their motives are helps you to judge how likely facts are to be true, and how
much weight you should attach to opinions.

Calling some work into question because of the authors' motives isn't a claim
that some other group has no ulterior motives at all. Certainly everyone has
some motives, otherwise we would never get out of bed. But some of those
motives will change the discussion more than others. E.g. when HP sponsors a
study that finds that their own ink works out cheaper than buying
remanufactured cartridges, it's perfectly sensible to be more suspicious of
that than if a study by a consumer organisation found the same thing.

------
amvp
LibreOffice is commonly used as part of a system to convert and generate
previews for MS Office files. I would assume it has something to do with
thumbnail generation or preview generation. However I don't seem to see
thumbnails or previews of .Doc files (I do for images - for example) on the
dropbox webapp - so maybe it's something their testing?

------
Guillaume86
Isn't that a thumbnail generation from dropbox? I remember a thumbnail entry
in their API.

~~~
hoopism
That would make sense to me. Especially since he only sees it on .doc files.
Probably a thumbnail generator utility that uses LibreOffice plugin. Very
interested to find out...

There's a saying that likely applies here:

"When you hear hoofbeats, think of horses not zebras"

------
VBprogrammer
I hope that the servers running LibraOffice only have that job. LibraOffice
has a pretty massive attack surface and its not the kind of thing I'd like to
leave running on a server with another purpose while accepting documents from
pretty much anyone.

The only thing to see here is that DropBox is potentially opening themselves
up to a vulnerability, would be interesting to see if GET file://etc/passwd
worked...

------
steven777400
On the one hand, it seems unlikely that an automated process would trigger
external resource retrieval. In the same way, most processes that scan
webpages for content or similarities don't run JavaScript, unless they are
very sophisticated (this used to be a good way to protect against spam bots,
for instance).

On the other hand, given how many files are uploaded to dropbox every hour,
it's inconceivable that a human, whether through deliberate management
direction or mischief, is opening all these documents. I would more concerned
about human intervention if occasionally, a document triggered a buzz some
days after it had been uploaded.

If all documents are showing as opened within 10 minutes, then surely it is
just an anti-duplication automated agent at work.

~~~
blcknight
I just tried this with a doc file, and the buzz was nearly instantaneous,
within seconds -- 3 buzzes total.

Certainly it's automated.

Paranoid part of me says it's NSA keyword scanning. I feel a little insane
suggesting that, but it's certainly conceivable these days.

The other possibility is Dropbox is indexing the files for search?

Anyway, using Dropbox unecrypted is a terrible idea. EncFS has user-friendly
frontends like Boxcryptor.

~~~
gjulianm
Keyword scanning shouldn't resolve images, unless they're using OCR to read
any text they have. In that case, they'd be wasting a lot of resources.

~~~
blcknight
Where do you get that this is using images?

Honey Docs doesn't actually explain what the callback looks like in the doc
file, but it doesn't look like it has anything to do with images.

~~~
mkopinsky
The HTML, DOC, and XLS files all have identical structure (though different
content). They are all HTML, and Honey Docs is relying on Word/excel's parsing
the HTML in those files to fetch the image (a 1px gif).

I downloaded the credit card Honeydoc. The content looks like:

    
    
        <html>Nicole  Davis  4556062729618215<br />
        Brian  Baker  4556767839126624<br />
        Patrick  Jones  4916615717158539<br />
        ....
        <br>
        <br>
        <br>
        ....
        <img src="https://honeydocs.herokuapp.com/img/html/202719bb5717d5621068780180abc593b0fedda692bd63727a510911d21fdcbf.gif">
        </html>

~~~
mkopinsky
Oh, and FTR, Excel gives a warning before opening the file. So they at least
have thought through this vector (if you want to call it that).

------
dotmanish
Could be this (thumbnails):
[https://www.dropbox.com/developers/core/docs#thumbnails](https://www.dropbox.com/developers/core/docs#thumbnails)

------
phaer
LibreOffice is not necessarily a sign of a human involment in the process, as
it comes with a commandline interface to convert documents between various
formats. So it could be thumbnail generation as Guillaume86 suggested.

------
nonchalance
If someone from honeydocs is reading this ...

The tracking behavior depends on a tracking pixel which may not always be
processed by the client.

For example, with the credit_cards sample, the xls file is actually an HTML
file with an img at the end (url linking to
[https://honeydocs.herokuapp.com/img/xls/...](https://honeydocs.herokuapp.com/img/xls/...))
and a client that only reads the plaintext (there are a boatload of command
line utilities that fit the bill) won't fetch the image.

------
ryanackley
Dropbox uses crocodoc for MS Office file previews in the browser as html and
my guess is crocodoc's tech is based on a custom print driver for LibreOffice
that converts it into html.

------
mrbill
Dedupe (at least for NetApp systems) only cares about data blocks; it wouldn't
"open" a document or parse contents.

[https://communities.netapp.com/community/netapp-
blogs/drdedu...](https://communities.netapp.com/community/netapp-
blogs/drdedupe/blog/2010/04/07/how-netapp-deduplication-works--a-primer)

------
guiambros
Coincidentally (or not), I just received the invite to beta test Sync.com [1]
today. Seems a Dropbox-clone, for the privacy conscious user. They claim that
all files are encrypted, and they don't have access to the keys. The
encryption algorithm is still private, but they say they'll open source it
soon.

While I like the approach a lot more than Dropbox (that fights to obfuscate
its own algorithm), I still don't feel safe. Anyone with access to the server
could intercept your keys, and thus have access to your data.

TrueCrypt over some cloud-based solution is still the ideal option, but the
lack of support for sparse images makes me hesitant.

EDIT: no affiliation with Sync.com (or Dropbox, for the matter). Just trying
to find a decent cloud-based storage solution that fixes the exact problem
exposed by the OP.

[1] [http://www.sync.com/your-privacy](http://www.sync.com/your-privacy)

------
yk
For further analysis, I would suggest embedding something nasty into a .doc.
[1] Seriously, why would Dropbox execute code in arbitrary files; the only
reason I can see is some virus scanner heuristic. So then they could spin up a
new vm, load the file and diff the vm with a clean one. Or as others
suggested, generate thumbnails; that, together with the 10 minute delay, would
imply that they are running remote code on some batch processing machine. (
Where a lot of other files are up for graps.) Either way, it does smell
somewhat.

[1] I am not sure how LibreOffice does handle active content and furthermore I
am not sure if there is a way to generate a ping back from LibreOffice without
some kind of active content embedded. But to me at least, it somewhat implies
that Dropbox, or whoever, runs LibreOffice in a not maximally locked down
configuration.

------
sdfjkl
When you click on a .doc file in the Dropbox web interface, you get a preview
of the file in PDF format. To do this, Dropbox must open and convert the file.
LibreOffice is popular for this, as it can be run in a headless API mode,
reads a wide range of files and can output PDF format. So this is what happens
here.

The wisdom of executing "active" content embedded in such files is of course
doubtful and something Dropbox should investigate. But if you want your files
to be safe, you should instead use a service that encrypts them client side,
which has the downside of losing the web interface that Dropbox offers (as
this requires it to be able to access the decrypted files in order to serve
them to you).

------
rexreed
Posted this reply elsewhere, but SafeMonk encrypts your files before they hit
your harddrive and keeps them encrypted in the Dropbox cloud. It's free for
personal use: [http://www.safemonk.com](http://www.safemonk.com). Note: this
is not my product, just using it after I saw it demo'd at a TechBreakfast.

------
hoopism
In retrospect this was a very well done ad for HoneyDocs... I checked out the
service and thought it was novel... wouldn't have looked if not for this.

The article is written in a such a way that they are saying a lot by playing
dumb... so hard to say it's misleading... but I know few security people who'd
write something up with this tone.

------
VuongN
I think this is a great example of why we should ask question about cloud
security & privacy. I've written down some thoughts about this:
[http://vuongnguyen.com/personal-business-cloud-
security.html](http://vuongnguyen.com/personal-business-cloud-security.html)

-V.

------
four12
Yay Little Snitch...

[http://imgur.com/FfbenAb](http://imgur.com/FfbenAb)

------
ValG
Site is up and down. Quick Cache:
[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://www.wncinfosec.com/dropbox-
opening-my-docs/)

------
jayd16
To me, the interesting part isn't that the file was read. What has me
interested is that this is a clear attack vector.

Want some free EC2 time? Wrap your workload in a .doc and have Dropbox foot
the bill.

------
jasonj79
[https://crocodoc.com/customers/](https://crocodoc.com/customers/)

Crocodoc is likely generating web previews of your documents.

------
Michael_Murray
What was that article the other day about "Stealing Traction" from an
established player in an adjunct space?

Well played, HoneyDocs... Well played.

------
gocard
In case you were wondering, I descrambled the blanked out "png" files and the
filenames were "jennymccarthy[01-04].png"

------
devx
It's so annoying when Google completely opens up archive files in Drive, too.
Why would they do that?!

------
whywhywhy5
I'm sure it's just the perfectly legal NSA browsing through your files. No
need to worry.

------
ck2
It's probably just a MITM review by the NSA Flying Pig

[http://www.techdirt.com/articles/20130910/10470024468/flying...](http://www.techdirt.com/articles/20130910/10470024468/flying-
pig-nsa-is-running-man-middle-attacks-imitating-googles-servers.shtml)

------
shmageggy
google cached version:

[http://webcache.googleusercontent.com/search?q=cache:www.wnc...](http://webcache.googleusercontent.com/search?q=cache:www.wncinfosec.com/dropbox-
opening-my-docs/)

------
jlkinsel
Time to write a little VBScript to port scan me some Dropbox servers...

~~~
atmosx
I don't know how to do that in VB and I'm sooo proud of it! :-P

------
madaxe
I would wager that they're opening it in order to generate a thumb or preview,
or maybe for search indexing, and libreoffice is a good way to achieve this on
linux - particularly if they're only opening it once, as they probably use the
hash of the file.

We do exactly this on our eCommerce platform, before wanging stuff into s3 or
glacier and just keeping a reference kicking around.

On the other hand, you have just discovered an information disclosure (host
IPs) vulnerability in dropbox.

~~~
tptacek
This seems unsafe; if I understand what this person has done, he'd essentially
be coercing Dropbox's backend services to open arbitrary links on his behalf.
That's a very dangerous capability to expose to adversaries.

~~~
milkshakes
to be fair, it's _possible_ that dropbox understands this and has taken steps
to sandbox and isolate the process that does this fetching from the rest of
their internal infrastructure. if this is done for the purposes of generating
thumbnails/online previews, and the .doc includes external resources, what
other choice do they have but to fetch it?

~~~
abstractbill
The machine isn't the only thing at risk. Given this setup, it seems possible
to use dropbox nodes to ddos an external target, just by uploading lots of
documents, each containing lots of these links. It doesn't seem like they
should be fetching external resources at all.

~~~
danielweber
There are lots of services that generate traffic on your behalf. A very
general rule is that you should have to send at least as many bytes as the
service does, lest you become a DDOS multiplier.

I don't see a .doc file getting small enough to outsize a HTTP request inside
of it, even if you used some funky compression, but I'm willing to hear
otherwise.

One question would be if you could upload the document once and then somehow
trigger a very tiny edit that causes them to rescan it.

~~~
abortz
Hi everyone, this is Andrew from Dropbox.

We do use LibreOffice to render previews of Office documents for viewing in a
browser, and have permitted external resource loading to make those previews
as accurate as possible. While this could theoretically be used for DDoS, we
haven’t seen any such behavior. However, just to be extra cautious we’ve
temporarily disabled external resource loading while we explore alternatives.

~~~
danielweber
As one part of your solution, I recommend restricting the machines that can
make outbound requests to a certain pool, and then limit that pool's total
bandwidth, throwing an alarm whenever the limit is hit.

It may be that you are big enough that even the limited bandwidth you need for
normal operations is enough to take out smaller hosts, so you'd need to
measure and monitor to see how well this works.

