
Facebook crawls links in PDFs you send in Messenger - ortekk
https://twitter.com/vah_13/status/1187755829371555840
======
archie2
Facebook spies on everything you do. Stop using it. This is the only way this
will end. It's not even a useful product. Stop being sheep.

~~~
harry8
I did, more than a decade ago. Do you think they stopped? How would I know
there is no shadow profile being constantly updated?

It's apparently not my data and privacy and I have no say in it at all when
someone uploads their goddamn contact list to these frickin' crooks. Is that
your understanding too?

So This solution is not a solution. This solution only works if we stop other
people from using it. How's that going to work? Well it isn't.

Cool? Now welcome to the "what regulation is appropriate" debate because this
is clearly an example of market failure, same as national defence, same as
national parks, same as pollution, same as any other. The sooner we treat
"...using computers" exactly the same as any other industry the better. This
is nuts.

~~~
dropin685
> This solution is not a solution.

No, not a complete solution, but it's a step in the right direction. One of
the reasons advertisers as well as ordinary users come to FB is to be in
contact with you -- to have your attention. Stop using FB and you take that
away. FB ends up with less of an audience with which to entice others.

~~~
harry8
There's enough users remaining that your shadow profile still has value and
facebook march on.

This solution is not a solution. Definitely do it, and you'll really enjoy
doing it too - life improves! Just don't think it's any kind of solution to
any of the systemic problems that are being discussed because it isn't. Not
even a partial solution. Not even a meaningful start of one.

~~~
travisporter
But if my Facebook profile is deactivated or deleted or whatever, and I’m no
longer active, how can they still have information about me? I had about 300
“friends” on there but no one really tagged or mentioned me or anything.

------
djohnston
And if they didn't the headline would read: "Facebook fails to stop malicious
and illegal content from being shared on their Network! Should they be shut
down?!"

~~~
Barrin92
this sounds like a strawman to be honest because I haven't heard anyone rant
about illegal music since probably 15 years, and if anything ever only
politicians and not ordinary people.

If we'd be talking really malicious stuff like chid pornography then in the
context of filesharing these companies already have systems in place to
distinguish content, so blanket banning of torrent files seems blatantly
unnecessary.

~~~
wutbrodo
> this sounds like a strawman to be honest because I haven't heard anyone rant
> about illegal music since probably 15 years

This is the real strawman, as nobody on this entire thread is talking about
illegal music. On the other hand, there's a strong and persistent thread of
calls for tech platforms like Facebook to control "malicious or illegal"
information being spread on their platforms: an obvious example is the NZ
shooter's manifesto + video.

~~~
jakeogh
Twitch didnt get deplatformed because a mentally ill person streamed murder on
it, and you wont hear a peep about the other big chan. That had nothing to do
with "protecting" (seriously? wtf) people from reading some mentally ill
person's note, it was a problem-reaction-solution to axe an inconvienent site.

------
javagram
This will keep happening until they enable e2e.

I’ve had Facebook block several links sent in private message groups, to
completely legal and safe sites (Messenger prints out an obscure API error and
refuses to send the content). They have done this for a long time.

~~~
rohug
Worth noting WhatsApp also provides link previews now. Although it is
supposedly e2e communication, the link previews are likely generated by
reaching out to a similar facebook unfurl service.

They can then have a single map of phone num -> links rendered between fb and
whatsapp.

~~~
yawaramin
WhatsApp fetches a link preview on the sender's device before the message is
encrypted, and packages it up with the message before sending. Depending on
how exactly they implement the fetch, they may or may not know what links you
sent.

~~~
Sami_Lehtinen
At least links in PDF over WA didn't get visited from FB servers. I just
personally tested it moments ago.

~~~
Thorrez
What's visiting the links? Your phone?

------
steelframe
My company's malware detection crap on my work laptop once scanned a PDF of a
security research paper I was reading for my project, found a link to a web
site with malware on it because that's what the research was about, and then
it summarily deleted the PDF to "protect" the company from that link.

~~~
Spivak
I mean how is the security scanner supposed to know that you’re working on a
project which is super specific edge case?

I mean almost all of the time that PDF will be malware designed to trick the
reader into clicking that link and it did the right thing.

~~~
archie2
Most security scanners allow you to create exclusion folders, where it doesn't
scan files in those folders. Something somebody researching malware on a
company computer with a malware detector should probably be aware of.

------
badrabbit
Not in the least surprising. Wouldn't be surprised if Gmail does this
to..."detect phishing" (pdfs containing phish links are common). Always a
plausible reason they can use.

~~~
supernova87a
There's no surprise. Gmail does.

If you search for a text string in Gmail, it will return emails that contain
that text only in _scanned images_ or PDFs that are in your mailbox.

~~~
odensc
That doesn't mean they're crawling the URLs, just that they're indexing the
content of PDFs/images for your search. Which is, honestly, a pretty useful
feature. Whereas Facebook is doing this without providing any value to the
user.

~~~
badrabbit
I've worked with email security appliances that will visit the pdf url when
detonating the attachment in a Sandbox. Dynamic analysis helps find 0day's
,sometimes the link leads to something worse,the link many times is a url
shortner too

------
criddell
Microsoft does this with Skype too. They say it's for detecting malicious
links.

~~~
knzhou
As always in big tech, you're damned if you do and damned if you don't.

~~~
st0le
Honestly, This is good to prevent malware but I imagine this breaks a bunch of
things if for eg. If the link has a limited visit count. The link will
"expire" before the recipient gets a chance to view it.

~~~
Nextgrid
To be fair, an HTTP GET request should never modify the state of the system -
hitting a link should not change anything.

If you need to expire links then make the initial link display a form with a
submit button (which does a POST) to reveal the content (and expire the link).
Legitimate crawlers don’t submit forms so it should be safe.

~~~
mehrdadn
> To be fair, an HTTP GET request should never modify the state of the system

In theory. But that's not how the world I live in seems to work.

~~~
bagacrap
I think it's pretty common practice. Otherwise search engine web crawlers
would be wreaking havoc.

~~~
mehrdadn
No, both your logic and premise are incorrect. To give just one example, rate-
limiting is clearly widespread stateful practice applied to GET requests, and
it doesn't cause web crawlers to wreak havoc on anything.

------
worldofmatthew
This should not be news to anyone. Facebook scans all links posted in
Messenger.

~~~
sushid
This is links INSIDE a pdf. Thats one step further than most people assumed.

~~~
manojlds
Mostly to scan the PDF and ensure it's safe I believe, or atleast that's would
be the stated reason.

~~~
strstr
I can believe that (despite the obvious creepyness), since FB users have a
knack for getting owned and spreading that ownage to other users.

------
gravypod
Could someone effectively DOS another site using this method by including a
bunch of links that generate a lot of load?

Would be interesting to see if Facebook has a maximum number of links it'll
follow.

~~~
GhostVII
The cost of uploading a PDF of links is probably not much less than the cost
of following those links on your own. So I don't think you gain much by
leveraging Facebook in this case.

~~~
ahbyb
What if I create a PDF with this content...

[https://news.ycombinator.com/item?id=1](https://news.ycombinator.com/item?id=1)

[https://news.ycombinator.com/item?id=2](https://news.ycombinator.com/item?id=2)

[https://news.ycombinator.com/item?id=3](https://news.ycombinator.com/item?id=3)

and so on, until 10,000,000? Perhaps Facebook starts opening every link using
10,000 parallel threads. Can you really replicate that from your connection at
home? Perhaps even the sysadmin of your victim site has whitelisted all
Facebook IP addresses so their crawlers get a free ride.

~~~
gojomo
There's a fair chance the service which generates these outbound requests
throttles itself, both with respect to how many requests it makes against any
one domain/IP, or how many errors it will provoke per time period, or before
human review.

~~~
zeptoon
Maybe, maybe not

------
buboard
Sidenote i wonder why FB doesnt launch a search engine since they crawl most
of the web anyways

~~~
saagarjha
That would let people leave the Facebook platform and explore the open web.

~~~
kortex
I remember the sheer awe when I first learned there was a huge open web
_outside_ of AOL. I'm sure people nowadays are aware of the rest of the web,
but if the draw is minimal, they will likely get stuck in the same loops of
well-trodden space.

~~~
sodosopa
A lot of the same people who kept that AOL walled garden alive, just migrated
to Facebook. To them Facebook is the web, the restaurants they like are there,
the tired celebs they worship are there and whatever crazed conspiracy
theories someone told them at work or at church are under Facebook News. It's
comfortable and I agree with you and like how you phrased it as "the same
loops of well-trodden space."

~~~
symlinkk
Sort of like how all of us check Hacker News daily? Don’t act like we’re any
better

~~~
saagarjha
I go to websites other than Hacker News.

~~~
skinnymuch
Yeah those FB people don’t go to other sites! /sarcasm

------
mikorym
You can also use canary tokens for this. [1]

I am not personally affiliated with them, but I believe they are South
African.

[1]
[https://www.canarytokens.org/generate](https://www.canarytokens.org/generate)

------
egypturnash
The galaxy brain maneuver here is to start trying to fuzz whatever machines
Facebook is using to do this.

------
h1fra
While I don't like FB this is not something related to PDF, they provide quick
preview for all links like almost all social networks

~~~
lostmyoldone
This isn't about rendering a preview, it's about following like links inside
the pdf. While there may be a case to do this to prevent phishing and malware
attacks, they really should ask/tell the user they are doing it!

------
dontbenebby
This feels like it could be an attack vector. Gather intel on what the user
agent is, nmap the IP, possibly find a vulnerability in the parser or the
server.

~~~
ressetera
I doubt the downloader isn't restricted in some kind of jail.

~~~
Enginerrrd
You'd think... but after that story about Microsoft just executing random
threatening code it found on someone's computer and allowing it access to the
internet, I have to question some of the wisdom these big companies show.

~~~
ahbyb
That's assuming Microsoft didn't do things properly. Who's to say the amount
of connections that could be opened, the bandwidth, or the max traffic that
could be recv/sent wasn't limited?

Thinking that you can make an HTTP request using this method and that that
means you can unleash a DoS is... worth a try, but not something you can take
for granted.

------
crazygringo
Huh, but why?

I can totally understand _scanning_ a PDF for links to look for malicious
links to protect users.

But that wouldn't involve actual HTTP requests to them.

I'm struggling to imagine what purpose this could have.

~~~
mdasen
How do you know if they're malicious if you don't make HTTP requests to them?

One of the things that phishers and others do is use link wrapping and other
services to hide malicious links. So, I get something.wordpress.com/something-
clean. I then put in an HTML or JS redirect on that page to something
malicious. Given that browsers don't warn about HTTP, HTML, or JS redirects,
it's an easy way for scammers to get around a list of malicious pages.

These kinds of attacks are very common in the email space.

~~~
gruez
But in this case, that doesn't help at all because facebook's crawler uses a
predictable user agent string. You give a clean result to the facebook crawler
and a malicious result to everyone else.

~~~
idoco
That is a very good point. Security crawlers should probably use a masked
user-agent.

~~~
marksomnian
I'm fairly sure Google's search crawler already uses a masked UA, to detect
when pages serve it different content than they do to users.

------
w1nst0nsm1th
There is a way to send any type of file through messenger without facebook
snooping into your private life.

1\. Compress the file you want to send in an password protected zip. 2\.
change the extension of the Zip file (.zip) to text file (.txt). 3\. Send the
file trough Messenger.

I already did it to send MacOS application to a friend. To avoid size
restriction, compress in several zip parts, rename the extentions to .txt and
send.

File size can be as high as 50mb+.

------
lukeschlather
Are there any comparable hosted messaging services that don't do this?

~~~
s09dfhks
signal/telegram

~~~
tialaramex
Signal actually does optionally offer previews for a handful of services, and
they really jump through some hoops to make that safer:

[https://support.signal.org/hc/en-
us/articles/360022474332-Li...](https://support.signal.org/hc/en-
us/articles/360022474332-Link-Previews)

The service being previewed doesn't know who you are because Signal acts as a
proxy, Signal doesn't know what you previewed on that service because their
client deliberately sends overlapping Range requests so that the preview size
is rounded.

------
beager
A good way to enable delivery tracking for PDFs over messenger, I guess!

~~~
NullPrefix
I assume, the tracking happens before the recipient sees the message. This
could be used to track sent messages, not received.

~~~
beager
Yep there’s still no “open” tracking on PDFs, at least afaik from a
pixel/beacon standpoint. Entire businesses like docusign are built with that
value prop in mind.

------
zzo38computer
Would making a passworded PDF file help a bit? Then you can tell them the
password by a separate message.

(And, I don't use Facebook, anyways.)

------
omani
seriously. how many times has the world tell you to stop using facebook?

stop using facebook. and no, there is not a single reason for you to be there.
trust me. how do you think we lived our lives before there was a facebook?

------
sys_64738
I only use the messenger thing on the Facebook webpage. I don’t generally
install apps on my phone as I don’t trust them. I trust an ad company like
Facebook the least.

------
ariyadi
Let’s play this game

------
burtonator
Facebook: we can crawl you but you can't crawl us!

------
tus88
Not to be flippant, but if you choose to use Facebook you are already throwing
your privacy to the wind, and you are probably getting what you deserve.

