

Google has also indexed thousands of publicly accessible Panasonic webcams - tlrobinson
http://tlrobinson.net/projects/bigbrother/
These appear to mostly be in China.<p>I haven't updated the list of cameras in awhile (and I seem to have lost my script to do it)
======
franze
ok, i just googled

[https://www.google.com/search?q=inurl%3A%22viewerframe%3Fmod...](https://www.google.com/search?q=inurl%3A%22viewerframe%3Fmode%3D%22&pws=0&hl=en)

10 600 results

when you click on one result

i.e.: 202.212.193.26:555/CgiStart?page=Single&Mode=Motion&Language=0

then you see in the head of the frameset (and similar in every framed html
document)

    
    
      <META NAME="robots" CONTENT="none">
      <META NAME="robots" CONTENT="noindex,nofollow">
      <META NAME="robots" CONTENT="noarchive">
    

so basically, these HTML abominations should not get indexed if google would
follow these indexing directives (basically google invented these meta tags
themselves)

google is evil? nope - they really follow these directives.

so why is this indexed?

take a look at

<http://202.212.193.26:555/robots.txt>

    
    
      User-Agent: * 
      Disallow: /
    

the robots.txt is a crawling directive, google can't crawl the (current)
version of these pages, so google doesn't see the indexing directive. but as
crawling is optional for indexing URLs, this gets indexed.

how could this be solved, well: either get rid of the robots.txt or

    
    
      User-Agent: * 
      Disallow: /
      Noindex: /
    

the noindex robots.txt directive is specified nowhere, but it works
nonetheless.

~~~
MiguelHudnandez
Can you elaborate on how crawling is optional for indexing? Isn't crawling a
prerequisite to indexing?

The only exceptions I can think of are scary, like operating a caching proxy
and scraping the cached data. Or scraping data from browsers that have loaded
pages by user request.

~~~
nostrademons
You can discover a URL through finding a link to it on a publicly-accessible
web page, even if crawling that link itself is not possible.

~~~
MiguelHudnandez
Ohh, got it, thank you. So Google is aware that the URL exists, even though
they know nothing about the content served at that URL.

I am just surprised that a URL with no associated content would be included in
the index.

But now that I think about it more, why not? It will not show up except in
extremely specific searches, and in those cases it is useful to the searcher.

~~~
coderdude
I find this behavior annoying. Here's why:

<https://www.google.com/search?q=unicorn+admin>

4th result down (wbpreview.com) is shown in search results despite blocking
crawling/indexing with robots.txt. The result displays "A description for this
result is not available because of this site's robots.txt – learn more" and
the title seems to be auto-generated. The goal was to de-index the listing but
apparently that's not an option.

~~~
gizmo686
As franze pointed out, you can specify not to index in robots.txt (I have not
confirmed this). The intent of dissalowing crawling is ambigous. Maybe they do
not want their content cached, or the extra load on their server, or any
number of reasons. If you need to de-index a site, you should use the
robots.txt directive. If it has already been indexed and you need it de-
indexed quickly, google offers tools to do so [1]

[1]
[http://support.google.com/webmasters/bin/answer.py?hl=en&...](http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1663427)

~~~
coderdude
Thank you for pointing that out to me.

------
tlrobinson
I haven't updated the list of cameras in a few years, and seem to have lost my
script to do it. I'll see what I can do later.

------
schabernakk
if you can find those cams via a google search, doesn't that mean they are
linked from some other public site (which has also been indexed) which would
indicate they were left public intentionally (at least most of them)?

If I would set up a web-accessible cam without password protection, how would
google find it? Its a crawler right? It doesnt just searches for random IPs
and tries to connect to them.

I always was under the assumption that there is a pretty big part of the
internet which is just not indexed by the major search engines (thus more or
less private).

~~~
0x0
Google is amazingly good at digging up sites out of nowhere. I wonder if it is
a combination of URLs passing through Chrome, GMail, any android phone, and so
on. It's always a hassle keeping staging/dev sites out of the index if you're
not careful with all the right meta noindex and robots.txt tags. (robots.txt
with disallow all, on its own, won't keep sites/URLs from showing up in the
results, at best just hide the cached body text summary below the link)

~~~
diminoten
> I wonder if it is a combination of URLs passing through Chrome, GMail, any
> android phone, and so on.

That would be incredibly alarming, and quite possibly the largest breach of
trust perpetrated by a company so far this decade.

~~~
MiguelHudnandez
While having the Bing toolbar installed in IE, any URL one types or visits is
submitted to Microsoft, and they actively use this data to tune their Bing
search results.

<http://www.wired.com/business/2011/02/bing-copies-google/>

I agree it'd be alarming and terrible, but hardly a new development.

Edit: it's doubtful that an e-mail provider would automatically fetch links
from e-mails -- think about them clicking 'unsubscribe' links and links to
reject the transfer of domain names. It would break in very obvious ways. IMs
and texts, on the other hand, might be more opaque to that kind of meddling.

~~~
0x0
It'd be interesting to set up a wildcard dns *.some-experiment.example.com,
and send various <http://links-via-gmail.some-experiment.example.com/somepath>
, [http://links-via-skype.some-
experiment.example.com/anotherpa...](http://links-via-skype.some-
experiment.example.com/anotherpath) through a bunch of services, and see which
domain names and which full URLs show up in the logs!

~~~
MiguelHudnandez
That's a really good idea.

I see it getting quite complicated, though! Dimensions I see are: User's OS,
User-agent, ISP/Cell carrier, Transmission protocol (smtp, xmpp, http),
service provider (google, microsoft/skype, microsoft/msn).

    
    
        android.att.xmpp-gtalk.example.com
        android.verizon.http-gtalk.example.com
        win8.verizon-fios.https-gtalk.example.com
        ios.sprint.skype.example.com
    

Then you might have to also include the sender AND receiver information in the
domain, so based on a single request you could see all possible implicated
parties.

I also thought about putting the sender in the path of the URI, but I think it
should be in the domain name, too. This is because you might get a hit on
robots.txt and in that case, you'd only have one half of the route in the
domain name.

Finally, including everything in the DNS lets you evaluate whether the name
was even resolved, and potentially by whom. Getting a hit that the name was
resolved but not fetched over HTTP gives you information about which services
might be analyzing links in order to queue them for further investigation.

~~~
0x0
Good call on logging DNS, that'd be a very nice early indicator even if no
HTTP requests are sent!

I think maybe the domain should be of the format
"www.encodedonlywithatoz.yourdomain.com" to maximize whatever regex parsers
try to pick up on URLs (i.e, a www. prefix, a .com suffix, and no special
chars). You could encode the dimensions via a lookup table to make it less
verbose and slightly more obfuscated ("aa" = at&t, "ab" = verizon, etc).

You shouldn't expect data in the path info to be preserved, but it'd be a nice
bonus, as you say.

Even more interesting would be some custom DNS software that replies with
perhaps a CNAME or something, where you could encode a unique serial number
per request. If you had a huge IP range available, you could even resolve to
unique IP addresses for every lookup, so you could correlate DNS requests with
any HTTP requests that show up later on. A low/near-zero DNS TTL would come in
handy.

~~~
MiguelHudnandez
I like the idea of encoding the data. Or it can be like a URL shortener, where
the metadata gets recorded, and a short hash is generated. It complicates the
back-end but allows for more comprehensive data storage, and eventual
reporting.

Regarding custom DNS software, I might draw from this excellent write-up
featured on HN recently:

<http://5f5.org/ruminations/dns-debugging-over-http.html>

~~~
0x0
Nice find!

Also, it'd be interesting to just crank the log level to maximum on a normal
piece of DNS software, and post some links around in IM clients and elsewhere,
just to see if anything anywhere kicks in. The experiment could be repeated
(on different subdomains) with a more clever implementation tricks later.

~~~
MiguelHudnandez
I ended up just setting up bind with a wildcard entry, and setting its log
level for queries to debug. It is working now, but I need to build a little
web app to generate the unique links. Also only one DNS server is running at
the moment.

I can't wait to send some around in facebook messages and IMs.

Here's a maiden honeypot link: <http://hn0001.hnypot.info/Welcome-Internets>!

...Though posting it publicly nearly guarantees I will see a hit, I can at
least see if code running on HN resolves it immediately.

~~~
MiguelHudnandez
I ended up just setting up bind with a wildcard entry, and setting its log
level for queries to debug. It is working now, but I need to build a little
web app to generate the unique links. Also only one DNS server is running at
the moment.

I can't wait to send some around in facebook messages and IMs.

Here's a maiden honeypot link: <http://hn0001.hnypot.info/Welcome-Internets>!

...Though posting it publicly nearly guarantees I will see a hit, I can at
least see if code running on HN resolves it immediately.

Edit: There is activity coming in on that name, but mostly it is from browsers
pre-loading DNS to prepare for the next potential pageview. My browser did
this (chrome on Mac). I suppose that is a form of information disclosure we
often overlook. On a page you can inject a link into, you can get some very
basic analytics.

In the 15 minutes following the posting of that link, there have been zero
clicks, 36 IPv4 lookups, 6 IPv6 lookups.

------
davidtyleryork
So this has actually been a pretty common 4chan prank for a while now. People
give you instructions for Googling webcam IP addresses (very typically just an
IP string or something similar) and then try to find something worth sharing.
Just like the printer post above, it's incredible how many webcams are left
completely unsecured.

~~~
eli
I'm pretty confident this actually predates the founding of 4chan. I mean,
it's not rocket science: if there is a device that can be administered via the
web, then you can probably find at least a couple unsecured via a web search.

~~~
vasco
What you wouldn't have though is a large group of people dedicated to actively
monitor what is being captured.

~~~
eli
Don't be silly. USENET? IRC?

------
achillean
This is a similar website that gathered webcam data using Shodan
(<http://www.shodanhq.com>):

<http://cryptogasm.com/webcams/>

~~~
Nux
lol, someone's taking an exam
<http://cryptogasm.com/webcams/webcam.php?id=36739>

awesome resource, connects you to the world in a weird way :)

~~~
icebraining
This is funny: a dog hotel: <http://cryptogasm.com/webcams/webcam.php?id=166>

------
uptown
Looks like Heisenberg is ready to cook:

<http://207.68.47.143:8080/anony/mjpg.cgi>

~~~
mr337
I couldn't agree more

------
brd
In school I had done a little research project to see if there were
opportunities to map out these cameras and use them for disaster response
scenarios. It seemed like a promising approach but I don't believe my prof
ever took it past my proof of concept.

We were even able to track down some cameras located on campus which made for
some hilarious phone calls.

~~~
Ao7bei3s
How?

------
smallegan
Clearly wasn't made for the frontpage of HN, it'd be nicer if it polled them
server side, cached them and then reserved it up refreshing it every so often
for the previews.

------
wyck
This was news about 6 years ago.

------
hippich
So many wasted IPv4s...

------
NathanKP
While we are on the topic of Google indexing things and revealing security
holes I think that VoIP devices should also be mentioned.

I remember when I was taking a network security class in college the professor
was guiding us through the steps required to scan a network for
vulnerabilities, specifically detecting services and control panels which are
left open and vulnerable. Naturally we were using the college network for
this, and in addition to the expected control panels of printers in different
professors' offices I accidentally found the control panel for the school VoIP
system, and it was not properly secured. I believe it was a Cisco system.
Anyway the control panel seemed to offer access to modify various settings of
the college VoIP phone system, with no password protection.

Now granted it could be that I only had access to this because I was doing the
scan from "inside the system" instead of outside via the web, but I'm sure
there are vulnerable VoIP systems which have accidentally exposed their
control panels to the internet.

~~~
gizmo686
>Now granted it could be that I only had access to this because I was doing
the scan from "inside the system" instead of outside via the web

If 'inside the system' means from a University internet connection, that it is
very much a security hole, as anyone physiclly present could exploit it. (Or
at best any person who the Univeristy allows on their network, which is much
larger than the group of people who should be able to touch those settings)

------
kvasan
So anyone want to take on the challenge of building a "Person of Interest"
like system and hook it up to these publicly accessible cams, it would
obviously be less early-warning as its fictionally one. Maybe I would try it
but I assume the computing resources needed would be large, but wouldn’t a
bot-net solve that. Or maybe my first brush of ML experience has left me naive
of its capability’s.

Of course a 'Global Citizen Operative' would have to take action also, it is
not like I proposed a "Global responsive network of autonomous drone to
enforce peace and harmony" to would do that instead.

Sorry for the up in the clouds comment but the merciless hand of insomnia
grabbed me and prompted my mind to wander.

------
bane
For fun, I used to troll open webcams in Japan. The best were the ones you can
pan around and zoom in and out of. Lots of nice vistas, sea ports, city
scenes. I'd put them on in the background and provide some visual "noise"
during the day.

~~~
kaybe
But those are intended to be public, are they not?

~~~
bane
I don't think so. They seem more like security cams or similar. Plus being
able to pan and zoom them around makes me think they really aren't meant for
more than the person who set them up.

------
stickydink
This might raise a few eyebrows...

<http://bit.ly/V4VtJJ>

------
dfamorato
There is actually a crawler for "Machines and Devices", such as routers, IP
Phones, WebCams, Dell Dracs, HP ILO , VMWARE ESX and so on.

<http://www.shodanhq.com/browse>

Also, check your server ip on Shodan to see if your firewall rules are not
exposing a little to much

