

What technologies are under your site - suyash
http://underthesite.com/
Super Happy Dev House at Google today lightning talk presenter - Ken
======
tectonic
Hey guys, I made this site and just gave a talk about it at SHDH. Someone must
have submitted it. Thanks for all your feedback, I really appreciate it!

~~~
mmaunder
Awesome! Suggestions:

Do a reverse lookup on the site's IP and add matchers for the hostnames that
show up.

Mention if they're round robining their DNS.

Add nmap OS fingerprinting.

Do a traceroute and log the IP of the closest router (final hop) to the site
and add matching for that.

Add a wiki interface and build a crunchbase like app.

Add archiving of data and monitoring over time (as netcraft did in their
original app).

~~~
JoshTriplett
nmap OS fingerprinting seems a bit aggressive, but reverse-DNS checking seems
like a great way to check which hosting platform a site uses.

~~~
16s
nmap is too aggressive. It's a prelude to actual hacking attempts and labeled
by IDS systems as such. Don't use it for this or you may end up in legal
trouble.

~~~
tectonic
Agreed. UnderTheSite makes a specific point of only considering information
that would be returned when a user's browser hits a website. It doesn't scan
or probe ports / urls.

------
chime
Minor bug: For my site, it says YUI. Having written it from scratch, I'm
pretty sure there is no YUI anywhere. It appears to match jqueryui.js as
yui.js.

Link: [http://underthesite.com/technologies/YUI-
Library/matchers/12...](http://underthesite.com/technologies/YUI-
Library/matchers/124?site=zetabee.com)

~~~
tectonic
You're right - This was a community-submitted matcher. I just reported the
matcher as inaccurate: [http://underthesite.com/technologies/YUI-
Library/matchers/12...](http://underthesite.com/technologies/YUI-
Library/matchers/124?site=zetabee.com)

------
JoshTriplett
Please consider allowing a small icon for a technology, to make lists of
technologies more immediately recognizable.

------
kingkilr
Interesting, but seems a little primitive compared to the cleverness in
<https://github.com/mitsuhiko/probe>

~~~
tectonic
Does probe try and fetch signal urls? (Like admin pages.). UnderTheSite.com
makes a point of only looking at data that would normally be fetched by your
browser. Probing a site for urls could be considered offensive.

~~~
kingkilr
Yes, it does probe at site URLs, you can look at libprobe.py to see all the
indicators it takes into account.

------
mikelbring
Same idea as <http://builtwith.com/> ?

~~~
tectonic
Similar, but anyone can submit a new technology. It's a wiki of technology
matchers.

------
mike-cardwell
I'd add SSL/https. I'd also add "Strict-Transport-Security" and "Content-
Security-Policy", both of which can be seen by looking at HTTP response
headers.

------
JoshTriplett
I _love_ this. I particularly like the broad definition of "technologies", and
the variety of ways to write matchers.

More importantly, I like that users can easily add their own technologies and
matchers.

------
wiradikusuma
I checked my website, <http://underthesite.com/sites/lelanggokil.com>

Your analyzer mistakenly detected my website using Microsoft IIS and ASP.NET.
Which is weird since it also detected Google App Engine (which is correct).
ASP doesn't run on GAE.

Just a thought, do you think website owners should mention technologies they
use in HTTP header? For example your analyzer can't detect that I'm using Java
and Spring framework.

~~~
tectonic
You're correct - I'm looking into this, thanks for the bug report!

~~~
tectonic
I think I've found and fixed the bug, thanks for the report. Does your site
show correctly now?

------
christangrant
fyi, If you click on the kid on the bottom right corner of
<http://underthesite.com/> he blinks.

~~~
arc_of_descent
Nice. but you have to click on the kid, NOT!

------
bstar
So yahoo.com is using jQuery now? There's like 10 references to YUI and zero
for jQuery in the source... interesting algorithm.

~~~
tectonic
It was found on yahoo.match.com, which is a bug because it shouldn't go off
domain. I'll look into it.

[http://underthesite.com/technologies/jQuery/matchers/5?site=...](http://underthesite.com/technologies/jQuery/matchers/5?site=yahoo.com)

------
araneae
Nice tool! Usually when I see a site I like I view source to see what it's
made in, but it's not always that obvious.

------
ArchD
To test the d3 matcher I just added, I tried the following URLs and get the
error "Sorry, we were unable to reach the site":

<http://whatdoyouworkfor.appspot.com/index.html>

[http://dustinphoto.iriscouch.com/gerrit/_design/app/index.ht...](http://dustinphoto.iriscouch.com/gerrit/_design/app/index.html)

I could reach them with no problems. These sites are just from some comments
on this thread:

<http://news.ycombinator.com/item?id=2746449>

~~~
tectonic
Interesting- I'll take a look at this and see if I can figure out what's going
on. Thanks for the bug report.

------
aorshan
Noticed that not a single website is using django. Am I missing something?

~~~
tectonic
I'm not sure that there is a django matcher yet. Django is hard to reliably
detect. Do you know of a good signature (like a header or form token) that it
always uses?

~~~
lloeki
You could implement "maybe" matchers, that look for stuff which has a good
chance of telling you what's underneath but not with 100% certainty.

Now, the following is a hack, but CSRF stuff[0] gives a good indication:

If there's a form, there's a good chance you'll find a construct like:

    
    
        <form action="." method="post"><input type='hidden' name='csrf_token' value="thetokenvalue" />
    

Also, there may be a 'csrftoken' value in the cookie.

You will also most probably find the jQuery function that sets X-CSRFToken on
XHRs (see the doc[0] at #ajax). For prototype it'll look like this [1]

[0] <https://docs.djangoproject.com/en/dev/ref/contrib/csrf/>

[1] [http://stackoverflow.com/questions/5551914/protecting-
protot...](http://stackoverflow.com/questions/5551914/protecting-prototype-js-
based-xhr-requests-against-csrf)

------
nbertram
Nice site guys; reminds me of www.quarkbase.com in some ways. But I like the
focus on what's under the hood - keep up the good work.

------
JoshTriplett
What triggers the error message "The pattern that you entered appears to be
too general - can you make it more specific for this technology?"? I tried to
add a technology for the use of rel="nofollow", using an XPath expression
//a/@rel[contains(., "nofollow")] , and got that error message. What do I need
to do to make it more specific?

~~~
tectonic
Good question, I think rel=nofollow must match too many websites. I'll look
into it.

------
fus
I cannot stand that _unreal_ cogs image in bottom-right corner . Please make
sure search works with special characters.

~~~
tectonic
What's wrong with the image?

~~~
Vivtek
There are three cogs in a triangle. They couldn't move. That's all the
explanation I can come up with - artistically I love the image.

------
shuri
my 2c. Nice design (graphics). I would make the first page seem less busy.
BuiltWith is going to be a tough competitor. You need to match them
(precision/recall) and add stuff that they don't have (trending techs? Add
info: who is the host provider? where in the world is it hosted? Response
times? Bad link stats?...?)

------
davidcuddeback
It says that my company's site [1] is running ASP.NET on Microsoft IIS, which
it's not. To be fair, it also mentions Ruby on Rails, Apache, and Phusion
Passenger, which are all correct. Aside from these minor glitches, this is a
pretty cool project.

[1] <http://identified.com>

~~~
tectonic
This should be fixed now.

------
aphexairlines
This doesn't really tell you anything about a site. Most of the interesting
stuff happens on the server side.

~~~
djcapelis
That's getting less and less true with the amount of stuff happening in
Javascript.

~~~
aphexairlines
Underthesite itself does its work behind a server:

<http://underthesite.com/sites/underthesite.com>

All you can tell from it querying itself is that the work happens somewhere
behind rails.

------
arctangent
Great idea. How about saving the results you generate and allowing people to
search for sites based on the technologies they use. For example, I might want
to see all sites using jQuery which are using something other than Apache.

~~~
tectonic
Results are saved just not exposed yet. I'm still trying to figure out how
best to do that.

------
westi
This is cool.

Some bugs in the information returned though.

For example: [http://underthesite.com/technologies/WordPress-Batcache-
Plug...](http://underthesite.com/technologies/WordPress-Batcache-Plugin)

"WordPress Batcache Plugin is closed source. "

Batcache is Open Source :)

~~~
tectonic
Fixed, thank you.

------
nfm
Bang on for my app's site! Very cool.

In terms of features, I'd love to see more emphasis on the
aggregate/comparison data. For example, most popular server side framework,
most popular JS libraries, most popular hosting platforms and so on.

I like it a lot :)

~~~
tectonic
It's not fully-baked yet, but I'm starting to do just that:
[http://underthesite.com/compare/jquery_and_prototype-
javascr...](http://underthesite.com/compare/jquery_and_prototype-javascript-
framework)

Thanks for the feedback!

------
a3_nm
I love how it gets confused by Hacker News:
<http://underthesite.com/sites/news.ycombinator.com>

~~~
tectonic
I challenge you to come up with a matcher for a technology detectable on HN.
It's hard!

------
yeahsure
Great idea - I like it.

I've been using a Firefox add-on for this, <http://wappalyzer.com> doesn't
give as much info, but works fairly good.

------
namank
Cool!

Though cannot reach this site: <http://www.olin.wustl.edu/pages/default.aspx>
...why

------
spontaneus
I had this same idea a while back, but never had the time to build it. It
could be a great way to generate some affiliate money. Good luck!

------
SallyG
Good start. Looks like <http://w3techs.com/sites> does a bit more

------
paraschopra
<http://builtwith.com/> is another great alternative!

~~~
dorkitude
builtwith.com does a better job with my site.

Under the Site misses MooTools, which is being loaded asynchronously on my
site via Google Libraries API

Impressively, Built With even managed to detect that I was making an AJAX call
to StackOverflow's API.

------
JoshTriplett
Looks like you've already started to get spam in the technologies list.

------
Nican
I would like to see the list of websites using IIS.

------
mcorrientes
At first I thought this was a copy of lineofthought.com, but it has been
created in a way it can stand for it's own.

------
ashwinurao
A Chrome/FF extension would be much more useful.

~~~
tectonic
There is a bookmarklet. Browser extensions are forthcoming.

------
jpr
I'm going to build my next site in UTF-8, just like news.ycombinator.com.

------
clobber
Tested this on a website written in ColdFusion and it's reporting back as
ASP.NET

~~~
arkitaip
I tried ColdFusion powered metafilter and it reported nothing.

~~~
tectonic
Please add a matcher for ColdFusion - I don't know it, but this is a wiki, so
anyone can add it.

~~~
arkitaip
I just understood the concept of tech matcher, which could be an incredibly
powerful concept. I think you should highlight this more.

------
eurohacker
does it recognize php frameworks - like cakephp, codeigniter,

or libraries like flourish

------
NARKOZ
<http://lineofthought.com/> is much better

