Hacker News new | past | comments | ask | show | jobs | submit login
What technologies are under your site (underthesite.com)
187 points by suyash on July 10, 2011 | hide | past | web | favorite | 87 comments

Hey guys, I made this site and just gave a talk about it at SHDH. Someone must have submitted it. Thanks for all your feedback, I really appreciate it!

Awesome! Suggestions:

Do a reverse lookup on the site's IP and add matchers for the hostnames that show up.

Mention if they're round robining their DNS.

Add nmap OS fingerprinting.

Do a traceroute and log the IP of the closest router (final hop) to the site and add matching for that.

Add a wiki interface and build a crunchbase like app.

Add archiving of data and monitoring over time (as netcraft did in their original app).

nmap OS fingerprinting seems a bit aggressive, but reverse-DNS checking seems like a great way to check which hosting platform a site uses.

nmap is too aggressive. It's a prelude to actual hacking attempts and labeled by IDS systems as such. Don't use it for this or you may end up in legal trouble.

Agreed. UnderTheSite makes a specific point of only considering information that would be returned when a user's browser hits a website. It doesn't scan or probe ports / urls.

Thanks for the great feedback!

Awww, I wish I didn't have to leave SHDH before the lightning talks started!

I wrote a Ruby program to do something similar to what you're doing: https://github.com/jpf/domain-profiler - If you ever start profiling sites using information from places other than what the server returns, perhaps what I've done can help inspire you?

Is there a working example of domain profiler please?

Thanks, I'll check it out!

Cool site! I'm having a lot of fun. However, I was confused by this flow:

First: http://drktd.com/8Jgn

After clicking the 'Add It!' link: http://drktd.com/8Iaf

The wording on the previous page had me expecting a page for manually adding technologies to my site's stack a la Bagcheck.com.

Ah! This is a feature that I plan to add. Do you want to select technologies from a list, or write free-form text about your technology stack?

I'd like to suggest avoiding manual additions of technology for as long as possible. Focus on adding more ways to match specific technologies. After all, a site could always advertise more technologies in server headers or meta generator tags.

What if we propose an extension to humans.txt called technologies.txt where a site can self-describe their stack?

A mechanism for sites themselves to advertise their stack seems like a great idea (though I'd prefer it not occur via fixed URLs like humans.txt or robots.txt, but via headers or meta tags). I'd just suggest not allowing arbitrary additions to a site's stack without any way to verify them.

Follow @underthesite for updates as I implement all of the great suggestions that you guys have made here.

Would love to see if it could detect more scripting languages and frameworks than PHP.

It incorrectly stated that my site uses cold fusion.

What's the site?

I posted it buddy, I liked your demo a lot!

Thanks :)

Minor bug: For my site, it says YUI. Having written it from scratch, I'm pretty sure there is no YUI anywhere. It appears to match jqueryui.js as yui.js.

Link: http://underthesite.com/technologies/YUI-Library/matchers/12...

You're right - This was a community-submitted matcher. I just reported the matcher as inaccurate: http://underthesite.com/technologies/YUI-Library/matchers/12...

Please consider allowing a small icon for a technology, to make lists of technologies more immediately recognizable.

Interesting, but seems a little primitive compared to the cleverness in https://github.com/mitsuhiko/probe

Does probe try and fetch signal urls? (Like admin pages.). UnderTheSite.com makes a point of only looking at data that would normally be fetched by your browser. Probing a site for urls could be considered offensive.

Yes, it does probe at site URLs, you can look at libprobe.py to see all the indicators it takes into account.

Same idea as http://builtwith.com/ ?

Similar, but anyone can submit a new technology. It's a wiki of technology matchers.

Even same idea as http://netcraft.com

I'd add SSL/https. I'd also add "Strict-Transport-Security" and "Content-Security-Policy", both of which can be seen by looking at HTTP response headers.

I love this. I particularly like the broad definition of "technologies", and the variety of ways to write matchers.

More importantly, I like that users can easily add their own technologies and matchers.

I checked my website, http://underthesite.com/sites/lelanggokil.com

Your analyzer mistakenly detected my website using Microsoft IIS and ASP.NET. Which is weird since it also detected Google App Engine (which is correct). ASP doesn't run on GAE.

Just a thought, do you think website owners should mention technologies they use in HTTP header? For example your analyzer can't detect that I'm using Java and Spring framework.

You're correct - I'm looking into this, thanks for the bug report!

I think I've found and fixed the bug, thanks for the report. Does your site show correctly now?

fyi, If you click on the kid on the bottom right corner of http://underthesite.com/ he blinks.

Nice. but you have to click on the kid, NOT!

So yahoo.com is using jQuery now? There's like 10 references to YUI and zero for jQuery in the source... interesting algorithm.

It was found on yahoo.match.com, which is a bug because it shouldn't go off domain. I'll look into it.


Nice tool! Usually when I see a site I like I view source to see what it's made in, but it's not always that obvious.

To test the d3 matcher I just added, I tried the following URLs and get the error "Sorry, we were unable to reach the site":



I could reach them with no problems. These sites are just from some comments on this thread:


Interesting- I'll take a look at this and see if I can figure out what's going on. Thanks for the bug report.

So index.html is treated the same as /, and you return a 404 on /.

Noticed that not a single website is using django. Am I missing something?

I'm not sure that there is a django matcher yet. Django is hard to reliably detect. Do you know of a good signature (like a header or form token) that it always uses?

You could implement "maybe" matchers, that look for stuff which has a good chance of telling you what's underneath but not with 100% certainty.

Now, the following is a hack, but CSRF stuff[0] gives a good indication:

If there's a form, there's a good chance you'll find a construct like:

    <form action="." method="post"><input type='hidden' name='csrf_token' value="thetokenvalue" />
Also, there may be a 'csrftoken' value in the cookie.

You will also most probably find the jQuery function that sets X-CSRFToken on XHRs (see the doc[0] at #ajax). For prototype it'll look like this [1]

[0] https://docs.djangoproject.com/en/dev/ref/contrib/csrf/

[1] http://stackoverflow.com/questions/5551914/protecting-protot...

Nice site guys; reminds me of www.quarkbase.com in some ways. But I like the focus on what's under the hood - keep up the good work.

What triggers the error message "The pattern that you entered appears to be too general - can you make it more specific for this technology?"? I tried to add a technology for the use of rel="nofollow", using an XPath expression //a/@rel[contains(., "nofollow")] , and got that error message. What do I need to do to make it more specific?

Good question, I think rel=nofollow must match too many websites. I'll look into it.

I cannot stand that unreal cogs image in bottom-right corner . Please make sure search works with special characters.

What's wrong with the image?

There are three cogs in a triangle. They couldn't move. That's all the explanation I can come up with - artistically I love the image.

my 2c. Nice design (graphics). I would make the first page seem less busy. BuiltWith is going to be a tough competitor. You need to match them (precision/recall) and add stuff that they don't have (trending techs? Add info: who is the host provider? where in the world is it hosted? Response times? Bad link stats?...?)

It says that my company's site [1] is running ASP.NET on Microsoft IIS, which it's not. To be fair, it also mentions Ruby on Rails, Apache, and Phusion Passenger, which are all correct. Aside from these minor glitches, this is a pretty cool project.

[1] http://identified.com

This should be fixed now.

This doesn't really tell you anything about a site. Most of the interesting stuff happens on the server side.

That's getting less and less true with the amount of stuff happening in Javascript.

Underthesite itself does its work behind a server:


All you can tell from it querying itself is that the work happens somewhere behind rails.

Great idea. How about saving the results you generate and allowing people to search for sites based on the technologies they use. For example, I might want to see all sites using jQuery which are using something other than Apache.

Results are saved just not exposed yet. I'm still trying to figure out how best to do that.

This is cool.

Some bugs in the information returned though.

For example: http://underthesite.com/technologies/WordPress-Batcache-Plug...

"WordPress Batcache Plugin is closed source. "

Batcache is Open Source :)

Fixed, thank you.

Bang on for my app's site! Very cool.

In terms of features, I'd love to see more emphasis on the aggregate/comparison data. For example, most popular server side framework, most popular JS libraries, most popular hosting platforms and so on.

I like it a lot :)

It's not fully-baked yet, but I'm starting to do just that: http://underthesite.com/compare/jquery_and_prototype-javascr...

Thanks for the feedback!

I love how it gets confused by Hacker News: http://underthesite.com/sites/news.ycombinator.com

I challenge you to come up with a matcher for a technology detectable on HN. It's hard!

Great idea - I like it.

I've been using a Firefox add-on for this, http://wappalyzer.com doesn't give as much info, but works fairly good.


Though cannot reach this site: http://www.olin.wustl.edu/pages/default.aspx ...why

I had this same idea a while back, but never had the time to build it. It could be a great way to generate some affiliate money. Good luck!

Good start. Looks like http://w3techs.com/sites does a bit more

http://builtwith.com/ is another great alternative!

builtwith.com does a better job with my site.

Under the Site misses MooTools, which is being loaded asynchronously on my site via Google Libraries API

Impressively, Built With even managed to detect that I was making an AJAX call to StackOverflow's API.

Looks like you've already started to get spam in the technologies list.

I would like to see the list of websites using IIS.

At first I thought this was a copy of lineofthought.com, but it has been created in a way it can stand for it's own.

A Chrome/FF extension would be much more useful.

There is a bookmarklet. Browser extensions are forthcoming.

I'm going to build my next site in UTF-8, just like news.ycombinator.com.

Tested this on a website written in ColdFusion and it's reporting back as ASP.NET

I had the same experience with several sites that I own. No asp on the server, some others report just apache or IIS and completely miss ColdFusion.

Even Adobe.com, a ColdFusion site, doesn't list it.

I tried ColdFusion powered metafilter and it reported nothing.

Please add a matcher for ColdFusion - I don't know it, but this is a wiki, so anyone can add it.

I just understood the concept of tech matcher, which could be an incredibly powerful concept. I think you should highlight this more.

does it recognize php frameworks - like cakephp, codeigniter,

or libraries like flourish

http://lineofthought.com/ is much better

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact