

Ask HN: An ideea about a link checker - sirrocco

I've been thinking lately about building an application takes in an url and basically crawls that entire site.<p>It would be an easy way of finding problems with the a site (like 404, 500 ...etc).<p>I tried the webmaster tool from google - but I know I have a 404 link on my site , yet it didn't find it. I also tried some other sites that do crawl a website but they had a limit of 2-300 links and then it would stop. You would then have to buy an app that was a bit expensive.<p>I'm thinking a web app where you can point it at your site and just receive a report when it's done.<p>Would anyone need something like this ?
======
Tichy
Something with a spell checker would be nice.

Edit: actually, no pun intended, even though there is a spelling error in the
title.

~~~
sirrocco
Yeah, my bad about the title. But I don't really see a spell checker as
something you want . If you have .. 1000 pages and I find 1 error in 10% of
them, it would be a lot of work to correct that work. Not sure anyone would
actually go and correct them.

But it could be a premium service i guess.

------
timanglade
I do get the added-value of having it as a webapp for some but as a developer,
I'd rather use Tarantula in my test suite. Hunts down 404s, 500s but can also
do HTML validation, check against common attacks (CSRF, XSS, etc.)

<http://github.com/relevance/tarantula>

~~~
simplegeek
Nice, thanks for sharing. Do you any similar Python software?

~~~
timanglade
Nope, not out of the top of my head. Though designed for Rails or Rack-based
platforms, Tarantula doesn't seem to be too heavily linked to it. I'm guessing
with some minor Ruby work, you can probably run it against any website.

It just emulates a browser session and doesn't actually interact with the
server-side code, just your front-end HTML.

------
edo
Hi everybody. I'm from Linkvive and we've actually been building such a
service for the last few months. We're very excited to see interest in it on
HN and look forward to sharing our service with you guys soon. Signup at our
form (<http://linkvive.com>) and we'll send you a single e-mail when it's live
in the next few months. Cheers!

------
daleharvey
this is on the list of things I have wanted to do at some point.

I believe a wget -spider should help you find any 404's, but I wanted to have
each link validated as well, and itd be nice to have as a simple web service

~~~
sirrocco
Yea, it could validate the html on the page, could show some statistics - like
linkcount, time it took to download the html ( it could then check for the
links of the img tags )

I was thinking that there could be a plan wher you would have a scan a month
to see if any problems appeared in the meantime.

~~~
stcredzero
Offer varying frequencies and opt-in tests. This becomes a "freemium" model
very easily.

Here's a challenge: Could a service also try to detect common problems with
appearance and rendering? Can this be done without some kind of AI akin to
OCR? Certain things would be easier to detect, like content text being
overlapped by another element.

Maybe just Mechanical Turk it?

------
maxklein
Do your research well. There are quite a number of such apps out there
already.

~~~
sirrocco
It would be great if you could give a couple of links, I did search but didn't
find something that made me think : ok - they are doing this so great that
it's next to pointless to even start.

The ones that I found , I didn't really like - which is I'm even asking here -
If I can't find others doing this , maybe nobody wants something like this.

------
zen53
Yep I require a tool like this but already use Xenu
<http://home.snafu.de/tilman/xenulink.html>

