

Ask HN: Review my mini-app. Easily validate all the (X)HTML pages in a website. - aarongough
http://aarongough.com/easy_web_qa

======
godDLL
INPUT: <http://google.com/>

Fatal error: Out of memory (allocated 47710208) (tried to allocate 35 bytes)
in /home/aarongou/public_html/easy_web_qa/simple_html_dom.php on line 760

SUGGESTION: Sane-itize input.

~~~
JimmyL
SUGGESTION: Have a limit on how far down (and across) you'll recurse

I got the same error when trying CNN.com

~~~
aarongough
There actually is a limit of 100 pages per submission, however it is obviously
not storing the data for those 100 pages too well.

Looks like tomorrow is de-bugging day!

I wrote this app over the course of a day a couple of months ago and it's been
languishing on my server since then. At the moment it's written in PHP and is
not exactly what you'd call 'well tested'.

I'm considering migrating it to Rails (or Camping). Any suggestions for mini-
frameworks (like Camping) suitable for a single-page app would be welcomed!

~~~
mitchellh
Very cool tool. Useful too. =]

Framework isn't the issue in this case, it just looks like you're loading all
100 pages into memory at one time? Or keeping the previous pages in memory as
you go "deeper" into the hierarchy.

I recommend you just keep a few pages in memory at any given time, keeping a
simple list of the URLs scraped out of that HTML. If you're using PHP remember
to unset() the variables storing the HTML and so on.

I haven't used PHP in a long time and of course I'm just speculating based on
the error message, so slight disclaimer there.

~~~
aarongough
I also meant to say: re: "Framework isn't the issue in this case" I agree. It
is possible to turn out nice code in PHP and write fast, reliable apps. That
being said Rails has me addicted to the idea of unit tests within easy reach.
I'm not exactly a massive fan of any of the unit testing systems that I have
seen for PHP and that is my main reason for wanting to switch away in this
case...

I use PHP all the time, I am just a little dissatisfied with it in light of
the alternatives that are out there.

------
aarongough
I wrote this app to streamline the process of checking medium/large website
for (X)HTML compliance... I basically want to get it out there so that other
web designers/developers don't have to go through the pain of checking
everything manually!

Let me know if you have any suggestions for improvements or find any bugs.

------
pbhj
Is it really that different to the results with
<http://www.htmlhelp.com/tools/validator/> ? You're dragging in W3C results
too it seems.

I find that once I have the site chrome (structural features) valid then I
don't really care too much about small errors that don't impact the
appearance. When you've got code dragged in, with javascript calls, that
doesn't validate anyway (Google, etc.) ...

These things always give millions of entity errors (other people dictate the
URLs used) perhaps an intial view that just says which errors are on a page
(broad categories) - entity errors, failed to close a tab, incorrect
attributes, etc.. with drill down to the error details.

These details in the summary would be good too, as would the number of clean
pages.

I don't like the dark theme.

~~~
aarongough
The reason that both validators are consulted is because they can pick-up
different errors. The WDG validator is _much_ more reliable when it comes to
finding unicode issues and that has saved my ass at least once.

Your other thoughts are things I'll keep an eye on when I do the re-write.

-A

------
tjmc
Useful tool. While waiting for results, a progress bar would definitely be
more useful in terms of feedback than the spinning logo.

~~~
aarongough
Noted! It may not be feasible for the first stage of the process though as the
system does not actually know how many URLs are in the site.

I'll definitely have a look into that though.

~~~
godDLL
You might want to look at doing this in several passes:

\- Build a site-map, like parsers build a syntax tree.

\- Follow that to validate one page at a time.

~~~
aarongough
That is certainly a valid way of doing it and is the methodology that I have
used in several search-engine projects in the past (like this:
<http://intrasitesearchsupport.com/>)

However in this case I am offloading the work of building the site-map to the
WDG Validation service as I didn't want to have to obtain new servers to
provide a free service. This means that I don't get the site-map until the WDG
results come back...

------
natch
Curious: Why would people use this instead of the W3 validator?

And for your fix list:

it's entirety -> its entirety.

~~~
aarongough
Difficulty with the W3C validator is actually why I designed this. Their
validator doesn't allow you to check all of a site in one go. You have to go
through page by page. Even on a relatively small site that can take a lot of
time!

------
bazookaaa
An extremely useful tool. Thank you so much.

~~~
aarongough
No, thank-you! I just hope that it saves some developers from unnecessary
pain!

