Peer-review and constructive criticism are valuable forces for change and improvement.
Clearly there's some personal politics going on (insults having been traded) but jgc does raise some good points: the original code is very tightly-coupled code to whatever Berlios outputs, and would incur a large maintainability cost.
It's also bought BeautifulSoup to my attention, which seems quite a neat utility built exactly for this sort of thing.
For these reasons it's interesting, and is why I've upvoted it.
It would be foolish of me to pretend and try to hide my dislike for Eric Raymond, but it comes down not to a personal problem, but a problem with the way he presents himself.
In this case, the juxtaposition of the quality of the code (which was truly poor) and Raymond's opinion of himself (recall that he claimed to be a Core Linux Developer at one point) made plain the problem that many people have with the man.
Interestingly, his response to my criticism was not to say something like "You are right, but don't be nasty about it". Instead he wrote a blog posting going on about how right he is. Oddly, I share his concerns about HTML parsing, but I think he's wrong to not use an HTML parser and do everything by hand. Having done a lot of screen scraping work dealing with all the edge cases is a pain.
>Oddly, I share his concerns about HTML parsing, but I think he's wrong to not use an HTML parser and do everything by hand
It looks like he didn't really understand how BeautifulSoup (or for that matter, xpath) works. For example, he seems completely unaware of the '//node' syntax which would completely sidestep the issue of encoding structure in the code.
And in his defense, someone who has not used BeautifulSoup might not realize how robust it is. I was deeply impressed with it when I used it for a scraping project - it's really good at handling tag soup and giving you powerful access to the parse tree. I would not have expected one tool to perform well in both those areas.
ESR probably just made some assumptions about the capabilities of available parsing options that would be correct but for a few exceptional tools. Let's fault him for not checking his assumptions rather than name-calling about poor programming.
but in reality, his ego is quite fragile. My theory is that the left-over dregs of being a kid with CP has left him a) prone to overstatement and b) quite shy of being challenged.
I actually got him to admit (in public, on his blog) that half of all he claims is untrue.
Clearly there's some personal politics going on (insults having been traded) but jgc does raise some good points: the original code is very tightly-coupled code to whatever Berlios outputs, and would incur a large maintainability cost.
It's also bought BeautifulSoup to my attention, which seems quite a neat utility built exactly for this sort of thing.
For these reasons it's interesting, and is why I've upvoted it.