Hacker News new | comments | show | ask | jobs | submit login

Garbage in, garbage out. If your data set comes from scraping wikipedia, you're going to have these kinds of flaws and omissions. What's the alternative though, other than to hire an army of grad students? If flawed, at least it's interesting. I wish the author had gone into more detail about counterintuitive results (like Rommel), but part of the point of an exercise like this is to find instances where the model disagrees with common wisdom. If you jump straight to rejecting the model without asking why, you don't learn anything.

Some other thoughts:

- Analysis is limited to results of individual battles. That's a very narrow slice of a general's actual job.

- He ties WAR to overall W/L, which isn't great but the data doesn't give you many options.

- The model rewards "underdog" wins. This sounds like a decent proxy for skill, but it seems like a big part of the job is avoiding being an underdog in the first place.

- Army size and casualty figures for anything pre-17th century (and that's generous) are extremely suspect.

Agreed. Trying to compare generals from ~10 different centuries without taking into account the specifics of each campaign, and context of those campaigns renders the results useless.

I wouldn't trust this model to compare Montgomery to Patton during the Sicily invasion, let alone compare Caesar to Napoleon.

One thing I would be curious at generating from this data set, given the historical period spanning, would be correlation between expected battle outcome & actual battle outcome.

As someone who picked up a computational military modeling course in college, attempting to model ancient warfar is a vastly different task than modern battles.

My gut would be that modern warfare de-correlates more strongly from numeric advantage due to increased speed and lethality of available force types.

Also, for the author, if you wanted to be more accurate, start calculating actual expected outcomes from the forces. Lanchester's Laws are as good a place as any to start.


Eh, scraping any encyclopedia would introduce insurmountable bias and error. As you say, only primary sources are usable here.

Yeah, didn't mean that as a knock against wikipedia specifically.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact