

Document Classification for Newspaper Articles (2009) [pdf] - beshrkayali
http://web.mit.edu/6.863/www/fall2012/projects/writeups/newspaper-article-classifier.pdf

======
PaulHoule
Training sets too small, film at 11.

~~~
mark_l_watson
I was not surprised that they didn't use larger training sets. They are
comparing three different approaches, one of which maximum entropy they had
trouble running without reducing the number of features used.

~~~
PaulHoule
With the training set sizes they are using it is no surprise they did not get
good results. They could have done better with them had they used some kind of
dimensional reduction.

However, just from seeing a lot of systems that wind up on the wrong side of
the "commercialization valley of death" I wouldn't start this kind of project
unless I had a bigger training set and some latitude to adjust the categories
to produce something more learnable.

~~~
beshrkayali
I don't know a lot about this topic as I started getting into it recently, but
I think the purpose of this is more of a study than an actual solution ready
for commercialization.

