

Predicting the Oscars with Machine Learning - mikemacpherson
http://blog.megafaunasoft.com/2013/01/predicting-oscars-with-machine-learning.html

======
aficionado
I used Megafauna Software parsed data to quickly create a model using BigML:
<http://bml.io/W6lKqe> It uses information gain mix to create a decision tree.
It also concluded that winning the 'best director' is the most important
field. However, 'make up' didn't top the most important fields. See the
results below.

    
    
        Data distribution:
        n: 16.94% (82 instances)
        y: 83.06% (402 instances)
    
        Predicted distribution:
        n: 16.74% (81 instances)
        y: 83.26% (403 instances)
    
        Field importance:
        1. Directing_1: 84.47%
        2. Music (Scoring)_0: 3.59%
        3. Music (Scoring)_1: 2.68%
        4. Art Direction_0: 2.39%
        5. Sound_0: 1.75%
        6. Film Editing_0: 1.55%
        7. Costume Design_1: 1.40%
        8. Film Editing_1: 0.82%
        9. Costume Design_0: 0.54%
        10. Cinematography_1: 0.52%
        11. Art Direction_1: 0.29%
        12. Special Achievement Award_0: 0.00%
        13. Scientific and Technical (Bonner Medal)_1: 0.00%
        14. Foreign Language Film_0: 0.00%
        15. Sound Editing_1: 0.00%
        16. Scientific and Technical (Bonner Medal)_0: 0.00%
        17. Sound Editing_0: 0.00%
        18. Scientific and Technical (Academy Award of Merit)_1: 0.00%
        19. Sound_1: 0.00%
        20. Animated Feature Film_1: 0.00%
        21. Scientific and Technical (Academy Award of Merit)_0: 0.00%
        22. Documentary (other)_1: 0.00%
        23. Animated Feature Film_0: 0.00%
        24. Music (Song)_1: 0.00%
        25. Documentary (other)_0: 0.00%
        26. Actress -- Supporting Role_1: 0.00%
        27. Short Film (Live Action)_1: 0.00%
        28. Music (Song)_0: 0.00%
        29. Documentary (Short Subject)_1: 0.00%
        30. Actress -- Supporting Role_0: 0.00%
        31. Short Film (Live Action)_0: 0.00%
        32. Documentary (Short Subject)_0: 0.00%
        33. Actress -- Leading Role_1: 0.00%
        34. Short Film (Animated)_1: 0.00%
        35. Documentary (Feature)_1: 0.00%
        36. Short Film (Animated)_0: 0.00%
        37. Makeup_1: 0.00%
        38. Scientific and Technical (Technical Achievement Award)_1: 0.00%
        39. Actress -- Leading Role_0: 0.00%
        40. Documentary (Feature)_0: 0.00%
        41. Actor -- Supporting Role_1: 0.00%
        42. Makeup_0: 0.00%
        43. Actor -- Supporting Role_0: 0.00%
        44. Scientific and Technical (Technical Achievement Award)_0: 0.00%
        45. Jean Hersholt Humanitarian Award_1: 0.00%
        46. Directing_0: 0.00%
        47. Actor -- Leading Role_1: 0.00%
        48. Scientific and Technical (Special Awards)_1: 0.00%
        49. Jean Hersholt Humanitarian Award_0: 0.00%
        50. Actor -- Leading Role_0: 0.00%
        51. Writing_1: 0.00%
        52. Scientific and Technical (Special Awards)_0: 0.00%
        53. Irving G. Thalberg Memorial Award_1: 0.00%
        54. Acting (other)_1: 0.00%
        55. Writing_0: 0.00%
        56. Scientific and Technical (Scientific and Engineering Award)_1: 0.00%
        57. Irving G. Thalberg Memorial Award_0: 0.00%
        58. Acting (other)_0: 0.00%
        59. Visual Effects_1: 0.00%
        60. Scientific and Technical (Scientific and Engineering Award)_0: 0.00%
        61. Honorary Award_1: 0.00%
        62. Cinematography_0: 0.00%
        63. Visual Effects_0: 0.00%
        64. Scientific and Technical (Gordon E. Sawyer Award)_1: 0.00%
        65. Honorary Award_0: 0.00%
        66. Category_1: 0.00%
        67. Special Achievement Award_1: 0.00%
        68. Scientific and Technical (Gordon E. Sawyer Award)_0: 0.00%
        69. Foreign Language Film_1: 0.00%
        70. Category_0: 0.00%

~~~
megafaunasoft
Very cool -- I'm glad someone else could use the data!

BigML seems useful. The BigML decision tree looks quite a lot like the J48
tree result. I wonder if the performance (ROC) is closer to J48 than logistic
regression. I couldn't see an option to cross-validate on BigML.

------
kenjackson
It would be interesting to use some other data to predict the Oscars -- data
that comes out before the Oscars begins, e.g., Golden Globe and other film
festival awards, date the movie was released, box office revenue, availability
on DVD, history of actors/directors/writers (e.g., have they won or been
nominated for Oscars in the past).

------
Breakthrough
Perhaps this illustrates how much of an effect a film's director can have on
it's perception.

As Stanley Kubrick once said, "The screen is a magic medium. It has such power
that it can retain interest as it conveys emotions and moods that no other art
form can hope to tackle."

