My name’s Jim, and I created MovieChat.org as an archive and replacement for IMDB’s message boards which are shutting down this week. For those of you not familiar, the IMDb message boards allowed you to discuss any single movie or tv show with others (there was a separate forum for each movie/show). IMDb recently announced they were shutting down the message boards and its users were furious (there's a petition with close to 10k signatures here: https://www.ipetitions.com/petition/petition-to-keep-the-imd...). I ventured out to create an archive (of all the existing posts) and replacement and hence MovieChat.org was born.
Key Features of MovieChat.org:
1. Any movie/show on IMDB is also on MovieChat.org (over 4 million and counting) - we have separate boards for each movie/show, just like IMDB
2. I backed up most of the posts for IMDB’s top 10,000 movies/shows - most existing conversations on IMDB should also appear on MovieChat.org - we have over 3 million posts already (and I'm working non-stop to back up even more from IMDB)!
Please visit http://MovieChat.org, join or start a discussion, and let me know what you think. If you like it, please spread the word. If there’s anything I can improve, just email me (jim@moviechat.org) and I’ll get on it.
Jim
jim@moviechat.org
(IMDb adopted the same practices of other message boards from its era. Compare: the default configuration for software like phpBB and vBulletin [at least in the early 2000s] was to preserve only the top N pages of the freshest threads, and older ones would fall off the face of the site unless they were specifically targeted for preservation and stickied.)
Jim: do you plan to snarf in data from other sources (e.g., archive.org) for the posts that IMDb has already removed during its regular course of operation? Any plans to allow linking new accounts to IMDb accounts? Where does the code for the moviechat.org backend live?
Just throwing it out there, but would you consider making a dump of the data you scraped that could be used by data scientists? Maybe as a torrent or something like that? Data about movies and what people say about them could form the basis of a lot of NLP projects.
What other big datasets are there for forum post text data? The reddit dataset most immediately comes to mind, and I've also seen a similar one for HN comments. Any others?
one item i didnt like is that if i search for a show, i get hits for each episode.
would be good to have one entry per show, and then you can drill down into individual episodes.
also would be good feature on ability to rate shows (as a registered user, but i guess IMDB is still offering that functionality)
One actually hopes that IMDB doesn't in fact have any claim on posts written by users that happened to be hosted on their site.
I did my homework before starting this project and I'm confident we're in a good position :)
I'm surprised there isn't enough interest in people to recognize tMDB as relevant. Especially since iMDB is ignoring users by taking down the forums.
However, I'm curious about exactly how you're going to promote this site. The people currently on IMDB likely don't know much about it, and it's unlikely the administration there will redirect people over when they view the pages for a movie or TV show on their site.
What's the plan to get the users whose data you scraped to come to this new domain and participate again?
Despite your comment and the (well-intentioned) plans from browser vendors to do what they can to squash unencrypted HTTP, failure to use HTTPS, even with Let's Encrypt, is still a totally forgivable sin today.
With nginx:
- `certbot certonly`
- press `2`
- type in your domain name
- press return, done
Add a few lines to your nginx config, done.
```
ssl_certificate /etc/letsencrypt/live/<yourdomain>/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/<yourdomain>/privkey.pem;
ssl on;
```
