Hacker News new | past | comments | ask | show | jobs | submit login
Frinkiac: Simpsons quote search engine (frinkiac.com)
340 points by asicboy on Feb 5, 2016 | hide | past | web | favorite | 84 comments

One of the authors here. I blogged a bit the other day about how we built this: https://langui.sh/2016/02/02/frinkiac-the-simpsons-screensho...

Where did the subtitle data come from? It's got a mispelling of Horatio McCallister in it:

https://frinkiac.com/?p=search&q=captain+mcallister https://www.google.com/search?q=horatio+mccallister

Also when are you going to get seasons 16+? I really need to be able to find Jeff Albertson! (Comic book guy)

Interesting, thanks for the report there. Coincidentally, this misspelling is also present in all the subtitles that Simpsons World uses!

We chose the season 15 cutoff pretty much arbitrarily. We're not necessarily opposed to later seasons, but we'd like to have some better season/episode filtering in place before expanding more.

Make sense with the seasons. Certainly helps when you've got issues like season 11 being off like it is. There's other reports of the search functionality being literal with punctuation and other things to work out. It'd be really neat with more polish and possibly adding in more tv shows. I imagine the data needs are pretty substantial, but I wonder if there might be a good way to deal with that by having the server generate the jpgs on the fly from the better compressed video files. That might really be a big win if BPG or other formats actually take off and begin replacing jpg.

Awesome work. I saw when you announced it on r/thesimpsons. What do you think of Josh Weinstein getting such a kick out of this on Twitter?

Also please give the text a slight drop shadow, if possible.

Ps. This makes me so happy. Thank you for making it.


"We also parse subtitle files and correlate each subtitle line's timecode with the timecode of the screenshot. Finally, the frinkiac binary can upload the data set to frinkiac-server. "

Could you elaborate on this parsing of the subtitle files. I've seen the "open source" star wars gifs file with the dialog and time codes[1], but I'm not sure how they pulled the text from the close-captioning? (edit: someone else something similar...sorry).

Also aside from the two character index search index you describe how are you searching the quotes with postgres? Are you using postgres's full text search[2] or something else?

Thanks, I love the simpsons and this is really cromulent[3] and cool.

[1]https://github.com/LindseyB/starwars-dot-gif/blob/master/sub... [2] http://www.postgresql.org/docs/current/static/textsearch.htm... [3]https://frinkiac.com/?p=caption&q=cromulent&e=S07E16&t=10420....

> Could you elaborate on this parsing of the subtitle files.

Not the creator, but subtitles are easy to find, and super easy to parse.

    00:02:17,440 --> 00:02:20,375
    Senator, we're making
    our final approach into Coruscant.

    00:02:20,476 --> 00:02:22,501
    Very good, Lieutenant.

.srt files are quite straight forward, but some of the other formats can get quite annoying in my experience.

This is fucking awesome.

Consider lucene indexing if you want free fuzzy searching

or Sphinx search, similar to Lucene but coded in C++: https://en.wikipedia.org/wiki/Sphinx_(search_engine)

Are there plans for an API? I think it'd be fun to try and get a Pebble Time app working with this.

There is one - open up the page and look in the web inspector.

Was this done using the DVDs? I'm curious about any potential licensing issues with the screen caps and subtitles. Did you have to get permission/sign something - or does this fall under fair use?

No way this is licensed (no copyright notice, even; not even a mention of Fox), and no way it is fair use. It has frame-by-frame, full resolution images and full transcripts of every episode up for browsing. This is textbook mass copyright infringement. Short of offering unlicensed video downloads for a fee, it could hardly be more clear-cut.

Yeah, it's cool, I get it, but you can't just steal and redistribute content en masse for your cool project. Well, he did, but I expect he'll be hearing from Fox's lawyers soon.

It is arguably fair use in the U.S. I don't think there is enough case law to be sure. It's hard to predict how it would go in litigation. I think you're right that the defendants wouldn't have a particularly strong case, but they wouldn't have the weakest.

The courts have generally judged significant "transformation" of the source material to be powerful in determining fair use. I think that would be in their benefit. Also it could be argued that this has very little effect on the market for the original copyrighted material, which would be in their favor. Of course, the copyright holder would see and argue it differently if they choose to sue. And the "the amount and substantiality of the portion taken" would not look good for the defendants -- but even though some common belief focuses on this factor almost exclusively -- thinking as long as you copy only 10 pages or whatever you're good, and if you don't you're definitely not -- that's not how it works, it's just one factor, and one that the courts in the past couple decades have somewhat de-emphasized.

But I don't think we can say "no way it is fair use", or "it could hardly be more clear cut." It could go either way. Fair use in the U.S. for novel things, not already well established as fair use or not, almost always looks like this.

Counterpoint: copying every single page of every book and making it searchable can be fair use. It just takes only 10 years of litigation and appeals to determine that. See Authors Guild v. Google. https://www.eff.org/document/ruling-appeals-court

Also, the "TV Eyes" system that recorded television newscasts and made the searchable was fair use, though certain features were found to be infringing. See https://www.eff.org/deeplinks/2015/08/dangerous-decision-fai....

Point is, the law is hardly clear cut and never is with new technologies. Without someone willing to take a risk and develop a potentially infringing technology we would never have had VCRs, MP3 players, YouTube.... I applaud the creators for making an incredibly useful resource and I hope if they do face legal threats they get a zealous pro-bono defense from someone like the EFF or Larry Lessig.

This is impressive. It found everything I tried. If the author is reading, showing GIFs or a small video clip instead of a static image would be preferable.

My favorite Simpsons quote: https://frinkiac.com/?p=caption&q=up+and+atom&e=S07E02&t=673....

Coach: Up and atom!

Rainier Wolfcastle: Up and at them.

Coach: Up and atom!

Rainier: Up and at them!

Coach: [annoyed] Up and atom!

Rainier: [louder] Up and at them!

Coach: Better.

Simpsons gifs as a service... al la:


Its open source and maybe could be repurposed?


Ooh, that's an awesome idea, and should be quite do-able technically, animated GIFs.

I love this. It found everything I threw at it. I hope the Fox lawyers don't take it down.


I searched for "moon pie" and didn't find what I was looking for. :(

Yeah, I was saying Boo-urns, and it couldn't find it.

Also, yeah, this is coming down as soon as the lawyers get ahold of it.

They may or may not (be allowed to) have a sense of humor about it. Our 24 Hours of LeMons car's publicity was sent to Matt Groening by a friend, and he passed it around the office. Apparently he asked their publicity folks if they could invite us up to show off the car about the same time that legal asked about sending us a cease and desist.

In the case of the car, it's probably fair use and the only issue was likely that we have non-Fox-approved sponsorship on it, but they probably decided their advertisers wouldn't complain about it because it's not exactly big bucks changing hands here.

So yeah, we got to meet Matt Groening and David X Cohen and Al Jean and a lot of the writers. It was definitely a cool experience.


The spelling in the captions can be hard to predict, and it's not good at fuzzy matching:


Or even with just one quote:


It looks like it uses really naive word breaking, so it considers the punctuation part of the word.

They could definitely use some better text indexing/relevancy ranking implementations. I had mixed success. I'd recommend lucene or something based on lucene (Solr, ElasticSearch).

Just in time for the Grammys!


Once the AV Club finds this I think a black hole will open and consume us all. The website is quite cool though!

If I could use this to get subtitled gifs of the scene in question, not just screenshots, it would go from amazing to godlike. On the roadmap for v2, hopefully?

For me I don't get gifs, just a list of stills from the same scene

I think he's making a joke about clicking the "next image" fast enough that it appears to be animated.

I tell people every day that you don't win friends with salad. Glad I finally have the images to go with it!

This is great and long over due.

For those of use who grew up having conversations in simpsons dialog, this will help provide those in my wife who don't have such habits develop them :)

small point - the encaptionator should put a 1/2px black stroke around the white text so it is visible against any background colour

edit - after reading the FAQ I see you are working on this

I withdraw my question https://frinkiac.com/?p=caption&q=withdraw&e=S08E14&t=688870....

This is the reason the internet exists.

Some great screencaps compiled in this article:


I can't believe how fast it is.

Who can write a Simpsons quote search engine?


Any chance of OCR? I searched for "Pharm Team", which was the name of the company at https://frinkiac.com/?p=caption&q=major+league+baseball&e=S1... though the name was never said.

This is great! My only complaint is that it comes up with lots of near duplicates. The images look they are different frames, but the quotes they reference are the same

I think that's a feature. You get to choose which frame you prefer.

This is amazing. I hope there's an api for this

Break out your Chrome inspector and follow along in our exciting home version:


Results look like this:


Concatenate the episode and timestamp to get the image:


Caption here:


Look for the Subtitles array:

"Subtitles":[{"Id":138914,"Episode":"S13E08","StartTimestamp":794266,"EndTimestamp":796533,"Content":" ( gavel pounding ) So, Professor,"},{"Id":138915,"Episode":"S13E08","StartTimestamp":796533,"EndTimestamp":799834,"Content":"tell us about Operation Hoyvin-Mayvin."}]

You'd have to be stupider than a monkey to not like this. Are you stupider than a monkey?


I made this Chrome extension to generate animated GIFs from frinkiac https://chrome.google.com/webstore/detail/frinkiac-gif/dlaba...

on the legal/lawyer talk tip - there have been a few notable other simpsons screencap repositories (like Lardlad) that have remained online for years. Wondering if there's some leeway or can't chase after a single frame (rather than video with picture and sound, which they are notoriously strict on youtube about etc)

This is a great tool. Any copyright issues? I tried it, "but it disappeared into 'fat air'."

Every time you type a character in the search box, it adds a browser history. That is not great...

They should be using replaceState, it looks like they are using pushState instead maybe.

You're right! But that behavior should be a little better now.

No Milpool! :(

More seriously:

1) Awesome!!!

2) It would be great if the search results page listed the quotes in addition to showing the images.

It totally does have milpool:


It's just that nobody ever says "Milpool" in the dialogue.


curious about how it works/was developed https://news.ycombinator.com/item?id=11036894

First thing I searched for: "Kids, you tried your best, but failed miserably. The lesson is, never try." Got the exact episode. This is great :)

I was hoping to find the quote in which Grandpa Simson mentions Estes Kefauver, but searching for "Kefauver" yields no results. :-(

It looks like there's no episodes indexed past season 15, and this quote is from season 20.

https://frinkiac.com/?p=episode&e=S20E14 Should be the episode.

Authors: Can you describe the backend infrastructure?

I'm just a bit curious here about the costs of running a toy service like this.

"And that is why The Lord of the Rings can never be filmed!"

Stumped ya Frinky. It didn't have to go down like this.

This is an amazing feat of human ingenuity.

Awesome. I could find an episode about "tiger-repellent rock" just by searching for "tiger".

Much cooler than expected. So I assume this is fairly trivial to adapt to any other set of subtitled videos?

"Hi Supernintedo Chalmers" LOL...this is freaking awesome. GIFs would be an improvement :)

Though you may be rat-like in appearance, you are truly king among men for sharing this!

also, why didn't this get picked up in the duplicate post algo HN? For the blog writeup @reaperhulk you should have put 'Show HN' in your original post to get more traction or something

I'm getting a nothing found error? Is this a mobile bug?

Getting the same results from my laptop at the moment also. Everything is returning 'Nothing Found' Error. It was working earlier today. (Can you tell it is Friday?)

needs a random button

How is this so fast?

Brilliant. just what I need to needle the wifey


Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact