Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Findka – Personalized recommendations for any type of content (findka.com)
96 points by jacobobryant on June 16, 2020 | hide | past | favorite | 45 comments

(Reposting, again, with permission from dang)

Hi, OP here. I started working on recommender systems in 2016 during my undergrad, specifically doing music recommendation (I was dissatisfied with the quality of Pandora/Spotify recommendations). I spent about five months in 2019 trying to make a music startup based on that. However, during that time I realized that there would probably be more value in having a really good general-purpose/cross-domain recommender system. i.e. if you're looking for something specific, use Google, and if you're not looking for something specific, use Findka. That's the vision anyway.[1][2]

More specifically, the benefits I see from cross-domain recommendation are:

- More data per user => better recommendations.

- More potential users => (eventually) better recommendations. For example, to get users for a podcast recommender, you have to find people who like podcasts above a certain threshold. With Findka, anyone who's interested in getting recommendations for at least one content type is a potential user. (And even people who aren't above that threshold for podcasts might appreciate an occasional podcast recommendation).

- Lots of potential applications. I'm particularly interested with trying to use Findka for social networking (opt-in of course). Data from Findka could be useful for dating, job opportunities, forming online communities, etc. This is more long-term, but I also think Findka data could be useful for search.[3]

The algorithm currently is dead simple. Just collaborative filtering without explicitly taking into account content type. So it's naively cross-domain. Since the data set is still small, there's no need for matrix factorization. I recompute the whole matrix every hour and store it in memory. See [4] for the implementation (it only took 30 LOC). That's a little out-of-date but the general approach hasn't changed.

For the tech stack, I'm using a Clojure web framework + deployment solution that I made.[5] It's like a self-hosted version of Firebase (I'm running it on DigitalOcean).

[1] https://www.ben-evans.com/benedictevans/2015/6/24/search-dis....

[2] https://www.aaai.org/ojs/index.php/aimagazine/article/view/2....

[3] https://news.ycombinator.com/item?id=23449754

[4] https://findka.com/blog/rec-sys-in-30-lines/

[5] https://findka.com/biff/

> I started working on recommender systems in 2016 during my undergrad, specifically doing music recommendation (I was dissatisfied with the quality of Pandora/Spotify recommendations). I spent about five months in 2019 trying to make a music startup based on that.

Hi! I'm very interested in this domain, I have long been dissatisfied with the results for music recommendation across almost every platform (and the respective apps, but I digress). So frequently it seems my own personal preferences are washed away in the breadth of ML and algorithmic recommendation systems, that no matter what I begin listening on, I will invariably find myself within the mainstream for that genre/artist. It also seems to me that many algorithms can't properly deduce that I would want songs centered around a specific year, and instead seem to draw from what's "popular" in the genre regardless of the time period the song is starting from.

As an example, I can begin a GPM station based on I Gotta Feeling, a song from 2009. When I start a playlist like that, I nearly never am rewarded with a song from that time period. I tried it just now, it immediately jumped to Roar by Katy Perry, a song from 2014 with an entirely different vibe.

I supposed my question is, how meaningful are the music recommendations Findka produces?

I would love to talk more about the music startup you were pursuing, and what you accomplished there, as well as any roadblocks you may have faced. I have been whittling down an idea for a music startup myself and can't help but wonder where that road led for you.

Also your site link is broken/not resolving.

I'd love to chat more about music etc. You can shoot me an email at hn at jacobobryant.com. Definitely a pain point I feel as well, though another reason I switched to Findka is because I figured it would be a lot easier to bootstrap it. The music industry is pretty tough to get started in.

As far as the current music recommendations; it's nothing intelligent. Just pure collaborative filtering, i.e. it has no understanding of "this is a rock song" or "this song is from 2005" or even "this is a song". Right now, the algorithm just sees a bunch of URLs and their rating data. I am interested in making the algorithm more intelligent over time though.

Could the site not resolving be an issue with your network? It Works For Me, and I'm getting plenty of traffic right now.

Pretty surprised that the music recommendation results you've been experiencing have been quite poor. Your feedback would have been invaluable in retraining the algorithm to make recommendations that suit your preference. Feels like a lost opportunity. Moreover, I don't know if much thought around self-discoverability/popularity has been factored into the design of the recommendation system - ideally, the recommendation system should surface interesting things that I like that I wouldn't necessarily discover on my own.

I've wondered about this a lot. It seems to be hit or miss--some people love e.g. Spotify's discover weekly/daily mix playlists, and some people hate them. I suspect that some users are outliers and it's hard to give them recommendations, and maybe tuning the system for them decreases the performance for other users. Also there may be limited business value in optimizing for those users anyway vs. pushing popular music.

I suspect you’re right about trading off between pushing popular music vs tuning for outlier users. Feels like there are too many smart engineers to miss it. Plus you’re building a personalisation system which should be personalising things for individual users by default.

I do think the idea has a lot of merit. I've seen collaborative filtering work very well in movie recommendations, and I've observed in real life how often social groups overlap with multiple similar interests.

I think that scaling will be a challenge, both computationally and gathering enough data to be meaningful. The biggest wins, I suspect, are when you can infer people's preferences incidentally (say, from watching what podcasts they actually talk about on social media) rather than from their self-reported preferences, where conscious intentions often override true emotions.

But I'd love to see this work. I know there are things out there that I'd enjoy but don't manage to connect with.

I've thought a fair amount about inferring preferences (i.e. explicit vs. implicit feedback). Particularly for music, tracking skip data and feeding recommendations to the music player would make it a lot more convenient. My plan is to integrate more deeply with specific content types as time goes on/after the business is working.

This sounds a lot like Google, which is in a sense a giant cross-domain recommender system.

Yeah. I think of recommender systems as one level of abstraction/automation above search engines: the only difference is that with recommender systems, you don't have to type in a search query.

Very cool and nice vision statement!

Quick heads up, the book links aren't working for me.

Thanks for the heads up. Turns out the default url that google books returns often gives a 404. I fixed that last night but haven't updated all the previously imported books yet.

Couple of ideas for you

- Consider injecting information with "oracles" An oracle is a kind of virtual user that likes one thing and only one thing. For example they only watch movies that have been tagged sci-fi. This sci-fi oracle adds information about sci-fi-ness to your data which is useful for several things. It helps with the cold start problem as new items can be automatically tagged by the appropriate oracles and get past the zero information horizon quickly. Also you can measure a users sci-fi affinity by measuring that users similarity to the sc-fi oracle.

- Another way to think about co-occurrences is as connected nodes in a digraph. You have users and items and connections between them (user watched video). Start with an item and traverse all the links to the other side (all the users who watched this video) then for each user traverse to the items side (you can roll up the occurrences for a score) and you have similar items. Works equally as well for finding similar users.

- Create an "average user" and use that as a seed for new users. If we know nothing else we should expect a new user to be close to average. This means they will probably get recommended the most popular items but

- Find items with divisive scores or groups and ask new users their opinion on those items to find out about them. After a new user gets created consider asking them their opinion on five of these divisive items. Their ratings should swiftly put them in an informed space the way taking five steps down a binary tree does a lot to reduce search space.

- I like the way you use simple plus one smoothing for your scores. I'm not sure why this doesn't get used more often.

Good luck with the project!

Thanks, I'm interested in all these ideas. I'm working on figuring out a marketing channel at the moment, and then once Findka is growing consistently I'd like to focus primarily on recommendation quality and experimenting with ideas like these. (If things are going really well, hopefully I can hire a couple people to continue working on marketing/UX etc while I work on the algorithm).

Reminds me of Gnod, the Global Network Of Discovery:


For music, Gnod definitely gives me better suggestions then iTunes and Youtube.

Definitely. Gnoosic in particular was an inspiration for the current iteration of Findka (https://www.gnoosic.com/).

I am going to share a piece of anedoct.

There once was a very nice German-based application called Foundd. The app was soecialised in movie recommendation and it was awesome: great interface, good filtering, good recommendations.

Then, they introduced TV shows recommendations. As soon as I started rating TV shows, the quality of my movie recommendations plummeted. I guess my general dislike for TV shows wasn't helping the algorithm.

It seems really hard to build useful cross-domain recommendations.

I love the sound of this. But wouldn't it be productive for users to initially "seed" a category (say, Movies) by entering a handful of films they liked and ones they didn't like?

Rather than being at the mercy of suggested movies which, in my experience of using Findka, 80% I hadn't watched. Thus I couldn't like or dislike so the system carried on showing me more and more content suggestions that weren't increasing in relevance as they had no input data.

I'm sure by simple maths over extended use I would inevitably see movies I could express a preference on, but I must be on "Refresh 50" now and only managed to vote on a tiny number of suggestions.

Also in the name of speeding up the process of identifying movies (or whatever) that I liked and disliked, would it be possible to like/dislike all three suggested items per screen before moving on to the next batch? atm as soon as one "vote" is placed the system throws out three more - when I could well have wanted to express a view on the other items that were presented.

Findka is a great idea btw! :-)

Both of those things are in fact implemented :). For seeding items, go to the "Add" page and there's a search bar (on desktop, it's in the sidebar; on mobile, you have to open the hamburger menu). I'll probably move the search bar to the main page to make it more discoverable.[1]

When you rate an item, are you sure all three items get replaced? If you rate one item, the ones below it should be moved up, and then the item on the bottom will be new. So if you rate the top item, you'll see a visual change in all items, but the other two items will still be there. Perhaps I should replace rated items in place without moving the others.

[1] https://news.ycombinator.com/item?id=23542880

Neat idea, this could become a valuable resource for discovering interesting things.

From a UI perspective: Maybe there could be a searchbox for books, songs etc. so that one can quickly enter things one likes. With the current system of entering preferences for the 3 suggested items, the problem is that I don't know many of the items and so I can't enter a preference.

There is such a search box. On desktop, click on "Add" in the sidebar. On mobile, you'll have to open the hamburger menu. There's also URL input for articles/content types I haven't added search for.

Thanks for the reply - that's great. I'm on mobile and looked for 'search', but now I see how to do it.

I thought about calling it "search" but went for "add" since that is technically more accurate with the URL inputs--and it's already a fairly common occurrence for people to enter search terms in those instead of URLs. Though your comment makes me thing that maybe I should just call it "search"...

Another option could be to add a search box to the landing page which when submitted takes you to the 'add' page

I'd thought about putting the search bar on the main page but wasn't sure how to make the UX good. It never occurred to me to just switch the page. I'll probably do that, thanks.

Biggest advise I can give is that you should do a better job at content seeding to ensure that first time users have a good experience. Otherwise, you'll never get better from more usage because people won't have a reason to stick around. Then your collaborative filtering will struggle as your user likes become more and more sparse.

It's an interesting idea. Do you have reason to believe that people's preferences in different areas are correlated? Not saying they're definitely not, but that's my first skeptical thought.

I think some areas will be correlated and some will not. e.g. I see plenty of overlap between articles, podcasts and books, but less overlap between those things and music.

I'm pretty curious to see what the correlation ends up being.

I agree. I think it's definitely worth trying and maybe you can update the algorithm later to take advantage of the domain correlations you find.

That being said, music etc. is at least correlated with demographics. I could see it being useful to a degree for recommending other content types.

Id be surprised if you get much signal out of demographic info.. but I would think music correlates more with general aesthetic taste which may transfer.

Very interesting. Learned about Findka in your music recommendation service newsletter.

- I'd like to change how many recommendations I get with each newsletter.

- I'd like to add some music, but it didn't find anything I searched. I could just add a link. That would however link the music with the source/url, which does not make any sense semantically.

- If you don't rate suggestions (because you don't know them), they'll soon appear again. Maybe it would be nice to block them in the current session.

Thanks for the feedback.

- I'll add an option for this.

- What songs were you searching for? The search feature uses Last.fm for music. They've had pretty much everything I've ever searched for, but maybe they're missing certain categories. I might investigate other/additional search APIs in the future. In the mean time, adding songs via URL is a decent option. The way the algorithm is implemented currently, it won't make a difference, and eventually I'm planning to add a cron job that'll go through the URL items and classify them as the correct content type and fetch additional metadata (with manual intervention as needed).

- I started working on this today; should be fixed tonight or tomorrow.

- Cool!

- Some swiss german music. However a search for "Not afraid" didn't return anything too... Didn't manage to get any result so far.

- Again: nice.

It's very motivating to get a response this fast and you seem to care about the input/feedback from users. The world needs more of that, thank you. :)

EDIT: My bad, uBlock Origin was blocking requests to ws.audioscrobbler.com. Can search for music now.

Ah, glad to hear it's working now. I was just about to ask you to check the network tab. Also, I pushed an update last night that renames the refresh button to "skip all" and keeps track of which items are skipped, so items won't show up again after you've seen them once (there's another button under Account which will reset the skip history).

Also you're welcome! It's been really nice to actually have some users, a luxury which escaped me in previous startup attempts.

This reminds me of Stumbleupon[1] from back in 2005-ish. I used to love using it before they decided to pivot and remove all of the things that made it great.

[1] https://en.wikipedia.org/wiki/StumbleUpon

Just tried findka out and going back to my comment about Stumbleupon, what I used to love the most about it was reading the reviews about the content I ^stumbled upon^ before even looking at the content. Pretty much the same manner in which most people use HN.

Additionally, reviews led to discovery of people with shared interests, when I saw the same people liking, sharing and commenting on stuff I liked. Following these people then added another layer of filtering to my ^stumbles^.

Might be food for thought.

I've thought about how this relates to Stumbleupon as well. I never used it myself, but it came up while I was looking for things similar to Findka.

I'm actually just starting to implement features like these. If you make an account then go the the Account tab, you can enable a public profile like this[1]. So far I'm planning to add follow/subscribe, commentary (i.e. comment on items instead of just rating them), and showing users who've rated the same items as you.

[1] https://findka.com/u/f962c926-43e1-406d-825d-3d7a4880befa/

You mention storing the matrix in memory and recomputing hourly because of the small data set. What are your estimates for scaling ceiling with current implementation? How many users / data-items (order of magnitude) do you think you can handle with current set-up?

I haven't thought too deeply on it yet, but:

- ram usage today has gone from ~1.1GB to ~1.6GB

- DigitalOcean prices ram linearly at $5/GB (I'm currently on a $10/month droplet)

Today is a big spike obviously (the number of thumbs up/thumbs down events have gone from 3K to almost 10K today, vs going from 0 to 3K since February). But say I continue to grow ram usage at 0.5GB per week (growth hopefully won't be linear, but I'd say that's a pretty steep linear growth rate for now). That means my hosting costs would increase at a rate of $10/month which is not bad at all.

That's probably the crappiest scale estimation ever made ha ha, but I believe at least that I should have plenty of time to figure out a reliable marketing channel before I do any major re-architecting. Maybe some time I'll do a stress test on my laptop to get a better sense of how RAM usage increases with additional data.

I've been at 10s of weekly active users. I guess this should handle 100s just fine, but maybe not 1000s.

I've been using findka for a few months, and love getting relevant recommendations every Friday. I've discovered lots of new music, books, movies and articles through it.

This would be much better if it allowed you to import ratings from other popular rating services (letterboxd, goodreads, RYM, etc.)

I did this in a previous version of Findka. I had integrations for Last.fm, Spotify, Youtube, Goodreads, Pocket, RSS and a couple others I think. I did a "reset" on Findka in February, removing most of the features in order to make it simpler. Eventually I would like to add integrations back in. I'm working on some social networking features right now--hopefully that'll work as a reliable user acquisition channel. Once I've figured out a way to keep getting new users, I'll be able to spend more time on things like integrations and improving the algorithm.

Also I hadn't heard of letterboxd or RYM. I'll check those out more when the time comes.

> Eventually I would like to add integrations back in

I might be missing the big picture, but wouldn't log-in with oauth[0] solve integration problem and removed impediment from new users coming in? Also, if you do not mind, which algo do you use to find matches?

[0] https://en.m.wikipedia.org/wiki/List_of_OAuth_providers

Edit: formatting

I don't think it'd make any difference. In the list you linked, only a few of those provide log-in with oauth. Letting people log in with Google won't let us import data from e.g. Letterboxd etc. Even if Letterboxd provided log-in with oauth, you'd still have to go through another authorization flow if you wanted to import data from an additional service.

The algorithm uses a simple item-based neighborhood model. i.e. if you like song A, Findka looks up all the content liked by other users who liked song A and probabilistically chooses an item that was well-liked. To help the algorithm keep learning, 35% of the recommendations are purely random ("epsilon-greedy"). I describe the implementation here[0], though it's changed slightly (now I export the database every day or so and generate a model on my laptop, then I load it into memory on the server). I experimented with a machine learning-based model (a latent factor model) last week, but it seems I don't yet have enough rating data for that to be useful.

[0] https://findka.com/blog/rec-sys-in-30-lines/

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact