There's a tangential problem I've been thinking about a lot recently: the long tail of open source projects on GitHub.

There are 20M+ repos on there. Which leads to a serious discoverability + signal/noise problem around 'true' open source projects. As someone who wants to put up an open source project for collaboration (with willingness to maintain), it's hard to get the right eyes on it. Too often, it seems like no one cares. Conversely, as someone who wants to contribute to open source projects, it's a little hard to find small early ones where I feel like I can make a meaningful contribution (yes, the big ones maintained by well known companies are easy to find, but I much rather work on something put up by an individual programmer).

How do you cut through the noise on GitHub to find projects you actually want to work with (and want you too)?

Sounds like a good idea for a better unofficial open source search engine. I would be interested in contributing. It could do intelligent search of Github, Bitbucket etc for projects which might be interesting (programming languages, objectives etc).

You could crawl github and bitbucket, make a public dataset available. Others could slice and dice.

I think this is a great idea.

Having access to the data set itself could unlock a lot of new, creative ideas and applications beyond the expected ones. That's one great thing I've learned from the open source community.

It could not only be used for search, but some data analysis, and what not. I think it would be fairly beneficial for github to do it, actually. Easy to work with, up to date dataset -> interesting projects -> github brand value++.

I was thinking of something maybe even simpler - a place for people to post projects they're actually interested in maintaining as open source projects. A subreddit could be the mvp.

Or go the other way, and train a searching algorithm based on a data set of actively maintained open source projects on Github, with things like commit history/contributors/etc as features.

Using a subreddit, you basically have a marketing problem and no way to seed the marketplace. One would need to do both. You are trying to restart http://freecode.com/

How about a README badge?

