Impressive! Really fast, full featured code search across a huge corpus.
1. How did you build the index? Did you use a GitHub dump of some sort? How often do you refresh it?
2. Is it Elasticsearch or similar or a completely custom engine?
3. What kind of RAM/CPU are you using to power it?
4. Any plans to open source the code or commercialize the technology?
I could absolutely imagine paying for a private code search engine like this to run against a large internal company codebase spread across many repositories.
Thanks! It's built on top of Solr. It fetches the repos from GitHub - it should pick up any updates to repos within a few days. It's running on a couple servers with 20 cores each, which is not really enough for the traffic it's getting right now.
I'd be curious how you built the step from regex to ElasticSearch. My guess would be an n-gram (3-gram) index in ElasticSearch and then translating the regexes to that, but just curious if you built that custom or used something off-the-shelf. Love the site!
> I'd be curious how you built the step from regex to ElasticSearch. My guess would be an n-gram (3-gram) index in ElasticSearch and then translating the regexes to that, but just curious if you built that custom or used something off-the-shelf. Love the site!
I'm pretty sure Elasticsearch supports regex search, it's just that it's horrendously slow and can blow up the system.
1. How did you build the index? Did you use a GitHub dump of some sort? How often do you refresh it?
2. Is it Elasticsearch or similar or a completely custom engine?
3. What kind of RAM/CPU are you using to power it?
4. Any plans to open source the code or commercialize the technology?
I could absolutely imagine paying for a private code search engine like this to run against a large internal company codebase spread across many repositories.