Hacker News new | past | comments | ask | show | jobs | submit login

Project: Hosted & On-Prem fast full-text search with faceting, filtering, multiple ranking algorithms and plenty of other features.

Not yet ready for launch but built a simple demo trying to get into startupschool ( failed unfortunately :( ), which lets you search every hackernews post while letting you filter based on domain / user / story type.

http://searchhn.com




Thank you all.. I am finally getting some search requests. I applied for startupschool and was desperately looking at the logs everyday for someone to try it out :)

With 50K plus amazing companies that applied, it is very difficult to stand out :( Slowly building a team with some of the best engineers that I had the pleasure of working with to take this to the next level. We badly wanted to get into the startupschool to help guide us and get us to the next level. Wish we were part of the program, but glad everybody gets to view the lectures :)


Cool project. Could you briefly talk about

* the backend you use and how it will scale to sites with large amounts of data across servers

* can third party sites integrate your search service?

* How is it different from eg- Algolia

Good luck with the project!


Thank you !

Backend is custom built written in C and assembly. Supports sharding and replication which is rack aware and data-center aware.

> can third party sites integrate your search service?

Yes of course.. that is the end goal.

Algolia is awesome.. but you end up paying a lot based on how many ways you sort / rank data. This operates with an on-the-fly ranking model and rank on any field in any direction. Also different ranking algorithms, extensibility with Lua and a lot more when I officially do a showhn


You tried any open-source ones in c/rust ? Are you doing anything differently/better (what/how) ? What are you using for replication/sharding ? Possibility to split-shards ? What are you using for server-backend-framework (ex seastar) ? Any libraries etc that you're using (i'm interested) ?

You have to write a really long blog post on why you've chosen this way.


Yes.. this will take a very long blog post. This started many months ago as a project to learn 'golang' and as a way to index my everygrowing collection of music / movies / documents / subtitles / lyrics and everything on my servers.

Got hooked into it and became obsessed with speed and rewrote everything in 'C'. Replication is based on 'Raft', actually the multi raft variant proposed by the amazing folks at Cockroach (https://www.cockroachlabs.com/blog/scaling-raft/)

It does not use a backend framework. It is a simple http/https server (epoll + multi-threaded) which talks json. I use Jansson for json and utf8proc for unicode handling. Index is custom built.

I have been working on low powered distributed systems for over 10yrs, which certainly helped. Will definitely let you know when I get that blog post written :)


This is amazing. Are you also going to implement Soundex?


Thanks.. yes that is something I'm planning. Keeping the engine super flexible. Currently does TF-IDF, Okapi BM25 or an Algolia like tie breaking algorithm.


really cool! is there a way to search by domain?


Thank you.. Yeah the current UI sucks Im sorry :(.. just search for a post that you know is in your domain and choose the domain checkbox on the right. You can then change the search text or any other filters after that.

Will probably fix the UI this weekend and do a proper 'Show HN' with more options, charts and analytics.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: