

Tell HN: Starting a search startup, please review our project - lkozma

The website is http://www.metahint.com<p>Currently we have a working prototype for searching some of our favorite blogs. We are soon sending out beta invites, and opening it up for general use.<p>Our idea is to make an embeddable search widget for websites, that generates suggestions from the content of the site. This came from the frustration that the search boxes on the vast majority of websites today do not suggest anything. While Google and Bing suggest previous queries, for small websites that might not be useful, so instead we extract phrases from the text and rank them by how well they describe a given page.<p>Let us know what you think, we are lkozma and pceelurd, lkozma has been on HN since the first days, when it was still SN.<p>Thanks !
======
chime
In a similar vein, I used Google's Ajax Search API and made this last year:
<http://chir.ag/projects/drop-search/> (Google even interviewed me for it:
[http://googledata.org/google-code/google-narratives-
series-c...](http://googledata.org/google-code/google-narratives-series-
chirag-mehta/) ).

'Drop Search' lets any site owner create a search box customized to their own
domains using just a few lines of JS code. There's an example of drop-search
at my own site in the top-right corner: <http://chir.ag> (search: cat). It
does not have the neat lightbox + readability view that metahint has. But it
does have excerpts from search results. Metahint users will most likely want
that.

Personally, I think metahint is a great execution of a good idea and that's
partly why I made Drop Search. It didn't become popular but then I created it
as a mini project for fun and not a startup with marketing goals. I wish you
the best of luck. I would suggest to you that you take a look at Google's
APIs. I don't think the search results are bad at all. I haven't tried Bing.
Creating your own search engine is kind of a big deal. Ask Gabriel Weinberg @
DDG.

~~~
lkozma
Very interesting, Chirag, and thanks for the suggestions. I was following your
projects in the past but somehow I missed drop-search until know. I think
there is room for more experimentation and new services, as the majority of
sites currently lack a usable search, so I'm happy to see others attacking the
same problem.

------
pjscott
I tried it out, and the bottom line is: this is really slick, and I want it on
my blog immediately. How soon are you sending out those beta invites?

Now to make myself useful:

1\. This pulls in about 50 kB of very useful JavaScript: jQuery, jQuery-ui,
TopUp, jCarousel Lite. Your own JS code is very light in comparison. All this
is minified and gzipped, which is good, but that's still a fair amount of
stuff. Do you have any plans to lighten this a bit? Perhaps use something like
Closure Compiler with advanced optimizations to get rid of functionality that
you don't use, and package it all into a single file?

2\. How are you going to check for updates to the blog's content, so you can
re-index it? The traditional way would be polling the RSS/Atom feed; the shiny
new way is to get realtime updates via PubSubHubbub where available, and fall
back to polling when that's not available. This can be simplified by using a
service like Superfeedr to handle the polling fallback for you and just
provide everything as PubSubHubbub feeds.

3\. It's probably too early to talk about tweaking your ranking algorithm
before you've started getting actual user data (you are storing complete logs,
right?), but I'm sure there's a lot of room for improving the results. Again,
this can happen after you've started to get more blogs and more user data.

4\. A lot of people have blogs on Blogger, and despite being run by Google,
their search box is pathetic. However, terrible as it is, it occupies some
prime screen real-estate. I would like to have some Blogger-specific
JavaScript I can drop in to replace that search box with yours.

5\. If I type in a search query and press Enter without selecting one of your
drop-down menu suggestions, nothing happens. It would be nice if pressing
Enter did something, even if it just sent the user to a Google search.

6\. Seriously, when are you sending out beta invites? ;-)

~~~
lkozma
Hi, great questions, I'll try to answer them:

1\. Yes, it is critical that we minimize the amount of JavaScript necessary.
Most of the libraries you observed are needed for the demo webpage: the
horizontal scrollbar, the popup box, etc. The widget, if embedded on a webpage
will need very little code, only the part that does autocomplete. We'll do our
best to make that minimal.

2\. Yes, we want to check for updates, and we will try to use all methods that
are available. So far we use our own http crawler, and where they are
available RSS/Atom feeds or sitemaps. It would be easiest if the sites would
notify us automatically of changes, but otherwise we can recrawl at a given
rate.

3\. Yes, we are still tweaking the algorithms for ranking and building
phrases. One of the reasons behind doing this demo is to get feedback on what
could be improved there.

4\. Good idea, we will try to do somthing like this. This is one of the
reasons for going for blogs as the first target: they have well-defined format
so we can do specific things for Blogger, Wordpress, etc.

5\. Yes, we want to do this. This demo showed just the suggestions, but we
will have traditional search as well.

6\. Soon, hopefully... We are aiming for a few weeks from now. Thanks for your
interest !

------
drats
Looks great! I can think of countless sites that need this.

Even if when you embed you take the colours of the site you are embedded in
you still need to improve your own site to make the pitch; it just looks too
plain jane at the moment. I'd recommend heading over to colour lovers for some
inspiration from their most popular palettes.
[http://www.colourlovers.com/palettes/most-loved/all-
time/met...](http://www.colourlovers.com/palettes/most-loved/all-time/meta)

You could build a database from those palettes, detect colours on a users page
and automatically select a complementary colour from the database for your
widget when you are presenting the demo (or three different colours to select
from as little boxes in the top right, also - perhaps provide an inverted
colour option for the readability widget). A test embeddable phase that works
from their last week of posts/RSS feed would also be something cheap/fast to
implement. So they can point it at their blog and then get a quick demo of the
search working on their last 20 posts in a handsome widget that already looks
like it belongs on their site.

Also rather than site owners, perhaps you could work towards the readability
widget thing you have going. So a user comes to you, plugs in all their sites
and then they can read cleanly with that widget and/or do a mass export to pdf
or their kindle. So bookmarks+readability+pdf/ebook = personal magazine
specifically for bulk reading/offline reading/distraction-free reading. I know
there are endless RSS readers out there, but how many are catering to ebook
readers with a pleasant search or even older types who like to print things
off. Your main strengths are in user interface and simplifying various
workflows, rather than search technology, so I'd play to those strengths.

~~~
netmau5
Just wanted to thank you for that link, one of those palettes was exactly what
I needed for my work today. I've used Kuler before but never seen
colourlovers. tyty

------
ryanwaggoner
I like it. How would integration work with my site? How would you make money
with this?

I'd also be curious to hear what you're doing behind the scenes. Also, what's
the general method for extraction the primary content of a page the way you do
for the lightbox preview?

Overall, it's very fast and the results seem relevant, plus you've identified
a niche where you can get a foothold, so I think you're on the right track :)

~~~
lkozma
Thanks for the comment.

We want to make the search widget customizable and in general unobtrusive, so
it should fit well on any site. Initially we target blogs mostly.

Behind the scenes we crawl the websites with our own crawler, we filter the
content and run our own algorithms for building and ranking expressions.

For the preview we use TopUp (lightbox clone) and we filter the page with
arc90's readability algorithm. If someone is interested we can write a blog
post with more details about the tools and libraries used.

For monetization, freemium would be the default route. What would you suggest
otherwise?

~~~
hardik988
I would love a blog post about how you guys went about doing this.

------
ch
Just wanted to throw my companies product out for a bit of shameless
promotion: www.picosearch.com.

We support suggestions (which drive an auto-complete feature) based on the
sites content, much like you guys.

I always welcome new competition into the search space, good luck on your
startup.

~~~
lkozma
Interesting, I was not aware of picosearch. Do you have an example link where
auto-complete is used with suggestions from website contents?

I agree that competition is good in this space, especially since most websites
don't have a usable search, let alone one with meaningful suggestions.

Thanks for your comment.

------
nostromo
I like it -- very useful.

A few questions:

* Why only blogs? I'd love to use it on other types of sites. (For example, maybe a wiki.)

* I like the keyboard support. (Search, hit down, press enter.) However, I'd like to be able to hit Esc to return to the list -- without using my mouse. It's pretty standard for Esc to close modal windows.

* When I click to see an entry full-screen, I'd rather it just open in the current window rather than a pop-up.

* If you don't mind, I'd love to hear a little about the backend. Is it Lucene?

~~~
pceelurd
* We said we'll start out with blogs, because that's where we miss such a feature dearly. In the long run, anything which contains plenty of text could be "metahinted".

* Very useful suggestion, thank you. It is implemented now: ESC should close the popup window.

* Few commenters have suggested that. Would you prefer to see the entry occupy the entire screen or fit in a text-area of some sorts?. Anyways, reworking the preview functionality is on our TODO list.

* We'd respectfully defer answering this question in ample details to an upcoming blog post. We appreciate your understanding in this matter. To give you a short answer, however: no, it's not Lucene.

Thank you for your comment!

------
revorad
Wow, it's really cool!

I searched for some obscure words just for fun (like "nirvana" on scott
aronson's blog). It came up with no suggestions. A google search revealed that
the term nirvana only occurred in comments. Are you not indexing comments?

Also, why can't I just search for any words of my choice? It looks like I have
to choose one of your suggestions.

Edit: Oh and it would be a lot nicer if the results are just displayed below
the search box instead of the animated lightbox, which just makes me wait
more.

~~~
lkozma
Currently we only index the blog posts and the demo shows the suggestion
feature only. We will probably add the option of searching for arbitrary
queries. When the widget will be integrated on websites, the result will take
you to the given page, instead of the lightbox used in the demo.

------
db42
Though I haven't fully explored it yet, I would like to give you a suggestion.
After a user enter some search keywords and press enter, the resulting blog(or
article) is shown on different window(and control no longer remains on the
main site page). It discourages a user to search more on your site as
everytime he would have to close that new window and return to your main page.
I think it would be better if you use space below search fields on the main
page itself for this purpose.

------
arethuza
Do site owners get some kind of analytics to see what people have been
searching for (and maybe to tune the search process)?

~~~
pceelurd
Yes, we've got a nifty set of statistics in the making.

~~~
arethuza
Cool - looks like a really nice product. Good luck!

------
nl
I found that pretty impressive.

I'm pretty sure that the Google AJAX search API bases suggestions on your site
though, and they have just launched the ability to edit suggestions. But I
think your thing compares quite well with Google AJAX API, assuming it's going
to be easy to setup.

------
ritonlajoie
<http://www.metahint.com>

~~~
arethuza
Doing a search for lisp in Paul Graham's essays (predictable, I know) gave a
list of results where the first three look identical (Lisp Lisp Code Lisp) -
clicking on any one of them gives a message saying "Looks like we couldn't
find the content.".

~~~
lkozma
Thanks, good catch, we fixed it. It was for the page:
<http://paulgraham.com/lisp.html>

We will have to watch out for such pages with minimal content.

~~~
arethuza
The dialog box that opens when you click on one of the suggestions pretty much
always seems to have a scrollbar - which is on the right.

The X icon to close the dialog box is on the left - meaning that if I have
been scrolling I have to move across to the left side to close it. Which is a
bit annoying...

------
iworkforthem
Any reasons why can't I search words like "the", "a", etc?

~~~
pceelurd
These words (and a few more) have been filtered out because they are far too
common in everyday English and statistically not [so] important. Had we left
them in, they would probably outweigh the meaningful suggestions.

~~~
kakaylor
Pretty cool site.

You might want to look into TF-IDF (term frequency-inverse document frequency)
weighting as an alternative to filtering for common English words.

<http://en.wikipedia.org/wiki/Tf%E2%80%93idf>

~~~
pceelurd
Thank you!

We are using both techniques (filtering + TF-IDF weighting), actually :)

------
nithyad
Your site isn't working now!

------
TheSOB88
Pretty cool. Seems to work awesomely. The animation when you preview a site is
a bit gaudy, though.

