Show HN: Hello – A conversational search engine powered by transformers (sayhello.so)
28 points by rushingcreek 4 days ago | hide | past | favorite | 25 comments

Maker here. Hello is part of a research project I'm working on at the University of Texas at Austin. The goal is to reimagine search as a conversation you're having with your best friend, where your questions are answered directly and in detail and you can ask follow-ups in the context of your conversation. There are still some rough edges and our independent index of ~350 million web pages is not yet large enough to answer every question, but we'll give it our best shot!

Looks like it's getting the HN hug of death. The progress bar goes to 100% and just freezes there. What's the performance bottleneck? Clearly it's not the CDN/hosting since the site is loading fine.

Yep, the backend which runs the ML models is currently experiencing issues from the HN onslaught. Thanks for your patience, should be fixed soon.

Same here, the progress bar animation seems to be a dummy. The actual search result is eventually returned over a `text/event-stream`... which seems to be struggling currently.

What transformer models are you using? How are you training them? Are you using them for semantic search, or only for “transforming“ the output (from a found snippet of text to an actual answer sentence)? On what hardware are you running this? With such a large index, and using transformers, I would guess that would be quite expensive, wouldn’t it (GPU, lots of RAM)?

I tried a couple of the prompts from the WebGPT paper: "How do trains turn?", "How do neural networks work?", "Why was the Suez Canal blocked in March 2021?" and got a 'No Relevant Answer' response on each. Are there no search APIs you could integrate instead of maintaining your own index? Either way this is a very cool project and I'm excited to see this space progress.

It should be able to answer all of those. There was a backend issue which has since been fixed.

I also had many of those “where do I find food?” Nothing...

This is a neat idea. Would love to run some queries against this with a larger index. Here's one instance of the question just going over the transformer's head, I guess:

My question:

> how does google work

The response:

> A decentralized autonomous organization (DAO) is a software running on a distributed ledger that offers users a built-in model for the collective management of its code. To become a member of a DAO, users need to first join the DAO by buying its currency. Unlike traditional organizations, there is no hierarchy in DAOs. One of the key features of DAOS is the internal capital that is used to incentivize these actors and ensure that the organization runs smoothly.

There was a race condition leak with users sessions that has now been fixed. It now gives a relevant answer for this question.

Oh cool! Never seen anything quite like this before, especially the concept of two search fields. Did you have any inspiration for creating this, OP?

Of course, the major problem with this (I just tested it) seems to be performance. Have you performed any performance optimisations / have anything in mind?

I was always curious about what search would be like if it was more conversational, so became interested in research about long-form question answering.

As for performance, this is tricky because every search involves running multiple large transformers (3 billion+ parameters). I'm working on optimizing the GPU infrastructure (using TensorRT and DeepSpeed Inference) to make it faster. There's a lot of work to be done :))

> poop

> Poop is a term used to refer to a person's feces. It is also a noun describing someone in their life. Poop poop is the opposite of the word pee pee, a word to describe one ' s penis or vagina.

Uh. Hm. I’m… not sure what expected, but I didn’t expect that. Especially the middle part?

“Who was smarter, Neanderthals or humans?” (something like this, was suggested)

Reply: Relevant information that didn’t answer the question.

Follow up: “what language did neanderthals speak”

Reply: Happiness is a state of well-being, characterized by a sense of satisfaction and enjoyment.

Not a very interesting conversation, I’m afraid :)

Slightly related, but could I make something similar using Haystack[0] and feeding it with a large index?


Haystack could be a good starting point, but I evaluated it at the beginning and did not suit the project requirements. It's fine for managing the document store (I use Elasticsearch) but there needs to be article reranking/relevant info extraction logic beyond what Haystack provides.

Is it possible its using context from other users sessions, instead of just your own? It seemed to be answering questions unrelated to the "conversation" I was having.

There was a race condition leak with users sessions that has now been fixed.

I typed in "hello" and it replied that it only accepted queries in english. (Maybe because the backend had crashed.)

I tried Immanuel Kant and Søren Kierkegaard. But yes search is extremely difficult.

What's the source/breakdown of the 350 million pages? Thanks!

Well… It's almost exactly like a chat with friends. ;-)

> how many people live on the moon > The Moon is the only celestial body in the Solar System that has been inhabited by humans since the first landing of Apollo 11 on 20 July 1969. The Moon has a surface area of about 220 km (133 sq mi) and a volume of about 1.7 km (0.9 mi). The Moon's surface is covered by a thin layer of ice, and the atmosphere is thin and thin.

> https://www.livescience.com/how-many-humans-could-live-moon.... > https://doorcountypulse.com/mooning-the-lunar-landing/


> why do reptilians rule the world > The Reptilians are a group of beings that have been influencing humanity for a very long time. They have been manipulating humans for many centuries. They are primarily a reptilian species, but they also have a human-like form. They use the human form as a vehicle to communicate with humans.

> https://humansarefree.com/2015/02/reptilians-why-are-they-ob... > https://discover.hubpages.com/politics/The-Reptilian-Enslave...


> why is faster than light travel so easy > In physics, the speed of light is the speed at which light travels through a vacuum. In a non-vacuum environment, light can travel faster than light, but only if the medium is non-relativistic. In this case, the medium would be a spherical object, such as a cube, a ball, or a particle. In the case of a photon, the photon would be traveling through scalar energy, which is a form of electromagnetic radiation.

> https://physics.stackexchange.com/questions/459946/can-light... > https://physics.stackexchange.com/questions/571458/if-a-phot...


> why ai is dumb > Artificial Intelligence (AI) is a branch of computer science that aims to make computers smarter. It is based on the idea that computers can learn from their experiences, and that they can be programmed to be more intelligent than humans.

> https://www.businessinsider.com.au/googles-artificial-intell... https://bigthink.com/technology-innovation/the-dumb-watch-th...


> how man software engineers does it take to change a light bulb > Approximately 8000 engineers are needed to change a light bulb.

> https://mitadmissions.org/blogs/entry/how_many_engineers_doe... > https://form.jotform.com/210328454077151


Asked "what is italian cube clan" (my obscure gaming clan certainly unindexed by hello)

The Italian Cube Clan is a mafia group based in Calabria, Italy. The group is composed of members of the Farao and Marincola families, who are alleged to have muscled into the wine, food, garbage, and funeral services trades in both countries.

https://diablo-archive.fandom.com/wiki/Mage_Clans https://www.euronews.com/2018/01/09/german-italian-police-sm...

The reason for LOLing is that two surviving members of the clan have consistent family ties with Calabria, even if the fact most likely was not on the web anywhere (now it is).

So basically the engine connects dots very well but the index needs more care on the semantic side. Congrats anyway!

