Show HN: Travel search engine calculating results in single digit ms

huhtenberg · on Sept 1, 2023

For a date in October in Paris it returned a single hotel option with a price of $20860361.04.

For another city it also showed a single option, but priced at $0.00.

That's not very good. It was very fast though.

mgl · on Sept 1, 2023

Thanks, and this is because we use real world hotel information but with synthetic prices.

However, the price and availability are all calculated in real-time, and the base price of accommodation components can be modified in real-time as well, what is rather uncommon in OTA search engines.

huhtenberg · on Sept 1, 2023

I'm not sure what to make of your "real world hotel information" statement when your search engine clearly thinks there's just one hotel in all of Paris.

mgl · on Sept 1, 2023

Precisely speaking: One hotel is available what means the search engine found other fully booked

huhtenberg · on Sept 1, 2023

That's hardly based on "real world data" then because it's completely unrealistic.

thanzex · on Sept 1, 2023

As they state, the demo only contains fake pricing data, presumably random-generated

mgl · on Sept 1, 2023

This is correct, thanks.

solardev · on Sept 1, 2023

The synthetic data makes it really hard to understand what you've done here. Is it fast because of some revolutionary tech or is it just fast because it's broken and thinks Vegas doesn't exist and there's only one hotel in Paris?

If you're going to use fake bookings, how about having a real feed of bot agents booking hundreds of rooms a second across the world and having the index reflect those in real time? As it is, I can't even find a real city with bookings available. Hard to know what the demo is showing when I can't do the basic thing it wants to show off...

mgl · on Sept 1, 2023

Thanks solardev, this is probably something we need to fix

potamic · on Sept 1, 2023

This appears to be a demo and not searching real data? If so, the title is a bit misleading.

Also, what makes this interesting from a tech point of view? I presume conventional travel sites are slow not because of search but because of a bunch of other things they do like personalization, recommendations, promotions, user content etc.

mgl · on Sept 1, 2023

Thank you for your comment.

Travel search sites are slow due to the explosion of the search space due to increasing number of search parameters (dimensions) a modern search engine with personalized experience should provide.

Many large travel operators and hubs still use pre-computation of search results using expensive RDBMS clusters what comes with two major problems:

a) you are almost always a bit out of sync, so any price and availability confirmation requires multiple real-time lookups,

b) more importantly: your are bound to supporting a limited number of search options, and these options have to be static whereas nowadays more and more people would like to travel to "a nice location where the forecasted sea temperature next weekend is between this and that".

The point of this demo is to show that you can solve these problems with the right software engineering and cheap hardware.

tyingq · on Sept 1, 2023

Seems related to your case study here: https://stratoflow.com/case_studies/highly-scalable-travel-s...

Basically, you worked with a hotel aggregator that happened to do direct SQL queries for searches. Then you migrated them to this pattern:

"decided to extract the availability search into a separate cache layer based on in-memory data grid (IMDG) platform. This new solution holds in memory a complete set of data required to return a hotel availability information. We also implemented a custom, fast data loading mechanism to populate the memory grid from database as well as a message queue pipeline supporting incremental intraday data updates."

So, putting something like Apache Ignite in place, with intraday updates.

That's an improvement for the client, for sure. But, most hotel aggregators aren't doing live SQL queries against a single database. You happened to find a client that was at the tail end of figuring out that wouldn't scale for them. Any of them with serious volume is doing something similar where they keep the main relational database for data entry and updates. But the live search is done with ElasticSearch, Redis, Geode/Gemfire, Cassandra, plain old read replicas or some other tech that's better suited for the complex availability queries. And some mechanism for intraday updates.

mgl · on Sept 1, 2023

Hi tyingq, thank you for your comment.

Your deduction goes a bit too far - yes, the technology demo in this post is in the same space as that case study, but no - the implementation is completely different.

Many firms replace their in-database processing with in-memory-like stores but it would be very difficult to achieve similarly low latency and still provide real-time updates with the solutions that you mentioned like Redis, ElasticSearch, etc. due to a number of the internal design features of these data stores.

The most fundamental issues are: a) inefficient memory allocation model hurting CPU cache usage, b) implementation model based on painful data serialization/deserialization transformations wasting both CPU and memory resources.

This is why we build and share this demo which enables you to achieve a 20x lower latency than other available options.

You may find a a bit more what's behind the demo in the Technology section here: https://ultrafastsearchengine.com/#/details

tyingq · on Sept 1, 2023

> You may find a a bit more what's behind the demo...

That only says "the engine is a Java-based proprietary solution which was about 20x faster than comparable implementations in Coherence, Ignite or Hazelcast. (Yes, we tested it.)".

So, comparable to those. Some sort of Java friendly in-memory cache. Perhaps, for this specific kind of data (dates/location/room-type/price/availability), you've figured out an optimal storage/retrieval layout that's fast. That seems plausible. What doesn't seem plausible is that your cache is generally 20x faster than every other cache in the space.

mgl · on Sept 1, 2023

The combination of storage/retrieval/calculation, executed close to the actual data, is what gives 20x advantage.

CPU cache is hard-to-believe fast, see:

https://stratoflow.com/latency-numbers-every-java-developer-...

ulfw · on Sept 1, 2023

I can make anything fast if I don't care about up-to-date prices. Sorry but meta search is slow for a reason. It takes a while to go from reservation systems to OTA inventory to meta results. And there's a cost involved to it too. So this is a demo you'd have to explain more.

mgl · on Sept 1, 2023

Thanks for this comment. If we think about a simple metasearch engine which just redirects your search to other search engines this is true, there is way less to optimize there.

In reality, if you are a provider of travel bookings or a travel operator, any final calculation of price and availability is a result of a multiple conditional sub-calculations driven by actual contractual agreements between underlying parties and making it fast along with multiparameter search (where parameters may also change their values in real-time). And to make it fast is non trivial.

kunley · on Sept 1, 2023

Hmm. Not sure about the point.

It doesn't search the real data, right? (I checked). It happens to have a predefined set of data. So, how do we know if the speed comes from genuine qualities of the tech, not from a simple fact that the data set is small?

mgl · on Sept 1, 2023

We use real world hotel information but with synthetic prices, so the structure and distribution of data is spot-on as well as the underlying pricing/availability calculations happening in real-time.

You may find this summary interesting: https://ultrafastsearchengine.com/#/details

deepspace · on Sept 1, 2023

No results for Toronto. It was quick to return no results, though. Fast != Good.

mgl · on Sept 1, 2023

Yes, it's just a demo site. We use real world hotel information but with synthetic prices and availability.

gardenhedge · on Sept 2, 2023

Title should say "demo". In your replies here you say that you're using synthetic prices. Why not implement upper and lower bounds? It really ruins the demo.

chfritz · on Sept 1, 2023

Cool! Any plans to make this also work for flights?

umeshunni · on Sept 1, 2023

Flight Search is a much harder problem: See http://www.demarcken.org/carl/papers/ITA-software-travel-com...