Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Instantly search 2M recipes (typesense.org)
349 points by jabo 7 months ago | hide | past | favorite | 126 comments



Some quick context: I was inspired to build this by this HN post earlier today [1]. So thank you glorf for making the recipe dataset available.

Thought this would take me 1-2 hours to build, ended up taking about 6 hours - engineering estimates and all!

> Details about the Tech Stack:

The dataset has 2,231,142 recipes and is indexed on Typesense [2], an open source alternative to Algolia/ElasticSearch that a friend and I are working on.

The UI was built using the Typesense adapter for InstantSearch.js [3] and is a static site bundled using ParcelJS.

The app is hosted on S3, with CloudFront for a CDN.

The search backend is powered by a geo-distributed 3-node Typesense cluster running on Typesense Cloud [4], with nodes in Oregon, Frankfurt and Mumbai.

Here's the source code: https://github.com/typesense/showcase-recipe-search

[1] https://news.ycombinator.com/item?id=25356156

[2] https://github.com/typesense/typesense

[3] https://github.com/typesense/typesense-instantsearch-adapter

[4] https://cloud.typesense.org


Jason, sorry to ask here rather than read your GitHub docs, but how does Typesense fare against non-romance languages that can't be segmented by whitespace?


I'm guessing you meant to say logographic languages. We don't yet support tokenization for logographic languages (like Chinese, Japanese, etc) but it's on our medium-term radar: https://github.com/typesense/typesense/issues/86


Basically...but I didn't know to use that term! So thanks for teaching me. Then there's also languages like Thai that are not whitespace separated on the word level, but that use an alphabet. So...I meant more 'non-Latin' but I think that's not actually a tight category. It's actually quite difficult to come up with the right term. I guess I was trying to be too clever, the best term is probably "non-whitespace delimited languages". Thanks for your response, and awesome speed to index the dataset and have it up and running in the same day.

Could I ask you a few more questions? What was the dataset size? What was the size of your index? How long (and how much RAM) did it take to index the dataset and what machine (and how many cores) did you do it on?


> Then there's also languages like Thai that are not whitespace separated on the word level, but that use an alphabet.

I did not know that! Good to know.

> What was the dataset size?

2.2GB in size, with ~2.2M records

> What was the size of your index?

2.7GB

> How long (and how much RAM) did it take to index the dataset

It took about 8 minutes to index that data. Typesense stores the entire index in memory, so the index took 2.7GB in RAM

> What machine (and how many cores) did you do it on?

It's running on a 3-node cluster, with each node having 4vCPUs and 8GB of RAM. The nodes are distributed across data centers, so search requests are served by the closest node (like a CDN).


That's great, thank you for that info! Very impressive performance specs for your indexing.


> non-romance languages that can't be segmented by whitespace

That can't be right? Surely Greek, Russian, Turkish, etc are whitespace delimited?


Yeah I meant some concept like "non-Latin derived" or "non-Roman alphabet" languages but then there's Cyrillic, etc. I was pretty sure "non-Romance" sounded like that right term, but not totally sure. I looked it up after and yeah, it wasn't. Actually I have no deep idea of the terms in this and just grabbed the first term that came to me. I thought I did pretty well and I appreciate the learning experience!


Chinese and Japanese wouldn’t be.


You have a UI bug: dismissing the modal recipe popup isn’t entirely reliable, and the site can get stuck in a state that doesn’t allow user interaction. This even survives the back button.


Hmm interesting, I can't seem to replicate this issue. What browser are you using?


iOS 14’s Safari. It only happened once.


Very nicely done! Also, appreciate sharing due credit to dependent stories.


Thanks I was just looking for a cheap alternative for a search engine just today (Algolia is cool but very expensive if you need to index millions of records) - I will check out Typesense.


Type sense is looking great! I was going to use it on a side project. Not sure why I didn’t but it must have been missing something I needed. Been using meilisearch but I’ll definitely be checking out typesense again.

Huge fan of instant search results, well done!


Hi! Would typesense be good for a general web page search system (algolia-like), or it's designed for structured entities only (products, recipes...)

Why did you built it instead of using other open source engines? eg postgres text search


Typesense would indeed be good for a general web page search system, just like Algolia. In fact, even Algolia stores web page data as structured JSON entities.

re: why not just postgres text search, I'll post a more detailed response in the github issue you opened (thank you) for posterity: https://github.com/typesense/typesense/issues/167


Thanks :)


This is refreshingly fast! Definitely going to try typesense in my projects.


Did you use CloudFormation to crate the infra? If not, I'd love to hear some details on how you did this. Any API Gateway being used? Seems to be offline at the moment.


The front-end is a static site. I used terraform to setup the S3 bucket & Cloudfront.

The search backend is running on Typesense Cloud, which is point and click to provision.

This is my 1-line deployment command: https://github.com/typesense/showcase-recipe-search/blob/7b5...

That's it for the infra! No API Gateway.

Hmmm, seems to be up for me. Could you show me a console+browser screenshot of what you see?


We run similarly on CloudFormation. Interesting with the ‘aws s3 cp’ command. I started using ‘aws s3 sync —delete’ nowadays after having issues with pre cleanup required for ‘cp’.


In my case since the assets have fingerprinted filenames and index.html references them, if I delete old files with each deployment, then a user who has a cached version of index.html will see a broken page. So I just leave old asset files as is.


it boggles my mind that a two person team built Typesense...


This is awesome. If nothing else, the ability to view a recipe without having to scroll through a 10 page story about the first time the author saw an avocado makes it worth it.


Yeah true. I hate how simple recipe site have been bloated with a) trackers b) ads and c) long text to improve their SEO. Finding "just the recipe" has a great value for me nowadays.


Once upon a time, yummly was this


Have you considered the ability to filter out recipes that don't contain a particular ingredient?

Obvious use cases are food allergies and dislikes in general.

My specific use case is that I searched `guacamole` and got 2k that contained `salt` but only 1.8k that contained `avocados`. I want to see the recipes that don't use avocados.

EDIT: It appears that `avocados` and `avocado` are separate ingredients, as are `tomatoes` and `tomato`. I know pluralization rules are hard, particularly in English, but any chance of a cleanup pass for the... low hanging fruit?


As an avocado fan I have trouble grappling with the concept of guac made without them. Isn't that sorta like orange juice made from things that are not oranges? Feels almost oxymoronic. Good suggestion though.


When I say "I want to see the recipes without avocados" it's kind of like when you witness a car crash. You're not looking because you want to see someone crash, you're looking because it's horrifying. Same with avocado-less guacamole. The concept is horrifying so I need to know.

And yes, I would absolutely love to see the recipes for orange juice that don't include any oranges for the same reason.

Either way though, I think it's just because `avocado` and `avocados` are counted as different ingredients.


Orange flavouring, orange coloring, citric acid, sugar, and water.



Ah interesting use case with food allergies and exclusions. Added this to my todo list.

I see what you did there with low hanging fruit! I'll take a pass at de-duping plurals in ingredients. Good catch!


Not just allergies. I was just looking for biscotti recipes and there's basically two styles of biscotti: made with butter (the soft chewy kind) and made without (the crispy you'd better have a drink to dip it in kind). It's impossible to search for the latter without excluding butter.


It appears that you can do -<word_to_be_filtered>


>"My specific use case is that I searched `guacamole`"

Guacamole was my first search as well. Must be something in the air ...


Easy to make, few ingredients, and easy to imagine people screwing it up by putting all sorts of nonsense in it.

In other words, a great test of a recipe search.


Back when I worked on eHow used to find the most ridiculous pages to test with. There was one that was something like “how to make ice water”. Apparently enough people searched for it for it to be worth writing up.

Another good recipe I see butchered is quesadillas. Something with Mexican food and people messing up.


@Jabo: Great site. Some errors though, for example the "4 Ingredient Sauce for Roasted Lamb" says to use 12 cups of brown sugar. The source site has 1/2 cup (0.5), so I'm guessing it's a scraping issue. Wouldn't want to give someone diabetes with their lamb!



I'd be more interested if it could also automatically convert between voodoo and actual metric measurements.


This rules! SO FAST

Search 'vegan' and Colorful Guac displays

Ingredients 2 large ripe avocados 1 lime juice 2 garlic cloves, Minced 13 cup choppled scallion 13 cup choped red bell pepper 14 12 ounces diced tomatoes with jalapenos 14 cup snipped fresh cilantro 2 tablespoons Braggs liquid aminos


With all those scallions, and that massive amount of cilantro you won't be able to taste the avocado. This reads more like some sort of chutney.

/s


This is really good. I would love to integrate this as an extension into our mobile browser. Hope that’s ok.

https://insightbrowser.com/collections/cooking

Also I wanted to mention that the most important piece of metadata we’ve learnt about that users care about after ingredients is the website it’s sourced from. It’d be great to have that


I actually sourced the aggregate dataset from this other Show HN earlier today: https://news.ycombinator.com/item?id=25356156

I do include the source website in the search results. It's the little icon on the bottom right of the search result card and it's also linked from the modal that opens up if you click on "Read cooking directions".


Oh cool I think you may have just hit the API limit for favicons if you use one from the HN traffic

https://share.icloud.com/photos/0XKxqxHuzZV-7sO2wOx0y53YQ


That's actually the icon I intended to show! :) It's an icon of a ticket that gets printed in restaurant kitchens.


I see, that’s cool but I was referencing something different — That it was important for users to quickly recognize the publisher on the search result page itself. They develop more trust with some sources than others.

I wasn’t able to discern that. But if your goal is to showcase search capabilities to developers instead of a daily driver search engine it wouldn’t matter much.


As an SI unit user who's been looking up more recipes during lockdown, the whole cups, spoons, etc, units of measurements are really annoying. Some recipe sites have on the fly conversion, it would be nice to have here as well, but my first look didn't inspire much confidence that this site will go anywhere (Example search: pizza, top result: fruit pizza. Ohkayyy...)


I unfortunately did not find a good field to sort by in the original dataset from earlier today [1]. So it's just sorted by text match relevance scores, and then the order that they appear in the dataset.

I'm hoping they publish a popularity metric, which will fix the issues like the one you pointed out. Or, once I have sufficient data on popular searches from this site, I can append that metric to the dataset. Early stages, so please pardon the dust in the meantime!

re: SI units, I hear you. There's definitely scope for improvement! :)

[1] https://news.ycombinator.com/item?id=25356156


Suggestion: Add open search support, so browsers prompt me to add it as a search backend. I added it to Firefox with the "eat" keyword so typing "eat butter chicken" gets me straight to the results page.


How do I tell it I really want to search for just "harissa" and not "Harrison", "Harriet", "Harris", "harrissee", etc.? I tried putting it in quotes, but no such luck.


This is not possible at the moment, but it's on my todo list.


This was my first search too lol! I just love harissa :)


It seems that the index has dropped. All searches are getting 503 as a result right now: "Not Ready".


I've been looking for an Algolia-style alternative (in terms of result quality/fuzziness) and this is great. Definitely going to use this on a project soon.


Wait a minute... What about copyright? Like I would love to have a blog where I can just copy and paste my favorite recipes, and add a few notes myself. But I don't do that because it seems like plagiarism.

Or another option is to use this site, and then use some kind of 1-5 star rating. And then just see my favorites without all the other bs that food sites show you.


I got the source dataset from this Show HN earlier today: https://news.ycombinator.com/item?id=25356156

Here's a comment thread that talks about copyright: https://news.ycombinator.com/item?id=25358813


IANAL, but as I understand it the ingredient list is not copyright able, though the description may contain sufficient creative content to qualify. There’s even an FAQ covering it. https://www.copyright.gov/help/faq/faq-protect.html


From your linked page:

> However, where a recipe or formula is accompanied by substantial literary expression in the form of an explanation or directions, or when there is a collection of recipes as in a cookbook, there may be a basis for copyright protection.

Emphasis mine. A website full of recipes certainly seems to be a collection in the sense protecting a cookbook.


Seriously. I did a similar massive recipe scraping project a few years ago but never would have redistributed it because it’s available for me to use, but not available for me to redistribute.

For a site that gets so opinionated about GPLv3 vs LGPL vs the rest, we really seem to have no qualms about licenses when it comes to actually using other people’s things.


The authors can file a DMCA. If they don't, there's no issue. If they do, the law will sort it out.

Though it sounds like a flippant response, I've spent a lot of time trying to decide how to feel about this. (I released 194k plaintext books as the books3 dataset.)


> If they don’t there’s no issue.

This doesn’t seem right, like saying if you shoplift and nobody comes after you, then there’s no issue.


Copyrights apply to the literal text of a recipe, but not to a recipe itself. Recipes are not copyrightable last time I checked, just the literal text.


A collection of recipes is copyrightable though. I think recipe vs collection of recipes is one of the canonical examples.


I’d love if this had a recommendation engine like punchfork[1] [had[2]] has.

That said the search is nicer than Yummly so I’ll have to give it a try.

[1]https://www.punchfork.com/

[2] Nice to see they are back to being independent after Pinterest bought them out sometime back.


Yeah the search could use some improvements. The firs result for "pizza" is "Fruit Cookie Pizza" which I think almost no-one searching for pizza will want.


I unfortunately did not find a good field to sort by in the original dataset from earlier today [1]. So it's just sorted by text match relevance scores, and then the order that they appear in the dataset.

I'm hoping they publish a popularity metric, which will fix the issues like the one you pointed out. Or, once I have sufficient data on popular searches from this site, I can append that metric to the dataset.

[1] https://news.ycombinator.com/item?id=25356156


@Jabo although search is really great you instantly know the quality of recipes is "aggregated or marketing" junk when one recipe contains quantities in millilitres, mystical cups and a oven temp in unknown centigrade scale

so heres a idea for you to automatically rate those things by just investigating unknowns and either help them to be converted to multiple centigrade scales, and single/multiple comparable metrics you've achieved the ultimate

you are missing direct links to search results, to single search result and it's hell of a task to find a link to click that opens the little modal with recipe information (click on square should be enough)


Cups, spoons, etc, are convertible to millilitres.

Most recipes call for pre-heated over at 180ºC unless states otherwise.

Cooking does not observe reproducible builds. You always, always needs to taste, poke, look. If your flour is of a different type of grain or not as fine, if you use different varieties of vegetables, or if your kitchen is a few degrees warmer or cooler, you WILL get different results.

So go ahead and use any mystical cup you want. Ingredient proportions are what matters. If you fail, write down what went wrong so you know better next time.


I've been cooking and programming for a long time now and wholeheartedly agree with everything you said except the default oven temp thing. Baking is a cargo cult of chemistry and I'd say most folks are well advised to follow those recipes exactly.


>> quantities in millilitres, mystical cups and a oven temp in unknown centigrade scale

As somebody who likes to cook and has not the slightest idea of what Farnheits, Ozs, Yards, Feet, Gallons, are, nor anything about how Freedom Unit Related & Co. converts to simple decimal metric measures, you are speaking Klingon.


This is really great! I will use that going forward :)

Two things to consider: a) put it onto a real domain. it's a great product, it deserves a domain :)

b) make the links clickable. They have the same color as the links below and I tried to click them with no success.


Any ideas for a domain name? You know what they say about naming things... :)

Just made the links clickable!


Great idea and works quite well.

However, it's generally frowned upon to show the actual directions for a recipe. Which is why you see most recipe aggregators only show ingredients and link directly to the source to get the actual directions


Very, very impressive.

One small nitpick: The result ulrs don't behave like regular links.

Can you open the recipe, preferably the original url, in a new tab when a middle click happens on a 'Read Cooking Directions' link?


Yup, updated it!


My team used typesense for a recent project. 16 million records combined with https://www.algolia.com/doc/guides/building-search-ui/what-i.... Really fast and worth the investment. It doesn't have all the enterprise features of ES or Solr but for basic search features it's great.


What features are you missing?


Very fast and interesting, but I couldn't figure out how to search for ingredients like lime leaf. If I search for "salmon lime leaf" in the title section, none of the top hits have lime leaf in them. If I search in the ingredients section, I get variations on limes (juice, zest, etc.) but no lime leaf. Curiously, if I just search for "lime leaf" in the title (without salmon) I get things that have lime leaf in them.


The top bar only searches within recipe titles. So if a recipe title has those keywords, it will show up in the results. The best way to search for ingredients would be in the sidebar.

In this particular example, the issue is that the source dataset [1] has "lime leaves" in plural, so if you click on "show more" in the sidebar after searching for "lime", you should see it in the list. I'm going to work on normalizing singular / plural ingredients as much as I can as part of this: https://news.ycombinator.com/item?id=25368628

[1] https://news.ycombinator.com/item?id=25356156


Very nice.

One issue I noticed: I looked up Chana Masala and the first recipe (Vegan Chana Masala) calls for “12 tsp salt” but the source calls for “1/2 tsp salt” :)


This is unfortunately an issue with the source dataset that I got from here[1].

I've now added a prominent warning to the UI, to check with the original site if the ingredient measurements seem off. Don't want any ruined dinners on my hands!

[1] https://news.ycombinator.com/item?id=25356156


Fast enough and good idea to showcase Typesense. Well done ! On a UX perspective, I would add a link from the recipe title to the recipe page.


Thank you!

re: UX, I was trying to prevent accidental clicks to an external (ad-ridden) site from the search result cards. The little icon on the bottom right takes you to the source though.


Really love this, but I do see that there are some formatting issues on some recipes. I found that when I was looking into recipe-scraping solutions there were only certain compatible sites since they were formatted so differently, and often they would have to make updates for sites that their their formatting.

Impressive all the same! Keen to dive into the source and learning a bit.


Very fast and very nice.

One thing that can be improved is the way the history is populated. Right now, every time a search is performed, a new entry is added to history. I was thinking how to spell Cauliflower as I was typing, now I have to press the back button for each character I typed. It will make more sense to only add a history entry when the input is blurred - onblur()


Ah yes, good catch. I'll fix that history issue.

Btw, the search backend has typo tolerance enabled. So you should be able to get away with typos as bad as "cliflowre" and still get the right results: https://recipe-search.typesense.org/?r%5Bquery%5D=cliflowre


Congrats! This is a great start.

It's impressive how fast it is, even if this mostly says something about the state of the web today.

2M recipes is a huge database, and without any indication of the quality of a recipe this makes it really hard to tell which ones are worth trying. I hope you can add some sort of rating system in the future.


I'd make the title of the recipe the link to the summary, rather than the "Read Cooking Directions" text, which is sort of confusingly worded. Do you "cook" salad?

Also, as others have said, you're going to get sued if the originator of the recipe isn't linked prominently.


your import seems to have omitted slashes from 1/2 measurements, e.g.

6 chicken breasts, cut into 2 inch cubes 2 eggs 2 cups breadcrumbs 12 cup olive oil 12 cup white wine 12 lemon, juice of 3 garlic cloves, chopped 1 teaspoon dried oregano 12 teaspoon dried parsley


Uh oh! I hope I didn’t ruin anyone’s evening. Fixing it.


Hmmm, I spot checked a couple of records and I do see the / in measurements: https://imgur.com/IhBMBBD

Could you give an example record that doesn't have the right measurements? I can then verify with the source dataset, to see if it's an indexing bug.


Shouldn't that be 1/2 of a green pepper and 1/4 cup blue cheese?


Hmm, I'd think so too. Looks like the source dataset [1] has it wrong :(

This is how it shows up (CSV format):

2230984,Buffalo Chicken Pizza!,"[""1 (9 3/4 ounce) canswanson premium white chunk chicken breast in water, drained"", ""2 tablespoons butter, melted"", ""1 (10 ounce) packageprepared thin pizza crust (12-inch)"", ""12 of a green pepper, thinly sliced"", ""14 cup crumbled blue cheese""]","[""Heat the oven to 425F Stir the chicken, hot sauce and butter in a medium bowl."", ""Spread the chicken mixture on the pizza crust to within 1/2-inch of the edge."", ""Top with the pepper and cheese."", ""Bake for 10 minutes or until the chicken mixture is hot and bubbling.""]",www.food.com/recipe/buffalo-chicken-pizza-394731,Recipes1M,"[""chicken"", ""butter"", ""crust"", ""green pepper"", ""blue cheese""]"

I'll see if I can open a PR with the fix.

[1] https://news.ycombinator.com/item?id=25356156


I see in the original posts github link that they have a scrubbed list as well[1]. I am not sure when that was added but it explains the 12 1/2 thing exactly.

[1] https://github.com/Glorf/recipenlg#where-is-the-dataset


It is amazing how fast the pages render. Amazing work, even if it took 6 instead of 1 hour!!


Thank you!


Love the concept but please could you show the domain name of the source websites? I use that as a filter for quality and it's time-consuming hovering over the source icon to see the linked URL in the status bar for each recipe.


This looks great!

A small bug: Filtering by ingredient "Sugar" gives results like "Sugar free" and "no-sugar-added".

And a small UX request: Have the recipe not just be a JS modal, I want to be able to open them in a new tab.


I don’t count that as bug but I see what you mean۔ The search engine only has so much context about the content.


This is great! I thought that I could filter on ingredients with an intersection of those ingredients and instead got the union. I would love if this could toggle between those and possibly the exclusion as well.


Added to my todo list, to toggle between AND / OR when filtering by ingredients.


This is really neat! My challenge with random recipes from the internet is they are all 4 or 5 stars, but sometimes they are kind of gross :/

Can anyone recommend a curated recipe list / website even if it is pay.


Very nice! I will definitely be using.

Quick question, are the recipes sorted by anything?


I unfortunately did not find a good field to sort by in the original dataset. So it's just sorted by text match relevance scores, then the order that they appear in the dataset.


I too will be using! Didn't really expect to see much when I searched for "Sri Lankan". Boy was I pleasantly surprised!


This is cool! It seems like a great way to find recipes altered for special diets like vegan, vegetarian, gluten free, keto, etc.


I searched for “cubeb” to find recipes which use this spice. It was changed to “cubed” with no way to revert.


Nice! Is there a way to get a link to a specific entry rather than just the full list of query results?


If you click on the little icon on the bottom right of each search result card, that should take you to the source website from where the recipe is from.


Ah, it does, but I'd like to second the suggestion to put the domain name at the bottom. All the cards I looked at looked like they had room.

First, it makes a big difference to me if it's from a site I know and trust. Second, I think that clear attribution is a good thing, even if you may be legally in the clear for copyright.


chrome wouldn't load it without a valid ssl cert. I'm very interested in seeing this though. Direct message me if you want a bit of free help with the infra.

in the meantime, is the recipe dataset open sourced and available somewhere, for other builders?


Hmmm, I do see a valid cert on my end: https://imgur.com/a/xgYwGFc

Could you show me a screenshot of what cert gets loaded for you? I did switch between infra providers in the last few hours. So I wonder if you're hitting the older infra due to stale DNS.

I got the dataset from this other Show HN post from earlier today: https://news.ycombinator.com/item?id=25356156


It's working for me now. I think my company's internet vpn was blocking it, bad on me. Thanks for the info and site. It's super cool!


Seeing as this is likely an open source algolia, can this be used for searching logs?


If it can fit in memory, yes.

But TBH, logs are a unique beast in that searches are usually temporal and only a tiny portion of the dataset is typically queried. So it will be wasteful to store the entire index in memory 24x7, which is what Typesense (and Algolia) do. ElasticSearch on the other hand has mastered searching log datasets by storing the primary index on disk, so I'd recommend using ES for log data, instead of Algolia / Typesense. The tradeoff with ES is performance, since the ES index needs to be fetched from disk.

For any other structured dataset (like the dataset in this app), Typesense would be a good fit.


Thanks for that detailed answer. That really helped me understand the tradeoffs which in hindsight seem obvious :-)

We currently use Algolia for 3 public sites and will further explore TypeSense for something like this for a site that hosts healthcare patient info.

And will stick to our original plan of setting up elasticsearch for logs.


Wow it's super fast.

Someone please make this over web archive data, for web search.


If you have a structured dataset, I can build an instant search experience around it!


The page doesn't load for me (Location: Chennai, India)


I can't access the bag and it's stuck in a loop of "Warning Potential Security Risk"

Error code: SEC_ERROR_UNKNOWN_ISSUER

Peer’s Certificate issuer is not recognized.

HTTP Strict Transport Security: false

HTTP Public Key Pinning: false


Hmmm, I do see a valid cert on my end: https://imgur.com/a/xgYwGFc

Could you show me a screenshot of what cert gets loaded for you? I did switch between infra providers earlier today. So I wonder if you're hitting the older infra due to stale DNS.


this is quite amazing and also inspiring! thanks for doing this and showcasing typesense.


Inability to open selected recipes in a new tab (e.g., to read later) is really annoying


Is not working




Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: