Thought this would take me 1-2 hours to build, ended up taking about 6 hours - engineering estimates and all!
> Details about the Tech Stack:
The dataset has 2,231,142 recipes and is indexed on Typesense , an open source alternative to Algolia/ElasticSearch that a friend and I are working on.
The UI was built using the Typesense adapter for InstantSearch.js  and is a static site bundled using ParcelJS.
The app is hosted on S3, with CloudFront for a CDN.
The search backend is powered by a geo-distributed 3-node Typesense cluster running on Typesense Cloud , with nodes in Oregon, Frankfurt and Mumbai.
Here's the source code: https://github.com/typesense/showcase-recipe-search
Could I ask you a few more questions? What was the dataset size? What was the size of your index? How long (and how much RAM) did it take to index the dataset and what machine (and how many cores) did you do it on?
I did not know that! Good to know.
> What was the dataset size?
2.2GB in size, with ~2.2M records
> What was the size of your index?
> How long (and how much RAM) did it take to index the dataset
It took about 8 minutes to index that data. Typesense stores the entire index in memory, so the index took 2.7GB in RAM
> What machine (and how many cores) did you do it on?
It's running on a 3-node cluster, with each node having 4vCPUs and 8GB of RAM. The nodes are distributed across data centers, so search requests are served by the closest node (like a CDN).
That can't be right? Surely Greek, Russian, Turkish, etc are whitespace delimited?
Huge fan of instant search results, well done!
Why did you built it instead of using other open source engines? eg postgres text search
re: why not just postgres text search, I'll post a more detailed response in the github issue you opened (thank you) for posterity: https://github.com/typesense/typesense/issues/167
The search backend is running on Typesense Cloud, which is point and click to provision.
This is my 1-line deployment command: https://github.com/typesense/showcase-recipe-search/blob/7b5...
That's it for the infra! No API Gateway.
Hmmm, seems to be up for me. Could you show me a console+browser screenshot of what you see?
Obvious use cases are food allergies and dislikes in general.
My specific use case is that I searched `guacamole` and got 2k that contained `salt` but only 1.8k that contained `avocados`. I want to see the recipes that don't use avocados.
EDIT: It appears that `avocados` and `avocado` are separate ingredients, as are `tomatoes` and `tomato`. I know pluralization rules are hard, particularly in English, but any chance of a cleanup pass for the... low hanging fruit?
And yes, I would absolutely love to see the recipes for orange juice that don't include any oranges for the same reason.
Either way though, I think it's just because `avocado` and `avocados` are counted as different ingredients.
I see what you did there with low hanging fruit! I'll take a pass at de-duping plurals in ingredients. Good catch!
Guacamole was my first search as well. Must be something in the air ...
In other words, a great test of a recipe search.
Another good recipe I see butchered is quesadillas. Something with Mexican food and people messing up.
Search 'vegan' and Colorful Guac displays
2 large ripe avocados
1 lime juice
2 garlic cloves, Minced
13 cup choppled scallion
13 cup choped red bell pepper
14 12 ounces diced tomatoes with jalapenos
14 cup snipped fresh cilantro
2 tablespoons Braggs liquid aminos
Also I wanted to mention that the most important piece of metadata we’ve learnt about that users care about after ingredients is the website it’s sourced from. It’d be great to have that
I do include the source website in the search results. It's the little icon on the bottom right of the search result card and it's also linked from the modal that opens up if you click on "Read cooking directions".
I wasn’t able to discern that. But if your goal is to showcase search capabilities to developers instead of a daily driver search engine it wouldn’t matter much.
I'm hoping they publish a popularity metric, which will fix the issues like the one you pointed out. Or, once I have sufficient data on popular searches from this site, I can append that metric to the dataset. Early stages, so please pardon the dust in the meantime!
re: SI units, I hear you. There's definitely scope for improvement! :)
Or another option is to use this site, and then use some kind of 1-5 star rating. And then just see my favorites without all the other bs that food sites show you.
Here's a comment thread that talks about copyright: https://news.ycombinator.com/item?id=25358813
> However, where a recipe or formula is accompanied by substantial literary expression in the form of an explanation or directions, or when there is a collection of recipes as in a cookbook, there may be a basis for copyright protection.
Emphasis mine. A website full of recipes certainly seems to be a collection in the sense protecting a cookbook.
For a site that gets so opinionated about GPLv3 vs LGPL vs the rest, we really seem to have no qualms about licenses when it comes to actually using other people’s things.
Though it sounds like a flippant response, I've spent a lot of time trying to decide how to feel about this. (I released 194k plaintext books as the books3 dataset.)
This doesn’t seem right, like saying if you shoplift and nobody comes after you, then there’s no issue.
That said the search is nicer than Yummly so I’ll have to give it a try.
 Nice to see they are back to being independent after Pinterest bought them out sometime back.
I'm hoping they publish a popularity metric, which will fix the issues like the one you pointed out. Or, once I have sufficient data on popular searches from this site, I can append that metric to the dataset.
so heres a idea for you to automatically rate those things by just investigating unknowns and either help them to be converted to multiple centigrade scales, and single/multiple comparable metrics you've achieved the ultimate
you are missing direct links to search results, to single search result and it's hell of a task to find a link to click that opens the little modal with recipe information (click on square should be enough)
Most recipes call for pre-heated over at 180ºC unless states otherwise.
Cooking does not observe reproducible builds. You always, always needs to taste, poke, look. If your flour is of a different type of grain or not as fine, if you use different varieties of vegetables, or if your kitchen is a few degrees warmer or cooler, you WILL get different results.
So go ahead and use any mystical cup you want. Ingredient proportions are what matters. If you fail, write down what went wrong so you know better next time.
As somebody who likes to cook and has not the slightest idea of what Farnheits, Ozs, Yards, Feet, Gallons, are, nor anything about how Freedom Unit Related & Co. converts to simple decimal metric measures, you are speaking Klingon.
Two things to consider:
a) put it onto a real domain. it's a great product, it deserves a domain :)
b) make the links clickable. They have the same color as the links below and I tried to click them with no success.
Just made the links clickable!
However, it's generally frowned upon to show the actual directions for a recipe. Which is why you see most recipe aggregators only show ingredients and link directly to the source to get the actual directions
One small nitpick: The result ulrs don't behave like regular links.
Can you open the recipe, preferably the original url, in a new tab when a middle click happens on a 'Read Cooking Directions' link?
In this particular example, the issue is that the source dataset  has "lime leaves" in plural, so if you click on "show more" in the sidebar after searching for "lime", you should see it in the list. I'm going to work on normalizing singular / plural ingredients as much as I can as part of this: https://news.ycombinator.com/item?id=25368628
One issue I noticed: I looked up Chana Masala and the first recipe (Vegan Chana Masala) calls for “12 tsp salt” but the source calls for “1/2 tsp salt” :)
I've now added a prominent warning to the UI, to check with the original site if the ingredient measurements seem off. Don't want any ruined dinners on my hands!
re: UX, I was trying to prevent accidental clicks to an external (ad-ridden) site from the search result cards. The little icon on the bottom right takes you to the source though.
Impressive all the same! Keen to dive into the source and learning a bit.
One thing that can be improved is the way the history is populated. Right now, every time a search is performed, a new entry is added to history. I was thinking how to spell Cauliflower as I was typing, now I have to press the back button for each character I typed. It will make more sense to only add a history entry when the input is blurred - onblur()
Btw, the search backend has typo tolerance enabled. So you should be able to get away with typos as bad as "cliflowre" and still get the right results: https://recipe-search.typesense.org/?r%5Bquery%5D=cliflowre
It's impressive how fast it is, even if this mostly says something about the state of the web today.
2M recipes is a huge database, and without any indication of the quality of a recipe this makes it really hard to tell which ones are worth trying. I hope you can add some sort of rating system in the future.
Also, as others have said, you're going to get sued if the originator of the recipe isn't linked prominently.
6 chicken breasts, cut into 2 inch cubes
2 cups breadcrumbs
12 cup olive oil
12 cup white wine
12 lemon, juice of
3 garlic cloves, chopped
1 teaspoon dried oregano
12 teaspoon dried parsley
Could you give an example record that doesn't have the right measurements? I can then verify with the source dataset, to see if it's an indexing bug.
This is how it shows up (CSV format):
2230984,Buffalo Chicken Pizza!,"[""1 (9 3/4 ounce) canswanson premium white chunk chicken breast in water, drained"", ""2 tablespoons butter, melted"", ""1 (10 ounce) packageprepared thin pizza crust (12-inch)"", ""12 of a green pepper, thinly sliced"", ""14 cup crumbled blue cheese""]","[""Heat the oven to 425F Stir the chicken, hot sauce and butter in a medium bowl."", ""Spread the chicken mixture on the pizza crust to within 1/2-inch of the edge."", ""Top with the pepper and cheese."", ""Bake for 10 minutes or until the chicken mixture is hot and bubbling.""]",www.food.com/recipe/buffalo-chicken-pizza-394731,Recipes1M,"[""chicken"", ""butter"", ""crust"", ""green pepper"", ""blue cheese""]"
I'll see if I can open a PR with the fix.
A small bug: Filtering by ingredient "Sugar" gives results like "Sugar free" and "no-sugar-added".
And a small UX request: Have the recipe not just be a JS modal, I want to be able to open them in a new tab.
Can anyone recommend a curated recipe list / website even if it is pay.
Quick question, are the recipes sorted by anything?
First, it makes a big difference to me if it's from a site I know and trust. Second, I think that clear attribution is a good thing, even if you may be legally in the clear for copyright.
in the meantime, is the recipe dataset open sourced and available somewhere, for other builders?
Could you show me a screenshot of what cert gets loaded for you? I did switch between infra providers in the last few hours. So I wonder if you're hitting the older infra due to stale DNS.
I got the dataset from this other Show HN post from earlier today: https://news.ycombinator.com/item?id=25356156
But TBH, logs are a unique beast in that searches are usually temporal and only a tiny portion of the dataset is typically queried. So it will be wasteful to store the entire index in memory 24x7, which is what Typesense (and Algolia) do. ElasticSearch on the other hand has mastered searching log datasets by storing the primary index on disk, so I'd recommend using ES for log data, instead of Algolia / Typesense. The tradeoff with ES is performance, since the ES index needs to be fetched from disk.
For any other structured dataset (like the dataset in this app), Typesense would be a good fit.
We currently use Algolia for 3 public sites and will further explore TypeSense for something like this for a site that hosts healthcare patient info.
And will stick to our original plan of setting up elasticsearch for logs.
Someone please make this over web archive data, for web search.
Error code: SEC_ERROR_UNKNOWN_ISSUER
Peer’s Certificate issuer is not recognized.
HTTP Strict Transport Security: false
HTTP Public Key Pinning: false
Could you show me a screenshot of what cert gets loaded for you? I did switch between infra providers earlier today. So I wonder if you're hitting the older infra due to stale DNS.