I just got a fabulous recipe from the uncanny valley of cooking. Almost viable, with just a few mistakes that would have made it both impossible to follow and, if followed as closely as possible, disgusting.
A glorious effort :) Reminds me of Douglas Adams' Nutrimatic machine:
The way it functioned was very interesting. When the _Drink_ button was pressed it made an instant but highly detailed examination of the subject's taste buds, a spectroscopic examination of the subject's metabolism and then sent tiny experimental signals down the neural pathways to the taste centers of the subject's brain to see what was likely to go down well. However, no one knew quite why it did this because it invariably delivered a cupful of liquid that was almost, but not quite, entirely unlike tea.
Funny you quote this. I just brought it up in a conversation about ad recommendation engines.
It's funny what a difference a generation makes. These days, we know to think of the Nutrimatic machine's neurometabolism probe is simply data farming. All we need is to substitute click history for microfloral ratios and we can have our own cupfuls of liquid that are almost, but not quite, entirely unlike tea.
I can't find any details on the license or copyright status of these recipes. It looks like a scraper was used (Scrapy). The only note I could find in the paper just says:
> Additional recipes were gathered from multiple cooking web pages, using automated scripts in a web scraping process.
Is it legal to republish these recipes without explicit permission from the origin sites? I would be wary of using these for anything without more clarity.
EDIT: There's no license information in the full dataset, but it does list the source URL of the scraped recipes. Summary of sites used:
Ingredients lists are not copyrightable. The text and photos around them can be (as in a recipe book or blog).
“A mere listing of ingredients is not protected under copyright law. However, where a recipe or formula is accompanied by substantial literary expression in the form of an explanation or directions, or when there is a collection of recipes as in a cookbook, there may be a basis for copyright protection.”
I’ve always wondered about this. Say you get the recipe and translate it using ML to another language but with fuzzy logic so it’s not identical to the original. And has added or missing text or parts of text.
Would that be breach of copyright? Do you hold the copyright over the new content? Can you claim the original was just inspiration and there are enough differences between the two items?
opinionated side note:
A recipe is a set of instructions on how to alter and combine foods. This is "obviously" not worthy of copyright. A set of instructions on how to combine and alter text on the other hand (99.999% of software) is 100% copyrightable
This is all covered under derivative works for copyright, you can't take ownership of something just by changing some things. The copyright office has a summary of derivative works here: https://www.copyright.gov/circs/circ14.pdf
Mere listing... no. But once you add quantities and a specific order or preparation, a list becomes a copyrightable work. The mere list reference is to the simple content lists we see on product labels.
Even the quantities and order of preparation are not inherently copyrightable. Per parent comment's citation, where a recipe or formula is accompanied by substantial literary expression.
"Fill pitcher with water, add lemon juice and sugar" is not going to pass muster as "substantial literary expression".
But if I republished your 5 page story about how your grandma used to make this lemonade and you remember drinking it on the porch of her cottage on Cape Cod, then I would be infringing on your copyright.
Just the ingredients, quantities, and procedure explained in a factual manner is explicitly not copyrightable in the US:
> A recipe is a statement of the ingredients and procedure required for making a dish of food. A mere listing of ingredients or contents, or a simple set of directions, is uncopyrightable. As a result, the Office cannot register recipes consisting of a set of ingredients and a process for preparing a dish.
Your explanations of why a process is followed, and other supplemental information beyond the instructions, can be copyrightable.
Substantial literary expression is old law, written when software was poorly understood by courts. Proceedures are now regularly copyrightable. They will find expression somewhere, even in a typo or apparant error (see the old phonebook and map cases). While one can pull and copy the facts, verbatum copying of every minutia is very dangerous.
Remember that a recipe isn't just an instruction set for making food. For purposes of copyright it is also letters on a page generated by a person.
Especially creative language describing the recipe might be. Like paragraphs of text about the cultural context or how much you like it, or even especially poetic langauge describing how to combine the ingredients.
But not the instructions for preparation themselves, not simple instructions of how combine things in what order. No matter how specific, they are considered to be facts rather than creative expression. "ideas, procedures, processes, systems, and methods of operation" are not protected by copyright -- no matter how useful, no matter how much energy went into making them, that is not the standard.
Other countries may have different legal status than USA
> Second step is to fill Additional information. First of them - an email address.
> Someone may ask: 'Why Recipe Generator might need my email address?' That's a pretty good question, especially when you concern your privacy, but the answer is rather straightforward. Recipe Generator is quite a complex piece of software and it needs some time to prepare your recipe, especially when there are many orders. The generator will email you a recipe as soon as it's ready, so you don't have to keep hanging on the website.
> Recipe Generator is not only excellent artificial chef, but also very caring about it's customers privacy, so after he emails you a recipe, he immediately forgets the address (that's why you have to provide it many times).
I think you have a great tool if it works, pretty well monetised if you want to go that route, but I cannot tell if it works and am not willing to submit an email address, sorry. So, just some basic potential user feedback. Perhaps a tool for more, of course.
Perfectly understood! We just cannot afford to run it on GPU in the backend so the generation takes up to 3 minutes which is terrible thing to do interactively :(
You're invited to check it with 10minutemail or so, or you can try our model on your own if you have some python experience: https://huggingface.co/mbien/recipenlg
For privacy conscious and if you're willing to go that far[0], you can generate a unique URL for a recipe and store it for a limited amount of time. With that (and the URL being generated beforehand), you can tell the user to either provide an email address so you can email them, or just to visit the link later and check if the recipe is ready.
[0] Personally I wouldn't worry. It's open source and people can run it themselves if they're squeamish, and probably your time would be better spent on the engine itself :-)
Hard to say per recipe, as in the current setup they are generated on demand and not pre-generated. But as you can see, we're hosting the solution in academia so it's basically for free vs around 2$/hour with GPU on AWS
Yes, you might be right about the timer, we'll think about changing it!
How about using browser notifications? That way, one could leave the website, receive a notification about data being ready, and click on it to find the generated recipe.
It does take a long while though (I have been waiting for... I didn't pay attention but more than 15 minutes) so I can understand how using email would be the easiest solution for them.
I'm willing to make many concessions anyway to get something that is as good as Chef Watson was. Losing this was a real tragedy.
I don't like alt accounts. And while I have a couple of alt emails for 'emergency purposes' I don't like logging into them because that defeats the purpose. That's all.
This is great! I cooked recipes generated by GPT-2 for Thanksgiving dinner this year and it was surprising how a recipe can look good on paper but then come out in a completely unexpected way. Wrote about the experience here if anyone wants to see the results: https://link.medium.com/DtHmAeBU4bb
Also we got access to GPT3 and it is remarkable how much better it is with just 2 example recipes compared to GPT2 fine tuned on hundreds of thousands of recipes.
This is pretty cool! I’m working on an open source alternative to Algolia / ElasticSearch called Typesense [1]. Going to use this 2M recipe dataset to build an instantsearch showcase like this [2]. Should make for an interesting search/discovery experience. Will post this shortly!
Wow, I'm really thrilled about this. Around 2014, I had a concept for a recipe app that I wanted to build to scratch my own itch: unit conversions, integrated timers, parallel flows (do X while Y). I had been hoping to work with an annotated recipe corpus to facilitate some of the work, but after some research, I couldn't see how to pursue it meaningfully given my time and budget. I might have to pull that idea out of the heap of abandoned projects now.
That sounds pretty nice. I particularly struggle with density conversions. One teaspoon of flour is how many grams? It's just a quick search away, but having it natively in the app would be a big help.
Someone should make a text-only recipe search website. Recipe websites are so heavy and full of interstitials, banners, auto-play videos, and ads. Two million recipes can be easily stored in memory. (2 million * 8000 characters per recipe text ~= 16 GB of RAM.) Put a CDN in front of it and you're golden.
Ask and you shall receive! I indexed the data from this post in Typesense and built a static site hosted on S3 with Cloudfront in front: https://news.ycombinator.com/item?id=25365397
My wife and I cook most of our meals at home, and I have yet to encounter a recipe site I don't hate. All Recipes is about as good as it gets, but only because they have a print button at the top of every recipe that takes you to a printer friendly version.
FWIW, I'm no fan of giving my email either, so I tried to load the transformer on my own system. Would you mind to put a bit more, eg an example, than just following three lines there?
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("mbien/recipenlg")
model = AutoModelWithLMHead.from_pretrained("mbien/recipenlg")
FYI, I got this from running above lines:
...AutoModelWithLMHead is deprecated and will be removed in a future version...
FYI2, I just entered on your huggingface page "Get salt, apples and sugar and you get" and got "Get salt, apples and sugar and you get a soft, sticky mess. Then add water, add a lemon or lime and add a 1/8 teaspoon vanilla. Add a pinch of cinnamon and a dash of nutmeg. When you are ready", where is the recipe, what is this?? A bit more docs would be nice...
We use custom control tokens, so you have to provide the specific structure of the recipe, simply typing it in the huggingface prompt won't work. The details are available in our papers about the solution.
"The dataset we publish contains 2231142 cooking recipes (>2 millions)"
Where are the actual recipes? Sorry if I missed it, but I've been opening folders and couldn't find a huge file with all the recipes (nor a folder with many files or similar).
I've also tried searching for common terms and I've only managed to match those against some sparse tests and URLs. Is the actual data not here and supposed to be fetched from their source urls?
I'm currently scraping recipes, too. The scrapers extract the data from the html itself, however most recipe blogs/sites provide the recipe as JSON-LD.
Attempted to max out the ingredient list, and got a recipe that would probably feed 20 people[0]. Most impressive thing in my mind is it did correctly figure out it is a soup.
However, I'm not sure 60 cups of vegetables is useful for the common family ;).
I tried ingredients for my current obsession: sauerkraut, just cabbage and salt. Naturally it did not reinvent sauerkraut. How many of those 2M recipes included a fermented portion? Will I get instructions to combine ingredients and let sit for two weeks?
Whenever i read about any tech involving recipies i think of Gnutella, the architypal peer-to-peer filesharing scheme developed to facilitate the sharing of nutella recipies. I immediately looks for the alternative and honestly unintended use.
Good question! I'd say, ultimate task is to achieve both, but what is fully achievable atm is to get the right recipe structure. The idea of "validity" is very hard to evaluate and often has high regional and personal variance, but many teams around the world are working on this problem at the moment!
IBM Chef Watson was quite good at it though, already five years ago. But I haven't found any similar service since it was turned off, what makes it so hard to replicate it?
"This is an archive of code which was used to produce dataset and results available in our INLG 2020 paper: RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation"
A glorious effort :) Reminds me of Douglas Adams' Nutrimatic machine:
The way it functioned was very interesting. When the _Drink_ button was pressed it made an instant but highly detailed examination of the subject's taste buds, a spectroscopic examination of the subject's metabolism and then sent tiny experimental signals down the neural pathways to the taste centers of the subject's brain to see what was likely to go down well. However, no one knew quite why it did this because it invariably delivered a cupful of liquid that was almost, but not quite, entirely unlike tea.