Hacker News new | past | comments | ask | show | jobs | submit login
Finding Free Food with Python (jamesbvaughan.com)
278 points by jamesbvaughan on Mar 7, 2017 | hide | past | web | favorite | 65 comments

So "back in the day" when I was an undergrad at MIT I signed up for every mailing list I could get myself onto and then trained a Bayesian spam filter to recognize free food emails. I threw all free-food e-mails in a freshly-untrained spam box and sure enough, after a few hundred e-mails it kept putting free food e-mails in one folder which was extremely convenient. Spamicity (i.e. free-food-icity) for phrases like "bring spoon" and "thesis defense" were particularly high.

I don't know how it was implemented, but there was a similar free food blitz[1] list at Dartmouth in my day. I swear, just catering the campus-funded student organizations' events must have floated half the restaurants in town. Most days, you could eat like a hobbit.

[1] Dartmouth is a weird place and for the longest time ran a unique homegrown emailish/im protocol called BlitzMail, with an entire vocabulary and culture associated with it.

Interesting. Read a little about BlitzMail. MIT had Zephyr as an IM system which was kind of fun.

I miss the days of those old IM systems that I could write my own processing pipelines and UIs for. It's sad that in these walled garden days of Whatsapp, Wechat et al. that there is no way for me to programatically access my own incoming messages.

I was always impressed with Blitzmail's speed. It was certainly faster than any of the email clients I use/try out these days, or at least it seems that way in my memory.

Blitzmail was awesome. It was also used at Reed College.

There is (was) a mailing list at MIT that was the result of your or some other persons filter across all of MIT mailing lists forwarding free food on campus. It was a lot of pizza, mostly.

vultures@ ?

That takes me back 33 years...

mooch master

I've really wanted the ability to make ad-hoc filters like this in email clients. You'd use it for free food, I'd use it to avoid having to maintain dozens of filters to correctly group a single category of work emails that refuse to be consistent.

What did you use? I'm assuming a CLI-based thing, since those seem to be the only ones that are truly flexible, but I haven't found any I actually like.

Haha, I love that idea. I may try something like that next.

I've been wanting to make my students do, as a side project, a free-food searcher for Stanford. Wouldn't have to be hard at all: Download (or even wget) the CS/Engineering calendar, do the minimal scraping needed to figure out if the words "food", "meal", "dinner", "will be served" "refreshments" appear anywhere in the text, then return True.

That's the main thing, the rest is just gravy: returning the event serialized as JSON/CSV (event name, date/time, URL) to be used in a web app or notification system -- so a simple web scraper can lead to exploring web dev (even just a simple Flask app) or fun APIs like AWS SNS and Twilio. You could even fit in a good "cache invalidation is a hard problem" lesson.

I never get around to assigning it because most people don't think of it as "serious work". Also, I'm afraid the CS dept. will obfuscate their calendar if random people start showing to things for free food. But I keep telling students, the only/best way to learn coding is to do something that directly affects your life or your bottom-line. It's the best way to put constraints on a project, i.e. think of things as the MVP that improves your life.

I learned Ruby and Ruby on Rails much faster than I had any right to, when my new job in NYC required it. I practiced not by writing Ruby on the job, but writing Ruby to scrape Craigslist apartment listings and feeding them into a spreadsheet.

I've thought about creating "personal data/programming projects"...in which the data comes from the student. Such as the SQLite that stores their Chrome/Firefox/Safari history. Or the parseable HTML dump that Facebook gives you when you request your records. Or your Twitter data dump. Or your Google search history.

But I've been reticent to do this. Partly because it's not a guarantee that every student uses Google or Facebook or has an Instagram. And partly because I'm deeply paranoid students (especially those who are novices about programming and operating systems) will accidentally upload or otherwise expose this sensitive data dump.

All you need to do is drop a hint and put the idea in their head, however obliquely you want to do it.

When I was at University, there was a public directory of employees and students available internally. You could opt-out, but very few people did. You could search by last name, but you had to use at least two letters and the responses cut off after 50 or something like that, although those 50 were in order. When I was in my senior year, somebody breathlessly ran to the student newspaper to loudly proclaim that they had scraped it and had a full copy of the database. I took a little impromptu survey at my next computer science class... "Who here hasn't at least mentally designed how to scrape the student database?" The answer was, nobody.

The student body at large was at least modestly impressed but that guy won no points with the computer science students that day.

Oh, and nobody had to "suggest" this to us.

> I practiced not by writing Ruby on the job, but writing Ruby to scrape Craigslist apartment listings and feeding them into a spreadsheet.

On a related note, PadMapper [1] is an amazing scraper / browser for saving time apartment hunting on Craigslist.

[1]: https://www.padmapper.com

Gravy, you say?

> I never get around to assigning it because most people don't think of it as "serious work". Also, I'm afraid the CS dept. will obfuscate their calendar if random people start showing to things for free food.

It's a practical problem to solve, and the calendar obfuscation angle is a real world issue. If the information itself is valuable and you're trying to aggregate it, it's naturally going to become a moving target. I think it's a solid lesson.

Someone else in this thread did it at MIT: https://news.ycombinator.com/item?id=13809108

I learned Ruby and Ruby on Rails much faster than I had any right to, when my new job in NYC required it. I practiced not by writing Ruby on the job, but writing Ruby to scrape Craigslist apartment listings and feeding them into a spreadsheet.

I started teaching myself computer programming in 2011/2012: I bought a bunch of Head First books published by O'Reilly, watched the entry-level CS video lectures that Stanford provided for free, participated in the first few Udacity courses, and I tried a couple Coursera classes. But I think I learned the most (in the shortest amount of time) when I went through your online book, 'The Bastards Book of Ruby'[0]

I loved the real-world examples you used: fetching Tweets through the Twitter API, web scraping with Nokogiri, and manipulating images with ImageMagick/RMagick are a few that stick out to me.

I'm sure you're doing great things at Stanford but I'm also confident that you could come out with a new book (covering the topics you spoke of above) that would help motivate people who are on the fence on whether or not they should start/continue learning computer programming.

I'd be happy to collaborate on something but I would argue that Python would be a better language to use than Ruby. Here are some of the topics I'd like to share with people (which I've found to be useful in my career/side-projects): reading data from CSV/Excel files (xlrd[1]), fetching data from APIs (requests[2]), web scraping (BeautifulSoup[3], Selenium[4]), connecting to a SQL database within a Python script (psycopg[4]), complex mathematical computations (numpy), and downloading videos/metadata from YouTube (youtube-dl[6]) are a few that come to mind.

[0] http://ruby.bastardsbook.com/ [1] https://github.com/python-excel/xlrd [2] http://docs.python-requests.org/en/master/ [3] https://www.crummy.com/software/BeautifulSoup/ [4] http://selenium-python.readthedocs.io/ [5] http://initd.org/psycopg/ [6] http://www.numpy.org/ [7] https://rg3.github.io/youtube-dl/

Thanks for the kind comment. I have been thinking of putting together a Python and SQL book since I teach those primarily and no current book fits my needs. I hope to have the same appeal except being a much better programmer these days :)

I wonder how do those "free food" promotions make sense, business-wise. As this article shows, by doing such promotion you hit a completely different (and useless for you) clientbase - people just waiting for the promotions. This feel similar to the way most people seem to use Groupon - they're interested in using what's currently on big discount, and they won't be coming back to a place when it's on its regular pricing.

Presumably, regular customers also react to these free offers. You always get your pizza from A, but because it's free you try B. Perhaps B think their pizzas are better. Perhaps B just offer you free pizzas long enough to break your habit with A, and your habit becomes B. Or you tend to stick with Italian, but because it's free you try Thai, and you find you like it. Businesses have many reasons for subsidising free promotions.

Mind you, I haven't done the math on this, and presumably neither have the participating restaurants. However, the logic is sound, and becomes even sounder if you treat a restaurant owner to a couple of glasses of wine.

I guess I fit this profile, only I have guilt about it. So if I do like it I will come back. I guess really it only helps get me back if it something I would do frequently enough to use the service. For example a deal to go to a hotel or massage I will probably only use once and not return, a deal for a restaurant I end up liking will get me back.

I'm not necessarily saying you're saying you should feel guilt about this; the companies invent many ways to screw you up, so I'd say that a small amount of exploiting the very rules they set up is in order. The point is, some of those promotional strategies have obvious problems which have already been demonstrated in the past, so I'm not sure why people keep implementing those strategies.

There is no need to feel guilt about this because companies do the exact opposite: they try to figure out the perfect breakpoint for €€€ per item per customer. Say your favorite pizza places does a few calculations and figures out that they can raise the price of their pizza by about €3 and actually make more money.. should they feel guilt?

That's not always the case though. I don't doubt that a proportion of people are just waiting for the promotions, but I'd be confident that others will become return customers – and in practice, it probably doesn't require that many return customers to offset the cost. I've definitely taken advantage of similar promotions and ended up purchasing products or starting subscriptions for that reason.

Many startups just chase having as much active users as possible. Uber has been losing money since day one

Right. Shame on me, I forgot they may be a startup, not a normal business. Startups like doing things like that, because majority of their money is disconnected from their sales.

I assume that the hope is to get a bunch of people to drop by, discovery they like the food and come back another day when it isn't free. No idea if that actually works, probably for some.

Probably they won't and will go away quickly -- as soon as someone turns this into a successful app that can hit a variety of sources.

There's a term for it. Its called "adverse selection".

Finding free food with PHP: http://trashwiki.org/en/Main_Page

Not sure why you're being downvoted, I laughed - and I'm a PHP developer.

Great read that shows how web scarping can easily be utilised in a meaningful way.

However, doesn't this script either send out the same text very often, or potentially send it out too late (by e.g. only letting the cronjob run every 6 hours)?

I assume that time is of the essence in this situation. Some sort of log on sent texts would surely be helpful.

Right, that's a very good point! That actually got very annoying when I first deployed this script, but I ended up adding a kind of logging to it. I wanted to keep the script in the post as simple as possible, but here is the one I'm actually using: https://gist.github.com/jamesbvaughan/4c501fc99acb75852756a4...

Cool project. I do something similar for the daily free ebook from Packt. Also for when new episodes of Silicon Valley are posted.

Tiny detail - you can do the regex match case insensitive and avoid the call to lower() for every string.

    re.match(..., re.I)

Do you have that Packt script handy?

There's a lot of junk that gets put up there, but also some gems. Unfortunately it seems like the gems are always on the days I don't check it...

So today I noticed that they announced additional free ebooks on twitter besides the daily one. I'm thinking of writing custom code now that subscribes to their tweets and filters for mention of free. May be doable with IFTTT.

For Packt, I'm just using ChangeDetection.com with an email alert. I noticed in the stats for that URL that others are doing the same. It's super easy to setup.

Is the 4th season airing already?

Nope, it's back April 24th

Ahh nice something to look forward to in life haha

Oh cool, thanks for that!

Seems to me you don't need regex, nor the call to list() at all for that

  free_food = [s for s in soup.stripped_strings if "free" in s.lower()]

You can also avoid building the list and terminate at the first true value with `any`:

  free_food = any(s for s in soup.stripped_strings if ...)

Well on that note, if you just want the boolean, I think it'd be enough to say:

    free_food = 'free' in soup.get_text().lower()

Oh even better. I'm kind of new to Python, so I hadn't seen that before!

Right, if you're just matching a substring in a list of strings then you don't need regex for that.

Also, lxml is over 20x faster than BeautifulSoup, although it doesn't really matter with such a low frequency of requests, it's something to note if you had another project that required parsing e.g. 1,000's of html pages.

I wonder how difficult would it be to do this with cloud services, I mean to scrape voucher codes for free credit. For instance when Digital Ocean announces some promo and you can get 10 USD worth of credit.

There are some sites that allegedly publish coupons, but I feel like a dummy only scrolling through those, it's full of ads and crap.

What would be the proper channels to scrape for promo codes of the cloud providers? Twitter feeds, something else?

What would be the proper channels to scrape for promo codes of the cloud providers? Twitter feeds, something else?

Affiliate programs.

If you're interested in free food, I recommend checking out www.freefoodguy.com. He's a blogger who's sole mission in life seems to be finding free food deals and sharing them via his email list.

Regarding free food you could also try natched:


Disclaimer: I helped building this.

This automation will increase the consumption of free food offers by removing friction. Restaurants will not make money on their promotions, since the script users will only consume the free food in their local market. Restaurants will stop using free food promotions. ;-D


cloudflare captcha is blocking me to test this.

Try changing the user agent string of the requests library to something more legit, like a Chrome user agent string. Makes most websites less suspicious of your traffic.

Haha, very nice. I think this might works with https://www.groupon.com/ too.

i dunno how well this would work in practice, in my experience signing up for promotional email results in tons of junk emails with actual deals only offered occasionally. It'd be hard to parse out real value

postmates uses cloudflare and it may show a captcha page sometimes .

New User = Free Food


(Equivalent library for Vegetarian people) Free food with Python: http://24.media.tumblr.com/tumblr_mdjfls2B5x1r0uk07o1_1280.p...

This is so easy, I wonder how it made all the way up to the front page.

I write these kind of mini-scripts all the time

It could be because the author took time to give a detailed explanation of how he implemented it. It shows to a beginner programmer that (s)he could automate simple tasks like this easily and a way to approach it.

Absolutely - I spend a fair bit of time on here, but I'm only a beginning programmer, using Python. This is exactly the sort of thing that inspires me, and also it's well written in terms of understanding for someone of my level. Obviously it's trivial to 99% of HN readers at a technical level, but I think there's room for this sort of thing.

I teach for a living (music technology), and it's incredible how badly a lot of things are explained (in all fields). Clear explanation and worked examples, combined with appropriate progression in difficulty is what makes for good learning, and I think this is a good example.

Thanks, that makes me happy to hear! My goal with posts like this is to do exactly that, and I hope to post more things like it in the future.

I would doubt your claim that 99% or HN readers are at a high technical level.

I spend a lot of time here, mostly in awe of the expertise of others and hoping to learn something from them.

HN isn't populated solely by hackers any more. Personally, I like reading stuff like this because it's "bite-sized" and doesn't take very long. It's also simple and clear enough for someone that doesn't have your god-level scripting skills to implement.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact