I worked on this project at my former employer for nearly 8 months. We worked as a good faith partner with McDonald’s, only to find out eight months in that McDonald’s had never had any intention of working with us and we’re only using us as a negotiating tactic for the acquisition they eventually made for this technology.
I learned two things from this project. Our team built a complete English language model for every possible permutation of ordering every menu item. No LLMs, no gpt required.
The hilarious part is that McDonald’s stopped responding to our team and abandoned the project only about a month before we were going to tell them that it was not technically feasible to do it scale. The problem is not bad AI nor a lack of data. The problem is all of the real world interface challenges, keeping speakers and microphones working outdoors across many different climates and weather conditions, and temperatures and humidity is incredibly expensive. And on top of that, the hardware needed per store at the edge and the permanent network infrastructure required to keep everything running makes the whole system substantially more expensive than just having a single human being running drive-through even have much higher wages than minimum wage.
So McDonald’s never got this conclusion from us and instead spent several billion more dollars and another six years of R&D to come to the same conclusion.
My other take away from this project was that I will never give McDonald’s another dollar of my money. Working with over 200 fortune 500 companies in my career McDonald’s is far away the most evil heartless and ruthless company I’ve ever dealt with.
They don’t care about their customers, their users, their franchisees or their employees. The only thing they care about at all is their stock price.
Doesn't surprise me in the least. McDonald's is such a bad company on many levels, even from its founding roots.
They make fast food (an industry notorious for craping all over their employees) and it was usurped by a greedy business man (Ray Kroc) from its original founders.
Their food is okay, but I would never, ever do work for them.
> usurped by a greedy business man (Ray Kroc) from its original founders
It's a weird aside, but maybe worth noting that Ray Croc was probably not just greedy. Ray Kroc was probably also racist against the Irish. A tell-tale sign is using a clown as the company mascot. The US strain of clowns were heavily influenced by "pale white face" racism jokes about Irish immigrants [0] from some of the same minstrel shows notorious for "black face" and "yellow face" and "red face". Ray Kroc was from a generation that would have easily been aware of that and would have been "entertained" by it. Ray Kroc's behavior to the actual McDonald's founders is rather easier to explain assuming it included quite a bit of Old Fashioned Racism than assuming just pure greed. Sometimes it is useful to remind ourselves that past isn't as clean as corporate memos want to paper over it.
[0] Notably, among other things: red hair, big feet, freckles, red drunken noses, loutish drunken behavior. Even the "clown car joke" is the exact same "joke" as "Mexican pickup truck" transposed across a couple of decades and about a different working class immigrant population. (Racists seem pretty lazy in how they reuse old material.) So yeah, if you ever wondered why clowns don't seem all that funny in the modern era, congratulations you probably aren't a racist. Also, now that this past horror is in your head I'm sorry for ruining Disney's Dumbo which uses all the worst of clown stereotypes and "jokes" all in the same place and eats up a lot of runtime with it, if you weren't already concerned about the "black face" crows in the movie or thought you could dismiss them as not central characters or contributing that much to the runtime.
Why is this grey? Does that mean people are reporting it? If you report it or want to, then why?
I haven’t verified these claims, nor do I have the background knowledge to form an opinion. I don’t care. Seems plausible and I think Irish were within the definition of “black” a century ago.
Is it because you don’t believe in this? Or do you think it is astroturfing with bad intentions similar to the anti women memes plastering social media (where it is some wrong committed by a woman with 100s of bot comments saying all women are bad)? Kinda like repeating long ago marginalizations of white people to stir up a feeling of wronghood among older more conservatives white men?
I don't even consider the food okay... I liked a few breakfast options, but the prices of even that have gone up so much the past 3 years, I won't even go then. It wasn't that long ago that the breakfast burrito was $1 (then 2 for $2/3) and they had some sandwiches 2 for $2/3/3.33/4/5 ... When I'm paying over $10 for a couple breakfast sandwiches and a drink, I'm out.
They definitely seem to be squeezing more out of their franchises than ever at this point. They will push technology optimizations, but they're at a point where service and price just isn't there. If I'm spending $15+ for lunch, I may as well go to Applebees/Chili's, etc.
Thanks for sharing, everything makes sense except this:
> The problem is all of the real world interface challenges, keeping speakers and microphones working outdoors across many different climates and weather conditions, and temperatures and humidity is incredibly expensive.
I thought this was a solved problem already, by McDonald’s in fact?
I am not the poster, but my guess here is that a point to point or radio setup which just needs to be good enough for humans to understand it (and even then just barely) is likely way worse quality than what would be needed for recording and feeding to a program
Treat each drive through ordering system as an edge compute location, run the software right there and then transmit only the order instead of streaming audio from the ordering system.
Edge compute requires field IT for servicing. You can hire 3-4 employees for the price of a single field service specialist.
Servicing industrial edge compute is insanely expensive for B&M businesses. This is a big part of why so many companies have spent the last decade offloading everything humanly possible to public cloud workloads.
With background noise? My wife and I had an interesting unintentional experiment regarding this a few weeks ago at a Slowdive concert. She doesn't know the band well and asked me the name of each song as it began and then also asked Google's song identifier service. I knew all of them and Google couldn't even produce an answer, incorrect or not. Either because of the crowd noise or a live version is sufficiently different from any recorded example it has heard before.
> The problem is all of the real world interface challenges, keeping speakers and microphones working outdoors across many different climates and weather conditions, and temperatures and humidity is incredibly expensive.
This sounds like a problem for humans too - are humans just better equipped to deal with bad audio?
Yes, but not in the ways you think. Humans can do about as good a job as a computer in understanding poor audio quality in context, but the compute needed for the latter, in realtime, is pretty substantial.
As the commenter says below, a human can intervene in many more ways when equipment malfunctions or customers have special needs that an AI just gets blocked by.
There are literally hundreds of edge cases where a voice powered drive through just stops working, from high winds to pouring rain, to thick accents, broken equipment, out of stock or seasonal items not available. Just a few of the ones I encountered personally in the wild tagging 15,000+ orders.
Humans have had to evolve to be able to communicate with each other effectively in this and far worse conditions - on battlefields, in driving hurricanes, while being stalked by other humans (and animals), yelling across vast distances, over raging rivers, while wounded, sick, etc.
Yeah this sounds strange to me - every drive thru has had an intercom system like this working outside for decades. Doesn't seem like an insurmountable challenge to me.
You're not entirely wrong, but often these AI systems need some pretty clear audio to work. It's kinda shocking how good we are at working around bad audio when it comes to conversation, and I'm certain most people know how bad these intercom systems get. The issue isn't that they need to be fixed at all, it's how far they can go before they must be fixed. And the one thing we can do that AI can't is have face-to-face conversations. If the speaker simply doesn't work, it's a bit of a drag, but you can just pull up to the window directly and skip the entire audio system. Or just walk inside. Both options eliminate the problem hardware, where as AI would need additional hardware to do those jobs.
> The problem is not bad AI nor a lack of data. The problem is all of the real world interface challenges, keeping speakers and microphones working outdoors across many different climates and weather conditions, and temperatures and humidity is incredibly expensive.
That's a great point. Shazam, which has a "simpler" challenge, still trips up with identifying music in challenging environmental conditions.
Having worked for a variety of corporate structures, I'd agree that franchise models tend to have... a bit more ruthless approach to everything. (Even unintentionally)
When "the business" is literally outside your company, you get the same dynamics as when it's in another org structure, except even worse.
I can answer that. Because I'm sick of installing a hundred apps for every single company, setting up accounts, verification, so I can spend 10$, and never use it again.
And, when you are traveling, and just taking the highway exit for a quick order, what, now I have to stop, park, download an app then order.
Or do you mean, like mount Ipads in kiosk mode? I think that gets back to the expense and weather.
I think it gets better over time. It is like with cars having no knobs or no physical buttons anymore. We have to figure it out. We have big screens in cars now, why is the menu not even integrated, when you get close to your McDrive? Why not allowing each shop a virtual space in your car, when you drive nearby?
And by the way, we all wanted employees to have a better pay. It is just pure survival mode now by mcd. Since paying humans is expensive now, one has to find ways to make them expendable even more, so that the company remains profitable for the franchise people and keeping the product price low.
(And after all, was McD not a real estate company?)
I hate the self-serve kiosks inside the places... I'm relatively tall, and trying to use them is often an exercise in frustration. Especially when there's no mechanism to adjust the height or angle.
This is why McDonalds literally spends tens of millions of dollars per quarter giving away free food to attempt to get more people to just use the app.
Their plan longterm is exactly this - have a robot box with a window that only accepts orders through your phone and removes all humans from the process.
You might be surprised how many people absolutely hate using apps, and do not want to interact with a business through a mobile phone. Older folks especially, but also there is a large portion of the population that only use technology when necessary.
We build a heuristic model from scratch. We used BART for some NLP bits and Azure speech to text for the basic mic -> raw text. My memory is very muddy on this bit as I didn’t work on the algo portion of the project, I was working on the UX workflows and interaction / conversational workflow design and validation.
Fun fact, McDonalds menu is a graph database that tracks every single ingredient as purchasable entity. You can order damn near any combination of elements and they will sell it to you - want 32 pickles and the bottom piece of a bun with a chocolate chip cookie on top? No problem.
And every promotional item and name ever put on sale is persisted in the menu forever.
> You might be surprised how many people absolutely hate using apps, and do not want to interact with a business through a mobile phone. Older folks especially
It is not just older folks. It is also anyone mildly or more security/privacy aware. App's offer far more opportunities for spying/tracking and as well "advertising" (push messages), none of which I am interested in giving the business.
In my case, if an app is required, I move on to the next business where an app is not required.
> You might be surprised how many people absolutely hate using apps, and do not want to interact with a business through a mobile phone. Older folks especially
Yep, but not because I'm older but because:
1. I do not want McDonald's (or anyone else) to link my orders together. It's none of their business.
2. App fatigue. I don't even eat at McDs, but I don't want an app for every grocery chain i buy from either.
3. Any app, even if not "AI" based, is going to fail in more annoying ways than a human.
McDonald's has screwed up my order less than any other restaurant, provided better, warmer food than any other restaurant, is cheaper for the quality than any other restaurant, and their Ronald McDonald House charity rescued my family during our darkest hour.
>> There are over 380 Ronald McDonald Houses in 64 countries. These accommodate families with hospitalized children under 21 years of age (or 18 or 26, depending on the House), who are being treated at nearby hospitals and medical facilities. [...] Ronald McDonald Houses allow families to stay free of charge.
That hasn't been my experience at all. Not sure where you are, but in the Phoenix area, I have gone several multi-year periods not spending anything at McDonald's. They had messed up my order more times than getting it right.
Worst was going ketovore... how the hell hard is it to put two pieces of meat in a box without a bun and nothing on it? OR getting 3x "2 strips" of bacon and counting to 6.
This might be a case where a kiosk touchscreen is better. I've used them in-store. It's not a bad system, plus you would pay there instead of "pull up to the first window".
Plus a touchscreen works for hearing impaired people.
I would prefer jobs like burger flipping getting automated instead, because when the ordering lacks human touch it feels like getting served at the animal feeder. Although I use the kiosk often, I'm not happy about it.
Automating the jobs that humans are good at and pushing people to do repetitive soul wrenching jobs that machines should de better is the wrong path IMHO.
The human at the counter sets the tone and the mood, which is part of the eating. That's true even at fast food mega chains even when the order taking employees follow a script and do it time after time.
The best thing about touchscreens and apps is that I don't have to worry about the frontline human and the kitchen human misunderstanding my custom request. Now it's down to just the kitchen human, and they tend to make fewer mistakes it seems.
As far as the human at the counter setting the tone and mood: I have had very few fast food interactions where this human made it a positive experience. "Meh" at best, and often not-good to bad.
And that the underpaid staff in the kitchen that now is at a cheaper hourly rate because they don't "need" front-line social skills is literate enough to read your custom order and/or cares enough to follow its instructions to the letter. They certainly aren't paid to care about customer service requests by that point.
The biggest challenge to this is weather. Do you really want to roll down your window and get soaked in a rainstorm to pick a #1 with a diet coke?
Most drive-thrus I've ever seen (Ohio, US) have zero cover and you are completely exposed to whatever mother nature wants to throw at you. Even the service window itself may have a tiny 24" awning that barely helps. In fact in my experience during rain it makes it worse as you get runoff right on top of the vehicle since they don't have rain gutters.
I would guess - and correct me if I'm wrong - that building a cover over that window was much cheaper than all this exercise they're trying. I mean low-tech solutions don't become invalid just because AI.
That may be because it freezes there, and there's a hazard of ice falling on vehicles etc. In both Texas and California, generally the drive through order thing has a ~8x8' cover over it. It never freezes for more than a day or two in populated areas of TX and CA
I've seen lots of fast food places where there is covering. They avoid this problem by installing thick steel tubing in front of the covering so overheight vehicles hit that instead.
Those kiosks are a master class in bad UI/UX. Just to start your order, you have to tell it (twice!) that you don’t want to log in.
And where Amazon “pioneered” one-click checkout, what, 20 years ago? Those kiosks have a path like view cart -> checkout -> decline upsells -> don’t use app to pay -> choose credit card -> enter table number -> tap card -> choose receipt.
Each one of those is a whole new screen. On a 40”-ish monitor! It is insane, and could easily be one screen with a couple of options. And just tapping a card should checkout and pay at any time.
Not a fan of McDonald’s but I wanted to see what they were doing with self serve tech. Nothing good, it turns out.
My first experience with one was just trying to order a large black coffee. Took me probably three full minutes. Would have been, I dunno, maybe 20 seconds with a register and a human taking the order.
Another case like self checkouts of “automation” in the form of making the customer do more work than it took a paid worker to accomplish the same thing, but not actually automating a damn thing.
> you have to tell it (twice!) that you don’t want to log in.
That's the result of the advertising department desperately wanting to track you for their metrics.
> Each one of those is a whole new screen. On a 40”-ish monitor! It is insane, and could easily be one screen with a couple of options.
This was most likely the result of some UI designer fearing that they would scare away too many folks with choice fatigue if they put all the options on a single screen all at once.
Sadly they forgot to add a button for "I'm a technically adept user who would like a single screen with all options please" to let those of us who would not be scared away by that complexity select that version as an option.
If this is really a big money saver, you'll see fast food restaurants reorganize their restaurant to be more like old Sonics - multiple ordering stations that are serviced by the window.
If you've never seen a Sonic, it's organized like a gas station. Replace gas pump with kiosk.
simply select what you want, have your favorites saved and go. When I had my first blackberry I envisioned that you'd be able to order everything there.
Apps are a huge dealbreaker for anyone who isn't a frequent customer. They are are a huge hassle on first use, pretty much guaranteed to be privacy-invading, and won't stop nagging you with annoying notifications.
Ordering a "big mac with fries and a coke" at the drive-thru window takes less time than reading this comment. Why would anyone want to spend five minutes installing an app, creating an account, figuring out how the app works, and finally placing an order?
They do have an app that helps with that. That is small percentage of customers.
The argument is don't force all, 100% of customers to install an app, because it gets frustrating, and then you loose some percentage of customers that just walk away to order somewhere easier.
Cognitive load for installing "yet another app" is pretty real. It doesn't matter how easy or seamless the experience is, part of the value for an app is to make it easier for repeat customers to use their service, and make them sticky. That starts with wanting to buy from them in the first place though.
If I'm on a road trip, and I need a quick meal, and my choices are "buy from place that makes me use their app" or "buy from place that doesn't make me use their app", I'm gonna choose the option that is more convenient to me, which is, the one that doesn't make me download anything, sign up for an account, and pester the hell out of me 2 hours after I have already forgotten about that particular fast food joint to let me know if that if I only buy 39 more Big Macs that I can get a free small fry.
I don't really want fast food to begin with. The food quality at any vendor has continuously gone downhill (or my standards have raised, could be some combination therein). If they want any hope of capturing my business still, they'll only get it if they keep that process as frictionless as possible.
The only time I dislike these kiosks is when they’re inexplicably laggy and horrible to trigger interactions. How does that happen?! It’s at least 2008.
Because they wait for corporate server half the world away logging every click through crappy 2g connection before they can advance to next screen. How I know? I made some telemetric devices working on 2g connections, trying to optimise the hell out of it (other hardware vendors look at you funny when you tell them that your devices use only 2MB/month). Minimum response time was about 1.5s, 20s was still pretty normal.
With 2 kids and a wife who, I won't say are "picky" but definitely want their meal a specific way (and typically aren't quite ready to order)... the kiosk is a life saver.
Fortunately, in the store, we can both do this in our preferred way. Kiosks are there for those who prefer them, and those who don't can still talk to a human at the counter (although ordering from the counter has been made worse by the fact that the menu boards no longer seem to show the whole menu).
But in a drive-through, somehow, I don't think there will be options. I bet it would be one thing or the other.
Mostly, they're clumsy to use. It takes me much longer to place an order with one of those than to just tell a person what I want. They give me no benefit and make my experience worse. Especially when I've mistakenly chosen the wrong thing.
I also dislike having to use a screen rather that interacting with a person for this sort of thing in the first place, but that's personal preference rather than anything about the kiosk system in particular.
I think they do a great job telling me what's available and what the options are. Much better than standing at the counter and reading distant displays while talking to a cashier.
Those giant touchscreens are a petri dish for every germ imaginable. McDonald's are open 24 hours with all kinds of people tapping away on them. There's no way the screens are cleaned more than a couple of times a day.
> people need to open their car door and half lean out to reach the screen
Couldn't we just have the screen move towards the user, using a basic proximity sensor? We've had car wash systems adapt to the car's size for quite a few years now, so it should be a relatively cheap solution, no?
I really doubt it would be cheap. As soon as you bring motors and the electronics to control them into the mix, you're going to have an increased rate of of failure. Even if the initial install is inexpensive (which I doubt), the maintenance costs would increase a lot.
Reading the comment above from noen, it seems McDonalds sabotaged the project as part of a plan for an acquisition.
When reading the article I also jumped to conclusion it was IBM's problem. But then thought it sounded an awful like McDonalds was 'hinting' that it was IBM's problem while not being able to say so.
So kind of a reverse stab. McDonalds is acting like there is contract language saying they can't say what happened, and subtly making it seem like it was IBM when really McDonalds is saying: "It was IBM but we can't say because of contracts, wink wink, but really we were a horrible customer and wrecked the project and trying to spin this onto the vendor".
Makes sense. I think the technology will get there but there are most likely too many unknown variables in the ordering process. McD would rather you order through the app for pickup.
I remember seeing a video from WSJ about the drive-thru AI chatbot from https://presto.com/ being used in a fastfood restaurant, to be honest it works quite well, I wonder if McDonald's will switch to them. The service is already being used by Checkers, Hardee’s, Carl’s Jr. and Taco Johns.
I don’t have any actual data to inform my feelings (I’m sure it wouldn’t be hard to find), but any time this stuff comes up I have to wonder about the economics of building out the AI, instrumenting a store, building or buying/licensing software for the customers to interface with, the compute, bandwidth, etc…
Versus paying some kid $7 an hour to hit buttons on a screen.
As uninformed as I may be here, my gut says that it doesn’t make a lot of sense.
In addition to the environmental issues, surely a wide disparity in dialects and speech patterns must have played a role? ML/NLP models largely be trained on clear, precise - and predictable - English language must struggle when you add in an accent, slang usage and age/geo-specific language choice?
Looking at the state of most drive though electronics this is pointless and won't matter.
Keeping that system working is already difficult enough with a highly adaptable human at the other end of the line which can interpret your order with a broken microphone, bad language skills, a loud motor interfering and in any weather.
And openai cannot handle the "please drive though to the first booth" if all else fails.
On another anecdotal note completely off-topic: I wonder how it would interpret a horse, back in the days when I worked at the local McD we had plenty of them going though the drivethough.
Surely you don't intend to imply that the horse is ordering. Give that, there are all manner of sensors to detect the presence of an automobile or horse in the drive through and notify staff or the AI that someone wants to place an order.
My understanding is that this is a solved problem. Staff get an alert/announcement chime or recording about someone waiting because the electronics detected something. Why a horse would pose a challenge is a mystery to me. Sure, there are some devices that will work less effectively as sensors for a horse than a car, but your particular franchise just adapts to the correct sensors for its traffic.
> Surely you don't intend to imply that the horse is ordering.
Haha no (should have explained), but riders are too high to be heard due to the directional microphones installed and you only effectively hear the horse. Same for very high seated cars (though I suspect this might be different in the US). Trucks are too tall and would hit the building so they are not a consideration for the one I worked at.
As you said, a simple alert system might work to solve this.
The new one can interpret broken af accent with a baby crying sounds on speaker on full volume just fine. If they cannot understand it just goes to the normal operator. Also the solutions seem to be AIO not adapting existing tech so the microphone quality is very good with noise cancellation built-in I'd assume.
What a play by McDonalds, sell the team to recoup the risk, sign some initial extension to make it feel worth it to IBM, then pull the rug. IBM nothing changes...
The price of errors is too high, I'd imagine. I see this everywhere in my own domain - AI stuff is super easy to demo and prototype, but running in production in scenarios where errors are not cheap is a recipe for disasters.
We are currently only pushing it where the existing process involving people is ridden with imperfections -- and AI can do no worse or better, but still not perfect.
i dont install apps from food companies on my phone. Some have already been caught collecting the data nefariously. The reason they have the app is to steal your data. Before mcdonalds had their summer drink and ice cream cone specials for everyone, now you have to use the app, so they can take your data. A lot of people just opt out.
> But the company did not dismiss the prospect of drive-thru AI, suggesting that McDonald’s plans to find a new partner for its automated order taking efforts.
Yeah, it's ending the current partnership, not the imminent doom of fully automated restaurants.
I learned two things from this project. Our team built a complete English language model for every possible permutation of ordering every menu item. No LLMs, no gpt required.
The hilarious part is that McDonald’s stopped responding to our team and abandoned the project only about a month before we were going to tell them that it was not technically feasible to do it scale. The problem is not bad AI nor a lack of data. The problem is all of the real world interface challenges, keeping speakers and microphones working outdoors across many different climates and weather conditions, and temperatures and humidity is incredibly expensive. And on top of that, the hardware needed per store at the edge and the permanent network infrastructure required to keep everything running makes the whole system substantially more expensive than just having a single human being running drive-through even have much higher wages than minimum wage.
So McDonald’s never got this conclusion from us and instead spent several billion more dollars and another six years of R&D to come to the same conclusion.
My other take away from this project was that I will never give McDonald’s another dollar of my money. Working with over 200 fortune 500 companies in my career McDonald’s is far away the most evil heartless and ruthless company I’ve ever dealt with.
They don’t care about their customers, their users, their franchisees or their employees. The only thing they care about at all is their stock price.