
Airbnb's preferred smart lock vendor accidentally bricks 500 door-locks - edward
http://boingboing.net/2017/08/14/airbnbs-preferred-smart-lock.html
======
tlb
I've pushed dozens of OTA updates to WiFi devices (telepresence robots), and
it's always terrifying. We'd test thoroughly, and we'd always start with
devices in our office, and wait to see them reboot before pushing to
customers. In addition, we had a scheme where we'd push an update, it'd
reboot, and if it didn't connect to our server within 20 minutes it'd revert
to the previous version and reboot again. WCGW? We never bricked one in the
field, but I can think of a million ways it could have happened.

The scariest times were pushing updates that affected WiFi connection for
customers with unusual network configurations. In particular, we had one with
a multi-campus Cisco WiFi system with WPA-Enterprise, which requires logging
into a RADIUS server to get WiFi. We were running on FreeBSD which had a half-
assed port of the Linux WPA authenticator. We never managed to replicate their
network perfectly for in-house testing, so pushing updates to them was always
nerve-wracking.

~~~
cperciva
_we had a scheme where we 'd push an update, it'd reboot, and if it didn't
connect to our server within 20 minutes it'd revert to the previous version
and reboot again_

Was this handled in userland? Or with something lower level (e.g., a hardware
watchdog)?

I'm asking because the question has come up a few times recently about making
FreeBSD recover (unattended) in the event that an unbootable kernel is
installed.

~~~
tlb
Userland. Basically during upgrades we installed a shell script in
/etc/init.d:

    
    
      sleep 1200
      if ping -c 5 central-services.anybots.com; then
        rm <this file>
      else
        cd /home/dist && git checkout STABLE && make install
        reboot
      fi
    

(We used git to distribute binaries, so we could checkout and install a
previous version easily).

This was a last ditch recovery. I could imagine lots of things that break, but
which still allow ping. And installing a previous version over a newer version
is flaky, because a newly created file won't be deleted.

An unbootable kernel is a whole different kettle of fish. I believe GRUB can
be told to fallback to a safe mode.

~~~
cperciva
_I believe GRUB can be told to fallback to a safe mode._

I'd assume so; FreeBSD's loader can also do that. The problem was that if a
kernel hanged during boot, we wouldn't get back to the loader until someone
power cycled the system, so the "try the other kernel" logic wouldn't run.

~~~
srcmap
This will be the time where setup HW watch dog should help.

After the upgrade, before the reboot, setup the HW watch dog for 2-5 minutes.
Setup the boot-old-kernel flags as default in the uboot/EFI/grub/or your
system boot loader.

If the upgrade the successful (connect to back to the server) and function
correctly, disable the "boot-old-kernel" flags in your bootloader.

As always, do a lot of testing and automate those tests. You will be surprise
how often a simple boot/reset/power cycle can fail if you repeat it over and
over again in a weekend.

~~~
cperciva
Hmm, you can set a watchdog before rebooting? I assumed that it would be
cleared automatically.

------
twblalock
Stuff like this is why I don't own any "smart" devices.

I never think about my deadbolt unless I am opening my door. I never think
about my smoke alarm unless starts to chirp because of a low battery. I never
think about my light bulbs unless they burn out.

Any "smart" features, like notifications or a phone app, are going to make me
spend more time thinking about these products than I care to, even if they are
working perfectly. Failed OTA updates make things even worse. That door lock
is going to need to introduce an extremely compelling new feature to convince
me to spend time thinking about such a basic component of my home's
infrastructure.

~~~
notjustanymike
I think the feature is fairly compelling in this case. Smart locks allow
airbnb hosts to grant limited entry to guests without sharing their house
keys. Any frequent airbnb renter runs the risk of their keys being duped and
used for illicit access later on. While the implementation still needs work
(obviously), the core idea is sound.

~~~
mikeash
Worth noting that nearly every hotel has used smart locks for years. This is
essentially the same.

~~~
busterarm
Every hotel has on-prem staff for dealing with faulty locks and keys.

Many Airbnb owners do not.

------
raverbashing
I've worked with embedded device makers that had better ideas about building
redundancy and failsafe procedures into their products

While software mistakes do happen, it seems any moron these days thinks they
can put out an IoT product because they know how to flash a led on Arduino and
they make _no effort whatsoever_ in thinking how to make their devices have
minimum reliability

Have a user accessible USB port (from the inside) from which a signed payload
can be loaded for cases like these. Would have saved several days for their
customers and several shipping fees.

~~~
striking
You could also do A/B partition booting. Whatever partition is currently
running, the other one gets updated. When the update is complete, the system
reboots to the updated partition. If it doesn't survive the reboot (like if
the system never reaches the app and the hardware watchdog timer stops being
reset), then it reboots again, back to the first partition.

~~~
Duhck
This is the common (and correct) solution.

In this scenario, the biggest problem is updating the bootloader, which cannot
do this type of fallback.

Generally people avoid updating the bootloader (for good reason) but it is
necessary sometimes...

~~~
sliverstorm
Of course, the good thing about the bootloader is it is less likely to need
updates in the first place, and it's also likely easier to verify an update
with a high level of confidence, as its functionality is so tightly bounded.

------
OkGoDoIt
I own 2 of these smartlocks (for my theater and my speakeasy, it's nice to be
able to let performers/crew in on set schedules with SMS confirmation). They
have both been down since last week. It's been hugely annoying, especially
considering we paid over $600 per lock. They overnighted us new circuitboards
that we have to manually replace, but the timing was horrible as both me and
my partner were out of town all weekend and couldn't do the repair, so it
impacted load-in for a show on Sunday.

These locks have some nice advantages but sometimes I feel like they are way
too internet-reliant and a bit outside my control. Furthermore the company
isn't exactly the picture of professionalism (their IOS app has actually
gotten worse during the year+ I have owned the locks) and I wonder if there
will be any functionality left in my $600 locks if/when they go out of
business someday.

------
busterarm
I hate just posting because I have an axe to grind here but wifi locks for
airbnb are an incredibly stupid thing to have for all sorts of reasons.

Just get normal locks and hire a damn property manager. When I used to work
for an ISP in a resort town, our support desk would get tons of calls when
power went out (a frequent issue there) or our service went out.

It was always property owners who lived multiple states away who couldn't get
their renters access to their property. Always furious. Always without someone
local who could let someone on to the property.

~~~
sargun
What prevents someone from copying the key?

~~~
busterarm
[https://news.ycombinator.com/item?id=15012746](https://news.ycombinator.com/item?id=15012746)

------
org3432
I had a flight get delayed due to needing an OS reinstall on a 737-800, the
OTA update failed and they had to send someone out to connect to the USB port
to recover the OS. The plane was bricked essentially and couldn't take off.
Fun.

~~~
raverbashing
They don't do ota updates for planes

~~~
jlgaddis
Seems like it would save a lot of time, energy, and money if they did
(assuming they had the procedure worked out). Push out an update while it's
sitting at the gate, powered down, instead of taking it out of service and
having to bring it into a hangar (where space is already limited).

~~~
raverbashing
This is a plane, not a consumer device

Nothing gets changed unless it's for maintenance or airworthiness reasons (or
fuel economy)

Updates can be done during scheduled maintenance if there's a good reason for
it (see above) or if it's really something urgent it is done overnight if
possible or the plane gets scheduled for this specific maintenance

------
lemoncucumber
From looking at the product website, it looks like these locks have
traditional keys as well, so I would hope that nobody is locked out of their
own house while waiting for their lock to be fixed. The article didn't provide
much detail, so I was picturing people getting locked out (or worse, locked
in) to their own apartments.

What will it take to get IoT device manufacturers to adopt development
practices that prevent this kind of thing happening? It's bad enough that most
software is bug-ridden and poorly tested, but it's inexcusable for IoT
software if they ever expect IoT devices to replace traditional alternatives.

~~~
maxerickson
Probably stiff regulation.

I wonder if a certification body (like UL) could work though. Probably is it
is hard to boil "works good" down to a set of clear requirements.

~~~
jlgaddis
Agreed. I don't see any major changes happening (WRT IoT insecurity) until
there's regulation that requires it. The industry certainly isn't trying to
solve this on their own (if they are, they're hiding it very well), they're
only worried about getting to market as quickly and cheaply as possible.

I expect this to continue getting worse. At some point, hopefully, things will
change and begin to improve. I think we'll see many more -- and worse --
incidents like this before that happens, though.

------
mi100hael
Why are they even pushing out OTA updates for a deadbolt? Who designs a
deadbolt that locks/unlocks via expiring codes and thinks "we're going to need
to release so many updates for this that we should use OTA updates instead of
the customer manually applying an occasional patch."

------
falcolas
Frankly, this is why OTA updates for critical infrastructure (locks, heating,
phones, vehicles) always scares me. It's too frequent an issue that someone's
device gets bricked as a result, leaving folks with no practical remedy in
many cases.

~~~
swiley
The users should be the ones updating it, and the ones paying the ISP bill if
it gets hacked.

------
jacquesm
This is why I always look extremely carefully at companies that operate
machinery (cars, machine tools and other connected devices) and do OTA
updates. This sometimes leads to interesting discoveries and blind spots. Who
ever thought that updating software connected to the cars' buses without first
checking that the car is stationary was a good idea?...

------
ChuckMcM
Ouch. Clearly someone missed a possible outcome on their OTA update flow.
Worse, they didn't devise a way that you could locally get the device sane
enough to do another OTA. Counts as a double fault :-(

------
post_break
Lockitron locked me out of my apartment. Danalock jammed in the middle of
unlocking and made it very difficult to get inside. You know what works 24x7?
My car's door. Why is this so hard?

~~~
busterarm
Where are we going to put a car battery and the means to charge it in your
door frame?

Are you going to operate a crank to recharge the battery? I'm not too keen on
wiring mains power up to something connected to my door handle.

~~~
PhantomGremlin
_I 'm not too keen on wiring mains power up to something connected to my door
handle._

This is a solved problem:
[https://en.wikipedia.org/wiki/Isolation_transformer](https://en.wikipedia.org/wiki/Isolation_transformer)

~~~
busterarm
They can fail though (and I've seen it happen, overload, excessive moisture,
etc.,). How do you know this before you touch the doorknob and get the
(possibly final) zap of your life?

If the failure mode is potentially "you die", then we're probably better off
low tech.

------
gubby
14 days for a replacement is an atrocious amount of time.

~~~
mikeash
I can't understand it. You can mail them the guts of your lock, have them fix
it, then mail it back to you, all in a week or less. Or they can just send you
a new one, and it takes at least two weeks? Do new ones come by courier snail
from China, or is this just a scheme to discourage people from opting for a
replacement, or what?

~~~
turtlebits
It says 14 days lead time. I'm guessing they don't have the parts in stock for
500+ devices affected.

------
foobaw
I've pushed OTA updates to smartphones before: there is a rigorous
certification process that the software has to go through before reaching the
customers. We've had times where we had to push emergency OTAs soon after, but
never a time where we had to do any sort of recall due to an OTA.

I'm certain that these vendors don't have nearly the same amount of checks
compared to smartphones due to much less overhead (like we had from carriers,
Google, SoCs), but it's still astonishing how this could happen, but it's a
good lesson for them, mistakes happen and I hope they implement a system where
this is not possible to happen again.

------
mschuster91
Happens to Google Chromecasts kinda often, by a quick Google - and you can't
reflash them via USB. Same for phones, especially cheap ones tend to brick
themselves on updates, and it's a hassle to get them reflashed.

The solution is simple, though: Use U-boot, two partitions for firmware and a
tiny script using a boot counter/flag. Only after successful boot set the flag
"do not boot back to old firmware". But doing that well requires recent
versions of U-boot, and people are STILL shipping devices with 2012 or earlier
versions...

------
frgtpsswrdlame
We/the media really should have picked a word other than "smart" for these
devices.

~~~
nxsynonym
I think "connected" would be a good replacement term.

Smart implies some sort of intuitiveness or responsiveness, which is often not
the case.

~~~
duncanawoods
Not bad. Could go further and call them "dependent" to make it unambiguous
that they are not standalone devices but out of your control and potentially
worthless if the connected services close or fail.

~~~
jlgaddis
The Marketing Department would never approve using that term!

------
adrianpike
Does anyone here have experience with Mender or similar? I'm looking to do
some IoT work and over-the-air updates are still my biggest unsolved problem.
I'd love any guidance on any good tools to research & investigate.

------
sjbase
This type of headline is why large enterprises are often careful about working
with unproven 3rd parties. And consequently why the enterprise sales cycle
feels so slow and bureaucratic.

LockState makes the mistake, and AirBnB takes collateral damage.

------
stretchwithme
Tesla's doing OTA pretty robustly. Maybe there just isn't enough space in a
door lock to do it properly.

If a system can go back the last working version if the latest fails to work,
even a bad version shouldn't brick the device.

~~~
com2kid
> Maybe there just isn't enough space in a door lock to do it properly.

Doubtful, dead simple OTA with recovery requires a 2x allocation of space. $2
or so of parts covers it for 99% of embedded scenarios, and if the product
doesn't have enough storage to do a recovery OTA then the product wasn't
spec'd properly. More clever but just as robust systems can get away with less
(however much a recovery partition takes up, if doing WiFi that can be a lot).

> If a system can go back the last working version if the latest fails to
> work, even a bad version shouldn't brick the device.

Knowing when the current version has failed can be hard. If the system comes
up 99% of the way but one driver fails, that can be hard to determine
programmatically. Obviously the manufacturer here didn't have that robust of
tests running at startup to determine if there was a need to rollback.

To be fair, most software updates don't have that level of rigor. Pre-release
testing catches the majority of issues, as it should. Now days, phased
rollouts to opt-in beta testers is standard to try and catch any remaining
issues.

