
Launch HN: Memfault (YC W19) – Crashlytics for Firmware - fra
Hi everyone!<p>We&#x27;re Chris, François, and Tyler, founders of Memfault (<a href="https:&#x2F;&#x2F;memfault.com" rel="nofollow">https:&#x2F;&#x2F;memfault.com</a>). Memfault helps firmware teams find and fix issues before customers start calling (or worse, tweeting!) by providing a small &lt;3kB SDK to include in the firmware and a web dashboard to manage releases, monitor devices, and view crashes. In the software world, Crashlytics, Sentry, and other error monitoring systems have been offering similar solutions for years. Memfault is the first such solution for firmware.<p>Embedded devices today are very different from ones built 10 years ago. Then, a device would run a small piece of firmware in a while() loop, capture input, compute some logic, write to a small 7-segment display, and that was about it.<p>Today, new products have a wireless connection to the internet, a bright 320x320 full color LCD, a high quality microphone and speaker for Alexa integration, and sometimes even run machine learning or computer vision algorithms on device! Building hardware products in 2019 is a significant software project, it requires software tools.<p>The three of us met at Pebble in 2013, where we shipped 4 watches together. Chris and Tyler went on to work at Fitbit, while François went to Oculus. Each time, we found ourselves building all of our tools from scratch which slowed us down tremendously. Imagine having to build a log collection solution every time you want to build a new web app!<p>As a result of the effort required to build them, the tools available to firmware engineers are not up to the task. For example, the state of the art in debugging requires connecting a physical debugger to your board. To investigate an error report from the field, customers must be contacted, devices shipped back, and enclosures disassembled. By the time this is all done, flash logs have rolled over, variables have reset, and developers are left scraping together raw data from flash to debug the issue. It can take weeks to get to the bottom of an issue that would be root caused in minutes with reasonable tools.<p>We&#x27;ve long wanted to show people what Memfault can do without the hurdle of integrating our SDK into their code. Today, we are launching a zero code, try-it-at-your-desk version of our tool available at <a href="https:&#x2F;&#x2F;memfault.com" rel="nofollow">https:&#x2F;&#x2F;memfault.com</a> (click on the &quot;Try Memfault&quot; button&quot;). In about 5 minutes, you should be able to connect a ARM Cortex-M based development board and upload an error report using a GDB script. If you do not have a board, you&#x27;ll be able to interact with an example error report.<p>We could go in at length about the implementations (ask us questions in the comments!). One thing we&#x27;re especially proud of is the &quot;Globals &amp; Statics&quot; tab which lets you query the state of any static or global variable in your system. To get this to work, we cross compiled libdwarf to wasm via emscripten and used it to implement parts of an in-browser debugger which can be used to look up values for a known symbol given an elf file and a Memfault core file.<p>We&#x27;d love to hear what you think, and find out what other tools you&#x27;ve found helpful in this space. Looking forward to the discussion!
======
redfern314
1\. Any concerns about privacy? Even if Memfault is one-way (as you mentioned
in a different comment), that doesn't mean that important user information is
not exposed. Battery SOC and last-seen stats aren't completely harmless.

2\. Maybe this will be clearer when you release docs on the SDK - do you
provide interfaces for normal logging in addition to just crash logging?
_Ideally_ , firmware applications should never crash, but unexpected logic
states or invalid user input happen all the time.

3\. How are you expecting licensing to work? Per device? Monthly subscription
fee? Flat fee software purchase?

4\. Are your libraries ASIL or FDA certified to allow use in the automotive or
medical industries? What are the reliability/safety implications of wrapping
your main binary in Memfault's monitoring interface?

~~~
fra
> Any concerns about privacy? Even if Memfault is one-way (as you mentioned in
> a different comment), that doesn't mean that important user information is
> not exposed. Battery SOC and last-seen stats aren't completely harmless.

Yes - some of the data is sensitive. We encrypt the data, use an aggressive
expiry policy (2 weeks by default), and work with our customers to limit PII.
Memfault does not know who the end user of the device is.

> Maybe this will be clearer when you release docs on the SDK - do you provide
> interfaces for normal logging in addition to just crash logging? Ideally,
> firmware applications should never crash, but unexpected logic states or
> invalid user input happen all the time.

Currently, we provide APIs for data logging ('telemetry') and error logging.
Note that errors do not have to be crashes. You can send Memfault a trace for
user defined issues (e.g. "bluetooth failed to connect") or even no issue at
all.

> How are you expecting licensing to work? Per device? Monthly subscription
> fee? Flat fee software purchase?

It's a monthly subscription fee (not per device).

> Are your libraries ASIL or FDA certified to allow use in the automotive or
> medical industries? What are the reliability/safety implications of wrapping
> your main binary in Memfault's monitoring interface?

We are not currently certified, but this is something we know we'll have to
do. Our error reporting only runs when an error is encountered, not during
normal operation. Our telemetry collection can run on a timer, and a bug in
our code there could impact your device.

------
borgel
Hi! I write firmware professionally and this looks pretty amazing. I've
already signed up to play with the tool and submitted a demo request :)

That being said, the thing I'm most interested in here is how to integrate
Memfault with my codebase, and that's the only thing I can't figure out! Your
docs pages are quite pretty, but don't include the interesting bits! Clicking
thought into the demo doesn't really help.

Any chance you'd consider publishing that to the site?

~~~
tyhoff
Thanks for the kind words, and looking forward to the demo! You are right, our
documentation leaves a lot to be desired. We are working on it!

You can find some of the more interesting bits at
[https://github.com/memfault/memfault-firmware-
sdk](https://github.com/memfault/memfault-firmware-sdk), which is our public
facing firmware SDK. This gives a rough idea at the steps necessary to
implement the coredump features of Memfault.

~~~
borgel
Great, thanks!

(Before digging into the SDK to see if this exists), is there any chance you'd
support some form of "custom transport"? In the system I'm working on the
micro is only connected to the network via a single board computer through
which I'd need to shim the Memfault connection.

~~~
fra
Yes, we can accommodate a custom transport. In a way, this is just a general
application of what we do for the BLE transport: break up data structures into
fixed sized packets, send them over the link, reassemble on the other side.

------
ccamrobertson
Congratulations! We rebuilt our diagnostics and firmware updating tools
multiple times over the years for Lockitron, it was a massive pain each time.

How do you handle log caching and retrieval for offline devices (i.e
Bluetooth)?

~~~
fra
We have SDKs for iOS, Androids, and other gateway devices to push logs up to
the cloud over bluetooth.

------
inglor
This looka great and like it solves a big real problem people have. Congrats
on doing this!

How do you deal with security? IoT devices are infamous and having one with a
debugger open to the world terrifies me.

~~~
fra
Currently, Memfault is one-way so it is not quite like having JTAG access to
your device from the cloud.

But it still needs to be secure, and we typically encrypt all data going from
the device to the cloud (some devices, sadly, do not have the ability to do
the encryption).

Edit: removed double negative.

~~~
redfern314
How can it be one-way if you can push new releases to the device?

~~~
tyhoff
Rather than "pushing" releases and data to each device, the devices query for
the URL to the latest firmware (if any).

~~~
dwndwn
This still isn't one-way, what protocol parsers are you implementing in
firmware to do this?

~~~
tyhoff
It's up to the firmware or customer architecture to decide that. Many
companies in the industry use an S3 bucket to publish firmware binaries to
their devices, and these binaries are read by hubs, mobile applications,
connected linux boxes, and yes, sometimes firmware devices themselves.
Memfault provides a couple of layers on top of S3, allowing the customer to
group devices into cohorts and do staged roll-outs.

------
tehlike
Add experimentation, canarying, rollbacks and monitoring (any metrics
relevant) and you have a winner.

~~~
fra
Hah! Thanks for the suggestions. We're determined to do all of those over
time.

~~~
tehlike
It is an area i am somewhat working on and very experienced in, and would be
happy to provide anectodes and ideas any time.

myhnusername@gmail.com

~~~
fra
Edit: emailing you now.

That would be great! Send me an email: francois at memfault. I'd be thrilled
to chat (or grab a coffee if you're in the bay).

------
ryanworl
What data warehouse or other analytics system are you storing the data in?

~~~
fra
We took a page from Heap's book and ultimately store the data in Postgres
databases.

~~~
ryanworl
That's probably a good idea to get to a usable product. You may want to
investigate a proper data warehouse if your workload primarily consists of
large scans and aggregations, such as if you offer a user-facing dashboard
which can generate arbitrary queries.

Does your data have a fixed structure, or can customers send essentially
whatever they want and you have to deal with it by e.g. storing a JSON blob in
each event?

~~~
fra
Do you have recommendations for data warehousing? Our data does have a fixed
structure at the moment.

~~~
ryanworl
BigQuery and Snowflake are the two managed services I'd recommend today if
you'd like good performance and cost-effective storage. They both separate
compute from storage so that your cold data isn't sitting on expensive SSDs
like your Postgres instances are probably using.

They're both also significantly faster than Postgres at large scans and
aggregations.

Snowflake is the most interesting to me because they offer a semi-structured
data type called VARIANT which efficiently encodes semi-structured data in a
column-wise format while losing only a tiny bit of performance compared to a
fixed schema. This could let your customers send semi-structured or variable
size data (like arrays or maps with arbitrary keys) and still keep your
dashboards fast.

If you'd like to chat more, I just requested to connect with you on LinkedIn.

~~~
wikibob
BigQuery is /terrific/, for in-house analytics. It would very likely not be
appropriate for backing a SaaS, at $5 per terabyte scanned.

I would suggest the OP is just fine with Postgres for awhile. They can shard
it when needed.

Then eventually they can either get more sophisticated with Postgres sharding,
or move to something like TiDB, clickhouse, or another event store.

------
cushychicken
Ha, I've been waiting to see this pop up on HN after seeing all the blog posts
on /r/embedded.

Best of luck with the launch!

~~~
fra
Thanks! Hope you've enjoyed reading Interrupt, let me know if there's a topic
you'd like us to write about.

~~~
cushychicken
The bootloader and linker script pieces were both quite good, I thought.

Would love one on bootloader/firmware updates.

------
kaycebasques
One of those "obvious in retrospect" ideas. The IoT startup I worked at
probably would have used this.

------
fest
I did browse the site, but couldn't find in what way exactly the mechanism
telemetry/faults get passed from microcontroller to their backend.

For the last products I have touched, this would probably be the toughest
part- abstracting/reimplementing whatever mechanism the device is already
using to communicate with something that may have internet access (USB, UART,
Bluetooth, LoRa) and tying in that end (mobile/desktop/connected device).

~~~
fra
The connectivity story is indeed one of the major complications.

Here's a high level overview on how we deal with it: let's say you have a
device connected via UART to a Linux box with WiFi.

1\. When an error occurs, the Memfault library collects all the needed
information and saves a packed error_report_t in a circular buffer in non-
volatile storage (say, flash).

2\. When connectivity is available, your code calls our SDK and says "hey, can
you give us N bytes of data to send". If Memfault has data in the circular
buffer, it returns a chunk of N bytes. Otherwise it returns false.

3\. You use your transport to send the N bytes packet to the Linux system.

4\. Your code on the Linux box calls the Memfault Gateway SDK to tell it
you've received N bytes of Memfault code.

5\. The Memfault Gateway SDK recombines packets into error_report_t and HTTP
POST-s them to our backend.

Does that make sense? Happy to talk about it in more details.

~~~
fest
Makes perfect sense. I would assume you also have to re-implement the routines
writing the registers/memory to the non-volatile device, as you can't rely on
peripheral registers being consistent. Ideally the host should have
independent access to the said volatile memory, but that's getting close to
implementing a debugger on host which uses JTAG/SWD to inspect the state of
MCU after a crash.

------
qntty
Any plans to support x86 CPUs?

~~~
lostgame
I'm curious as well why you'd be using x86 processors in embedded devices.
Care to share?

~~~
willangley
(Also not the parent) I've seen this in robotics; a lot of code in that space
has been tested for a long time on x86 computers [^1], and only more recently
been looked at on ARM.

I think this is also common in industrial embedded systems, since I
periodically see ads to buy hardware for them [^2]. I'm not entirely sure why
:).

[^1]:
[http://wiki.ros.org/Robots/TurtleBot/Robot%20Setup](http://wiki.ros.org/Robots/TurtleBot/Robot%20Setup)
[^2]:
[https://www.logicsupply.com/computers/nuc/](https://www.logicsupply.com/computers/nuc/)

------
ggregoire
> In the software world, Crashlytics, Sentry, and other error monitoring
> systems have been offering similar solutions for years. Memfault is the
> first such solution for firmware.

Just curious, why can't Sentry be used in a firmware? (I don't do firmware
dev)

~~~
fra
There are a few concepts that do not map neatly:

1\. For firmware, each user is on their own hardware. Rather than a session
you need to track a device and the state thereof. Devices exist for a longer
period of time than sessions do, and you need to have a concept of "device
history".

2\. Sentry assumes backtraces can be generated on the client side, which is
impractical for firwmare

3\. Our focus on embedded allows us to run some more specific analysis. For
example we automatically detect if your MPU (memory protection unit) is
misconfigured on an ARM chip.

~~~
btashton
I built a translator service between my devices and Sentry to take the memory
dump and a known symbol file to rebuild the context serverside and send it
back to Sentry. I only used this during development and early testing, but it
was super useful.

At one point I thought about building this out more, so I'm glad to see
someone is taking this on more seriously.

~~~
tyhoff
We are taking it very seriously ;)

Happy to hear that an you thought to build this translation service even for
early development! It's usually an after thought at the companies we've talked
to, and an expensive one too.

------
bibabaloo
Do you have any plans to support embedded Linux IoT devices? Or are you
strictly focussing on tiny devices running without an OS or with a RTOS ?

~~~
fra
We plan to support embedded Linux in the future.

------
rkul
Any plans on adding Qualcomm (Bluetooth) chips?

~~~
fra
Do you mean the CSR family? We don't currently support the XAP architecture,
but it is technically doable and we would do it with the right partner. Get in
touch if you want to chat!

------
person_of_color
Are you hiring?

~~~
fra
We've just about hired everybody we need for the time being, but we're open to
meeting folks who would bring a lot to the team. Send me a note! francois - at
- memfault

