
Ask HN: Debugging serverless systems? - hackerews
We switched a number of core processes to AWS Lambda and my team is now spending WAY more time debugging.<p>Our strategy has been to add more logging and metrics to better find root causes of issues - but that&#x27;s getting unruly.<p>How are you debugging? What dev tools are you using understand what&#x27;s happening within your distributed system? If you&#x27;re logging...<p>- Where does your state go?<p>- Are you tracking startup latency?<p>- Are you tracking limits in memory, concurrency, execution duration?<p>Super interested in what others are doing for this, as well as other problems like testing, deployment, discovery.
======
laurentl
A few pointers from our own experience:

\- centralized logs (we tried an ELK stack and have moved to Datadog since
they added logging). Using a correlation ID helps tracking the flow as it
crosses Lambda / services boundaries

\- using datadog also gives us metrics and dashboards for free, although
there’s not always the data you’d want

\- we started to experiment with X-Ray to track start-up time and how much
time is spent where. I’d definitely advise you to try it if you’re tracking
down performance issues. It’s a bit of a pain to get working though

\- testing : as described in Yubl’s road to serverless (link in another
comment), we have a switch to call the code locally or remotely through
whichever service triggers the Lambda. This usually insures that the logic is
sound before deploying and that remote bugs are mostly linked to integration
or rights issues

\- deployment : we rolled our own with Ansible and CloudFormation / SAM but if
you fit in the Serverless use cases you should probably try that first

\- discovery : we use SSM parameter store as a distributed key/value DB and a
poor’s man discovery service: if we want to reach a given lambda or service we
look up it’s name or arn in SSM PS.

I’m in the process of writing a post (or more likely a series) on our
experience and will post to HN when ready

Edit: also, decoupling. If your Lambdas are calling each other directly,
consider putting a queue or SNS topic in between. Makes it easier to test each
unit independently, can manage timeout / retry issues on your behalf, and
gives you a convenient observation point for inter-service traffic

~~~
shoo
> Edit: also, decoupling. If your Lambdas are calling each other directly,
> consider putting a queue or SNS topic in between. Makes it easier to test
> each unit independently, can manage timeout / retry issues on your behalf,
> and gives you a convenient observation point for inter-service traffic

i recall someone giving a talk (offline at a functional programming meetup?)
about something like this. it wasn't in the context of AWS services, or
serverless, and probably wasn't even in the context of a distributed system
--- but the crux of the suggestion was that they re-architected some system to
jam queues between everything, which gave them great traceability and the
ability to do things like capture messages then replay them later or what have
you. once they had the queues in place they realised it would be trivial to
add a bunch of other features leveraging the queues (such as "undo", clearly
this would depend upon what side effects your system has -- this was a
functional programming meetup so perhaps their system didnt have many/any).
might be good for enabling recording of execution for export and playback in a
debugging environment or for construction of automated regression tests.

------
jrowley
I use SAM CLI so I can run my code locally in a realistic environment. So you
can skip the build/zip/upload/deploy process.

[https://github.com/awslabs/aws-sam-cli](https://github.com/awslabs/aws-sam-
cli)

~~~
hackerews
Thanks, haven't tried this yet.

------
shoo
Yubl's road to serverless has a section on testing:

[https://hackernoon.com/yubls-road-to-serverless-
part-2-testi...](https://hackernoon.com/yubls-road-to-serverless-
part-2-testing-and-ci-cd-72b2e583fe64)

------
itielshwartz
Hi feel free to check this product (like you debug lambda in production easily
and fast): [https://techcrunch.com/2018/06/04/rookout-releases-
serveless...](https://techcrunch.com/2018/06/04/rookout-releases-serveless-
debugging-tool-for-aws-lambda/)

full disclosure I work for Rookout - feel free to ask questions

------
jppope
We have a similar problem, and I'm really interested in the answer to this
question... our solutions so far are Testing and monitoring (doesn't make it
any easier)

------
shoo
Out of curiosity, what was the driver for switching to serverless in the first
place?

