
Writing a cron job microservice with Serverless and AWS Lambda - gkoberger
https://blog.readme.io/writing-a-cron-job-microservice-with-serverless-and-aws-lambda/
======
mbrumlow
I have written this reply and started over 3 times -- hopefully this one
sticks.

I really feel like the author boxed himself into a solution by faulty
reasoning.

1) Cron jobs are not hard to set up. Being able to control these things on one
or more server is just part of proper server deployment.

2) If it is not part of the application but seems to have to do with
information about users who used the application in trial then maybe it should
be part of the application? If you are deploying a application and you require
to know information about users who trials are going to end I sure would bet
that is part of the application and should be a trigger coded into the
application itself.

3) I am shocked that you are already did not have a central spot for running
time interval code and maintenance on your systems. At a minimum there should
have been backups and reports generated on your mongodb instances. If these
functions were already built into the application then then the same generic
task system you used to run that is likely where you wanted to run this new
functionality.

tl;dr Not knowing much about the application or its architecture I feel like
the author painted themselves into a corner to justify using some AWS tools.

~~~
jasonmp85
Having just written a bunch of scripts to populate a Google Sheet with some
KPI data for the business folks, I _completely_ agree.

At first I wanted to put my scripts in Lambda, put an API Gateway in front,
and have them be callable by the Google Sheet on a schedule. This seemed
"pure" because I didn't really _need_ independent resources to do the task.
The sheet could just update itself using an on-demand server!

Then I realized there was a 30-second max execution time. My scripts, in
serial, couldn't hit that. I'd need to rewrite them all to be parallel. Also,
they were a bunch of bash/jq/curl I'd just thrown together. I'd have to write
a Python wrapper to shell out and handle streams (ugh).

At this point, I gave up and fell back to the advice of an (ex-Heroku)
coworker: why not just run them in a cron job and put the data in a Heroku DB
and use Heroku Dataclips to update the Google Sheet.

Sure I have to "have" an EC2 instance and a Heroku PG box, but they're both so
small they're either free or essentially so. After cloning my script repos,
everything worked right away.

Sometimes it's better just to do the "inelegant" thing and move on.

~~~
CWuestefeld
_Then I realized there was a 30-second max execution time._

Perhaps you're talking about a limitation on your spreadsheet, but the max
execution time for a Lambda function is configurable up to 300 seconds.

~~~
jasonmp85
I believe it may have been a limitation of the API Gateway + Lambda thing.
Just checked the docs again and it says there is an "Integration Timeout" as
follows:

> 30 seconds for all integration types, including Lambda, Lambda proxy, HTTP,
> HTTP proxy, and AWS integrations.

So not Lambda alone, but in conjunction with the API Gateway.

------
cagataygurturk
I believe Amazon added scheduled execution because of high demand from
customers who never understood why Lambdas are there. I saw also many people
complaining about lacking SQS support of Lambda. It is totally normal that
Lambda's are not supporting SQS because SQS is a pull model, while Lambda is
designed differently. Very very probably, AWS will also add SQS support but it
will mean that they preferred non-sense user complaints over their design.

Lambda functions are there to process a single message and produce output. It
is a code piece that should be invoked responding to events (A new file on S3
bucket, a new record on DynamoDB, a HTTP request coming from API Gateway etc.)
If you are designing a system where you execute Lambda function on its own
without any meaninful event and get it pull data from an external source upon
execution, you are designing your system wrong. Let us assume that this
"cronjob" works every day at 10 pm and that day there was more than 5000 users
whose trials end. Lambda has an execution limit of 5 minutes. In 5 minutes you
are likely to fail to process 5000 users and Lambda execution will be
interrupted. And what then? You are not designing a scalable system.

The correct approach would be to book a future Lambda execution (per user)
when you register the user. A single Lambda function per user would then
execute exactly when the trial ends for the user. Also this Lambda function
would receive all the data it needs for its operation, so it would not need to
connect to MongoDB to fetch user information. This can be probably done by
SWF.

~~~
gamache
Unless you designed Lambda, this is an arrogant tone.

Sure, some of the things you mention are "better" or at least otherwise
accomplished using event-triggered Lambdas. But there's plenty of reasons why
you might want to run a job that takes under 5 minutes on a schedule.

For instance, daily or weekly "digest" emails. Or a nightly stats job, where
expensive-to-calculate yet not-immediately-critical queries are performed and
the results stored or exported. These are things where you want them to happen
at a certain time. Cron does that.

It's possible that customers didn't understand why _AWS_ thought Lambdas were
there, but that's irrelevant. Suddenly, Lambdas were there, and people thought
of more use cases (SQS triggering, connection to API Gateway, etc) than AWS
had implemented from the start. That's not the customers' fault for being
creative.

~~~
abrookewood
Yep, I agree. If you rely on Cron/SCheduled Tasks, then you end up needing to
run a server to run them. If Lambda did scheduled tasks, I'd have 1-2 boxes
less in each environment ...

~~~
lostcolony
Not sure based on your wording, but you -can- execute scheduled tasks with
Lambda. It's a little tricky, as the actual configuration takes place in
Cloudwatch (Cloudwatch Events). You also will be sure to want to set up a dead
letter queue, and have that broadcast to an SNS topic that will email you, so
that in the event the lambda fails, you'll be alerted.

~~~
abrookewood
I've read about external services that will hit a SQS queue for you, but it
would be a lot nicer if it was just supported natively in Lambda.

~~~
lostcolony
No, like, [http://docs.aws.amazon.com/lambda/latest/dg/with-
scheduled-e...](http://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-
events.html) (and linked from that page,
[http://docs.aws.amazon.com/lambda/latest/dg/tutorial-
schedul...](http://docs.aws.amazon.com/lambda/latest/dg/tutorial-scheduled-
events-schedule-expressions.html) )

You can just set up "here's my cron string, run this function", and it will
run that function when applicable. No external services, no SQS. Just a
Cloudwatch event, that executes your lambda, at a set schedule.

~~~
abrookewood
Perfect! Thanks. I'm going to retire some boxes ;)

------
pavanred
Here is a similar solution but with an AWS Lambda function in Python and using
a cloudwatch event to trigger the lambda function on a schedule. [0]

[0] [https://kerneltrick.in/aws-lambda-cronjob-
schedule/](https://kerneltrick.in/aws-lambda-cronjob-schedule/)

------
scaryclam
I'm confused. Is this a microservice as a cron replacement or is this just
running a script with AWS lambda? If it's the latter, it's a bit of a stretch
to call this a microservice, or a cron job for that matter.

It seems like it might be a nice post about using serverless, but the whole
cron + microservice bit is muddying the water for me :/

Edit: spelling

------
pokeymcsnatch
Sure, cron is easy to use, but it also involves maintaining a sever to some
degree. I use a Lambda "cronjob" to run health checks on various other
servers/services- I don't have to worry about healthchecking my healthchecking
machine.

------
philip1209
Cron is hard. I think it's better to just build in good locks and assume it
will fail occasionally, and on failure repeat.

This approach works on a small scale, but it seems like the logic should be at
the application layer. Secrets, API keys, ACL, etc all need to be duplicated
on Lambda. Developers can't easily run it locally. Testing (eg integration
testing on database migrations) is separate from the core app. Build promotion
/ rollbacks are different than the core application. Error tracking / logging
may be different. Monitoring will be separate.

Seems like a cool demo, but I think that running the code will have more
overhead than writing it.

~~~
wheelerwj
so, i'm not a very good engineer so maybe this is a stupid question... but i
am going to ask it anyways.

Cron _is_ hard. Not in the sense that its challenging to set up but its a
challenge to maintain and have visibility into. It seems like you need a whole
set of infrastructure just to feel confident that things are running as
intended.

This has always lead me down the path of building a task manager type of
micro-service running on an independent server to manage everything. But by
the time im done, it feels like over kill. I just don't know what to do
differently. Am I missing something?

~~~
chrisp_dc
Sounds like you might be better off using Jenkins than roll your own. I use
Jenkins over cron when I want easier auditing & authentication. I got the idea
to use Jenkins as a 1-size fits all hammer from
[https://www.cloudbees.com/blog/drop-cron-use-jenkins-
instead...](https://www.cloudbees.com/blog/drop-cron-use-jenkins-instead-
scheduled-jobs)

~~~
mickael-kerjean
Great to see I'm not the only one using Jenkins to look after crown jobs!

However, it's to me way more than simply auditing & auth. The main advantage I
found with Jenkins are:

\- you get a central place for all the cron jobs. That comes with all the
advantages the most important one being nobody has to remember on which server
is this freaking script

\- making an update can easily be done

\- running the task can even be done by non technical people.

\- easy backup

\- when a script become deprecated you can easily remove it and you're not
very likely to let it there running for nothing forever

\- easy documentation for you tasks

There's probably more but Jenkins is definitely awesome for cron jobs

------
ak217
For anyone interested in a lightweight approach, I've built the following
package using AWS Chalice:

[https://github.com/kislyuk/domovoi](https://github.com/kislyuk/domovoi)

It specializes in plugging Chalice into all the various event sources that can
trigger Lambda, including the AWS Events API's "cron" functionality
highlighted here.

------
ufmace
Was actually just working on a Lambda function myself when I read this, though
for something a good bit more complex.

My reaction to this particular case is more like - yeah, I guess you could do
that, but it seems a little odd. I could see it making sense depending on what
your infrastructure is like. Might be handy if you're heavily into AWS and
need to coordinate AWS actions with appropriate IAM roles and VPC access and
such things. The auto Cloudwatch logging can be handy if you don't have your
own logging set up already. But on the other hand, who's running something
like a business that doesn't have better logging already set up, and better
infrastructure for running infrequent tasks, or at least a few servers lying
around somewhere that can run some modest Cron jobs in addition to whatever
they're already doing?

------
flagZ
I have been thinking about how to approach this in several projects, and while
scheduled lambda calls do the job, besides being AWS specific, do not
integrate well with the "old way": launching shell commands.

The beauty of shell commands to me was that you can launch them both with cron
and manually. Arguably you could do the same with lambda, but here we go,
another service to set up (api gateway).

I built [http://croningen.io](http://croningen.io) \- which is a hosted
version of cron that schedule jobs on clusters of servers, with central error
reporting. It is in my opinion as easy to use as cron, with most annoyances
removed. Early days, but feedback welcome!

------
bostand
> Cron jobs are easy to write, but difficult to setup

Please explain to me, how is cron hard to set up?

(slightly of topic: if you didn't learn cron don't worry. It is just a matter
of time before it is replaced with some systemd service that only works half
the time)

~~~
jcrites
I'm not the author and didn't write that line, but I can speculate about the
kinds of concerns he's imagining, looking at the situation from a
DevOps/automation-focused point of view:

How do you set up cron in such a way that your cron job runs on a machine
somewhere, and will continue to do so for a long period of time?

Sure, you can log into a machine and edit the crontab manually, but what will
happen if that machine fails? Ka-blam, it suffered a hardware failure and is
gone. Do you repeat your manual edits on another machine? (If someone _else_
did this three years ago and is no longer with the organization, does anyone
know what edits to reproduce?)

OK so we need to build a mechanism that can ensure that at least one machine
is running, and has this crontab installed on it. If the currently active
machine fails, it needs to replace the machine and reinstall the crontab and
software that the crontab runs. You need to monitor to detect when this
happens, to kick that off, and you need to test the stack to be adequately
sure it's going to work when it really does happen. There's infrastructure you
can use to do that, of course, but it's all complexity to be mastered.

With the approach the author's describing, one only needs to define
configuration: what code to run on what schedule. It's declarative, and the
infrastructure handles actually executing it on a machine somewhere,
completely encapsulating the concerns related to "getting some hardware
running" and "installing the code".

~~~
bostand
> How do you set up cron in such a way that your cron job runs on a machine
> somewhere, and will continue to do so for a long period of time?

Use supervisord? Or (god forgive me) systemd services?

~~~
jcrites
Cron itself is a system service and is not something I'd hope you would need
to run under supervisord.

Some of the challenges that I mentioned, though, are not just arranging the
configuration that you want to have on the machine (whether cron or
supervisord configuration), but getting the configuration onto the machine and
ensuring it stays up and working long term. When looking at the problem with a
long time horizon and larger system context, one expects any machine to fail
and need to be replaced, so handling that is important. With this perspective,
it doesn't matter how reliable your process supervisor is: the code can be
written perfectly and the application will still fail to do its job if the
machine it's running on halts! I don't want to have to care if my machines
suffer a hardware failure; I just want a replacement to be brought online
seamlessly.

Sometimes in cases like this, one needs a whole lot of infrastructure to solve
what feels like it ought to be a simple problem. In larger system contexts it
can be quite inconvenient to invoke the full complexity needed for reliable
servers just to run a job periodically.

Once you start to tie the satisfaction of any business requirement to the
successful completion of a cron job, lots of problems crop up. Even after you
set that configuration up, there are still tricky issues to consider like:
what if the host running the cron job fails at _just_ the wrong time, like at
11:59 when the job starts at 12:00? Is it possible the job might not run at
all today even if the host is replaced promptly? Is that OK, and if not what
do we do about it?

If you have many different cron jobs with slightly different security contexts
and permission, then do you launch a separate machine for each one? Do you set
them up to share the same server? How will you know if, a few years from now,
that server doesn't have enough power to run all the jobs fast enough? Will
you notice if it's gradually slowing down? These are some of the issues I'd
consider if I was tackling a business problem that required doing something on
a regular schedule.

Serverless-style infrastructure takes away a lot of these concerns. There's
nothing to host or maintain yourself; no servers needed at all: just the
definition of the code you want to run, how frequently you want to run it, and
what resources (if any) it needs access to. A serverless task that runs once
today will continue to run indefinitely, and it doesn't cost more to set up
than the naive crontab approach.

------
patrickg_zill
"I'm too elite to use cron and bash scripts. If it doesn't require npm to
install and an AWS account, I don't bother."

Seriously, it takes all of 10 minutes of reading the cron and related man
pages to figure this out. (and your Lambda code needed the exact same syntax
for cron! )

And you already have a MongoDB server running, so it is not really any
additional server config time to do it.

~~~
bostand
Yeah, I find that hilarious. Cron is probably over of the simplest unix-y
things you will ever encounter.

I am also getting tired of this "npm all things" mentality. I don't want to
install npm and a ton of npm packages to do something that has nothing to do
with web development and java script.

------
Splendor
This seems fine for a single job but if you have dozens to manage it would
become unwieldy. I'm excited to see the ins and outs of AWS Glue[0]. Hopefully
that will make projects like this more manageable.

[0]: [https://aws.amazon.com/glue/](https://aws.amazon.com/glue/)

~~~
kafkaesq
_This seems fine for a single job but if you have dozens to manage it would
become unwieldy._

Just like with good old... cron jobs. One wonders what's really new under the
sun, here.

We even have to put "use strict" up at the top, like in the Perl days!

------
_nato_
A slight variation I'd like to plug for _random_ scheduled jobs (3rd party) :
[http://ikura.co](http://ikura.co)

------
mankash666
This maybe a silly/newbie question. How is the function getting triggered at
10 AM? Who/what in awe is triggering it?

~~~
teej
According to the documentation for Serverless
([https://serverless.com/framework/docs/providers/aws/events/s...](https://serverless.com/framework/docs/providers/aws/events/schedule/))
it is using AWS scheduled CloudWatch events
([http://docs.aws.amazon.com/AmazonCloudWatch/latest/events/Sc...](http://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html)).

------
m23khan
why not use AWS Maintenance windows for this?

