We're taking this serverless thing too far. We're coming up with elaborate overcomplicated schemes to accomplish simple tasks just so we can say "we're not running a server for this".
Instead of doing the simple thing and running a server process that periodically checks some queue for tasks to execute, this "scheme" involves triggering a lambda function every minute, to do some IO operations on a distributed file store where the execution date metadata is embedded in the filenames to see if there is some task to execute. If there is, some more distributed IO operations are required to move the file to another part of the distributed file storage, where some listener will notice a new file was added and trigger the actual lambda function with the task-specific code...
But hey, you're not running any servers yourself...
Don’t blame Serverless. Blame it on a poor implementation. AWS Cloudwatch supports cron rules that include the year that it should be scheduled and it can trigger lambdas directly. There was no reason he had to over complicate this.
As far as using a queue, lambda also supports triggering based on a queue.
Not saying it's right or wrong solution, but saying that a simpler solution would be to take a server and put on it a job scheduler process is misleading. People shouldn't forget that it never ends with a single server and a single process, you have to manage it to make it production ready and fault tolerant.
Sometimes doing the extra step to make it serverless compatible is worth the effort.
Like Amazon's new SQS Lambda trigger you can then schedule a serverless function that's triggered by new items in the queue to have arbitrary scheduling of tasks.
Using their Python API it's pretty nice:
from datetime import datetime, timedelta
from azure.servicebus import ServiceBusService, Message
sbs = ServiceBusService('task-queue', shared_access_key_name=key_name, shared_access_key_value=key_value)
d = datetime.utcnow() + timedelta(minutes=1)
task = {"some": "object"}
sbs.send_queue_message(
"task-queue",
Message(
task,
broker_properties={'ScheduledEnqueueTimeUtc': d}
)
)
It's worth noting that there is currently a gotcha with S3 Event Notifications such that they are not guaranteed. As a result, you may end up missing out on events.
S3 also doesn't provide a linearizable consistency model or even a vague approximation of one. You can't rely on the events you try to schedule happening in the order you try to schedule them in, or even happening at all.
This seems overcomplicated compared to using a regular timed event to trigger a lambda and having it decide what to execute conditionally.
I wouldn’t trust S3 events to lambda. Sure lambda supports retries and a dead letter queue but you can’t reprocess the data.
A much more resilient approach would be:
S3 event -> SNS Topic -> SQS Queue -> lambda.
and set up a dead letter queue for the SQS queue.
It doesn’t help with the reliability of S3 events (and I’ve never seen that happen), but it does help if their is an error running your lambda.
Move the S3 object after processing it. As long as you move it to a bucket in the same region, there aren’t any charges.
Then if you are really paranoid, you can have a timed lambda that checks the source S3 bucket periodically and manually sends SNS messages to the same topic to force processing.
If only; this is AWS. You pretty much need premium support just to tell them their products are broken. Before someone says "forums", the forums are a joke.
This isn’t using S3 as a scheduler. Cloudwatch already supports cron and rate expressions, as the post alluded to. This is a hack to schedule something once.
All Cloudwatch would have to do is implement a recurrence = 1 feature but I’m guessing it's not a common enough use case.
You would have to delete the rule. In theory, it’s just like deleting an item from a queue once it has been processed. Another advantage is that you could look in the console to see which overdue rules haven’t been deleted.
And if you really need more than 50 rules, it’s just a soft limit. You send a request to support and they will raise it.
CloudWatch Events already supports triggering Lambda functions on a cron schedule. You don't even need S3. This seems like a ton of work to basically invoke a function every n minutes.
If you want a reoccurring task then you are right, but if you want to run a one time task in a specific time in the future depending on internal logic, then it won't work
Dynamodb can also be used as a one-time scheduler and I will say it will look also a lot simpler than this example.
In dynamodb you can set TTL for a record. When the record expires it will fire event to your designated lambda. That's it. You simply write a record, wait for the record to expire and you get notified on your lambda.
Correct me if I'm wrong, but s3 unlike dynamodb does not require a specific region, therfore it's more fault tolerance or it requires more work on dynamodb to achieve same level of stability
And as with everything overly flexible, people will abuse it and build over-engineered solutions on top of it, when simpler solution is already present, but possibly not obvious or poorly documented.
If I interviewed someone who explained that as a solution, it would definitely not help their case. The worse hire is someone who overcomplicates projects. It leads to “negative work”.
Instead of doing the simple thing and running a server process that periodically checks some queue for tasks to execute, this "scheme" involves triggering a lambda function every minute, to do some IO operations on a distributed file store where the execution date metadata is embedded in the filenames to see if there is some task to execute. If there is, some more distributed IO operations are required to move the file to another part of the distributed file storage, where some listener will notice a new file was added and trigger the actual lambda function with the task-specific code...
But hey, you're not running any servers yourself...