It's just so fragmented. Some parts are actually almost great (Lambda) while others are downright awful. Batch is the worst, and it's been like this for years. As soon as you go over a couple of hundred jobs per day, it becomes unmanageable quickly.
You still have to do some trickery with the CLI too. Let's say I want to get all logs from failed Batch jobs in the past day? This involves:
* Listing the Jobs (possibly paginated)
* Parsing out the log stream names from JSON (oh, and separate logs for separate attempts)
* Iterate through log streams and query Cloudwatch (each paginated)
* Parse JSON
I am sure we're all writing half-baked wrappers for our individual use-cases, I am surprised no one's published something generally useful for stuff like this.
Whereas with Kubernetes, that's all a single call with kubectl...
Don't get me wrong, we wouldn't be on AWS if it didn't make sense and they have been pushing development forward a lot. But it's unfortunately fragmented.
The only way to stay sane here is to use Terraform. That way you can stay out of it at least for creation and modification of resources and will have an easier time should you want to migrate.
EDIT: Another great example from Batch: Let's say you have a job that you want to run again, either a retry or changing some parameters.
AWS Console:
* Find job in question (annoying pagination through client-side pagination where refresh puts you back on page 1).
* Click Clone Job
* Perform any changes. (Changing certain fields will reset the command, so make sure you stash that away prior to changing)
* Click Submit
* Job ends up in FAILED state with an ArgumentError because commands can not be over a certain length.
Turns out that the UI will split arguments up, sometimes more than doubling the length of a string, and there's nothing you can do about it except resort to CLI or split it up into smaller jobs if you have that option.
CLI:
* Get job details
* Parse JSON and reconstruct job creation command
* Post
It baffles me how container fields and parameters differ from what you can GET and what you can POST; you really need to parse the job down, and reconstruct the create job request.
I completely understand that it will be like this when services launch. But it's been years now.
You still have to do some trickery with the CLI too. Let's say I want to get all logs from failed Batch jobs in the past day? This involves:
* Listing the Jobs (possibly paginated)
* Parsing out the log stream names from JSON (oh, and separate logs for separate attempts)
* Iterate through log streams and query Cloudwatch (each paginated)
* Parse JSON
I am sure we're all writing half-baked wrappers for our individual use-cases, I am surprised no one's published something generally useful for stuff like this.
Whereas with Kubernetes, that's all a single call with kubectl...
Don't get me wrong, we wouldn't be on AWS if it didn't make sense and they have been pushing development forward a lot. But it's unfortunately fragmented.
The only way to stay sane here is to use Terraform. That way you can stay out of it at least for creation and modification of resources and will have an easier time should you want to migrate.
EDIT: Another great example from Batch: Let's say you have a job that you want to run again, either a retry or changing some parameters.
AWS Console:
* Find job in question (annoying pagination through client-side pagination where refresh puts you back on page 1).
* Click Clone Job
* Perform any changes. (Changing certain fields will reset the command, so make sure you stash that away prior to changing)
* Click Submit
* Job ends up in FAILED state with an ArgumentError because commands can not be over a certain length.
Turns out that the UI will split arguments up, sometimes more than doubling the length of a string, and there's nothing you can do about it except resort to CLI or split it up into smaller jobs if you have that option.
CLI:
* Get job details
* Parse JSON and reconstruct job creation command
* Post
It baffles me how container fields and parameters differ from what you can GET and what you can POST; you really need to parse the job down, and reconstruct the create job request.
I completely understand that it will be like this when services launch. But it's been years now.