Hacker Newsnew | past | comments | ask | show | jobs | submit | jit_hacker's commentslogin

I started using "The Cloud" in 2012 to build apps. At the time they called it Google App Engine. In the last 13 years a lot has changed and I've used all the main cloud offerings.

I've been a part of many outages that originated on our cloud provider. And what I've learned is the best practices rarely save you. We pretend multi-az, global DBs and fail overs protect us, but they don't. They make us more scale able, but not more resilient.


It is difficult to measure

These cloud services have great uptime generally, and they have become so centrally integral to the web that when they go down, everything goes with them

If a random person was hosting their own server somewhere, they would probably avoid the huge outages, but likely have more frequent smaller outages and nowhere near the reliable uptime that the big providers have, right?


MSK IAM support has long mystified me. I think they only supported Java for the first 9 months or so. Even then they still don't have GO or PHP support. It's not a ton of work, they're reusing request signer code anyways.


According to my teammate who actually wrote the C++ code for this, there are lack of documentations of how the AWS_MSK_IAM is supposed to work. He has to check the Java/Python implementation line by line to avoid those guesswork


Well, there's precedent for that since the $(aws eks get-token) is just a base64 pre-signed GetCallerIdentity URL but I don't think that's documented anywhere, either, but can be spotted by squinting at aws-iam-authenticator source

My suspicion is that if they didn't want to bother to write a C++ client, they for sure wouldn't have the empathy(?) to document how anyone else could, too. I said empathy but I kind of wonder if by publishing how something works they're committing to it, versus they're currently only one commit away from changing it in their clients, without having to notify anyone


That’s how it works when writing CDK outside of TypeScript. You have to review the TS docs to get anywhere.


Exactly, once figured out how it works, the implementation is quite straightforward.


I work at a popular Seattle tech company. and AI is being shoved down our throats by leadership. to the point it was made known they're tracking how much devs use AI and I've even been asked when I'm personally not using it more. and I've long been a believer in using the right tool for the right job. And sometimes it's AI, but not super often

I spent a lot of time trying to think about how we arrived here. where I work there are a lot of Senior Directors and SVPs who used to write code 10+ years ago. Who if you would ask them to build a little hack project they would have no idea where to start. And AI has given them back something they've lost because they can build something simple super quickly. But they fail to see that just because it accelerates their hack project, it won't accelerate someone who's an expert. i.e. AI might help a hobbyist plant a garden, but it wouldn't help a farmer squeeze out more yield.


> just because it accelerates their hack project, it won't accelerate someone who's an expert.

I would say that this is the wrong distinction. I'm an expert who's still in the code every day, and AI still accelerates my hack projects that I do in my spare time, but only to a point. When I hit 10k lines of code then code generation with chat models becomes substantially less useful (though autocomplete/Cursor-style advanced autocomplete retains its value).

I think the distinction that matters is the type of project being worked on. Greenfield stuff—whether a hobby project or a business project—can see real benefits from AI. But eventually the process of working on the code becomes far more about understanding the complex interactions between the dozens to hundreds of components that are already written than it is about getting a fresh chunk of code onto the screen. And AI models—even embedded in fancy tools like Cursor—are still objectively terrible at understanding the kinds of complex interactions between systems and subsystems that professional developers deal with day in and day out.


My experience has gotten better by focusing on documenting the system (with ai to speed up writing markdown). I find reasoning models quite good at understanding systems if you clearly tell them how it works. I think this creates a virtuous circle where I incrementally write much more documentation than I ever had the stomach for before. Of course this is still easier of you started greenfield buts allowed me to keep claude 3.7 in the game even as the code base is now 20k+ lines.


> even as the code base is now 20k+ lines.

That's better than my past experience with hobby projects, but also nowhere near as big as the kinds of software systems I'm talking about professionally. The smallest code base I have ever worked on was >1M lines, the one I'm maintaining now is >5M.

I don't doubt that you can scale the models beyond 10K with strategies like this, but I haven't had any luck so far at the professional scales I have to deal with.


I've found claude-code good in a multi-million line project because it can navigate the filesystem like a human would.

You have to give it the right context and direction — like you would to a new junior dev — but then it can be very good. Eg.

> Implement a new API in `example/apis/new_api.rs` to do XYZ which interfaces with the system at `foo/bar/baz.proto` and use the similar APIs in `example/apis/*` as reference. Once you're done, build it by running `build new_api` and fix any type errors.

Without that context (eg. the example APIs) it would flail, but so would most human engineers.


Well I have also worked on systems of multiple millions of lines, well pre-llm, and I sure as he'll didn't actively understand every aspect of it. I understood deeply the area I work on and the contracts with my dependencies as well the contracts I provide. I also understand the overall architecture. We'll see how it goes if my project grows to that point, but I believe by clearing documenting those things, and overall focusing on low coupling, I can keep the workflow I have now, but with context loading for every session. Time will tell.

In general though, its been a lot of learning on how to make llms work for me, and I do wonder if people simply dismiss too quickly because they subconsciously don't want them to work. Also "llm" is too generic. Copilot with 4o sucks but claude in cursor and windsurf does not suck.


I’m using it to ship real projects with real customers in a real code base at 2x to 5x the rate I was last year.

It’s just like having a junior dev. Without enough guidance and guardrails, they spin their wheels and do dumb things.

Instead of focusing of lines of code, I’m now focusing on describing overall tasks, breaking them down, and guiding an LLM to a solution.


Cool anecdote, for me it has slowed me down 8x to 23x since I started using it in real projects with real customers in a real code base last year.

So 1-1 in pointless personal anecdotes. Now show us the numbers! How did you measure this? Can u show 2x/5x increase in projects/orders/profits/stock price?


I'm not really sure I understand your counter argument. Pretty much everything about personal productivity is anecdotes because it's so uniquely tied to an individual. I showed you my numbers - I am 2x to 5x faster at delivering projects.


The point is that leadership gets to write on their own promo document / resume about how they "boosted developer productivity" by leading the charge on introducing AI dev processes to the company. Then they'll be long gone onto the next job before anybody actually knows what the result of it was, whether it actually boosted productivity or not, whether there were negative side-effects, etc.


Aye - this is a limitation of the current tech. For any project greater than 1k lines where the model was not pretrained on the code base…. AI is simply not useful beyond documentation search.

It’s easy to see this effect in any new project you start with AI, the first few pieces of functionality are easy to implement. Boilerplate gets written effortlessly. Then the ai can’t reason about the code and makes dumb mistakes.


pdfbox is just as good for 99.9999% of documents. We used to use iText and in the last renewal, they tried to 10x our yearly license cost to the point it would have been more expensive than our AWS bill. No thanks.


Beyond the fact that it is a total scam, it also creates a lot of animosity among employees. Because there is no set limit, heck there are rarely guidelines, people often feel it's used unfairly in their teams and org.


If there aren't guidelines thats where the problem lies. Management should be all over this. One size fits all. "Unlimited" is a nonsense term but shouldn't be taken literally and abuse needs to be called out.


I completely agree. I've worked at three places with unlimited PTO and none of them do. I think the moment you set that expectation the illusion of being unlimited goes away.


> but more importantly it takes just doing fantastic work when the opportunities present themselves.

It has taken me a decade to realize this is the key to success. Just do your very best work, as often as possible, and let the rest figure itself out.

I accidentally created a data warehouse that became the necessary backbone to launch a massive new org (100M+ revenue). I was building it for a relatively small near realtime Elasticsearch cluster to generate data reports for an application. But I thought, gee, if I suck in a bunch of other data from other sources and clean it, I might find a use some day. Little did I know that someone else would piggy back on it to build a POC for a giant business expansion.


I've been stuck in the mud for a bit and had been slowly forgetting this one papercut at a time. Since this is usually not something I'm susceptible to, it has been troubling me.

Last year I pulled off a minor coup by making it cheaper and easier to run a big chunk of our code. One of the senior people was trying to get me to toot my own horn but the actual work was like one week of cleverness and five weeks of book-keeping, and so it felt too slog-like to celebrate.

But a couple weeks later when I was doing some housework it dawned on me that it was 'only' six weeks of work because of a bunch of fairly thankless things I'd invested a great deal of time in over the previous 18 months. I'd been dribbling out appetizers for a while, this was the first big dish, and I'm not sure it's the main course, so it's not like I missed my big chance, but I probably should have honored it more.

I'd had an initial 'I told you so' feeling in the heat of the moment that pushed down as feeling petty, and delivered instead a fairly lukewarm "this is what I am talking about." This is, in fact, one of the things I'm talking about, but I think I'm past trying to recruit people (all the fun people to recruit are gone). I'm sadly just following the Campsite Rule and splitting my priorities between things that help me do my job, and things that might help me do the next one.

In that respect, I've been chasing a dragon (two, in fact) for the last couple of months instead of acknowledging that paragraphs like the one above are a pretty good indication I should be focusing all of my energy on job hunting instead. Wisdom is on a continuum and there's a lot of room still for foolishness even if you've got a lot of things figured out.


There is another dimension to this that most people haven't talked about, remote and gig work. For most small and medium businesses, it's cost prohibitive to hire remote employees. For larger companies it's very easy. This has created a regional imbalance.

Lots of knowledge working and call center opportunities are moving to full time remote. Paying employees the same with better benefits to never leave their house.


This article amounts to someone whining because they didn't take necessary action to prevent a dumpster fire. AWS's Managed Elasticsearch has tradeoffs and you should understand them before choosing it, but AWS is not to blame if you've under provisioned your cluster and imbalanced your shards.


Lack of security? AWS offers very granular, per index, authorizations that are tied into IAM in the same way you would configure S3 or DynamoDB. If user's are failing to implement good policies, AWS is not to blame.


When your authorization system is a usability shitfest, you're partly responsible for bad usage.


Preach on!

The only way I've really been able to understand it well is by writing code that uses boto and seeing what errors out lol.

I've found some really interesting bugs/inconsistencies too. Nothing horrible but its def unintuitive sometimes.


That's the right way of doing it IMO. I've got a PoC script which finds the minimum subset of permissions to allow some action: https://github.com/KanoComputing/aws-tools/blob/master/bin/a...

Haven't had time to productise it yet. I think doing this makes you quite a bit safer, because it means you don't end up giving up and allowing more than you need. However, you still need to understand which actions shouldn't be allowed, so it's not the whole solution.


That's awesome!

That said, if a customer has to fuzz a platform's settings to discern their effect, the UX definitely needs work.


Netflix open sourced a similar tool that watches API calls for a Role and then suggests minimum privilege changes to the attached policy document: https://github.com/Netflix/repokid


That's interesting. That can only work if there's some way of introspecting permissions - which I didn't realise existed. Mine works by experiment. I wonder how fine grained their way is.


Ooooh I gotta check that out!


If you don't have prior experience in networking or permissions, it will take some time for you to understand the concepts to properly secure a standalone server. The same concept applies to AWS. You are paying for the hardware, not someone to hold your hand through the process.

And if you can't figure security out by yourself, pay someone to hold your hand.


>And if you can't figure security out by yourself, pay someone to hold your hand.

This. Security is as much a tradition as it is a set of technologies. Its better to learn from a master than from a costly mistake, and its better to learn how to do it rather than to pay to have it done for you.


I feel like this attitude (which is very common) holds us back from developing more reliable systems. When something fails, we don't ask what we can do to improve the system.. instead we point the blame at users. It's the easy way out.. instead of designing better systems, we just tell the user to 'do better next time'.

The more difficult a system is to use properly, the more we should demand an alternative. If your users keep making the same mistake over and over again, then at some point you have to start asking yourself what you need to improve.


On the other hand, I always go back to something my grandpa (who himself was an industrial engineer) used to say: "It is impossible to make things foolproof, because fools are always so ingenious."


Unlike every other AWS managed service I had used up to that point, when I was using Amazon ES a ~year ago there was no integration with any sort of VPC offering, and there was no clear published guide on how to establish such a connection. I ended up doing so with a hacky bastion-based architecture, but most other teams I saw using ES at the time just didn't bother.


I don't think you're wrong, its a complete offering- and yet:

- If you want to ingest data with Kinesis Firehose, you can't deploy the cluster in a VPC.

- You can enable API access to an IP whitelist, IAM role or an entire account. You can attach the policy to the resource, or to an identity, or call from an AWS service with a service-linked role. That's all good, perhaps a little complex but as you said, nothing too different than S3 or DyanmoDb, except for the addition of IP policy. Why not security groups? Is DENY worth the added complexity?

- However, you can't authenticate to Kibana with IAM as a web-based service. Recently they added support for Cognito for Kibana, otherwise one would have to setup a proxy service and whitelist Kibana to that proxy's IP, then manage implementing signed IAM requests if you want index level control. Cognito user pools can be provisioned to link to a specific role, but you can't grant multiple roles to a user pool, so you have to create a role and user pool for every permutation of index access you want to grant. You also have delegate ES cluster access to Cognito, and deploy them in the same region.

All told, even a relatively simple but proper implementation of ES+Kibana with access control to a few indexes using CloudFormation or Terraform would require at least a dozen resources, and at least a day of a competent developer's time researching, configuring, and testing the deployment. Probably more to get it right.

Ultimately there is nothing wrong with the controls AWS provides, but plenty that can go wrong with them.

For the curious:

- https://aws.amazon.com/blogs/security/how-to-control-access-...

- https://docs.aws.amazon.com/elasticsearch-service/latest/dev...


> - If you want to ingest data with Kinesis Firehose, you can't deploy the cluster in a VPC.

why not?

https://aws.amazon.com/blogs/aws/amazon-elasticsearch-servic...


Check out the 2nd to last line of that post. They make the same statement in doc. Lots of services are getting VPC endpoints so traffic never has to hit the public web, but firehouse isn’t one of them(yet).


This is a non-issue. The URLs are way to unique to guess (you'd have an easier time guessing an email/pass/2FA). And ones ability to access the URL at all is the same as their ability to access the bytes of the image. Once accessed, they could capture and share either.

This would be an issue if it were mutable data.


Agreed, I was looking for something more substantial than this, too. I was thinking it was some clever unpatched way to scrape semi public Google photos links. Turns out it's just the sharing feature working as expected.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: