Hacker Newsnew | past | comments | ask | show | jobs | submit | molf's commentslogin


> I'd argue self-hosting is the right choice for basically everyone, with the few exceptions at both ends of the extreme:

> If you're just starting out in software & want to get something working quickly with vibe coding, it's easier to treat Postgres as just another remote API that you can call from your single deployed app

> If you're a really big company and are reaching the scale where you need trained database engineers to just work on your stack, you might get economies of scale by just outsourcing that work to a cloud company that has guaranteed talent in that area. The second full freight salaries come into play, outsourcing looks a bit cheaper.

This is funny. I'd argue the exact opposite. I would self host only:

* if I were on a tight budget and trading an hour or two of my time for a cost saving of a hundred dollars or so is a good deal; or

* at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.

I have nothing against self-hosting PostgreSQL. Do whatever you prefer. But to me outsourcing this to cloud providers seems entirely reasonable for small and medium-sized businesses. According to the author's article, self hosting costs you between 30 and 120 minutes per month (after setup, and if you already know what to do). It's easy to do the math...


> employing engineers to manage self-hosted databases is more cost effective than outsourcing

Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.

PaaS platforms (Heroku, Render, Railway) can legitimately be operated by your average dev and not have to hire a dedicated person; those cost even more though.

Another limitation of both the cloud and PaaS is that they are only responsible for the infrastructure/services you use; they will not touch your application at all. Can your application automatically recover from a slow/intermittent network, a DB failover (that you can't even test because your cloud providers' failover and failure modes are a black box), and so on? Otherwise you're waking up at 3am no matter what.


> Every company out there is using the cloud and yet still employs infrastructure engineers

Every company beyond a particular size surely? For many small and medium sized companies hiring an infrastructure team makes just as little sense as hiring kitchen staff to make lunch.


For small companies things like vercel, supabase, firebase, ... wipe the floor with Amazon RDS.

For medium sized companies you need "devops engineers". And in all honesty, more than you'd need sysadmins for the same deployment.

For large companies, they split up AWS responsibilities into entire departments of teams (for example, all clouds have math auth so damn difficult most large companies have -not 1- but multiple departments just dealing with authorization, before you so much as start your first app)


You're paying people to do the role either way, if it's not dedicated staff then it's taking time away from your application developers so they can play the role of underqualified architects, sysadmins, security engineers.


From experience (because I used to do this), it’s a lot less time than a self-hosted solution, once you’re factoring in the multiple services that need to be maintained.


As someone who has done both.. i disagree, i find self hosting to a degree much easier and much less complex

Local reproducibility is easier, and performance is often much better


It depends entirely on your use case. If all you need is a DB and Python/PHP/Node server behind Nginx then you can get away with that for a long time. Once you throw in a task runner, emails, queue systems, blob storage, user-uploaded content, etc. you can start running beyond your own ability or time to fix the inevitable problems.

As I pointed out above, you may be better served mixing and matching so you spend your time on the critical aspects but offload those other tasks to someone else.

Of course, I’m not sitting at your computer so I can’t tell you what’s right for you.


I mean, fair, we are ofc offloading some of that.. email being one of those, LLM`s being another thing.

Task runner/que at least for us postgres works for both cases.

We also self host an s3 storage and allow useruploaded content in within strict borders.


Yeah, and nobody is looking at the other side of this. There just are not a lot of good DBA/sysop type who even want to work for some non-tech SMB. So this either gets outsourced to the cloud, or some junior dev or desktop support guy hacks it together. And then who knows if the backups are even working.

Fact is a lot of these companies are on the cloud because their internal IT was a total fail.


If they just paid half of the markup they currently pay for the cloud I'm sure they'll be swimming in qualified candidates.


Our AWS spend is something like $160/month. Want to come build bare metal database infrastructure for us for $3/day?


When you need to scale up and don't want that $160 to increase 10x to handle the additional load the numbers start making more sense: 3 month's worth of the projected increase upfront is around 4.3k, which is good money for a few days' work for the setup/migration and remains a good deal for you since you break even after 3 months and keep on pocketing the savings indefinitely from that point on.

Of course, my comment wasn't aimed at those who successfully keep their cloud bill in the low 3-figures, but the majority of companies with a 5-figure bill and multiple "infrastructure" people on payroll futzing around with YAML files. Even half the achieved savings should be enough incentive for those guys to learn something new.


> few days' work

But initial setup is maybe 10% of the story. The day 2 operations of monitoring, backups, scaling, and failover still needs to happen, and it still requires expertise.

If you bring that expertise in house, it costs much more than 10x ($3/day -> $30/day = $10,950/year).

If you get the expertise from experts who are juggling you along with a lot of other clients, you get something like PlanetScale or CrunchyData, which are also significantly more expensive.


> monitoring

Most monitoring solutions support Postgres and don't actually care where your DB is hosted. Of course this only applies if someone was actually looking at the metrics to begin with.

> backups

Plenty of options to choose from depending on your recovery time objective. From scheduled pg_dumps to WAL shipping to disk snapshots and a combination of them at any schedule you desire. Just ship them to your favorite blob storage provider and call it a day.

> scaling

That's the main reason I favor bare-metal infrastructure. There is no way anything on the cloud (at a price you can afford) can rival the performance of even a mid-range server that scaling is effectively never an issue; if you're outgrowing that, the conversation we're having is not about getting a big DB but using multiple DBs and sharding at the application layer.

> failover still needs to happen

Yes, get another server and use Patroni/etc. Or just accept the occasional downtime and up to 15 mins of data loss if the machine never comes back up. You'd be surprised how many businesses are perfectly fine with this. Case in point: two major clouds had hour-long downtimes recently and everyone basically forgot about it a week later.

> If you bring that expertise in house

Infrastructure should not require continuous upkeep/repair. You wouldn't buy a car that requires you to have a full-time mechanic in the passenger seat at all times. If your infrastructure requires this, you should ask for a refund and buy from someone who sells more reliable infra.

A server will run forever once set up unless hardware fails (and some hardware can be redundant with spares provisioned ahead of time to automatically take over and delay maintenance operations). You should spend a couple hours a month max on routine maintenance which can be outsourced and still beats the cloud price.

I think you're underestimating the amount of tech that is essentially nix machines all around you that somehow just... work* despite having zero upkeep or maintenance. Modern hardware is surprisingly reliable and most outages are caused by operator error when people are (potentially unnecessarily) messing with stuff rather than the hardware failing.


At 160/mo you are using so little you might as well host off of a raspberry pi on your desk with a USB3 SSD attached. Maintenance and keeping a hot backup would take a few hours to set up, and you're more flexible too. And if you need to scale, rent a VPS or even dedicated machine from Hetzner.

An LLM could set this up for you, it's dead simple.


I'm not going to put customer data on a USB-3 SSD sitting on my desk. Having a small database doesn't mean you can ignore physical security and regulatory compliance, particularly if you've still got reasonable cash flow. Just as one example, some of our regulatory requirements involve immutable storage - how am I supposed to make an SSD that's literally on my desk immutable in any meaningful way? S3 handles this in seconds. Same thing with geographically distributed replicas and backups.

I also disagree that the ongoing maintenance, observability, and testing of a replicated database would take a few hours to set up and then require zero maintenance and never ping me with alerts.


The lede I buried there is whether all of this theater actually gives you better security and availability than 'toy' hardware.

Looking at all the recent AWS, Azure and Cloudflare outages, I posit that it doesn't.


Nice troll. But TFA is about corporate IT so hopefully you get whatever.


For companies not heavily into tech, lots of this stuff is not that expensive. Again, how many DBAs are even looking for a 3 hr/month sidegig?


It depends very much what the company is doing.

At my last two places it very quickly got to the point where the technical complexity of deployments, managing environments, dealing with large piles of data, etc. meant that we needed to hire someone to deal with it all.

They actually preferred managing VMs and self hosting in many cases (we kept the cloud web hosting for features like deploy previews, but that’s about it) to dealing with proprietary cloud tooling and APIs. Saved a ton of money, too.

On the other hand, the place before that was simple enough to build and deploy using cloud solutions without hiring someone dedicated (up to at least some pretty substantial scale that we didn’t hit).


> Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.

This doesn’t make sense as an argument. The reason the cloud is more complex is because that complexity is available. Under a certain size, a large number of cloud products simply can’t be managed in-house (and certainly not altogether).

Also your argument is incorrect in my experience.

At a smaller business I worked at, I was able to use these services to achieve uptime and performance that I couldn’t achieve self-hosted, because I had to spend time on the product itself. So yeah, we’d saved on infrastructure engineers.

At larger scales, what your false dichotomy suggests also doesn’t actually happen. Where I work now, our data stores are all self-managed on top of EC2/Azure, where performance and reliability are critical. But we don’t self-host everything. For example, we use SES to send our emails and we use RDS for our app DB, because their performance profiles and uptime guarantees are more than acceptable for the price we pay. That frees up our platform engineers to spend their energy on keeping our uptime on our critical services.


>At a smaller business I worked at, I was able to use these services to achieve uptime and performance that I couldn’t achieve self-hosted, because I had to spend time on the product itself. So yeah, we’d saved on infrastructure engineers.

How sure are you about that one? All of my hetzner vm`s reach an uptime if 99.9% something.

I could see more then one small business stack fitting onto a single of those vm`s.


100% certain because I started by self hosting before moving to AWS services for specific components and improved the uptime and reduced the time I spent keeping those services alive.


What was work you spend configuring those services and keeping them alive? I am genuinely curious...

We have a very limited set of services, but most have been very painless to maintain.


A Django+Celery app behind Nginx back in the day. Most maintenance would be discovering a new failure mode:

- certificates not being renewed in time

- Celery eating up all RAM and having to be recycled

- RabbitMQ getting blocked requiring a forced restart

- random issues with Postgres that usually required a hard restart of PG (running low on RAM maybe?)

- configs having issues

- running out of inodes

- DNS not updating when upgrading to a new server (no CDN at the time)

- data centre going down, taking the provider’s email support with it (yes, really)

Bear in mind I’m going back a decade now, my memory is rusty. Each issue was solvable but each would happen at random and even mitigating them was time that I (a single dev) was not spending on new features or fixing bugs.


I mean, going back a decade might be part of the reason?

Configs having issues is like number 1 reason i like the setup so much..

I can configure everything on my local machine and test here, and then just deploy it to a server the same way.

I do not have to build a local setup, and then a remote one


Er… what? Even in today’s world with Docker, you have differences between dev and prod. For a start, one is accessed via the internet and requires TLS configs to work correctly. The other is accessed via localhost.


Just fyi, you can put whatever you want in /etc/hosts, it gets hit before the resolver. So you can run your website on localhost with your regular host name over https.


I’m aware, I just picked one example but there are others like instead of a mail server you’re using console, or you have a CDN.


I use a https for localhost, there are a ton of options for that.

But yes, the cert is created differently in prod and there are a few other differences.

But it's much closer then in the cloud.


Just because your VM is running doesn't mean the service is accessible. Whenever there's a large AWS outage it's usually not because the servers turned off. It also doesn't guarantee that your backups are working properly.


If you have a server where everything is on the server, the server being on means everything is online... There is not a lot of complexity going on inside a single server infrastructure.

I mean just because you have backups does not mean you can restore them ;-)

We do test backup restoration automatically and also on a quarterly basis manually, but so you should do with AWS.

Otherwise how do you know you can restore system a without impact other dependency, d and c


Yes, mix-and-match is the way to go, depending on what kind of skills are available in your team. I wouldn't touch a mail server with a 10-foot pole, but I'll happily self-manage certain daemons that I'm comfortable with.

Just be careful not to accept more complexity just because it is available, which is what the AWS evangelists often try to sell. After all, we should always make an informed decision when adding a new dependency, whether in code or in infrastructure.


Of course AWS are trying to sell you everything. It’s still on you and your team to understand your product and infrastructure and decide what makes sense for you.


> still employs infrastructure engineers

> The "cloud" reducing staff costs

Both can be true at the same time.

Also:

> Otherwise you're waking up at 3am no matter what.

Do you account for frequency and variety of wakeups here?


> Do you account for frequency and variety of wakeups here?

Yes. In my career I've dealt with way more failures due to unnecessary distributed systems (that could have been one big bare-metal box) rather than hardware failures.

You can never eliminate wake-ups, but I find bare-metal systems to have much less moving parts means you eliminate a whole bunch of failure scenarios so you're only left with actual hardware failure (and HW is pretty reliable nowadays).


If this isn't the truth. I just spent several weeks, on and off, debugging a remote hosted build system tool thingy because it was in turn made of at least 50 different microservice type systems and it was breaking in the middle of two of them.

There was, I have to admit, a log message that explained the problem... once I could find the specific log message and understand the 45 steps in the chain that got to that spot.


In-house vs Cloud Provider is largely a wash in terms of cost. Regardless of the approach, you are going need people to maintain stuff and people cost money. Similarly compute and storage cost money so what you lose on the swings, you gain on the roundabouts.

In my experience you typically need less people if using a Cloud Provider than in-house (or the same number of people can handle more instances) due to increased leverage. Whether you can maximize what you get via leverage depends on how good your team is.

US companies typically like to minimize headcount (either through accounting tricks or outsourcing) so usually using a Cloud Provider wins out for this reason alone. It's not how much money you spend, it's how it looks on the balance sheet ;)


I don’t think it’s a lie, it’s just perhaps overstated. The number of staff needed to manage a cloud infrastructure is definitely lower than that required to manage the equivalent self-hosted infrastructure.

Whether or not you need that equivalence is an orthogonal question.


> The number of staff needed to manage a cloud infrastructure is definitely lower than that required to manage the equivalent self-hosted infrastructure.

There's probably a sweet spot where that is true, but because cloud providers offer more complexity (self-inflicted problems) and use PR to encourage you to use them ("best practices" and so on) in all the cloud-hosted shops I've been in a decade of experience I've always seen multiple full-time infra people being busy with... something?

There was always something to do, whether to keep up with cloud provider changes/deprecations, implementing the latest "best practice", debugging distributed systems failures or self-inflicted problems and so on. I'm sure career/resume polishing incentives are at play here too - the employee wants the system to require their input otherwise their job is no longer needed.

Maybe in a perfect world you can indeed use cloud-hosted services to reduce/eliminate dedicated staff, but in practice I've never seen anything but solo founders actually achieve that.


Exactly. Companies with cloud infra often still have to hire infra people or even an infra team, but that team will be smaller than if they were self-hosting everything, in some cases radically smaller.

I love self-hosting stuff and even have a bias towards it, but the cost/time tradeoff is more complex than most people think.


Working in a university Lab self-hosting is the default for almost anything. While I would agree that cost are quite low, I sometimes would be really happy to throw money at problems to make them go away. Without having the chance and thus being no expert, I really see the opportunity of scaling (up and down) quickly in the cloud. We ran a postgres database of a few 100 GB with multiple read replica and we managed somehow, but actually really hit our limits of expertise at some point. At some point we stopped migrating to newer database schemas because it was just such a hassle keeping availability. If I had the money as company, I guess I would have paid for a hosted solution.


The fact that as many engineers are on payroll doesn't mean that "cloud" is not an efficiency improvement. When things are easier and cheaper, people don't do less or buy less. They do more and buy more until they fill their capacity. The end result is the same number (or more) of engineers, but they deal with a higher level of abstraction and achieve more with the same headcount.


I can't talk about staff costs, but as someone who's self-hosted Postgres before, using RDS or Supabase saves weeks of time on upgrades, replicas, tuning, and backups (yeah, you still need independent backups, but PITRs make life easier). Databases and file storage are probably the most useful cloud functionality for small teams.

If you have the luxury of spending half a million per year on infrastructure engineers then you can of course do better, but this is by no means universal or cost-effective.


Well sure you still have 2 or 3 infra people but now you don’t need 15. Comparing to modern Hetzner is also not fair to “cloud” in the sense that click-and-get-server didn’t exist until cloud providers popped up. That was initially the whole point. If bare metal behind an API existed in 2009 the whole industry would look very different. Contingencies Rule Everything Around Me.


You are missing that most services don't have high availability needs and don't need to scale.

Most projects I have worked on in my career have never seen more than a hundred concurrent users. If something goes down on Saturday, I am going to fix it on Monday.

I have worked on internal tools were I just added a postgres DB to the docker setup and that was it. 5 Minute of work and no issues at all. Sure if you have something customer facing, you need to do a bit more and setup a good backup strategy but that really isn't magic.


> at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.

This is the crux of one of the most common fallacies in software engineering decision making today. I've participated in a bunch of architecture / vendor evaluations that concluded managed services are more cost effective almost purely because they underestimated (or even discarded entirely) the internal engineering cost of vendor management. Black box debugging is one of the most time costuming engineering pursuits, & even when it's something widely documented & well supported like RDS, it's only really tuned for the lowest common denominator - the complexities of tuning someone else's system at scale can really add up to only marginally less effort than self-hosting (if there's any difference at all).

But most importantly - even if it's significantly less effort than self-hosting, it's never effectively costed when evaluating trade-offs - that's what leads to this persistent myth about the engineering cost of self-hosting. "Managing" managed services is a non-zero cost.

Add to that the ultimate trade-off of accountability vs availability (internal engineers care less about availability when it's out of there hands - but it's still a loss to your product either way).


> Black box debugging is one of the most time costuming engineering pursuits, & even when it's something widely documented & well supported like RDS, it's only really tuned for the lowest common denominator - the complexities of tuning someone else's system at scale can really add up to only marginally less effort than self-hosting (if there's any difference at all).

I'm really not sure what you're talking about here. I manage many RDS clusters at work. I think in total, we've spent maybe eight hours over the last three years "tuning" the system. It runs at about 100kqps during peak load. Could it be cheaper or faster? Probably, but it's a small fraction of our total infra spend and it's not keeping me up at night.

Virtually all the effort we've ever put in here has been making the application query the appropriate indexes. But you'd do no matter how you host your database.

Hell, even the metrics that RDS gives you for free make the thing pay for itself, IMO. The thought of setting up grafana to monitor a new database makes me sweat.


> even the metrics that RDS gives you for free make the thing pay for itself, IMO. The thought of setting up grafana to monitor a new database makes me sweat.

CloudNative PG actually gives you really nice dashboards out-of-the-box for free. see: https://github.com/cloudnative-pg/grafana-dashboards


Sure, and I can install something to do RDS performance insights without querying PG stats, and something to schedule backups to another region, and something to aggregate the logs, and then I have N more things that can break.


> Could it be cheaper or faster? Probably

Ultimately, it depends on your stack & your bottlenecks. If you can afford to run slower queries then focusing your efforts elsewhere makes sense for you. We run ~25kqps average & mostly things are fine, but when on-call pages come in query performance is a common culprit. The time we've spent on that hasn't been significantly different to self-hosted persistence backends I've worked with (probably less time spent but far from orders of magnitudes - certainly not worthy of a bullet point in the "pros" column when costing application architectures.


> query performance is a common culprit

But that almost certainly has to do with index use and configuration, not whether you're self hosting or not. RDS gives you essentially all of the same Postgres configuration options.


its not. I've been in a few shops that use RDS because they think their time is better spend doing other things.

except now they are stuck trying to maintain and debug Postgres without having the same visibility and agency that they would if they hosted it themselves. situation isn't at all clear.


One thing unaccounted for if you've only ever used cloud-hosted DBs is just how slow they are compared to a modern server with NVME storage.

This leads the developers to do all kinds of workarounds and reach for more cloud services (and then integrating them and - often poorly - ensuring consistency across them) because the cloud hosted DB is not able to handle the load.

On bare-metal, you can go a very long way with just throwing everything at Postgres and calling it a day.


100% this directly connected nvme is a massive win. Often several orders of magnitude.

You can take it even further in some context if you use sqlite.

I think one of the craziest ideas of the cloud decade was to move storage away from compute. It's even worse with things like AWS lambda or vercel.

Now vercel et al are charging you extra to have your data next to your compute. We're basically back to VMs at 100-1000x the cost.


Yeah our cloud DBs all have abysmal performance and high recurring cost even compared to metal we didn't even buy for hosting DBs.


This is the reason I manage SQL Server on a VM in Azure instead of their PaaS offering. The fully managed SQL has terrible performance unless you drop many thousands a month. The VM I built is closer to 700 a month.

Running on IaaS also gives you more scalability knobs to tweak: SSD Iops and b/w, multiple drives for logs/partitions, memory optimized VMs, and there's a lot of low level settings that aren't accessible in managed SQL. Licensing costs are also horrible with managed SQL Server, where it seems like you pay the Enterprise level, but running it yourself offers lower cost editions like Standard or Web.


Interesting. Is this an issue with RDS?

I use Google Cloud SQL for PostgreSQL and it's been rock solid. No issues; troubleshooting works fine; all extensions we need already installed; can adjust settings where needed.


its more of a general condition - its not that RDS is somehow really faulty, its just that when things do go wrong, its not really anybody's job to introspect the system because RDS is taking care of it for us.

in the limit I dont think we should need DBAs, but as long as we need to manage indices by hand, think more than 10 seconds about the hot queries, manage replication, tune the vacuumer, track updates, and all the other rot - then actually installing PG on a node of your choice is really the smallest of problems you face.


| self hosting costs you between 30 and 120 minutes per month

Can we honestly say that cloud services taking a half hour to two hours a month of someone's time on average is completely unheard of?


I handle our company's RDS instances, and probably spend closer to 2 hours a year than 2 hours a month over the last 8 years.

It's definitely expensive, but it's not time-consuming.


Of course. But people also have high uptime servers with long-running processes they barely touch.


Very much depends on what you're doing in the cloud, how many services you are using, and how frequently those services and your app needs updates.


Self hosting does not cost you that much at all. It's basically zero once you've got backups automated.


I also encourage people to just use managed databases. After all, it is easy to replace such people. Heck actually you can fire all of them and replace the demand with genAI nowadays.


The discussion isn't "what is more effective". The discussion is "who wants to be blamed in case things go south". If you push the decision to move to self-hosted and then one of the engineers fucks up the database, you have a serious problem. If same engineer fucks up cloud database, it's easier to save your own ass.


Agreed. As someone in a very tiny shop, all us devs want to do as little context switching to ops as possible. Not even half a day a month. Our hosted services are in aggregate still way cheaper than hiring another person. (We do not employ an "infrastructure engineer").


> trading an hour or two of my time

pacman -S postgresql

initdb -D /pathto/pgroot/data

grok/claude/gpt: "Write a concise Bash script for setting up an automated daily PostgreSQL database backup using pg_dump and cron on a Linux server, with error handling via logging and 7-day retention by deleting older backups."

ctrl+c / ctrl+v

Yeah that definitely took me an hour or two.


So your backups are written to the same disk?

> datacenter goes up in flames

> 3-2-1 backups: 3 copies on 2 different types of media with at least 1 copy off-site. No off-site copy.

Whoops!


What is needed to evaluate OCR for most business applications (above everything else) is accuracy.

Some results look plausible but are just plain wrong. That is worse than useless.

Example: the "Table" sample document contains chemical substances and their properties. How many numbers did the LLM output and associate correctly? That is all that matters. There is no "preference" aspect that is relevant until the data is correct. Nicely formatted incorrect data is still incorrect.

I reviewed the output from Qwen3-VL-8B on this document. It mixes up the rows, resulting in many values associated with the wrong substance. I presume using its output for any real purpose would be incredibly dangerous. This model should not be used for such a purpose. There is no winning aspect to it. Does another model produce worse results? Then both models should be avoided at all costs.

Are there models available that are accurate enough for this purpose? I don't know. It is very time consuming to evaluate. This particular table seems pretty legible. A real production grade OCR solution should probably need a 100% score on this example before it can be adopted. The output of such a table is not something humans are good at reviewing. It is difficult to spot errors. It either needs to be entirely correct, or the OCR has failed completely.

I am confident we'll reach a point where a mix of traditional OCR and LLM models can produce correct and usable output. I would welcome a benchmark where (objective) correctness is rated separately from of the (subjective) output structure.

Edit: Just checked a few other models for errors on this example.

* GPT 5.1 is confused by the column labelled "C4" and mismatches the last 4 columns entirely. And almost all of the numbers in the last column are wrong.

* olmOCR 2 omits the single value in column "C4" from the table.

* Gemini 3 produces "1.001E-04" instead of "1.001E-11" as viscosity at T_max for Argon. Off by 7 orders of magnitude! There is zero ambiguity in the original table. On the second try it got it right. Which is interesting! I want to see this in a benchmark!

There might be more errors! I don't know, I'd like to see them!


This is why arenas are generally a bad idea for assessing correctness in visual tasks.


This is a philosophy. One which many people that write Ruby subscribe to. The fundamental idea is: create a DSL that makes it very easy to implement your application. It is what made Rails different when it was created: it is a DSL that makes expressing web applications easy.

I don't know its history well enough, but it seems to originate from Lisp. PG wrote about it before [1].

It can result in code that is extremely easy to read and reason about. It can also be incredibly messy. I have seen lots of examples of both over the years.

It is the polar opposite of Go's philosophy (be explicit & favour predictability across all codebases over expressiveness).

[1]: https://paulgraham.com/progbot.html


If there is one DSL which is a central abstraction of one’s entire app, used in 100s of places—this is fine.

If there is a DSL such as Rails’ URL routing, which will be the same in every app—this is also fine.

When one makes 100s of micro-DSLs for object creation, that are only ever used in one or two places—this is pure madness.


Good question. There's a few reasons to pick UUID over serial keys:

- Serial keys leak information about the total number of records and the rate at which records are added. Users/attackers may be able to guess how many records you have in your system (counting the number of users/customers/invoices/etc). This is a subtle issue that needs consideration on a case by case basis. It can be harmless or disastrous depending on your application.

- Serial keys are required to be created by the database. UUIDs can be created anywhere (including your backend or frontend application), which can sometimes simplify logic.

- Because UUIDs can be generated anywhere, sharding is easier.

The obvious downside to UUIDs is that they are slightly slower than serial keys. UUIDv7 improves insert performance at the cost of leaking creation time.

I've found that the data leaked by serial keys is problematic often enough; whereas UUIDs (v4) are almost always fast enough. And migrating a table to UUIDv7 is relatively straightforward if needed.


Not only can you make a good guess at how many customers/etc exist, you can guess individual ones.

World’s easiest hack. You’re looking at /customers/3836/bills? What happens if you change that to 4000? They’re a big company. I bet that exists.

Did they put proper security checks EVERYWHERE? Easy to test.

But if you’re at /customers/{big-long-hex-string}/bill the chances of you guessing another valid ID are basically zero.

Yeah it’s security through obscurity. But it’s really good obscurity.


This advice assumes /customers/:id/bills is public. Protected routes shouldn't expose sensitive information such as bills anyway, so this is more of an authorization issue (who can access which resource) more than privacy concerns. So this means, if you can access customes/4000/bills, then that's an application logic issue more than the type of ID itself.

In a well designed application, you shouldn't be able to guess whether a record exists or not simply by accessing a protected URL. As a counter argument - normal BIGINT or serial PKs are performant and are more than enough for most applications.


You describe a world where human skill is required to prevent these class of bugs, time and time again we've proven that people are people and bugs happen.

Systems must be _structurally architected_ with security in mind.

Security is layered, using a random key with 128-bit space makes guessing UUIDs infeasible. But _also_ you should be doing AuthZ on the records, and also you should be doing rate limiting on API so they can't be brute forced, either.


One of my previous orgs used incrementing number for Github profiles. While there was no privacy concern about accessing a Github profile of another user, i don't think the org expected someone to calculate tech attrition and layoff count using this URL pattern ;)


You normally aren't supposed to expose the PK anyway.


That advice was born primarily _because_ of the bigint/serial problem. If the PK is UUIDv4 then exposing the PK is less significant.

In some use cases it can be possible to exclude, or anonymize the PK, but in other cases a PK is necessary. Once you start building APIs to allow others to access your system, a UUIDv4 is the best ID.

There are some performance issues with very large tables though. If you have very large tables (think billions of rows) then UUIDv7 offers some performance benefits at a small security cost.

Personally I use v4 for almost all my tables because only a very small number of them will get large enough to matter. But YMMV.


It's not about table size so much as number of joins. You don't need to trade off between security and performance if you simply expose a uuid4 secondary col on a serial PK'd table.


There is a big difference though. Serial keys allow attackers to guess the rate at which data is being added.

UUID7 allows anyone to know the time of creation, but not how many records have been created (approximately) in a particular time frame. It leaks data about the record itself, but not about other records.


ASML CEO: Mistral investment not aimed at strategic autonomy for Europe

"In the long run, all AI models will be similar. It's about how you use the models in a well-protected environment. We will never allow our data and that of our customers to leave ASML. So a partner must be willing to work with us and adapt its model to our needs. Not only did Mistral want to do that, it is also their business model."

https://fd.nl/bedrijfsleven/1569378/asml-ceo-strategische-au...

--

Full article translated:

“A good reason to collaborate.” That's how ASML's CEO described his company's remarkable €1.3 billion investment in French AI company Mistral on Wednesday. Since the investment was leaked by Reuters on Sunday, there has been much speculation about ASML's reasons for investing in the European challenger to giants such as OpenAI and Anthropic. Analysts and commentators pointed to the geopolitical implications or the strong French link between the companies. But according to ASML CEO Christophe Fouquet, the reason was purely business. “Sovereignty has never been the goal.”

Mistral AI is a start-up founded in 2023 that specializes in building large language models. The French CEO of ASML and Mistral CEO Arthur Mensch met at an AI summit in Paris earlier this year and decided to work together to use Mistral's models to further improve ASML's chip machines.

Surprising investment

Each ASML machine generates approximately 1 terabyte of data per day. “Our machines are very complex,” Fouquet explains in an interview with the FD. "We have highly advanced control systems on our machines to enable them to operate very quickly and with great accuracy. The amount of data our machines generate gives us the opportunity to use AI. With the current software and machine learning models, we are limited in what we can do with the data and how quickly we can adjust the machine,“ says the CEO. ”AI is the next step in making better use of all that data."

ASML has invested in other companies in the past, such as German lens manufacturer Zeiss and Eindhoven-based photonics company Smart Photonics, but those were either suppliers or potential customers. Mistral is neither.

Running AI models in-house

According to the ASML CEO, the Dutch company's investment in Mistral stems from the conviction that both companies can create value together. If Mistral becomes more valuable as a result of the collaboration, ASML can benefit from that.

ASML is the main investor in a new €1.7 billion financing round for Mistral. This makes Mistral an important AI player in Europe, but small compared to its American rivals. OpenAI raised $40 billion in its latest round alone. Anthropic, the company behind the Claude program, which is popular among programmers, just closed a $13 billion round.

“European sovereignty was not the goal”

According to Fouquet, the reason for the collaboration lies primarily in the way Mistral develops its AI models. “In the long run, all AI models will be similar. It's about how you use the models in a well-protected environment,” says Fouquet. “We will never allow our data and that of our customers to leave ASML. So a partner must be willing to work with us and adapt its model to our needs. Not only did Mistral want to do that, it is also their business model.”

According to Fouquet, the collaboration is not motivated by a desire for greater European sovereignty. “That was not the goal. But if it contributes to that, we are happy,” says Fouquet.

ASML supports EU initiatives to strengthen the chip sector in Europe, but always maintains a politically neutral stance in the geopolitical struggle between the United States, China, and the European Union. This is understandable, as the company has major customers in all regions, such as TSMC in Taiwan, SK Hynix in South Korea, SMIC in China, and Intel in the US.

“Two birds with one stone”

Although ASML itself does not play the European card, some analysts and politicians do see such a motive for the collaboration with Mistral. “Thousands of large companies worldwide make extensive use of AI in their product development by using the services of OpenAI, Meta, Microsoft, Google, Mistral, without investing in these companies,” writes investment bank Jefferies in a commentary. “We also do not believe that ASML needed an investment in an AI company to benefit from AI models in its lithography products. In our view, the investment stems primarily from geopolitical motives to support and develop a European AI company and ecosystem,” the bank states.

Wouter Huygen, CEO of AI consultancy Rewire, also sees a clear link to European sovereignty. “ASML is known for taking internal technology development very far. It is therefore quite understandable that ASML is taking this step: access to and influence on the development of a strategic technology. Plus European sovereignty. That's two birds with one stone.”


Not only are the CEO + COO French, they recently hired Le Maire, French ex-minister of Finance as a strategic advisor. ASML has also been rumoured to exit the Netherlands and relocate to France.

It is definitely a political move.


I doubt the relocation: they just announced a new production site with 20.000 job openings in the next 3 years around Eindhoven.

I'm sure the French would love it, though. I always thought ASML would open a R&D facility in France or so to court the French government.

Guess this is it.


I agree that a relocation might not happen, but the increasing Frenchification of ASML after their CEO became French does smell a little off.


Are Google and Microsoft “Indianified” because their CEOs are from India?


I think you misread my comment. Did the CEOs of Google or Microsoft hire former ministers from India as strategic advisors, or make unprecedented and eyebrow-raising investments in Indian startups?



There is not enough power in the region to facilitate ASML demand. Rumors of dc's opening in France instead.


> they recently hired Le Maire, French ex-minister of Finance as a strategic advisor.

Then I can say without much speculation that this will end in a disaster.

Hiring Le Maire as a strategic advisor with his "accomplishments" should be taken as a sign of clear enshitiffication.


He won't be giving any advice, they are buying his contact list.


Still smells like corruption


If there were a culture of always including the original source, or journalists massively advocating to include the original source, then surely the CMS would cater to it. I think it's safe to draw the conclusion that most journalists don't care about it.


There’s a lot of rightly deserved criticism of the media but the OP describing journalists as conspiring to keep links out in the fear of being fact checked by readers is simply false and indicative of not having any experience at a large news organization.


But there has to be some reason for original source links never appearing in journalist articles.


It's not historically done. Printed newspapers obviously didn't have links and neither did televised news. Even when the news media started publishing online, it's not like the courts were quick to post the decisions online.

And there's also the idea that you should be able to at least somewhat trust the people reporting the news so they don't have to provide all of their references. --You can certainly argue that not all reporters can or should be trusted anymore, but convincing all journalists to change how they work because of bad ones is always going to be hard.


There is also the added pressure that some organizations quietly pile on editors to keep people from clicking out to third parties at all, where their attention may wander away. Unless of course that third party is an ad destination.

Reputable news organizations are more robust against such pressures, but plenty of people get their news from (in some cases self-described) entertainment sites masquerading as news sites.


Is it because journalists think of their special talent as talking to people to get information (which is a scarce and priviliged resource), versus reading and summarizing things that we all have access to?

So they rarely are forced to do anything but state the name of who they interviewed, and that's it. And puts them in the habit of not acknowledging what they read, as a source?


I feel this is an insanely distorted take.

How about extreme and utter irrelevance (such as after building a thing nobody wants)?

Or how about this, arguably the most common: slightly successful; nobody hates it but nobody loves it either. Something people feel mildly positive about, but there is zero “hype” and also no “moat” and nobody cares enough to hate it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: