Hacker News new | past | comments | ask | show | jobs | submit | ryan29's comments login

You can see the same thing starting to happen in the domain industry. Registries are buying pricing data rather than setting their own prices, so high-value keywords end up having the same price across TLDs that should be competing with each other.

> Wouldn't you say the same thing for most of the people? Most of the people suck at verifying truth and reasoning. Even "intelligent" people make mistakes based on their biases.

I think there's a huge difference because individuals can be reasoned with, convinced they're wrong, and have the ability to verify they're wrong and change their position. If I can convince one person they're wrong about something, they convince others. It has an exponential effect and it's a good way of eliminating common errors.

I don't understand how LLMs will do that. If everyone stops learning and starts relying on LLMs to tell them how to do everything, who will discover the mistakes?

Here's a specific example. I'll pick on LinuxServer since they're big [1], but almost every 'docker-compose.yml' stack you see online will have a database service defined like this:

    services:
      app:
        # ...
        environment:
          - 'DB_HOST=mysql:3306'
        # ...
      mariadb:
        image: linuxserver/mariadb
        container_name: mariadb
        environment:
          - PUID=1000
          - PGID=1000
          - MYSQL_ROOT_PASSWORD=ROOT_ACCESS_PASSWORD
          - TZ=Europe/London
        volumes:
          - /home/user/appdata/mariadb:/config
        ports:
          - 3306:3306
        restart: unless-stopped
Assuming the database is dedicated to that app, and it typically is, publishing port 3306 for the database isn't necessary and is a bad practice because it unnecessarily exposes it to your entire local network. You don't need to publish it because it's already accessible to other containers in the same stack.

Another Docker related example would be a Dockerfile using 'apt[-get]' without the '--error-on=any' switch. Pay attention to Docker build files and you'll realize almost no one uses that switch. Failing to do so allows silent failures of the 'update' command and it's possible to build containers with stale package versions if you have a transient error that affects the 'update' command, but succeeds on a subsequent 'install' command.

There are tons of misunderstandings like that which end up being so common that no one realizes they're doing things wrong. For people, I can do something as simple as posting on HN and others can see my suggestion, verify it's correct, and repeat the solution. Eventually, the misconception is corrected and those paying attention know to ignore the mistakes in all of the old internet posts that will never be updated.

How do you convince ChatGPT the above is correct and that it's a million posts on the internet that are wrong?

1. https://docs.linuxserver.io/general/docker-compose/#multiple...


I asked ChatGPT 4o if there's anything that can be improved in your docker-compose file. Among other (seemingly sensible) suggestions, it offered:

## Restrict Host Ports for Security

If app and mariadb are only communicating internally, you can remove 3306:3306 to avoid exposing the port to the host machine:

```yaml ports: - 3306:3306 # Remove this unless external access is required. ```

So, apparently, ChatGPT doesn't need any more convincing.


Here GPT is saying the port is only exposed to the host machine (e.g.: localhost), rather than the full local network.


Wow. I can honestly say I'm surprised it makes that suggestion. That's great!

I don't understand how it gets there though. How does it "know" that's the right thing to suggest when the majority of the online documentation all gets it wrong?

I know how I do it. I read the Docker docs, I see that I don't think publishing that port is needed, I spin up a test, and I verify my theory. AFAIK, ChatGPT isn't testing to verify assumptions like that, so I wonder how it determines correct from incorrect.


I suspect there is acsolid corpus of advices online that mention the exposed ports risk. Alongside with flawed examples you mentioned. Narrow request will trigger the right response. That's why LLMs are still requiring basic understanding of what exactly you plan to achieve.


I think first year premium pricing makes a lot of sense. I'm not sure what the average time to sell is for a domain investor, but say it's 10 years for an easy example.

If you go from a standard registration price of $12 / year to a first year premium of $132, you double the 10 year carrying cost of a domain. That, naively, means domain investors can only speculate on half as many domains.

By having a first year premium price and then dropping domains back into the 'standard' tier, you also leave registrants with a semblance of price protections via section 2.10c of the registry agreement. As-is, premium domains have zero guarantees when it comes to premium renewal pricing.

There's a lot of room between squeezing domain investors and asking registrants to pay $100-1000+ per year for premium domains.


If memory serves me, first year premium pricing is definitely a thing for some domains on some tlds with some registrars.

Though I can also definitely understand why, for example, "lawyer.lawyer" would cost $$$$ every year, too, at least myself.


Domains are the ultimate identity system for building a more trustworthy internet without handing over control to some kind of verified ID scheme or being forced into publishing your personal details to gain credibility.

You can build reputation and trust using a handle, even if it's not associated with your real world identity. For example, I know that if 'ryao' replies to a question about ZFS, the response can be considered trustworthy. I don't know who that is or even what country they live in, but I know they're a contributor that isn't speculating or guessing when they reply and that's all that matters to me.

Domains can be used as verifiable, globally unique handles which simplifies things for the average user because it makes it easier to help users avoid impersonation and confusion if you can point them to something simple and verifiable. For example, look at Bluesky [1].

I've been wanting domain based namespaces and handles for a solid 5 years because it just makes sense. Here's my oldest mention of it (asking why package managers don't use domain verified namespacing) I have on HN [2]:

> It seems like a waste to me when I'm required to register a new identity for every package manager when I already have a globally unique, extremely valuable (to me), highly brandable identity that costs $8 / year to maintain.

You can tell it's old because .com domains only costed $8 back then. IMHO, domain based handles are the #1 reason to use Bluesky over X/Twitter. People used to spend $10-15k buying "noteworthiness" via fake articles, etc. to get verified on Twitter. I can't find any links because search results are saturated with talk of X wanting $1000 per month for organization validation (aka a gold check mark). Domain validation is just as good as that kind of organization validation, at least for well known individuals and organizations.

Given that, I think there would be a bigger market for domains if domain validated identities catch on. It could even spawn specialty gTLDs that do extra identity or notoriety checks (if that's allowed) or maybe attestations would become a big thing if there were an easy way to do them against a domain verified handle.

1. https://bsky.social/about/blog/3-6-2023-domain-names-as-hand...

2. https://news.ycombinator.com/item?id=24674882


> When I see a billboard or print ad with e.g. `example.travel`, I read that as a social media handle and not a website address like `example.com` would convey.

This is where I think the new gTLDs registries could do better. Using your domain as a handle on Bluesky is a perfect example of something they could push for to grow the industry, but they seem to think the status quo with a sprinkle of price discrimination is the winning formula.

Most of the new gTLDs work great as domain verified social media handles, but no one is going to use them for that if all the good keywords are classified as premium with $100+ annual renewal fees. However, if you make them too cheap and they get popularized, domain investors will register everything good and try to flip them.

I think first year premium pricing strikes a good balance that doesn't limit novel, non revenue generating use cases too much. Charging $100-200 for the first year causes a very large increase in the amount of capital domain flippers need to invest to acquire a large portfolio of good names.

If Bluesky catches on I think we could hit a point where non-technical people are suddenly shocked when the see someone "using their social media handle for a website." Getting back to having people understand there's more than just Facebook and Twitter would be a step in the right direction IMO, so it would be nice to see Bluesky continue to gain popularity.


It's the registries not the registrars that classify some domains as premium. I think they're a risky product because you don't even get the limited price protections provided by section 2.10c of the registry agreement, but there seems to be a market for them [1].

1. https://domainnamewire.com/2024/08/28/radix-sets-record-for-...


> I believe you that token uploads will continue to be possible, but it seems likely that in a couple of years trusted publishing & attestations will be effectively required for all but the tiniest project.

That's what I think will happen.

> And maybe that's a good thing? I'm not against security, and supply chain attacks are real.

The problem is the attestation is only for part of the supply chain. You can say "this artifact was built with GitHub Actions" and that's it.

If I'm using Gitea and Drone or self-hosted GitLab, I'm not going to get trusted publisher attestations even though I stick to best practices everywhere.

Contrast that with someone that runs as admin on the same PC they use for pirating software, has a passwordless GPG key that signs all their commits, and pushes to GitHub (Actions) for builds and deployments. That person will have more "verified" badges than me and, because of that, would out-compete me if we had similar looking projects.

The point being that knowing how part of the supply chain works isn't sufficient. Security considerations need to start the second your finger touches the power button on your PC. The build tool at the end of the development process is the tip of the iceberg and shouldn't be relied on as a primary indicator of trust. It can definitely be part of it, but only a small part IMO.

The only way a trusted publisher (aka platform) can reliably attest to the security of the supply chain is if they have complete control over your development environment which would include a boot-locked PC without admin rights, forced MFA with a trustworthy (aka their) authenticator, and development happening 100% on their cloud platform or with tools that come off a safe-list.

Even if everyone gets onboard with that idea it's not going to stop bad actors. It'll be exactly the same as bad actors setting up companies and buying EV code signing certificates. Anyone with enough money to buy into the platform will immediately be viewed with a baseline of trust that isn't justified.


As I understand it, the point of these attestations is that you can see what goes into a build on GitHub - if you look at the recorded commit on the recorded repo, you can be confident that the packages are made from that (unless your threat model is GitHub itself doing a supply chain attack). And the flip side of that is that if attestations become the norm, it's harder to slip malicious code into a package without it being noticed.

That's not everything, but it is a pretty big step. I don't love the way it reinforces dependence on a few big platforms, but I also don't have a great alternative to suggest.


Yeah, if the commit record acts like an audit log I think there’s a lot of value. I wonder how hard it is to get the exact environment used to build an artifact.

I’m a big fan of this style [1] of building base containers and think that keeping the container where you’ve stacked 4 layers (up to resources) makes sense. Call it a build container and keep it forever.

1. https://phauer.com/2019/no-fat-jar-in-docker-image/


I don't use PyPI and only skimmed the docs. I think what you're saying here makes sense, but I also think others posting have valid concerns.

As a package consumer, I agree with what you've said. I would have a preference for packages that are built by a large, trusted provider. However, if I'm a package developer, the idea worries me a bit. I think the concerns others are raising are pragmatic because once a majority of developers start taking the easy path by choosing (ex) GitHub Actions, that becomes the de-facto standard and your options as a developer are to participate or be left out.

The problem for me is that I've seen the same scenario play out many times. No one is "forced" to use the options controlled by corporate interests, but that's where all the development effort is allocated and, as time goes on, the open source and independent options will simply disappear due the waning popularity that's caused by being more complex than the easier, corporate backed options.

At that point, you're picking platform winners because distribution by any other means becomes untenable or, even worse, forbidden if you decide that only attested packages are trustworthy and drop support for other means of publishing. Those platforms will end up with enormous control over what type of development is allowed. We have good examples of how it's bad for both developers and consumers too. Apple's App Store is the obvious one, but uBlock Origin is even better. In my opinion, Google changed their platform (Chrome) to break ad blockers.

I worry that future maintainers aren't guaranteed to share your ideals. How open is Open Solaris these days? MySQL? OpenOffice?

I think the development community would end up in a much stronger position if all of these systems started with an option for self-hosted, domain based attestations. What's more trustworthy in your mind; 1) this package was built and published by ublockorigin.com or 2) this package was built and published by GitHub Actions?

Can an impersonator gain trust by publishing via GitHub actions? What do the uninformed masses trust more? 1) an un-attested package from gorhill/uBlock, which is a user without a verified URL, etc. or 2) an attested package from ublockofficial/ublockofficial, which could be set up as an organization with ublockofficial.com as a verified URL?

I know uBlock Origin has nothing to do with PyPI, but it's the best example to make my point. The point being that attesting to a build tool-chain that happens to be run by a non-verifying identity provider doesn't solve all the problems related to identity, impersonation, etc.. At worst, it provides a false sense of trust because an attested package sounds like it's trustworthy, but it doesn't do anything to verify the trustworthiness of the source, does it?

I guess I think the term "Trusted Publisher" is wrong. Who's the publisher of uBlock Origin? Is it GitHub Actions or gorhill or Raymond Hill or ublockorigin.com? As a user, I would prefer to see an attestation from ublockorigin.com if I'm concerned about trustworthiness and only get to see one attestation. I know who that is, I trust them, and I don't care as much about the technology they're using behind the scenes to deliver the product because they have a proven track record of being trustworthy.

That said, I do agree with your point about gaining popularity and compromises that developers without an existing reputation may need to make. In those cases, I like the idea of having the option of getting a platform attestation since it adds some trustworthiness to the supply chain, but I don't think it should be labelled as more than that and I think it works better as one of many attestations where additional attestations could be used to provide better guarantees around identity.

Skimming the provenance link [1] in the docs, it says:

> It’s the verifiable information about software artifacts describing where, when and how something was produced.

Isn't who is responsible for an artifact the most important thing? Bad actors can use the same platforms and tooling as everyone else, so, while I agree that platform attestations are useful, I don't understand how they're much more than a verified (ex) "Built using GitHub" stamp.

To be clear, I think it's useful, but I hope it doesn't get mistakenly used as a way of automatically assuming project owners are trustworthy. It's also possible I've completely misunderstood the goals since I usually do better at evaluating things if I can try them and I don't publish anything to PyPI.

1. https://slsa.dev/spec/v1.0/provenance


I think I might be the only one that prefers Docker for building Docker containers using CI.

I use Drone, but instead of using the Docker plugin I start a detached (background) Caddy server to work as a proxy to DOCKER_HOST. That lets me proxy to the local Docker socket to take advantage of caching, etc. while I'm iterating, but gives the option of spinning up docker-in-docker to get a clean environment, without any caching, and running a slower build that virtually identical to what happens on the CI server.

I find that having the daemon available solves a ton of issues that most of the CI provided builder plugins have. For example, with the builder plugins I'd always end up with a step like build-and-tag-and-push which didn't work very well for me. Now I can run discreet build steps like build, test, tag, push and it feels far more intuitive, at least to me.


If you want a simple example of how important good regulators are, look at the NTIA / DoC's handling of the .com cooperative agreement in 2018. The US gave up control of the most important technical asset on the planet and no one even knows it happened :-(


can you expand or link to any sources?


You could verbatim google the string in the comment and the first result will almost certainly tell you everything. There aren’t two sides to this.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: