Hacker Newsnew | past | comments | ask | show | jobs | submit | nrds's commentslogin

Wait until you realize that the difference between path and query string is entirely arbitrary and decided by the server. Query strings should never have existed. They are an implementation detail of CGI webservers that leaked all over everything and now smells really bad.

I dunno, it seems like the fact that we arrived at a fairly standard structure for URL paths that works pretty well is not a bad outcome.

Seems a lot better than the other potential world we could lived in, where paths were a black box and every web server/framework invented their own structure for them.


My next website is going to have the path portion of the URL be a base64 encoded ASN.1 blob.

So long as it starts with a slash, go ahead! See how long it takes for someone to figure it out.

It’s your website. Have fun with it! Do dumb things! :-)


Make sure you use URL-safe base64 or the portions that looks like a path can get mangled

MII//epi

Is converted to MII/epi


That would be broken software.

https://en.wikipedia.org/wiki///


In my current project I use URIs to refer to absolutely any entity in a git(-ish) repo. Files, branches, revisions, diffs, anything. URI turns out to be a really good addressing scheme for everything. Surprise. But the most used and abused element is always the path. Query takes a lot of that mess away. Might have been unmanageable otherwise.

https://github.com/gritzko/beagle


In fact, GitHub URIs are a good example of overusing paths: https://github.com/gritzko/beagle/blob/a7e17290a39250092055f...

  - user gritzko,
  - project beagle, 
  - view blob, 
  - commit a7e17290a39250092055fcda5ae7015868dabdb4, 
  - file path VERBS.md
... all concatenated indiscriminately.

That’s not an indiscriminate hierarchy.

Grouping data by user is common and normal in computing: /home laid precedent decades ago.

Project directories are an extremely common grouping within a user’s work sets. Yeah, some of us just dump random files in $HOME, but this is still a sensible tier two path component.

The choice to make ‘view metadata-wrapped content in browser HTML output’ the default rather than ‘view raw file contents’ the default is legitimate for their usage. One could argue that using custom http headers would be preferable to a path element (to the exclusion of JavaScript being able to access them, iirc?) or that the path element blob should be moved into the domain component or should prefix rather than suffix the operands; all valid choices, but none implicitly better or worse here.

Object hash is obviously mandatory for git permalinks, and is perhaps the only mandatory component here. (But notably, that’s not the same as a commit hash.) However, such paths could arguably be interpreted as maximally user-hostile.

File path, interestingly enough, is completely disposable if one refers to a specific result object hash within a commit, but if the prior object hash was required to be a commit, then this is a valid unique identifier for the filesystem-tree contents of that commit. You could use the object hash instead of the full path within the commit hash, but that’s a pretty user-hostile way to go about this.

So, then, which part of the ordering and path selections do you consider indiscriminate, and why?


actually, instead of the object hash, you could also use the commit-hash. then the filename would be mandatory, but the url would be more readable and usable: give me the file VERBS.md as it is at commit <hash>

That's actually what it is here, a7e17290a39250092055fcda5ae7015868dabdb4 is a commit's oid: https://github.com/gritzko/beagle/commit/a7e17290a3925009205...

yes, you are right. and it makes a lot more sense that way. see my other comment on the difference between commit blob and raw.

But the path misses param names (or types?). E.g who said the hex-encoded part is a commit hash? Maybe it's a tree hash, or just weird ref.

Query strings are more verbose as force to give each param a name.


Which target audience of github needs extra verbosity in the commit hash, though? Once you know it you know it; if you don’t know git you aren’t the target audience; etc. Saying /user=foo is no better than ?user=foo if your audience can work it out without confusion from your unadorned paths. We have a great deal of history with filesystems showing that people are capable of keeping up with paths that lack key names if exposed to and familiar with them, and if the filesystem isn’t being constantly randomized.

> Saying /user=foo is no better than ?user=foo

I mean /foo vs ?user=foo

I know git enough, there's more than one type of hashes -- object hashes, tree hashes.


Back in the day there was an attempt to introduce "matrix URIs" as a more structured alternative to query strings: https://www.w3.org/DesignIssues/MatrixURIs.html

Of course there's nothing to stop you using URIs like this (I think Angular does, or did at one point?) but I don't think the rules for relative matrix URIs were ever figured out and standardised, so browsers don't do anything useful with them.


what would be a better way of doing that? i am not disagreeing, but i just can't think of any way to improve on this. put everything into the query part? i prefer to use the query only for optional arguments. in this example the blob argument is the only thing that doesn't fit in my opinion.

Every object in git (commit, tree, revision of a single file) has a hash that is guaranteed unique within a repository (otherwise many more things than a web UI would break) and likely also globally. I can understand wanting to isolate repositories to prevent hash collisions from causing problems, but within a repo everything has a universally unique ID.

edit: for instance, that specific VERBS.md is represented by the blob 3b9a46854589abb305ea33360f6f6d8634649108.


that's not what i meant. i was trying to suggest that the string "blob" does not fit. why is it there? why is it needed?

    https://github.com/gritzko/beagle/a7e17290a39250092055fcda5ae7015868dabdb4/VERBS.md
this should be sufficient to represent the file.

"blob" is like a descriptor of the value that follows. it would be like doing this:

    https://github.com/user/gritzko/project/beagle/blob/a7e17290a39250092055fcda5ae7015868dabdb4/file/VERBS.md
this actually irks me every time i see it in a github url

> this should be sufficient to represent the file.

Except it's not, because the oid can be a short hash (https://github.com/gritzko/beagle/blob/a7e172/VERBS.md) and that means you're at risk of colliding with every other top-level entry in the repository, so you're restricting the naming of those toplevel entries, for no reason.

So namespacing git object lookups is perfectly sensible, and doing so with the type you're looking for (rather than e.g. `git` to indicate traversal of the git db) probably simplifies routing, and to the extent that it is any use makes the destination clearer for people reading the link.


how does adding the word blob in the url help with that?

i don't think it makes a difference here.

in fact compare these urls:

https://github.com/gritzko/beagle/blob/a7e172/VERBS.md

https://github.com/gritzko/beagle/raw/a7e172/VERBS.md

https://github.com/gritzko/beagle/commit/a7e172/VERBS.md

turns out that "blob", "raw" and "commit" have nothing to do with the hash itself, but are functions to describe how the object in question is to be presented. so what i said above about blob being redundant is false, the problem is rather that it is in a weird place. it should be at the end, like a kind of extension because it signifies the format of the output. except i think putting it at the end makes handling relative paths more difficult as it would have to be appended to every link to other files.

the roxen webserver has an interesting solution for that. they call it prestates and it's placed at the beginning of a url: https://github.com/(commit)/gritzko/beagle/a7e172/VERBS.md . it sets the format value visually apart, and you could have multiple prestate values separated by a comma. i have used that feature extensively on my own sites. i even expanded on the concept in custom modules.


> how does adding the word blob in the url help with that? i don't think it makes a difference here.

How does adding a disambiguating segment help disambiguate?

"in fact, consider these urls":

https://github.com/gritzko/beagle/issues

https://github.com/gritzko/beagle/pulse

> are functions to describe how the object in question is to be presented

So they are functions, which take parameters, which makes prefix notation reasonably natural?

> the problem is rather that it is in a weird place. it should be at the end

That's, like, your opinion man.

> except i think putting it at the end makes handling relative paths more difficult as it would have to be appended to every link to other files.

It also doesn't make sense when file paths may not be relevant at all e.g. compare

https://github.com/gritzko/beagle/commit/a7e172

and

https://github.com/gritzko/beagle/commit/a7e172/VERBS.md

As well as where https://github.com/gritzko/beagle/blob/a7e172/ ends up

> the roxen webserver has an interesting solution for that. they call it prestates and it's placed at the beginning of a url: https://github.com/(commit)/gritzko/beagle/a7e172/VERBS.md .

> When developing and debugging is a great help to be able to turn on and off specific parts of the code that generates the current page.

That doesn't have anything to do with what github does.


They are following the /key/value/key/value pattern, but the first two pairs in a GitHub URL are fixed to user and project, which lets them omit the key names. I could see them not being willing to hardcode the third pair to blob.

Back when GitHub URLs were kind of cool, github.com/user/gritzko/project/beagle would have been much less cool than just github.com/gritzko/beagle.


> They are following the /key/value/key/value pattern

They are not. There's just a routing layer below the repository.


Not entirely arbitrary - forms that use the GET method instead of POST will append form values as query params.

For sites without Javascript, it's great for things like search boxes, tables with sorting/filtering, etc. instead of POST, since it preserves your query in the URL.

https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...


It has always amazed me how much trouble the SPA folks are willing to go to in order to slowly rebuild just normal boring URLs with querystrings because users demand deep linking and back buttons and the like.

Or you could accept that you're probably going to need a round trip to the server and use a normal URL and it's fine.

For all but the absolute biggest websites in the world, anyhow. At Facebook or Google scale yeah it's needed.


Nothing you said here is correct. Paths, query strings, and fragments are all well defined entities. https://datatracker.ietf.org/doc/html/rfc3986#section-3.3

It’s a string between ? and # isn’t well defined. Or it is and it says very little.

Query strings existed before CGI did, and the way they're defined to be filled in from web forms is quite useful; I wouldn't want to need Javascript to fit that into path format. There's nothing wrong about having things decided by the server; I don't get that part of your argument at all.

Maybe dumb question: how does the server “decide” anything other than what file to serve? Today we have many choices but back in the day CGI was the first standard way to do it.

So yes query parameters existed before CGI but to use them you had to hack your server to do something with them (iirc NCSA web servers had some magic hacks for queries). CGI drove standardization.


    func specialHandler(w http.ResponseWriter, r *http.Request) {
 if time.Now().Weekday() == time.Tuesday {
  http.NotFound(w, r)
  return
 }

     fmt.Fprintln(w, "server made a decision")
    }
Your server can make decisions however you program it to, you know? It's just software.

Forgive the phone-posting.


and what server software is running this code in 1995?

CL-HTTP or AOLserver

sure looks like VB there, what’s the plugin? Didn’t see anything like that before.

That's Go.

Which runs on what computer in 1995?

I'm not sure what point you're trying to make. Here it is in C, so you can run it on you computer in 1995? Because servers could make decisions in 1995.

int main() { int s = socket(AF_INET, SOCK_STREAM, 0); setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &(int){1}, sizeof(int));

    struct sockaddr_in addr = { AF_INET, htons(8080), .sin_addr.s_addr = INADDR_ANY };
    bind(s, (struct sockaddr*)&addr, sizeof(addr));
    listen(s, 10);
    printf("Listening on :8080\n");

    while (1) {
        int c = accept(s, NULL, NULL);

        char req[1024] = {0};
        read(c, req, sizeof(req) - 1);

        time_t now = time(NULL);
        int tuesday = localtime(&now)->tm_wday == 2;

        const char *status = tuesday ? "404 Not Found" : "200 OK";
        const char *body   = tuesday ? "Not Found (it's Tuesday)" : "Hello from 1995!";

        char resp[256];
        snprintf(resp, sizeof(resp),
            "HTTP/1.1 %s\r\n"
            "Content-Length: %zu\r\n"
            "Connection: close\r\n\r\n%s",
            status, strlen(body), body);

        write(c, resp, strlen(resp));
        close(c);
    }
}

A post claimed CGI led to bad standards around query parameter formatting and parsing. I was merely pointing out that, prior to the advent of CGI, if you wanted to actually do anything with those parameters on the server, you had to extend whatever primitive HTTP server you were running, write some custom code and invent your own “standard”. There were no server side frameworks or standards.

TCP has been around a long time. Listen, read, send, you're good to go. It's just software so you can make it do anything.

But you're asking about the relationship between popular primarily file serving servers like Apache and their relationship to high level code to create custom responses? Yeah, CGI was the first big standard there that I remember, though it was a bit before my time. But that's only one possible architecture.

These days, most web apps have the web server built in, and so the custom code you're writing works with the full request directly. There may be a lightweight web server in front (or multiple), like nginx, to manage connections, but they will largely just proxy the whole thing through.


I was responding to:

> Query strings existed before CGI did… There's nothing wrong about having things decided by the server

Sure, but there is also no standard for how to format/parse the query string. And also no server plugin frameworks. So you are inventing your own standard and extending some HTTP server for which you have source. Until CGI forces a standard, bad as it might be; it’s a common ground.


It's arbitrary to a degree like the difference between using an attribute or child element in XML, but it's not entirely arbitrary. If you want to include data in the URL that's not part of the hierarchy of the path, query strings are good for that.

How do you figure?

Paths are hierarchical; query strings are name/value.

(Note I speak of common usage.)

You can create a different convention, but that one is pretty dang useful.


So you're telling me that cells use an attention system for DNA?


epigenetics is all you need


Instructions unclear. Created eugenics company instead.


The author appears to have a serious misconception about Lean, which is surprising since he seems to be quite knowledgeable in the area.

Specifically, the author seems to be under the impression that Lean retains proof objects and the final proof to be checked is one massive proof object, with all definitions unfolded: "these massive terms are unnecessary, but are kept anyway" (TFA). This couldn't be further from the truth. Lean implements exactly the same optimization as the author cherishes in LCF; metaphorically, that "The steps of a proof would be performed but not recorded, like a mathematics lecturer using a small blackboard who rubs out earlier parts of proofs to make space for later ones" (quoted by blog post linked from TFA). Once a `theorem` (as opposed to a `def`) is written in Lean4, then the proof object is no longer used. This is not merely an optimization but a critical part of the language: theorems are opaque. If the proof term is not discarded (and I'm not sure it isn't), then this is only for the sake of user observability in the interactive mode; the kernel does not and cannot care what the proof object was.


A proof object in dependent type theory is just the term that inhabits a type. So are you saying the Lean implementation can construct proofs without constructing such a term?


No, I'm saying it is checked and then discarded. (Or at least, discarded by the kernel. Presumably it ends up somewhere in the frontend's tactic cache.) That matches perfectly the metaphor, "rubs out earlier parts of proofs to make space for later ones".

The shared misconception seems to be in believing that because _conceptually_ the theory implemented by Lean builds up a massive proof term, that _operationally_ the Lean kernel must also be doing that. This does not follow. (Even the concept is not quite right since Lean4 is not perfectly referentially transparent in the presence of quotients.)


Yeah. I guess the abstract type approach saves some memory, but it's a constant factor thing, not an asymptotic improvement. The comment that Lean wastes "tens of megabytes" seems telling: it seems like something that would be a critical advantage in the 1980s and 1990s, when Paulson first fought these battles, but maybe less important today...


To be fair, lean wastes and leaks memory like a sieve, but this is almost all in the frontend. It has nothing to do with the kernel or the theorem proving approach chosen.


It is more a conceptual thing. In LCF, proofs and terms are different things, and that is how it should be in my opinion. This Curry-Howard confusion is unnecessary, but many people don't realise that, they think it is the only way to do math on a computer. You can still store proofs in LCF if you want, and use them; just as in Lean, you might be able to optimise them away.


You have done no more to show an actual distinction in the approach than TFA and its linked blog post... It sounds like a naming thing to me. On one side we name the term/program as a term and see it as something checked by the kernel, and on the other you name the term/program as a program and see it as something executed by the runtime. What's the difference?


There is indeed no difference if your dependent-typing approach is using reflection (where the checked term is actually a program that's logically proven to result in a a correct proof when executed - such as, commonly, by running a decision procedure) but that's not a common approach.


The difference is that a term is not (necessarily) a program. Also checking is not executing. Its like saying riding a horse is the same as eating a fish. Really just a naming thing, what's the difference?


You're drawing an equivalence between the wrong pair of things. I'm not saying that term=program; I'm saying that the type checker, qua `term -> context -> decision`, bound to a particular term, is a program `context -> decision`, and the other approach is also a program `context -> decision`. I guess it's defunctionalization, not "nothing", but a next-door neighbor of "nothing".

Deceitful omission from TFA: the warning lights which the firefighter admits seeing override any tower authorization. Omitting this leaves a reader with several incorrect impressions:

INCORRECT: The truck was cleared to cross the runway.

CORRECT: The truck was forbidden to cross the runway due to the active conflict lights.

INCORRECT: Airport safety systems failed.

CORRECT: The only safety system relevant to the crash is the one which would actually alert the truck not to enter the runway, not the one that would alert the controller after the incursion happened (after all, the controller realized immediately what had happened). That system was, by the firefighters' own admission, working perfectly.

INCORRECT: The fault lies mostly with the controller.

CORRECT: While the controller made a bad error which contributed significantly to the accident, nevertheless, the proximal cause of the accident was the truck driver's failure to obey the mandatory direction given by the fully functioning runway safety system.


I agree, but https://en.wikipedia.org/wiki/Swiss_cheese_model . You usually need a few bad coincidence to get an accident.

Probably the firefighter need monthly reminder that the light overrule the controller instructions.


Zig is just doing vtable-based effect programming. This is the way to go for far more than async, but it also needs aggressive compiler optimization to avoid actual runtime dispatch.


Can you monomorphize the injected effect handlers using comptime, for io and allocators (and potentially more)?


I know what a vtable is, but what is vtable-based effect programming?


Well... effect programming using vtables. I think this is an emerging paradigm, but it is very early yet so it's difficult to define precisely.

My primary inspiration for the concept is theorem proving languages like Lean in which typeclasses ("interfaces" in the OOP terminology) are implemented using structures passed down as arguments ("vtables" in the OOP terminology) separately from any receivers (values of the type implementing the interface, which doesn't actually need to exist for Lean). Typeclasses (and interfaces) are an effect, albeit a simple and limited one. Lean can't express effects in their generality due to totality requirements, but the same mechanism would work perfectly well for effects too. As for the "vtable" aspect: the primary distinction in implementing typeclasses using exposed vtable passing is that the language does not in any way limit the programmer to zero or one implementations of a typeclass per receiver type(s) (cf. orphan rules in Rust, cf. compiling effect systems to witness-passing, etc.).


> a version of the Internet that is just intermittently and somewhat mysteriously broken.

That's actually just how the Internet is. Nothing to do with the great firewall.


You have to throw the context away at that point. I've experienced the same thing and I found that even when I apparently talk Claude into the better version it will silently include as many aspects of the quick fix as it thinks it can get away with.


I've been using pi.dev since December. The only significant change to the harness in that time which affects my usage is the availability of parallel tool calls. Yet Claude models have become unusable in the past month for many of the reasons observed here. Conclusion: it's not the harness.

I tend to agree about the legacy workarounds being actively harmful though. I tried out Zed agent for a while and I was SHOCKED at how bad its edit tool is compared to the search-and-replace tool in pi. I didn't find a single frontier model capable of using it reliably. By forking, it completely decouples models' thinking from their edits and then erases the evidence from their context. Agents ended up believing that a less capable subagent was making editing mistakes.


Are you using Pi with a cloud subscription, or are you using the API?


Out of curiosity, what can parallel tool calls do that one can't do with parallel subagents and background processes?


How would you do a parallel subagent if you don't have parallel tool calls? Sub agents are tools.


you find that pay-per-use API's degraded too?


Yes, absolutely.


The far more important question is whether it's bad for the stock price of a few pharma companies.


I've worked at both Microsoft and Google in the past 6 years and the notion that msft "Principal" is equivalent to goog L5 is crazy.


Meaning Msft Principal is below L5? I got the same feedback from one of my friends who works at Google. She said quality of former MSFT engineers now working at Google was noticeably lower.


I mean imputed prestige within the organization. Being an L5 is nothing; it's the promote-or-fire cutoff at Google AFAIK. But being a Principal is slightly more than nothing; it's two levels above the promote-or-fire cutoff.

I mean, _now_, sure, I'd assume Microsoft Principals should be hired around L4 at Google. But that's just due to a temporary inbalance in the decline of legacy organizations. Give it a few years and it will even back out and msft 64 will be in the middle of L5 range like levels.fyi claims.


L5 hasn't been the promote or fire cutoff at Google for perhaps a decade. L4 is the new L5, mostly because Google would have to pay L5s more, and it has been terrified of personnel costs for years.

But even so, an L5 at Google is basically a nobody as far as prestige or convincing other people to adopt your plan goes. Even L6 is basically just an expert across several mostly local teams. L7 is where the prestige gets going.


I mean if you go by pay in the UK a Microsoft principle is equivalent to an L4 at Google if levels.fyi is too be believed....


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: