Hacker Newsnew | past | comments | ask | show | jobs | submit | zzo38computer's commentslogin

That is not a good excuse for requiring overly complicated and overly specific software.

Every HN thread is full of people who think webmasters should just pay through the nose to handle bot traffic to preserve the sacred rights of turbonerds to visit their website using Lynx on their toaster.

And guess what .. when these basic rights and principles are no longer considered, very bad things are able to happen.

Might be useful to stop LARPing as a tech bro.


I should think that there should be a better way (e.g. port knocking, instructions for manually correcting the URL that cannot easily be automated, additionally supporting alternative protocols, etc).

My opinion is that they should not use AI/LLMs to program this, but I think this is not the proper way to make a bug report (and that the use of AI/LLMs is not necessarily itself a bug, even though I do have concerns and objections about their use).

They should post the text directly rather than a picture of the text, and it (and the issue title) should describe what is not working in the correct details (in this case, they do provide a few details; it says incremental backups are not working correctly when using multiple --compare-dest= arguments, and it mentions which version does work).

If they are also opposed to using AI/LLMs to program this, then they can mention that as well, but by itself it is not a proper bug report; they have to indicate what (if anything) is wrong with it (whether or not they used AI/LLMs to program it).


I don't actually see much evidence the usage of AI here was an issue. I think you can obviously identify areas where the code isn't perfect. I'd blame this slightly on human prompting, slightly on AI.

But I'm not a sure a human on their own would've done better. There aren't enough resources to make the changes required.


I think a way to do is to reduce how many dependencies are needed.

Yes, every time I get message that some dependency is needed to update, I get anxious what could go wrong not to mention recent situation with malware or robware silent distribution.

It doesn’t matter whether it is Wordpress, Python, Nodejs, PHP, to name a few.

I understand that updates are necessary but we need to change the way we do them.

If I had a solution, I would post it here…


I agree with most of the criticisms they make.

I agree that pointer and length is better than null-terminated strings (although it is difficult in C, and as they mention you will have to use a macro (or some additional functions) to work this in C).

Making the C standard library directly against syscalls is also a good idea, although in some cases you might have an implementation that needs to not do this for some reason, generally it is better for the standard library directly against syscalls.

FILE object is sometimes useful especially if you have functions such as fopencookie and open_memstream; but it might be useful (although probably not with C) to be able to optimize parts of a program that only use a single implementation of the FILE interface (or a subset of its functions, e.g. that does not use seeking).


Making every C call a system call is not a good idea at all - think about malloc() etc - the OS shouldn’t care about individual allocations and only worry about providing brk() etc. otherwise, performance will die if you’re doing a thousand system calls per second!

No modern libc uses (or should use) brk() as the heap. Allocate virtual memory using mmap, VirtualAlloc, etc., and manage your set of heaps.

I believe glibc uses both mmap and brk depending on the situation.

It is not what I meant and also seems to me not what is meant by sp.h either.

Null terminated strings have some merits but they should be a completely different data type like in Freebasic.

Are there other merits than availability of literals in C?

It seems like one of the worst data structures ever - lookup complexity of a linked list with a expansion complexity of an array list with security problems added as a bonus.


One I can think of is simplicity. No need to worry about what the type of the string should be (size_t?) or where it should be stored. Just pass around a pointer. Pointers fit the size of a CPU register most of the time. Though in my opinion the drawbacks (O(N) performance, NUL forbidden etc.) outweigh this benefit we are stuck. Many kernel interfaces like open, getdents etc. assume NUL-terminated strings, therefore any low-level language or library has to support them.

But (i32 length, byte[] data) is as complex as (byte[] data, '\0'), its two-parts anyway. Of course it allows potentially for very long strings at the cost of just a single byte spent as a terminator. Beside the rarity of such a case, the "space savings" might play a role on a PDP11, or on a Z80, but not on any of the modern architectures that need structures aligned to 32 or even 64 bit boundary. The efficiency and security costs far outweigh any savings is space or simplicity (heh) of processing.

Null-terminated strings are the other billion-dollar mistake, along with the original NULL.


Arrays as glorified pointers were the mistake. Null terminated strings are a natural result of that design choice.

Null pointers however were not a mistake, despite how popular slandering them has become. A reasonable case can be made that any modern language should enforce null checks (and bound checks, and ...) or at the least provide them by default but that is neither here nor there as far as C is concerned.


Tony Hoare himself called NULL a mistake. But the problem is not in the ability to set a pointer to a null value, of course. The problem is that all pointers are nullable, and there's no way to statically enforce their being non-null. I wonder how feasible data flow analysis would be in 1969 though.

It's fine as a serialization/deserialization primitive for on-disk files, as long as the NULL character is invalid.

String tables in most object file formats work like that, a concatenated series of ASCIIZ strings. One byte of overhead (NUL), requires only an offset into one to address a string and you can share strings with common suffixes. It's a very compact layout.


Nothing prevents you from using a shared pool of strings that don't have null terminator. It can even be more efficient, since you don't have the null byte to handle at string end. Depending on the maximum string length you want to support, it doesn't even have to take more space.

How do you represent that pool of strings on-disk?

If we concatenate the raw strings together without the null terminator, either all string references will require a length on top of the offset (25% size penalty for a Elf32_Sym), or we'll need a separate descriptor table that stores string offsets and lengths to index into.

If we prepend strings with a length (let's say LEB128), we'll be at best tied with null-terminated strings because we'd have a byte for the length vs. a byte for the terminator. At worst, we'll have a longer string table because we'd need more than one byte to encode a long string length and we would lose the ability to share string suffixes.

Out of all the jank from a.out and COFF that was eliminated with ELF, that representation for the string table was kept (in fact, the only change was mandating a null byte at the beginning to have the offset 0 indicate a null string). It works fine since the 1970s and doesn't cause undue problems, as nothing prevents a parser to spit out std::string_view instead of const char* for the application code.


For short strings (probably most of them) - use a byte for the length (at the string/symbol definition site, alongside the offset) (adds 1 byte * symbols, use high bit if necessary to add bytes for longer strings). You need the offset into the table anyway. It isn't strictly better, but it isn't strictly worse, and it gives you the option to reuse sub-strings.

When using null terminated strings, parsing can be branchless because you don't need bounds checks and can use a jump table indexed by the byte.

Hearing someone mention FreeBASIC really brings me back. It was the first language I ever used pointers in.

> I've gone so far as having a Gemini instance at gemini://g.wiki.roshangeorge.dev which no one has accessed.

If nobody else knows then they might not access, but I looked; at least some of the parts looks like interesting to me.

> The protocols in use here are quite nice and there's always Gemini if you want a protocol that is pure document oriented.

As well as others, depending on what you want to do; it is not quite as simple as "pure document oriented" (e.g. Gemini does have inputs (1x status code) and TLS as well, including authentication with client certificates).

> Perhaps a HTTP browser that only `Accept`s `text/markdown`

It might also be made to be modular so that the file formats and other features can be added separately (including HTTP, HTML, Unicode, etc also would not be forcibly built-in, and the different protocols, file formats, character sets, and other features can be done by adding them on (which can be static or dynamic; static might allow some possible optimizations but would require recompiling and/or relinking it when you want to change it)).


Gopher and Gemini can both work on many kind of devices; having a monochrome display, or the differences in input (e.g. having numbered lines works OK, especially since both require links to be on a line by itself, unlike HTML), etc, without the author of the document needing to worry about such things like that. In both cases text entry might sometimes be needed so is not ideal but still it is possible.

What is "TLC" meaning here? Furthermore, for the purpose of keeping out commercial entities, it would be necessary to have the details of what is intended to be avoided and in what contexts, as well as how to avoid certain things; I think simply "keeping out commercial entities" won't do (except perhaps for such things like e.g. indexing services, which can choose not to link to them).


TLC (tender loving care) means a better method to avoid spam. That can be a problem with USENET.

There are also the other "small web" protocols (and some other stuff). Either way, it is still the same internet and not a new one, and still uses TCP/IP and DNS (although not HTTP/HTML). (That does not mean that it is not worth anything, though.)

The robots.txt file should be used to restrict (and, in some cases, slow down) crawling at the time it is being crawled, not for SEO or for restricting access to mirrors or for any other purpose. It should never apply retroactively. (Unfortunately it is sometimes used badly despite this.)

They mention a compiler having access to a file called BILL for storing billing information and if you specify that it is the file for debugging then it is overwritten by the debugging information. While an appropriate kind of capability system (such as proxy capabilities, or object-capabilities described in that article which is very similar) can help, locking the file might also help (if it is locked for billing first before any files specified by the user are locked); then the compiler will complain that the file specified as the debugging output file cannot be written because it is locked (even though the compiler is the one that locked that file). A capability system is better, although it would be possible to do both, since locks (and transactions as well) are also helpful for other purposes.

This kind of proxy capabilities has other benefits as well, e.g. you can implement a disk quota, or transparent compression, or logging, or ask the user (if you have a capability which can do that), or provide access to a part of the file as though it is the entire file, etc.

Or, if a program requests access to a camera, you can provide a capability with a still picture, a video file, a filter (e.g. that resizes the picture or modifies the colour) from some source (including, but not limited to, a camera), etc; this can be helpful in case e.g. you do not have a camera on your computer, or for testing.

(Other people have similar ideas, sometimes independently than I do.)

There is also a way to transmit capabilities across a network; I had thought of how a protocol would be made to do such a thing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: