More

clnhlzmn · 2024-05-09T13:33:58

You might also be interested in metalang99 by the same author.

clnhlzmn · 2024-04-03T01:48:24

This is totally true and the bash apologists are delusional.

TeMPOraL · 2024-04-03T06:08:12

So are some Python advocates, too. The thing that's worse than a bash script made of Perlish line noise, is a piece of "clean code" dead simple Python that's 80% __language__.boilerplate, 20% logic, smeared over 10x the lines because small functions calling small functions are cool. No one has enough working memory to keep track of what's going on there. Instead, your eyes glaze over it and you convince yourself you understand what's going on.

Also, Python build scripts can be living hell too, full of dancing devils that could be introducing backdoors left and right - just look at your average Conan recipe, particularly for larger/more sensitive libraries, like OpenSSL or libcurl.

sanderjd · 2024-04-03T12:19:59

FWIW, my comment wasn't meant to single out python as particularly good. I think the comparison I drew between its inscrutability and that of shell / bash would apply to nearly all other languages as well.

_a_a_a_ · 2024-04-03T08:32:16

That's a lot of emotive language, can you actually link to some actual examples.

medstrom · 2024-04-03T10:14:51

I can understand the frustration when those small functions are not well-named, so you have to inspect what they do.

TeMPOraL · 2024-04-03T10:38:13

You already have to inspect everything if you want to review/audit a build script. Small functions - and I specifically mean functions being written small because of misguided ideas of "clean code", as opposed to e.g. useful abstraction or reusability - become especially painful there, as you have that much more code to read, and things that go together logically (or execution-wise) are now smeared around the file.

And you can't really name such small functions well anyway, not when they're broken down for the sake of being small. Case in point, some build script I saw this week had function like `rename_foo_dll_unit_tests` calling `rename_foo_dll_in_folder` calling `rename_foo_dll` calling `rename_dlls`, a distinct call chain of four non-reused functions that should've been at most two functions.

Are all Python build scripts like that? Not really. It's just a style I've seen repeatedly. The same is the case with inscrutable Bash scripts. I think it speaks more about common practices than the language itself (notwithstanding Bash not really being meant for writing longer programs).

medstrom · 2024-04-03T11:27:37

Sounds like DRY run amok indeed. Maybe a compiler or linter could detect these cases and complain "this function is only called from one place" :)

ben_w · 2024-04-04T12:16:10

I'm glad I'm not the only one with this particular frustration.

TeMPOraL · 2024-04-03T10:17:07

I mentioned Conan recipes, didn't I? :). Those are my most recent sources of frustration.

_a_a_a_ · 2024-04-03T11:35:36

I've never heard of a conan language, and a couple of URLs to some bad recipes would not go amiss.

TeMPOraL · 2024-04-03T12:20:05

Conan is a package manager for C/C++, written in Python. See: https://conan.io/.

The way it works is that you can provide "recipes", which are Python scripts, that automate the process of collecting source code (usually from a remote Git repository, or a remote source tarball), patching it, making its dependencies and transitive dependencies available, building for specific platform and architecture (via any number of build systems), then packaging up and serving binaries. There's a lot of complexity involved.

Here are the two recipes I mentioned:

libcurl: https://github.com/conan-io/conan-center-index/blob/master/r...

OpenSSL v3: https://github.com/conan-io/conan-center-index/blob/master/r...

Now, for the sake of this thread I want to highlight three things here:

- Conan recipes are usually made by people unaffiliated with the libraries they're packaging;

- The recipes are fully Turing-complete, do a lot of work, have their own bugs - therefore they should really be treated as software comonents themselves, for the purpose of OSS clearing/supply chain verification, except as far as I know, nobody does it;

- The recipes can, and do, patch source code and build scripts. There's supporting infrastruture for this built into Conan, and of course one can also do it by brute-force search and replace. See e.g. ZLib recipe that does it both at the same time:

https://github.com/conan-io/conan-center-index/blob/7b0ac710... -- `_patch_sources` does both direct search-and-replace in source files, and applies the patches from https://github.com/conan-io/conan-center-index/tree/master/r....

Good luck keeping track of what exact code goes into your program, when using Turing-complete "recipe" programs fetched from the Internet, which fetch your libraries from somewhere else on the Internet.

_a_a_a_ · 2024-04-03T13:08:30

That was a really, really good answer, thanks.

lobocinza · 2024-04-03T04:09:16

It depends on the use case. Bash code can be elegant, Python code can be ugly. I'm not saying those are the average cases but complex code regardless of the language often is ugly even with effort to make it more readable.

clnhlzmn · 2024-04-04T02:40:28

Oh I get it. Sometimes bash is the right tool for the job. I think that’s just mostly an unfortunate historical artifact though. It’s hard to argue it’s intuitive or “clean” or ${POSITIVE_ADJECTIVE} in the average case.

egorfine · 2024-04-03T08:41:38

I'm a bash apologist and I totally stand by your words. It's delusional. Bash totally has to go.

clnhlzmn · 2024-04-02T14:13:18

This is not Lasse Collin’s responsibility. What is a burnt out committer supposed to do? Absolutely nothing would be fine. Doing exactly what Lasse Collin did and turn over partial control of the project to an apparently helpful contributor with apparent community support is also perfectly reasonable.

clnhlzmn · 2024-04-01T15:20:50

This is really not the responsibility of unpaid developers.

lenerdenator · 2024-04-01T17:51:32

Big vendors should pay to get to know them, because they're the ones making the money off of the developers' work, but "I don't want to meet anybody and want to just manage the project" is the FOSS version of "just trust me bro".

clnhlzmn · 2024-04-01T00:21:48

I think the c++ version could be more understandable but it’s as if the authors intentionally made it as obtuse as possible.

bregma · 2024-04-01T10:53:18

The authors are required to make it obtuse. They're required to use warts on all of the names because most of the code is in the head files and is generative code compiled by users of the library rather than the vendor. In order to avoid naming conflicts they can only use obscured names in their implementation of any but the defined API (eg. naming any internal functions, macros, or variables with leading underscores).

So, the authors did intentionally make it as obtuse as possible for your benefit. It's written to be used, not studied, by all kinds of developers in all kinds of circumstances.

gosub100 · 2024-04-01T11:33:33

They could supply a "pretty" version for people who want to review it. Every time I have to step through code (and accidentally step into STL code) it looks sloppy and gross, like a swamp. No comments or organization. I would expect something neatly formatted, and comments saying "This is overload-4 of std::copy()..." etc.

bregma · 2024-04-02T12:52:16

Professional software developers have a lot to do just to get their job done on time and within budget. Having to duplicate all their code just so that people who contribute nothing to the end product can have an easy time understanding it is just never going to be a priority worth addressing.

The problem here is not really the code, it's the reader.

gosub100 · 2024-04-03T17:35:39

Who says they have to duplicate it? Just write the original version clean, clearly, and concisely. Then run it through a mangler to rename variables to avoid collisions.

The STL is maintained by volunteers, it's a FOSS project. So your appeal to Serious Business doesn't hold.

clnhlzmn · 2024-03-30T23:58:17

Would it be reasonable to expect that this MR comes along with a test that shows that it does the thing it’s claiming to do? I’m not sure how that would work in this case.. have a test that is run on a system that is known to have landlock that does something to ensure that it’s enabled? Even that could be subverted, but it seems like demanding that kind of thing before merging “features” is a good step.

masspro · 2024-03-31T00:15:59

I like the idea of testing build-system behaviors like this, and I don’t think it’s ever really done in practice. Scriptable build systems, for lack of a better name for them, exist at a bad intersection of Turing complete, hard to test different cases, hard to reason about, hard to read the build script code, and most of us treating them as “ugh I hope all this stuff works” and if it does “thank god I get to ignore all this stuff for another 6 months”.

azakai · 2024-03-31T04:10:04

If you mean testing the "disable Landlock if the headers and syscalls are out of sync" functionality then I agree, workarounds for such corner cases are often not fully tested.

But it would have been enough here to have a test just to see that Landlock works in general. That test would have broken with this commit, because that's what the commit actually does - break all Landlock support.

Based on that it sounds like there wasn't a test for Landlock integration, if I've understood things correctly.

adrianmonk · 2024-03-31T00:13:41

Create a tiny fake version of landlock with just the features you're testing for. Since it's only checking for 4 #defines in 3 header files, that's easy.

Then compile your test program against your fake header files (with -Imy-fake-includes). It should compile without errors even if landlock is missing from your actual system.

Then build your test program a second time, this time against the real system headers, to test whether landlock is supported on your system.

viraptor · 2024-03-31T00:19:30

I'd say this MR is a bad approach in general. The headers say what interfaces are known, not what features are available. You should be able to compile with landlock support on a system which doesn't enable it. Same situation as seccomp and others. Your build machine doesn't have to match the capabilities of the target runner.

But yeah, to test it, you can have a mock version of landlock which responds with the error/success as you want, regardless of what the system would normally do. It relies on the test not being sabotaged too though...

raimue · 2024-03-31T00:29:20

Read the code of the check again. It mostly checks that the required SYS_* constants are defined to be able to use the syscalls. You can compile this on a system that does not have landlock enabled in the running kernel, but the libc (which imports the kernel system call interface) has to provide the syscall numbers.

viraptor · 2024-03-31T00:37:18

You're right. I didn't see SYS... symbols being actually used, but they are: https://git.tukaani.org/?p=xz.git;a=blob;f=src/xz/sandbox.c;...

This doesn't change my opinion in general - that version should be exposed through a library call and knowing about the specific syscalls shouldn't be needed in xv.

Denvercoder9 · 2024-03-31T01:37:47

I see your point, but suggesting adding an additional library dependency while we're discussing a supply chain attack is quite ironic.

viraptor · 2024-03-31T02:07:06

Should've said function call not library call. My bad. Basically if you already have the linux/landlock.h, that should provide everything you need to do without explicit references to SYS...

raimue · 2024-03-31T12:42:46

Now we are running in circles. As you see in the git commit, the compile check was added because the existance of linux/landlock.h alone was not enough to check that the feature can be used.

This header defines the data types for the Linux kernel interface, but not how the syscall landlock_create_ruleset(2) will be issued. That is provided by libc either as a separate wrapper function (does not exist in glibc) or the implementation needs to use syscall(SYS_landlock_create_ruleset, ...), with the constant also being provided by libc. That is how it works for all syscalls and you won't be able to change this.

SAI_Peregrinus · 2024-03-31T14:34:34

The only source of the claim that the existence of.linux/landlock.h is insufficient is (AFAICT) the malicious git commit. Why trust the comment, written by the attacker, to explain away a malicious change?

raimue · 2024-03-31T19:29:58

I already explained above why the existence of linux/landlock.h is not sufficient. Why do you still question it? If you know a bit about system programming and how configure checks work, the change in itself is totally reasonable.

clnhlzmn · 2024-03-30T21:02:37

Yeah I find the rolling method is more work than it’s worth when the “grab the corners and shake vigorously” method works just fine.

dzhiurgis · 2024-03-31T08:16:31

Shaking is far more exhausting, dusty and not foolproof. Roll method is no brainer, esp on large sheets.

clnhlzmn · 2024-03-29T01:43:22

TextAdept is a cross platform, minimal, fast, and extensible text editor that might be a good candidate.

fsloth · 2024-03-30T08:28:05

Thanks! I’ll check it out.

clnhlzmn · on Aug 5, 2021

> I mean, you can still use it, but you have to do your own type checking/coercion in code

Can you explain why you wouldn't have to check what you're putting into the database? If you're just stuffing values of unknown type into a statically typed DB you're going to get errors up front and if you stuff values of unknown type into SQLite you might get errors down the road. Either way you probably should know what you're putting into the DB in the first place.

wenc · on Aug 5, 2021

It's not so much that one is stuffing random/unknown types into the database without type-checking. Programming language types do not map exactly to database types even in statically typed DBs. ORMs manage this translation for you (imperfectly at times), but for those of those of us work with directly with SQL, we do typecheck, otherwise the INSERTs will fail.

It's more that static database types provide a standard contractual interface that is enforced (agnostic of application and programming language) and has reproducible behavior upon retrieval.

The advantage of static types is the guarantee that if a data point is successfully INSERTed, it can be successfully retrieved in the future.

In a dynamically typed DB you have a problem of standardization across codebases -- every new program/microservice that is written will have to use the exact same typechecking code, or else risk future retrieval issues. Those that do type coercion on the other end are essentially guessing and hoping that the original type was successfully reproduced upon retrieval. Plus if you work with different programming languages, that same code has to be ported to all the different languages. You also find yourselves having to reinvent the wheel a lot -- for instance the SQL DECIMAL type. In SQLite you can either store it as an INTEGER or REAL, and then either store metadata in another field or create a specific function to retrieve the INTEGER and recreate the specific DECIMAL type with the right number of digits (say DECIMAL(18,0)).

On the other hand you can rely on SQL types and get all this for free and have the assurance that it will work impeccably.

mumblemumble · on Aug 5, 2021

SQLite doesn't fit perfectly cleanly into this, but, for me, the big distinction is between schema-on-read and schema-on-write. This captures the fact that you always need to have some sort of schema constraint when you're working with data. The question is, do you apply those constraints when you're writing the data, or when you're reading it?

Schema-on-write is useful when you know exactly what the schema should be ahead of time, and it's static. That happens quite often in OLTP and business intelligence applications, but that's not always what you're doing.

Schema-on-read is very useful in those situations where the schema constraints you need can't be known ahead of time, or might vary from situation to situation. At that point you're kind of stuck delaying the schema bits (including making sure everything is an appropriate type) until the last minute. This comes up in, for example, big data applications such as data lakes.

wenc · on Aug 5, 2021

Schema-on-read vs Schema-on-write are definitely relevant concepts.

I think SQLite sort of straddles the two -- it's sort of a loose schema-on-write (with type affinity to its base types), and schema-on-read (with respect to more complex types which is inferred from what is stored in base types)

mumblemumble · on Aug 5, 2021

Agreed. Which kind of makes it even more complicated to talk about. Whether its approach offers the best of both worlds, or the worst of both words, depends on your application, and the dividing line doesn't cleanly follow any of the distinctions we normally make among types of application.

CRConrad · on Aug 6, 2021

> Either way you probably should know what you're putting into the DB in the first place.

Yeah, well, one good way of finding that out in practice is that reasonable RDBMS systems tell you that when you try to stuff the stuff in.

clnhlzmn · on Aug 5, 2021

If you're storing the text "1.0000" in a NUMERIC column maybe you're the dangerous one.