And what's more, why does it matter? I assume they mean something like Cucumber, which, while I'm OK with, doesn't really add any benefit as well organized tests written in the language your code is in.
Because it allow to keep the domain logic somewhere. In these days people don't do documentation, or documentation is always outdated. Having high-level tests in DSL will do more than just tests - it give you information how your application behave from the user perspective, in more general sense. So you will have less issues when new people join to the project, or when project owner changes. And it force you to focus goal during feature implementation. From my experience, new features often have behavior that are not oblivious, and some times conflict with other application features logic. These tests allow to see these conflicts before implementation.
I disagree, but I get where you're coming from - which is exactly what the problem with this list is. The Joel Test was great because everything on it was universally accepted as something that every developer would want to happen where they worked. With this, different developers are going to have different opinions of many things on the list, making the "score" lose it's usefulness, since now every time a company has less than a perfect score, I need to figure out why (is it because they don't have a library? I don't care so much about that. Is it because they don't have CI? I care a lot about that.)
i'm not a fan of cucumber and you can achieve the same results by just reusing code. a big problem i have is it ends up introducing pointless indirection. defining steps in different files from the feature being tested even if they are only used in that single feature file is unnecessary abstraction. having to grep for step syntax in a project to try and find the code that is implementing a step is insane.