I'm a native English speaker. Your understanding and usage of the word "reliability" is correct, and that's the exact word I'd use in this conversation. The GP is playing a pointless semantics game.
It's not semantics, if the definition is "it does what it’s supposed to do" then probably all of the currently deployed LLMs are reliable according to that definition.
That's the crux of the problem. Many proponents of LLMs over promise the capabilities, and then deny the underperformance through semantics. LLMs are "reliable" only if you're talking about the algorithms behind the scene and you ignore the marketing. Going off the marketing they are unreliable, incorrect, and do not do what they're "supposed to do".
But maybe we don't have to stoop down to the lowest level of conversation about LLMs, the "marketing", and instead do what most of us here do best, focus on the technical aspects, how things work, and how we can make them do our bidding in various ways, you know like the OG hacker.
FWIW, I agree LLMs are massively over-sold for the average person, but for someone who can dig into the tech, use it effectively and for what it works for, I feel like there is more interesting stuff we could focus on instead of just a blanket "No and I won't even think about it".