Translating it to idiomatic natural language makes it less clear, because the natural language statement "all marbles in the bag are black" implies that there are marbles in the bag. The question is, why do we interpret it differently in a logical/mathematical context?
As with anything mathematical, we picked the way that yields useful and elegant mathematics. The notion of "all" would be awkward to work with if it did not apply to the empty set (bags without marbles) so we extended it in a way that is consistent with what it means over non-empty sets (bags with marbles.)
One very nice thing about this definition of "all" is that it gives this simple relationship between logical "all" and logical "none":
"for all x, P(x) is true" is equivalent to "there is no x for which P(x) is false"
I.e., if there is no marble in the bag that is not black, then all marbles in the bag are black... so to speak.
This yields elegant and useful mathematics, and (as you would expect) that tends to mean elegant and useful code as well.
For example, suppose you had an enormous dataset of records that included zip codes and wanted to know: "Are all the zip codes in this dataset in the continental United States?" One approach would be to split the data into smaller shards, answer the question for each shard, and "and" all the answers together. What if you chose a bad way of sharding the data and one of the shards was empty? What result would you want returned for that shard?
As with anything mathematical, we picked the way that yields useful and elegant mathematics. The notion of "all" would be awkward to work with if it did not apply to the empty set (bags without marbles) so we extended it in a way that is consistent with what it means over non-empty sets (bags with marbles.)
One very nice thing about this definition of "all" is that it gives this simple relationship between logical "all" and logical "none":
"for all x, P(x) is true" is equivalent to "there is no x for which P(x) is false"
I.e., if there is no marble in the bag that is not black, then all marbles in the bag are black... so to speak.
This yields elegant and useful mathematics, and (as you would expect) that tends to mean elegant and useful code as well.
For example, suppose you had an enormous dataset of records that included zip codes and wanted to know: "Are all the zip codes in this dataset in the continental United States?" One approach would be to split the data into smaller shards, answer the question for each shard, and "and" all the answers together. What if you chose a bad way of sharding the data and one of the shards was empty? What result would you want returned for that shard?