Hacker News new | past | comments | ask | show | jobs | submit login
Union, intersection, difference, and more are coming to JavaScript Sets (sonarsource.com)
115 points by thunderbong 9 months ago | hide | past | favorite | 80 comments



This article seems to be largely fluff. Here's a link to the proposal with the list of methods being added: https://github.com/tc39/proposal-set-methods?tab=readme-ov-f...


It seems to be written by someone with an incomplete understanding of the analogies they're using:

    A difference is like performing a LEFT JOIN.


Has anything changed recently? The proposal has been in stage 3 since November 2022 [1]

[1] https://github.com/tc39/notes/blob/HEAD/meetings/2022-11/nov...


I believe that they're just waiting for Firefox to ship it, at which point it will be upgraded to Stage 4 and included in ES2024.


Having read the article... it doesn't? It gives a nicely elaborate overview of the original Set and the new functions, in a way that lets folks who've never use sets (except to hide "filter for uniqueness" functionality) know that it might actually be useful to them.


Javascript Set is the least useful Set there is across languages I have used.

Why won't it work on objects when the language very good for creating objects with any ceremony.

EDIT:

By won't work i mean you are left with reference equality checks which I never felt useful.

Talking about other languages i worked with, In Java and C# you can override equals method in which you specify which property to check equality on, (even python you can do that with __eq__ i think).


It works fine on objects.

  const set = new Set();
  const x = { id: 'some object' };
  foo.add(x);
  foo.has(x);
  > true
I'm guessing you're taking issue with the fact that it uses reference equality instead of structural equality, which can definitely be a pain point. There's a proposal to improve the situation with "Records and Tuples" [0], but it's been stuck in committee hell for years.

[0] https://github.com/tc39/proposal-record-tuple


(Not directed at you, but the JS implementation)

I don't understand the point of this. What good is an object in a Set if it can't be deduped?

With this and so much else in the core language, I wish they just incorporated lodash into Ecmascript and called it a day. https://lodash.com/docs/4.17.15#uniqWith (the example is specifically for comparing objects by value)

> but it's been stuck in committee hell for years.

It's been like this for more than a decade, too. I've never seen a popular language evolve SO slowly, whether it's with Sets, or things like the Temporal API or TypeScript or JSX or bundlers. Almost all the innovation in JS seems to come from third parties forced to hack them in, whether through vendor prefixes, jQuery, or later ecosystems like npm libs and React. In that same time period, PHP improved by leaps and bounds, entire languages like Rust came out of nowhere and gained a foothold, WASM was developed, etc. And lodash is STILL around and still useful.

It's so frustrating to watch.


JS sets are plenty useful. Sometimes you want reference equality for objects (though letting programmers choose would be ideal).

As far as the pace, I think they do it about right. JS has hard requirements to be backwards compatible and has several canonical implementations, so adding new features should only be done after careful consideration. Letting users add features themselves and ensuring they have staying power and wide adoption is a good test.


> JS sets are plenty useful. Sometimes you want reference equality for objects

An example of where I've found this useful is object cycle detection.


That's the curse of a language that has to be supported by several competing implementations, and therefore all have to agree on any additions. Also, it's important not to "break the web".


Would it really break the web to have a basic, standard implementation of "compare object by value"? Or a first-party type system?

I think the inverse can also be true, where the core language / TC39 is over-cautious with feature adoption, leading to extreme ecosystem fragmentation and eighteen vendors reinventing the same three wheels every year.


My understanding of the problem with records-and-tuples (which is what "objects comparable by value" would look like) that's kept it stalled out is that it breaks all the engines' internal assumptions about how things work, so they'd have to do a large amount of refactoring, and they'd have to do it in a way that doesn't regress performance for code that doesn't use the new features. Given how complicated modern JS engines are, this is just a lot of work and a big ask. I do hope that it happens eventually.

As for a first-party type system, would you want it to basically be TypeScript, or something else? If the former, then we'd lose a lot of features and such, because the core language, not being a single vendor, can't move as fast as TypeScript does. If the latter, then it would be necessary to define what, and in all likelihood different people would want different and incompatible things.


It seems like they're worried that interning of objects/records would be too expensive to do generally. It's hard to predict though: that overhead would only apply to new code using R&T and has to be weighed against the elimination of recursion for deep comparisons and freezing, fewer re-renderings when value-equal but not reference-equal objects are encountered, the greater possibility of memory reuse across deep clones, and other performance optimizations that would be unlocked by true immutability.

Aside from performance, true native immutability would bring huge improvements to how JS programmers can reason about their code. Not having to worry about mutation makes a whole class of possible bugs disappear. Having to rely on third party libraries (or deep freezing manually) for immutability is really holding back the language.


I think "basic implementation of compare by object" is actually not that simple. Either you need to be able to provide an equality function and hashcode function (for efficiency) or some generic comparison which is useful only for a subset of cases, potentially very slow and brittle.


It's probably about the right speed for a language with an implementation as complicated as this that's unfortunately stumbled backwards into enabling all web page dynamic behavior on the planet.

JavaScript is a secure language and therefore doesn't have the luxury of a language like C++ where you can mash two features in and declare their interaction "undefined behavior." Every new feature must eventually be considered in the context of every other feature.


> I'm guessing you're taking issue with the fact that it uses reference equality instead of structural equality

That and other programming languages provide a mechanism to specify a property to check for value equality.


While that could help, I don't see why records and tuples are necessary. We added symbols precisely to deal with this issue in a backward compatible way. We could add symbols for set/map protocols using `equals` and `getHashCode` which would enable any objects to get set/map deduplication functionality. Those implementation could then be the mechanism how its implemented for records and tuples.

If this is a bad way to do it, then why isn't TC39 working on a better way to implement protocols / traits in JS?


relevant issue, which is at the crux of this problem: https://github.com/tc39/proposal-record-tuple/issues/387 and which shows that a symbol based protocol was the kind of approach that could've worked from the start.

See `Symbol.keyEquality` - although that still shows what I feel is a misunderstanding about the direction at which the protocol should work. I don't want to be creating new types of maps and sets, I want existing ones to have controllable key equality. If it was for new types of maps and sets, I'd just implement a new Set class independent of JS builtins and be done with it. (Its not like the built in Set offers any rich features that I'd have a hard time replicating anyway)

Protocols (traits) should really be the cornerstone of TC39 work, IMO. They'll help with JavaScript's serious ecosystem compatibility issues.


Instead, or in addition to records and tuples I want Symbol.equals or some other interface for equality.


You'd also need Symbol.hashCode, since you wouldn't be able to implement a fast lookup just with equals method.


It does work for objects, but I imagine the problem you're describing is that JavaScript objects are equal if and only if they're the same reference, so you get this:

    const set = new Set();
    set.add({});
    set.add({});
    // Set has 2 items [{}, {}]
    const a = {};
    set.add(a);
    set.add(a);
    // Set has only 3 items [{}, {}, {}]
The actual problem here is that JavaScript doesn't have any way to override `equals` and `hashCode` like Java does, so there's no way to change this reference-equality behavior.


> The actual problem here is that JavaScript doesn't have any way to override `equals` and `hashCode` like Java does

But you can use Symbols for that. no ?


Theoretically, if they were to add the ability to override equals they'd do it through symbols, but they haven't yet:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


Couldn’t you polyfill your own Set implementation using Map<Symbol, Object> where the symbol is constructed how you need it from the object?


Yes. Symbols are the obvious ones. Symbols and sets both came with ES6? I believe. I wonder why they haven't implemented something like that.


This has nothing to do with the language and is just a huge flaw in the design of the standard library: inability to supply a comparator function that is used for data structure types like sets, etc.


I think this is the feature I miss the most in TS: ability to implement equals, compare and hash for my types and have it working with standard Map and Set


The language is good at creating literal objects of no particular type/class/prototype. The concept of a Set is closer to an Array of unique Numbers than one of literal objects:

    const s = new Set([1,2,3]);
The resulting instance of Set can be queried for something existing in the set, have items added to the set, and remove from the set. This doesn’t handle literal objects (e.g. a record) well, which isn’t idiosyncratic to JS as most languages treat objects as reference-ish. The solution be to use a literal String or Number identifier that can be looked up.

I’m genuinely curious, what features from other languages do you want to see in this implementation?


> I’m genuinely curious, what features from other languages do you want to see in this implementation?

In Java and C# you can override equals method in which you specify which property to check equality on, (even python you can do that with __eq__ i think). In JavaScript you are left with either primitives or objects with references equality checks.


That’s fair, I would actually love to have a ‘Symbol.compare’ to implement so instances can be sorted and checked for equality.


An annoyance I have is "add" does not return a boolean indicating if the set already contains the entry.

Most runtimes do this to prevent the need for a double lookup cost by doing "has" then "add."

It's a common pattern for when the set is used to either know when to or when not to do expensive work.


> even python you can do that with __eq__ i think

In Python __hash__() needs to be implemented in order to put something in a set.

You cannot override that in Javascript, but you can roll your own set with custom hash function by wrapping a Map.


> You cannot override that in Javascript

True. But you they can use a "known Symbol" to implement this feature.


Implementing this on mutable objects is a dangerous game. That would be quite the footgun (and JS has plenty of them already).

IMHO it's better to be explicit in such a case. It's not complicated to implement your own set-like class that does exactly what you want.


> That would be quite the footgun

Could you explain why that would be a footgun? JavaScript already has known Symbols like hasInstance which is kind of similar to this.


It's very important to ensure that the value of hash for a given entity doesn't change if it is to be relied upon, otherwise things like Map or Set will not function properly. Therefore, one should be very careful in implementing it yourself and ideally one should use immutable hashable structures provided by the language/stdlib (like tuples or sets in Python). Until Javascript has such structures which should be enough for majority of use cases (there is record and tuple language proposal, but it's in limbo) I think it's unwise to provide a way to sloppily emulate it in user code.


What do you mean by "why won't it work on objects"? Objects work fine as Set members.

    const a = {}
    const b = {}
    const set = new Set([a])
    set.has(a)
    // true
    set.has(b)
    // false


AFAIK there is no mechanism to do the following

const a = { value: 1 } const b = { value: 1 }

const set = new Set([a, b])

set.size will be 2 as JavaScript checks for reference equality and has no mechanism check equality on a specific property


That's not specifically related to Set, though? That's "how the language works"?


Some really low quality comments and discussion on this thread.


Yes, primarily driven by people that seem to have only ever used the JS set and are either confused as to why people are expecting more or surprised to learn what devs in other languages have taken for granted for decades.


I don't often define a set type in my language, but when I do I like to include the basic set of set operators...


The way to do it in EcmaScript-6+ is to create your own class MySet which extends the built-in Set. In it you then add or override any methods you want.

You cannot override the equality -operator but you can add simple short-named method "eq(anotherSet)" and perhaps variations of that.

It's not too much effort to create your own custom Set -subclass because it only needs to do what you need it to do that the built-in Set does not do. You can also perhaps find such an implementation on npm etc.

A perfect built-in Set would be great but being able to subclass the existing Set-class helps a lot already.


The number of existing actual set operations available on std Set is so low that that’s basically the same thing as creating your own custom Set over the std Set via plain, old composition.


That is why the new Set proposal is welcome.

But my point is when you code an application you are not coding a library, but an app, so you only need to add the methods your app needs. You probably don't need to create an all-encompassing new Set-implementation that wins the library-of-the-year award. :-)

Maybe you just need to add a method named 'eq()'.

An added benefit of creating your own Set-subclass is you can put debugger-statements in its code to observe who uses it and how. You can put assertions in it to ensure only correct type of elements can be added to it etc.


I reduced my use of sets.

I often used sets in Java as return-types or parameters to specify collections of unique items.

Profiling code over the years I noticed that, a lot of times the reason for a performance bottleneck was some hashcode calculation for a Set or Map.

So now I often use plain ol' ArrayLists.


whatever man, having a Set type has still been useful in JS even if it didn't have all the algebra hooked up for a few years


Hate to be that guy, but let’s leave out memes and tropes out of here.

https://news.ycombinator.com/newsguidelines.html


Without record types, JavaScript sets are not too useful anyway.


I was looking recently and found there doesn't seem to be any way to do lexicographic ordering of lists in JavaScript, unlike (almost) every other language I've ever used, where lists are ordered lexicographically by default.

Is there any magic quick way of doing this in modern JavaScript, or do I have to keep cutting+pasting a comparator I wrote everywhere?



Hm I don't understand what you mean. What's wrong with `myList.sort()`?


Sorry, I mean the things I'm sorting are themselves lists. Imagine for example:

[ [1,23], [11,1], [11,44], [22,4] ]

I'd consider this list "sorted", as the inner lists are sorted lexicographically. Except, if I gave this list to Javascript and asked it to sort it, it would cast each of the inner lists to strings, which (for example) leads to [11,11] < [1,1], as "[11,11]" < "[1,1]"


A while back I wrote some RxJS operators to do these operations around the primitives (https://rxjs-ninja.tane.dev/modules/array.html) but glad to see they are finally adding them to the standard library.


> union, intersection, difference,

More leftie ideas infecting the language. It's political correctness gone mad.


As a lefty, I’ll admit I snickered at this.


You should since its not actually mocking lefties.


Yeah sorry guys, I couldn't resist that one, throwing away a bit of karma on a "humour punishment" seemed worth it. :)


I don’t care about the karma, but “humor punishment” seems an apt term and it’s kinda disappointing.


Of all the HN rules (written and informal) I find the general air of humourlessness most difficult to vibe with. I do understand the reasons. We'd like to keep discussion focused and jokes sometimes derail a sincere thread. Also, humour can be misinterpreted easily and cause offence. But that comes in a context where sarcasm, dooming and flippancy are rife and acceptable, and sometimes it really does feel like someone just needs to lighten the mood a little. There's also that humour can be a wonderful seed to pose a question or provocative challenge that kicks-off a thread of intellectual curiosity and would not work otherwise. Anyway I am glad that this is neither Slashdot nor Reddit but has it's own atmosphere.


You’ve captured my disappointment perfectly FWIW, so triple thanks (for the laugh, the vocabulary of “humor punishment”, and this articulation). I'm also glad this isn’t Slashdot or Reddit, but sometimes it’s nice to get a reminder that it can also be pleasant here in its own right.


As someone who does not work with Javascript, I'm shocked that any language would have sets and not support these three operations.


C++ sets didn't even support 'contains' in the past


C++ 'sets' are ordered mutable binary search trees in most implementations.

Not the ties and hashtables of other languages.

With strict weak ordering, you really only have the ordering of to elements.

Ada took care of that polysemy between ordered and hash sets a long time ago.

But ya it took them to long to implement 'contains', but JavaScript requiring ordered sets while allowing multiple implementation details complicated it IMHO. Had they chosen a structure vs a time complexity it would have been easier to extend.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


Thankfully unordered_set is also available


Yes, that was one of the big reasons to move to C++11 where I was at the time.


That's just objectively untrue. C++ sets have always supported membership testing.

    bool set_contains_x = set.find(x) != set.end();


I know that, but that's a rather elaborate way of doing the main thing sets do, and you have to repeat the name of the set twice. Fortunately C++20 added set.contains(x). Of course set.count(x) also already existed but some would prefer set.find(x) != set.end() anyway for <reasons>, and it's the fault of the language's design to cause such opinions in the first place imho.


I hate to be pedantic, but since you started it:

That is not a contains() function. That is a search followed by a test for not not-found.


That's a valid distinction for a vector or a list. For a set, contains and search are one and the same.


No, they are not the same thing. They are logically equivalent, which is not the same thing as same thing.

1+3 is logically equivalent to 4, but they are not the same. As just one example, the former contains an operation.


Yeah, it seems like something you can code up in an afternoon.


I would like tuples in js, first class, not the crappy [i,j] which could be the head of an array with no sensible equality definition. Pretty please.


You mean like this stage 2 proposal? https://github.com/tc39/proposal-record-tuple


checks the date

Is it any use to add them now? Are there people who use base JavaScript with absolutely no library?


Libraries have to use bare JS at some point. It is a good idea to have such functions implemented natively, both for performance and consistency.


Yes?

And that's the nice thing about JS: because it keeps improving, many things that used to require a third party library no longer do.


Well I would certainly like to.


I use jquery from time to time, but most of the time I use bare JS




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: