I've greatly enjoyed this series, I feel like these topics are quite hard to find in learning materials (using a profiler, how to use a profiler, how to tell what is useful in a profiler's results).
I really hope after this series you consider consolidating the posts and making a book out of it, I would definitely buy it as I'm sure others would as well.
Thanks for the kind words! I felt the same way and I'm hoping to fill that gap with this series. I've always been someone who loved to peek behind the curtain to appreciate how things are done behind the scenes.
> I really hope after this series you consider consolidating the posts and making a book out of it, I would definitely buy it as I'm sure others would as well.
I'm really happy to hear this! With the blog posts I wanted to test the waters to see if there is enough interest in such a topic. It helps me practice my writing and storytelling too. I'd really love to write proper book one day!
ESLint is so slow. I think it takes longer to lint our codebase than it does to run our hundreds and hundreds of tests over it. So beyond the technical deep-dive, it's really exciting that attention is being given to performance
I found a big improvement in eslint by disabling specific tests that I don't need. The worst offender was the whitespace/indentation check. Turning that off made a huge difference in overall build time.
That's a great tip! I noticed a similar speedup when removing other formatting related plugins when I did the research for this article. Originally, I had planned to look into that too, but the article was already getting pretty long.
> we can confirm that this function is only called with strings with characters in the ASCII range. That makes it a little easier to rewrite getPath to only ever do one iteration without any allocations at all
Wait, so in this part the author implies that you can only safely index into a JavaScript string one character at a time if it's ASCII, not Unicode
That's true in languages like C, and even Rust, where an index is an exact byte offset. But I'm 95% sure JavaScript smooths over this, and will treat each index as one unicode character (there's an exception for multi-character entities like diacritics, but). Did the author get this wrong?
Kind of, sort of, not really. What they imply (by using the term "ASCII" here) is not correct, and I'm not sure how the assurance that the string does not contain astral characters helps them split a string by the `.` character. But JavaScript doesn't exactly "smooth over this" in a very useful way, either.
For legacy reasons, JavaScript's "character unit", the basic component of a string, is an "UTF-16 character", that is, sixteen bits that are interpreted as being UTF-16-encoded. That said, sixteen bits are not enough to represent all valid Unicode characters in the UTF-16 encoding. Instead, characters in the [supplemental planes] are represented in UTF-16 using two sixteen-bytes "non-characters", which do not individually map to any Unicode codepoint in any plane, but in combination reference an Unicode codepoint in one of the supplemental planes.
JavaScript's internal representation of strings, as well as the APIs it exposes for dealing with strings, such as index accessing and string length, treat each of the sixteen bit "halves" of the UTF-16 representation of a supplemental plane codepoint as individual characters.
This means that, when you index a string, you might get an UTF-16 character that represents a Unicode codepoint in the basic plane, or an UTF-16 "non-character" that, along with its other half, would represent an Unicode codepoint in one of the supplemental planes.
That's great feedback! After reading your comment and re-reading the section in the article it does indeed sound wrong. Decided to remove that paragraph. Your explanation of the string representation is really good. Thanks for sharing!
I'm glad it helped! Now that I'm actually looking at the different ways to manipulate strings in JavaScript and not going from memory, the traditional JavaScript "except when it doesn't" caveat applies.
It seems like _some_ string operations treat each surrogate (that's the fancy name for the half-characters) as its own character, while others (correctly) treat the surrogate pair as a single character.
This might explain how ensuring that the function name does not contain astral character would make it easier to use different string functions together without accidentally introducing bugs.
I really hope after this series you consider consolidating the posts and making a book out of it, I would definitely buy it as I'm sure others would as well.
This content is great.