Hacker News new | past | comments | ask | show | jobs | submit login
A lighter V8 (v8.dev)
471 points by tosh 88 days ago | hide | past | web | favorite | 188 comments



One of the interesting aspects of benchmarks is that they are usually designed and intended to run in isolation. E.g. if you benchmark a database system, you expect that to be the sole system running on your server machine, in control of all resources.

That's not true of software running on desktop systems or mobile phones - desktops usually run many concurrent tasks, so do phones to some degree, and then there's also the question of battery use.

That can create skewed incentives if the benchmark isn't carefully designed. E.g. you can usually make space/time tradeoffs regarding performance, so if your benchmark is solely measuring CPU time, it pays off to gobble up all possible RAM for even minor benefits. If your benchmark is only measuring wallclock time, it pays off to gobble up all the CPUs, even if the actual speedup from that is minor.

This can lead to software "winning" the benchmark with improvements that are actually detrimental to the performance on end user's systems.


Aside: Chromium includes regression tests for battery usage.

I saw a fix get reverted because the code change caused a 5% step increase in battery usage.

They have some very good testing infrastructure.


How do they test this?


this commit[0] might be what's referenced, and this[1] looks to be the test that Google runs (not sure if this is ran in CI, or only done anecdotally for when performance tests[2] show issues).

0: https://github.com/chromium/chromium/commit/208f274e4bcb5174...

1: https://chromium.googlesource.com/chromiumos/third_party/aut...

2: https://chromeperf.appspot.com/


Never heard of [1] before. Impressive stuff.


Here is an example bug that had an unobvious cause (they just reverted the diff out in the end!):

https://bugs.chromium.org/p/chromium/issues/detail?id=520952

Bisect job status: Completed Bisect job ran on: android_nexus9_perf_bisect

===== BISECT JOB RESULTS ===== Status: Positive: Reproduced a change.

Test Command: tools/perf/run_benchmark -v --browser=android-chromium --output-format=buildbot --also-run-disabled-tests power.android_acceptance Test Metric: energy_consumption_mwh/energy_consumption_mwh Relative Change: 150.95% (+/-1.88%) Estimated Confidence: 99.90% Retested CL with revert: No

Pretty graphs now deleted.


Is this responding to something in the post? They mention sampling across different pages and using "in-the-field telemetry", and the sibling post mentions battery testing, so...


The post mentions that they can drastically reduce memory usage at some CPU time cost in the lite version. V8 is presumably doing the right thing for the end user here, given the in-the-field telemetry etc. It's an instance of carefully weighing different resource consumptions though, where simple benchmarks might drive you to prioritize CPU time at the cost of overall system responsiveness due to memory consumption.


So unit testing instead of end-user testing at a system level


When I saw the title of this article, I got really excited because I thought they were referring to a lighter "build".

IMHO, one of the biggest problems facing v8 right now is the build process. You need to download something like 30 gigs of artifacts, and building on windows is difficult - to say the least.

It's bad enough that the trusted postgres extension plv8 is considering changing it's name to pljs and switching engines to something like QuickJS. [0]

One of the driving factors is that building and distributing v8 as a shared lib as part of a distro is incredibly difficult, and increasing numbers of distros are dropping it. This has downstream effects for anyone (like plv8) that are linking to it.[1]

Also, embedding it is super complex. Referenced in the above conversation is a discussion that NodeJS had to create their own build process for v8. At this point, it's easier to user the NodeJS build process and use the Node v8 API than it is to use v8 directly.

At the beginning of the article, they are talking about building a "v8 light" for embedded application purposes, which was pretty exciting to me, then they diverged and focused on memory optimization that's useful for all v8. This is great work, no doubt, but as the most popular and well tested JavaScript engine, I'd love to see a focus on ease of building and embedding.

0: https://github.com/plv8/plv8/issues/364

1: https://github.com/plv8/plv8/issues/308#issuecomment-4347400...


I completely agree that the most difficult part of using V8 is the build process. In node we have three layers (!!!) of build tooling stacked together to insulate ourselves from it (gyp, ninja, and some extra python scripts), and it still requires constant effort to keep working. Deno just gave up and uses GN, but that requires some stupidly complex source layouts and yet more python scripts. Unfortunately, Google just doesn't care about making V8 work with external tooling, it's all about their chromium build process. And this is a real shame, because V8 has the best embedding API of any js engine I know of, it really is a joy to use.


Meanwhile integrating Lua is as simple as just dropping in a few c files...


As is duktape if you for some reason you need it to be JavaScript.


I thought the word "duktape" was typo+snark. It's not. 100% legit, thanks!

https://github.com/svaarala/duktape


Sorry, I probably should have linked to it.

The embedded API is really easy to make use of, and I'd say it's reached production-level stability now.

I've only had to use it on tiny hardware though, so your experience may differ.


duktape has been great for stealing the occasional library to another language when I couldn’t be bothered to properly port it


Lua is awful. Even JavaScript is better.


After getting frustrated with v8's build process, I tried dropping all the sources (+ pregen files I had generated on osx) into a new visual studio project...

Compiled with no problems at all!


> At the beginning of the article, they are talking about building a "v8 light" for embedded application purposes, which was pretty exciting to me, then they diverged and focused on memory optimization that's useful for all v8. This is great work, no doubt, but as the most popular and well tested JavaScript engine, I'd love to see a focus on ease of building and embedding.

Would you ever consider an engine designed for that, like XS?[1] Or is it V8-or-nothing as far as you're concerned?

[1] https://news.ycombinator.com/item?id=16883567


I don't remember needing to retrieve or build all of chromium in order to work on V8, but maybe I never needed something that you needed.


The most recent version of chrome has made having a breakpoint a painful affair.

* The browser becomes completely unresponsive, each time you hit a breakpoint for 3-5 seconds.

* It also takes chrome much longer to show you the source maps for a page

* If you refresh while on a breakpoint it will remain unresponsive for nearly 10 seconds.

Is this related to these new changes? Is there a way to revert the trade off?


> * If you refresh while on a breakpoint it will remain unresponsive for nearly 10 seconds.

Glad to hear I'm not alone in experiencing this. For a period, I thought I had written bad code that caused Chrome to stumble. Guess it's just Chrome.


Can you please file a bug report on crbug.com?


How are Chrome devs themselves not experiencing this? I've experienced it too...


It's easy to conceptualize a configuration or use case that's common to the people reporting a bug that's not being used by the developers or in their QA process. It's not uncommon.


It seems like such a common action, I'm surprised a non-Chromium developer would have to report it. I considered reporting it, but too much friction. I need to use a Google Account to report a bug? Why? Eh.


Thanks for sharing this. Same problem, I just assumed it was just my shit tier laptop!


For anyone who's affected by this, I just filed a bug via crbug.com, fingers crossed for it to be fixed ASAP.

In the mean time, one work around is not to hit refresh, instead hit F8 and let it exit the breakpoint normally.


I think Firefox went through something similar many many years ago, a Project called MemShrink started after many user were complaining how Firefox were getting slower and bloated with Firefox 3 - 4* if I remember correctly.

It was Memory optimisation in every part of Firefox and mostly in SpiderMonkey.


I think Firefox needs another MemShrink initiative. After Electrolysis / e10 - at least in my experience - the browser uses more memory over time than Chrome.


And the "Fission" project (i.e. running each browsing origin in its own process) won't help in that regard.

To be fair, they are working intensively on reducing the overhead of having loads of separate content processes, but the target goal I have heard about is still "only" < 10 MB overhead per content process, which still translates into 1 GB (!) of additional RAM usage for the benchmark browsing session with 100 separate origins (which is perhaps slightly more than the average user uses, but for power users it might not be that unlikely to hit - and it's not just tabs that need to be counted, but every iframe that loads some third-party content and therefore needs to run in a separate process, too).


I tolerate (apparently) a 6G memory leak with 50+ tabs and Tree Style Tabs. It was much faster when Mozilla accidentally remotely disabled extensions for a weekend.


How does compare current spidermonkey heap usage vs V8?


Great engineering stuff. I am consistently amazed by the work of V8 team.

I hope V8 v7.8 makes it to Node v12 before its LTS release in coming October.


Somewhat off topic but this Lite version made me think of something.

Is there any engine/browser mode/something that lets go of legacy JS/CSS/HTML and increases performance/weight/memory consumption?

A browser/JS engine with removed legacy support would in principle be much faster, no?


That's literally how Flutter started. "What if we don't have to be backwards compatible? Let's delete the HTML parsing quirks... and expensive CSS selectors... and... all of the DOM... and JavaScript... and why do we have markup even..."


How would you say has that worked out?


They've made a lot of progress, check out https://flutter.dev/

They're expanding from mobile, ios and android, working on flutter for desktop and web.

They move to the dart language early on in their experiment.

Dart has an online playground https://dartpad.dev which has experimental flutter mode.

Found this flutter hello world! example, don't know how long the link will work. https://dartpad.dev/experimental/embed-new-flutter.html?id=2...


Sciter is maybe close:

https://sciter.com/

It allows you to make desktop apps with HTML/CSS, but it has its own custom engine that doesn't support all the quirks the web does.


This looks interesting but the code samples look like they include non-standard features (if I'm not mistaken).

Is this compatible out of the box with (most) modern web apps? Because that's the real value of Electron.


> Is this compatible out of the box with (most) modern web apps? Because that's the real value of Electron.

No it's not. It uses its own JS-like scripting language, and custom CSS 3 support.

A couple of months ago the author suggested on Reddit he might start working on an Electron alternative that used Node + Sciter.

https://www.reddit.com/r/programming/comments/a8vkzm/scitern...


Interesting. What's the name of the engine, and is it open source?


5MB!

Anyone knows how is the performance?


  > Script
  >        (tbd)


It has scripting available in a JavaScript-like language. I'm not able to tell yet how mature it is as I've only done a brief test.


What is “legacy” JS? with()/eval()?

What is “legacy” CSS? Box model?

What is “legacy” HTML? <br>?


HTML has something of a legacy mode, https://en.m.wikipedia.org/wiki/Quirks_mode

Roughly anything before HTML5/4.01 and XHTML renders in quirks mode.

AFAIK there are no legacy CSS properties. Only nonstandard (i.e. prefixed or experimental and never published in finished specs) properties have been deprecated.

Browsers already have some leeway with DOM APIs. https://developers.google.com/web/updates/2017/01/scrolling-...


Quirks mode can largely be described with a quirks-mode stylesheet that applies several custom styles to all elements unless otherwise overridden.


Remove javascript, instant "lite" version. You can even disable it with a CSP header.


What do you consider legacy JS/CSS/HTML? Do you mean HTML features like contentEditable? Do you mean JS eval()?


From the top of my head: CSS floats, contentEditable, execCommand.


Document.write()


Don't you get the same effective perf benefit by marking your script async defer?


How are you going to define legacy? I'm not sure how you could whittle down the web platform in an agreed way.


It is rather interesting to me to see that V8 has been packing so many micro-optimizations and special techniques over the years that now it has become actually feasible to just start cutting back on the optimizations to have performance gains.

What all this enables is, beside having more sensible defaults, is the ability for developers who use V8 with Electron or NW.js to tweak the default behavior of the engine, catering to their application's needs. That is always good.


That's not what I took away from this at all. They specifically say that Lite mode started out as a way to reduce memory consumption at the cost of performance. Execution time jumped 120%!

Then they figured out how to get the memory improvements without the performance hit. The only place where they actually removed optimizations was in generating stack traces, and that wasn't a gain in performance, it was just considered acceptable for that to get slower.


Someone needs to add some heavy complexity to ACID4 so there's a solid baseline for tests.

They should have ACID sub-tests for images, videos, etc so these statistics would actually provide something long term and important.


Is there any reason there couldn't be a python interpreter as awesome as V8 is for JavaScript?


I did some work on a Python JIT in the path. The two biggest challenges were:

- Python is much, much, more dynamic than Javascript. You can override just about anything in Python, including the meaning of accessing a property. You have overloaded operators (with pretty complex resolution rules), metaclasses, and more. And they're all used extensively. There's some Javascript equivalents to those things, but either there are fewer deoptimization cases or are features that aren't commonly used in practice (e.g. Proxy objects).

- Python has a ton of important libraries implemented as C extensions. These libraries tend to depend on undefined behavior of the CPython interpreter (e.g. destruction order which is more deterministic with ref counting) or do things that happen to work but are clearly not supposed to be done (e.g. defining a full Python object as a static variable).

I guess also economic incentives, there hasn't been an incentive for anybody to staff a 50 person project to build a Python JIT given that it's cheaper to rewrite some or all of the application in C/C++/Rust/Go whereas that's not an option in Javascriptland.


I completely agree with you and I'd argue that #2 and #3 are the two biggest reasons.

It is easy to forget the colossal amount of engineering resources that browser vendors have spent creating and rewriting their Javascript engines. And due to the nature of how their JITs work, all that work is tied down to the specific Javascript environment they were written for. (For example, you can't really reuse the v8 codebase to create a Python JIT)

And in Python's case a lot of the appeal of the language rests on the extensive library ecosystem, which has a significant number of extensions written in C. Generally speaking JIT compilers aren't very good at optimizing code that spends a lot of time inside or interacting with C extensions, even if we ignore the significant issues you mentioned regarding undefined behavior.


Why is javascript slower than Java despite being both garbage collected and v8 having more engeeners than e.g openjdk?


A couple obvious reasons would be compiled vs interpreted, and static vs dynamic, both of which bear some inherit runtime performance cost.


Compiled vs interpreted is not a useful distinction. Until fairly recently when Ignition was introduced, V8 had no interpreter. All the JS was compiled by the first tier “full codegen”

The bigger difference is that the JVM is heavily optimized for performance after a long warmup and V8 needs to produce relatively fast code early during page loading.

Java being much more static certainly helps warmup time but ultimately doesn’t really affect final performance. LuaJIT can beat C in some cases once it has time to compile all traces needed.


V8 needs to produce relatively fast code early during page loading So for e.g backend js programs is it possible to ask v8 to take more time to optimize?


Why would would you want it to take longer to optimize?


To get more time to generate betten codegen so a faster program (but slower "launch time")


Most VMs will recompile long-running hot code; I’d assume V8 does this as well.


Modern JS engines only interpret cold functions, and static vs dynamic doesn't matter once type information has been collected.

The real reason is that HotSpot has 10+ more years of work put into it than V8.


Static vs dynamic still matters if the program is doing "dynamic stuff", where "dynamic stuff" means any thing that the JIT compiler is currently not able to optimize.


That's fair, but that's also stuff that you just can't really do in Java in general¹, so it's not useful for a comparison. The fact is that the vast majority of JavaScript code is pretty static and there's nothing preventing it from running as fast as Java other than man-decades of compiler engineering.

¹ Possibly excluding reflection. It's been a long time since I used the Java reflection APIs and I have no idea if you can do things like add class fields named after arbitrary strings at runtime. Even if you could, presumably this bails out of jitted code so the situation is basically the same as in JS.


dynamic types probably the first order reason and some unfortunate language choices being the 2nd order reason.


Dynamic typing means you pay the cost of trace recording or profiling to collect the type info. The actual code performance should ultimately be the same. JIT can remove the overhead of dynamic dispatch and replace it with a fixed call and a guard, for example. This isn’t possible with dynamically loading C libraries.


> JIT can remove the overhead of dynamic dispatch and replace it with a fixed call and a guard, for example.

Only when the guard isn’t triggered constantly. With an actual type system you can remove many of these guards altogether instead of having them everywhere and falling back to the slow case when you get something unexpected.


There are actually two problems here: How to handle things being re-defined and how to handle an unexpected type after speculation.

In real high performance VMs the guard for redefinition is effectively a single instruction which is a CPU can easily branch predict and handle with out of order execution: https://chrisseaton.com/truffleruby/low-overhead-polling/

With unexpected types we can use LuaJIT as an example: The type speculation guard will be turned into a conditional branch to a side trace. The slow path quickly becomes another fast path.


> For example, you can't really reuse the v8 codebase to create a Python JIT

Actually, come to think of it... V8 also runs WASM right? Right now I think WASM is missing a few features (like garbage collection) which Python would need to be efficiently compiled to WASM, but once those are solved...


At a conceptual level that would be no different than having a Python implementation targeting the JVM or CLR runtimes (aka Jython and IronPython).

The usual situation for these alternate implementations is that they make it easier to interact with other code that targets those runtimes, but that they do not speed up the average speed of the interpreter. The previously-mentioned compatibility and performance issues for C extensions also remain.


There was brief point in time where IronPython was faster than CPython for several major benchmarks, precisely because of the better CLR JIT and GC, plus some smart tricks implemented in the "DLR". (It's almost sad, IronPython and the "DLR" have been left to grow so many weeds since that time.)


I definitely should have been clearer in my other comment. Alternative interpreters such as IronPython can be faster but most of the time the speed stays in the same "order of magnitude" as the original C interpreter. On the other hand, good JIT can deliver a 10x or more speedup in the best case scenarios where it manages to get rid of the dynamic typing overhead. (For subtle technical reasons, running the language interpreter on top a jitting VM like the CLR is not enough. The underlying JIT has a hard time looking further than the IronPython interpreter itself and making optimizations at the "Python level")


That's where some of the most "DLR" magic filled in (cached) and optimized a lot of the "Python level" in a way that the CLR JIT could take advantage. The DLR briefly was a huge bundle of hope for some really interesting business logic caching. In a past life I did some really wild stuff with DLR caching for a complicated business workflow tool. It's dark matter in an enterprise that I'm sure all of it is still running, but I'm not sure if the performance has kept up over time (and have no way to ask, and probably don't care) as the CLR declared "mission complete" on the DLR and maybe hasn't kept it quite as optimized since the IronPython heyday.


In Ruby land, JRuby is significantly faster than CRuby since 9.0. It has a proper IR for optimization and can inline etc.


Couldn't you compile the C extensions to WASM too? Then it'd just be WASM code interacting with WASM code.


I don't think its possible to efficiently compile a dynamically-typed language like Python to statically-typed WASM.


Thought experiment: what about transpiling Python to JS? https://github.com/QQuick/Transcrypt looks like a nice implementation, but their readme just talks about deployment to browsers — I'm curious about outside of the browser whether Transcrypt + Node might be more efficient than CPython.

(Not even just CPython, but really any dynamic language implementation.)

And then of course wasm could still be used for C extensions.


Not random code, but specifically written with type checking should be doable. There was an asm,js subset for example.


Calling asm.js a subset of Javascript is a bit of a stretch. It looks nothing like idiomatic Javascript, and was more of a low-level statically-typed language dressed up in Javascript clothing.

At some point they realized that representing this low-level code as Javascript text instead of as specially-designed bytecode added a significant ammount of parsing and compilation overhead, which was one of the initial motivations for the creation of WebAssembly. If I had to sum up WASM in one sentence, it is that it is kind of like the JVM, except that its instruction set was designed specifically for running programs downloaded from the web. Special attention was paid to security and startup latency.


Erm... the JVM was also designed specifically for running programs downloaded from the web. Special attention was paid to security and startup latency. Javascript got its name because it was the only other language designed to be downloaded and executed in a browser!


rpython might be a better example


You’d have to compile the CPython interpreter to WASM.


> And they're all used extensively.

That's the key. MicroPython is significantly less dynamic than full Python, and would be much easier (but still not easy) to write a fast JIT for. Unfortunately such a JIT wouldn't be very useful - MicroPython won't run much code that hasn't been written specifically for it. Without the dynamic features, MicroPython is essentially a different language from regular Python.


> I guess also economic incentives, there hasn't been an incentive for anybody to staff a 50 person project to build a Python JIT given that it's cheaper to rewrite some or all of the application in C/C++/Rust/Go whereas that's not an option in Javascriptland.

Ask Dropbox https://github.com/dropbox/pyston


Right, let's ask Dropbox:

https://blog.pyston.org/2017/01/31/pyston-0-6-1-released-and...

(And after that blog post, there are no commits in the github repo you linked to).


Yeah, I was excited when they announced this effort. Sad they 86ed it.


I think you have the first three points — dynamism, C extensions, and economics — right on the money. The fourth point I would add is that Python has a huge standard library. That's a very large surface area, all of which ends up needing optimization effort to get good performance across a wide variety of programs.


> You can override just about anything in Python, including the meaning of accessing a property

JavaScript has getters that do the same [0]. You can even redefine 'undefined' depending on version and mode [1].

Is operator overloading really that much 'worse' in Python?

[0] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

[1] https://stackoverflow.com/a/8783528/


> You can override just about anything in Python, including the meaning of accessing a property.

Can't you do this in JS as well?


As far as I know, JS does not have the equivalent of Python's descriptor protocol[0]. This is notably different from setter/getter methods.

[0]: https://docs.python.org/3/howto/descriptor.html


Besides the existence of __getattr__, __add__, etc. that other people mentioned, there's also:

- A Python runtime has to support threads + shared memory, while a JS one doesn't. JS programs are single-threaded (w/ workers). So in this sense writing a fast Python interpreter is harder.

- The Python/C API heavily constrains what a Python interpreter can do. There several orders of magnitude more programs that use it than v8's C++ API. For example, reference counts are exposed with Py_INCREF/DECREF. That means it's much harder to use a different reclamation scheme like tracing garbage collection. There are thousands of methods in the API that expose all sorts of implementation details about CPython.

Of course PyPy doesn't support all of the API, but that's a major reason why it isn't as widely adopted as CPython.

- Python has multiple inheritance; JS doesn't

- In Python you can inherit from builtin types like list and dict (as of Python 2.2). In JS you can't.

- Python's dynamic type system is richer. typeof(x) in JS gives you a string. type(x) in Python gives you a type object which you can do more with. And common programs/frameworks make use of this introspection.

- Python has generators, Python 2 coroutines (send, yield from), and Python 3 coroutines (async/await).

In summary, it's a significantly bigger language with a bigger API surface area, and that makes it hard to implement and hard to optimize. As I learn more about CPython internals, I realize what an amazing project PyPy is. They are really fighting an uphill battle.


> Python has multiple inheritance; JS doesn't

All these specific examples are true statements, yet isn’t Common Lisp even more dynamic, and often have even better optimizing compilers?

Lisp has multiple inheritance and also multiple dispatch, and SBCL beats CPython by a country mile in every performance comparison I’ve seen.


> SBCL beats CPython

I always find it ironic that the CMUCL lisp compiler (upon which SBCL was based) was called 'the python compiler', had machine-code generation in 1992 and that CMUCL sports native multithreading that is largely lock free..

https://www.researchgate.net/publication/221252239_Python_co...


>SBCL beats CPython by a country mile in every performance comparison I’ve seen.

This. Lisp at least 10x faster in the worst case; it can be made to run even faster...


Hm that's a good question, not sure. Do Common Lisp implementations have a "core" that multiple inheritance and multiple dispatch can be desugared to? Or are those features "axiomatic" in the language?

If it's the former, I would say that optimizing a small core is easier than optimizing a big language. Python's core is 200-400K lines of C and there are a lot of nontrivial corners to get right.

I was surprised when looking at Racket's implemetation that it's written much like CPython. IIRC it was more than 200K lines of C code. Some of that was libraries but it's still quite big IMO. I would have thought that Racket, as a Scheme dialect, would have a smaller core.

AFAIK Racket is not significantly faster than Python; it's probably slower in many areas. Maybe it's just that SBCL put a focus on performance from the beginning?

(I looked at Racket since I heard they are moving to Chez Scheme, which also has a focus on performance.)


I expect Racket to be faster than CPython:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Note that the old C runtime of Racket has been rewritten to use Chez Scheme. The work is not completely done - but it is getting close.

Talk by Matthew Flatt on the rewrite (almost a year old): https://www.youtube.com/watch?v=t09AJUK6IiM


>Do Common Lisp implementations have a "core" that multiple inheritance and multiple dispatch can be desugared to?

Yes, that's the Meta Object Protocol. It isn't on the standard, yet most Lisp implementation have it, and now you can use it in a portable way as well.

>If it's the former, I would say that optimizing a small core is easier than optimizing a big language. Python's core is 200-400K lines of C

Common Lisp's "core" (that means, not including "batteries") is considerably more involved and complex than Python's. Creating a new CL implementation is a big deal.


> (I looked at Racket since I heard they are moving to Chez Scheme, which also has a focus on performance.)

to wit, chez scheme was several generations into improving it's native code compilation abilities before python even existed:

history of chez scheme (2006):

https://www.cs.indiana.edu/~dyb/pubs/hocs.pdf


A while ago someone made this Reddit post:

https://www.reddit.com/r/learnlisp/comments/bskmcg/speed_com...

The user was comparing some Ackermann computations using Python and GNU Common Lisp (GCL), finding the performance about the same.

But he wasn't compiling the Lisp! So this compared GCL's Lisp raw AST interpreter to Python byte-code.


Any future "super speed" Python efforts would probably do well to build on the amazing work that PyPy has done in teasing apart an optimization-friendly subset of the language in the form of RPython, and building the rest of it in that language.

Like, focus on further optimizing the RPython runtime rather than starting from scratch.


That doesn't really make sense -- there is no "RPython runtime". There is a PyPy runtime written in RPython.

RPython isn't something that's exposed to PyPy users. It's meant for writing interpreters that are then "meta-traced". It's not for writing applications.

It's also not a very well-defined language AFAIK. It used to change a lot and only existed within PyPy.

I'm pretty sure the PyPy developers said that RPython is a fairly unpleasant language to write programs in. It's meant to be meta-traceable and fast, not convenient. It's verbose, like writing C with Python syntax.


Why does RPython exist? It seems to be a subset of Python that can be optimized.

Why not use a faster language to write the interpreter, like C?

This is not meant to be a hostile question, I am just confused as to why PyPy exists


The PyPy interpreter is written in RPython, but is a full Python interpreter with a JIT. When you compile PyPy, it generates C files from RPython sources, which are then compiled with a normal C compiler into a standalone binary.

RPython is both a language (a very ill-defined subset of Python... pretty much defined as "the subset of Python accepted by the RPython compiler"), and a tool chain for building interpreters. One benefit of writing an interpreter in RPython is that, with a few hints about the interpreter loop, it can automatically generate a JIT.


Basically because it can be "meta-traced", and C can't (at least not easily).

The whole point of the PyPy project is to write a more "abstract" Python interpreter in Python.

VMs written in C force you to commit to a lot of implementation details, while PyPy is more abstract and flexible. There's another layer of indirection between the interpreter source and the actual interpreter/JIT compiler you run.

See PyPy's approach to virtual machine construction

https://scholar.google.com/scholar?cluster=36453268015981472...

This sentence explains it best:

Building implementations of general programming languages, in particular highly dynamic ones, using a classic direct coding approach, is typically a long-winded effort and produces a result that is tailored to a specific platform and where architectural decisions (e.g. about GC) are spread across the code in a pervasive and invasive way.

Normal Python and PyPy users should probably pretend that RPython doesn't exist. It's an implementation detail of PyPy. (It has been used by other experimental VMs, but it's not super popular.)


> optimization-friendly subset of the language in the form of RPython, and building the rest of it in that language.

I'm assuming 99% of normal python users are not using anything outside of the RPython subset, correct?


RPython's restrictions [1] are quite strict - I'd say its more likely that 99% of normal Python does use features that aren't supported by RPython.

[1] https://rpython.readthedocs.io/en/latest/rpython.html


The libraries they depend on probably are, though.


>Python has multiple inheritance; JS doesn't

I don't think multiple inheritance is a performance issue. A class's resolution order is resolved when it's defined (using C3: https://en.wikipedia.org/wiki/C3_linearization), and after that it's only a matter of following it, like Javascript's prototype chain.


No, basically everything is dynamic in Python. Both objects and types are mutable after definition:

https://github.com/oilshell/blog-code/blob/master/python-is-...

    m1 Sub
    m2 C
    ---
    Changed type of object:
    m1 C
    m2 C
    ---
    m1 Sub
    m2 C
    ---
    Changed superclass of type:
    m1 Sub
    m2 unrelated


Objects and types are mutable after definition, but that's no more severe than what you can do in Javascript. Assigning to .__class__ is like assigning to .__proto__, and assigning to a class's .__bases__ is more or less like assigning to a prototype's .__proto__.

The resolution order is calculated when it's defined. It's calculated again whenever you assign to __bases__ (or a superclass's __bases__). But it's not calculated every time it's used, which means there's no significant performance penalty to multiple inheritance unless you're changing a class's bases very often.

Metaclasses can override the MRO calculation, which we can abuse to track when it's recalculated: https://pastebin.com/NdiA12Ce

  Defining Baz
  ! Computing MRO
  Instantiating Baz
  Accessing attribute
  Changing Baz's bases
  ! Computing MRO
  Accessing attribute
Doing ordinary things with the class or its instances doesn't trigger any calculation related to multiple inheritance. You only pay for that during definition or redefinition. So there's no performance problem there compared to Javascript.

I do agree that basically everything is dynamic in Python. But some things are more dynamic than others.


Hm yeah I see what you mean. I don't know the details of how v8 deals with __proto__, but I can see in theory they are similar.

Though I think the general point that Python is a very large language does have a lot to do with its speed / optimizability. v8 is just a really huge codebase relative to the size of the language, and doing the same for Python would be a correspondingly larger amount of effort.

I don't know the details but v8 looks like it has several interpreters and compilers within it, and is around 1M lines of non-test code by my count!

v8 was written in the ES3 era. And I knew ES3 pretty well and Python 2.5-2.7 very well, which was contemporary. I'd guess at a minimum Python back then was a 2x bigger language, could be even 4x or more.


I agree, I was just nitpicking one detail. I made a similar comment: https://news.ycombinator.com/item?id=20953496


> reference counts are exposed

IIUC this is also true for PHP but HHVM has/had some interesting techniques to deal with it, like pairing up and cancelling out reference count operations, and bulk changing the reference count before taking a side exit or calling a C function.


you can definitely inherit from Array and Map in javascript.


V8 has the full resources of Google, not to mention Microsoft, Node.js, and every community that uses the V8 engine.

Basically V8 has some of the best engineers in the world being paid to work full time on it, and have resources from dozens of other high profile companies.

Not knocking python in any way - V8 essentially just has more resources available. I highly recommend trying pypy if you're looking for a performance benefit and your code works with it.


Many of the techniques used in modern JS VMs like v8 or JavaScriptCore could totally be applied to a Python interpreter, and it wouldn't take 50 people. Someone just needs to invest the effort. The core techniques are a fast-start interpreter, a templatized baseline JIT, polymorphic inline caches, and runtime type information combined with higher JIT tiers that speculatively optimize for observed types, which allows many checks throughout the generated code to be replaced by side exits. (Also a good garbage collector, and escape analysis to avoid allocating temporaries).

I believe most of these could be applicable to Python. JavaScript has crazy levels of dynamism too, and the above methods are the short version of how you deal with it.

It seems like no one in the Python community has had the knowledge + motivation to try these approaches.


Evan Phoenix (Rubinius) talked about applying the Self techniques of collecting type info using inline caches for CRuby in 2015.

I'm using part of this idea in my prototype tracing JIT for CRuby. With a tracing JIT it's much, much easier to implement basic escape analysis because the control flow is linear. One basically gets Partial Escape Analysis (the big deal from Graal) for free.

So far it's proving unreasonably effective to re-use the method lookup info from the inline caches and use the same invalidation mechanism.

The CRuby 2.6 JIT can't use this approach because the compilation pipeline is too slow to invalidate it with the method cache, but I'm using the CraneLift compiler from Mozilla.

Post Haswell, there's not a ton of value to baseline JIT, especially for dynamic languages. Mike Pall talked at length about how the highly optimized bytecode VM of LuaJIT 2.x was sometimes faster than the baseline-ish method JIT from LuaJIT 1.x, and that was before Haswell.

I think we'll see all the major JS engines remove baseline JIT over the next few years in favour of even more optimized bytecode interpreters.


> I'm using part of this idea in my prototype tracing JIT for CRuby. With a tracing JIT it's much, much easier to implement basic escape analysis because the control flow is linear. One basically gets Partial Escape Analysis (the big deal from Graal) for free.

Maybe Ruby is different. But with JavaScript, tracing JIT turned out to not be a winning strategy. Every engine that tried it eventually moved to a more traditional multi-tiered JIT (with OSR entry/exit).

> Post Haswell, there's not a ton of value to baseline JIT, especially for dynamic languages. Mike Pall talked at length about how the highly optimized bytecode VM of LuaJIT 2.x was sometimes faster than the baseline-ish method JIT from LuaJIT 1.x, and that was before Haswell.

I can't say definitively for other languages, but for the JavaScriptCore implementation of JavaScript, we are very aware of the performance value of all our JIT tiers, and the baseline JIT makes a significant difference. And yes, our interpreter is very optimized. We essentially have a CPU-specific interpreter loop using assembly code generated from a meta-language. Baseline JIT on top of that is still a big perf win (as are our two higher JIT tiers, DFG and FTL/B3).


In an ideal world we would build a multi-tier method JIT for CRuby just like JSC. Unfortunately nobody is willing to invest the resources to do that.

There have been numerous failed attempts to build baseline JITs for CRuby so I'm trying tracing.


Could you expand on exactly what Haswell did to render baseline pointless?


Haswell can branch predict the indirect branch at the end of each bytecode instruction to dispatch the next bytecode instruction much better than previous generations.

In highly dynamic languages like JS, Ruby and Python it’s not even this which is the main source of branching anyway. It’s branching on the typesfor each opcode to handle all valid types.


There's a bunch of things: Python allows metaprogramming in a way that JS does, which means you end up needing more guards (or conflating more guards); the Python ecosystem fairly heavily relies on CPython extension modules, and if you wish to remain compatible with them you're constrained in some ways, especially if you care about performance of calling into/from them.


And of course money, lots of it. The amount of money invested in optimizing v8 is staggering -- Google brought Lars Bak out of retirement[1] to start v8, and that guy is no joke.

[1] https://www.ft.com/content/03775904-177c-11de-8c9d-0000779fd...


Paywalled article




> Python allows metaprogramming in a way that JS does, which means you end up needing more guards (or conflating more guards)

JS allows you to dynamically modify some of the scopes that names refer to, as well as changing the actual prototype chain itself. I'm not sure you can do such crazy things with Python classes/metaclasses.

Of course, for v8 in particular, doing any of this crazy manipulation tends to set off alarm klaxons that kick your code off every optimization path, but the language still permits it.

> the Python ecosystem fairly heavily relies on CPython extension modules, and if you wish to remain compatible with them you're constrained in some ways, especially if you care about performance of calling into/from them

And for JS, very low overhead of calling into the DOM APIs (written in C++) is a necessary feature for having competitive performance. Arguably more so than in Python, since the overhead of the FFI trampoline itself here is considered a bottleneck.


> dynamically modify some of the scopes that names refer to

You can do some fairly disgusting things to name resolution in class bodies, but names within functions are resolved statically nowadays.

> as well as changing the actual prototype chain itself

You can change a class's MRO, if that's the closest analogue.

  class Foo:
      x = 'foo'
  
  class Bar:
      x = 'bar'
  
  class Baz(Foo):
      pass
  
  print(Baz.x)
  Baz.__bases__ = (Bar,)
  print(Baz.x)
In Python you can also hook your own entire custom import system into importlib, or just arbitrarily change the meaning of the `import` statement by replacing builtins.__import__:

  >>> import builtins
  >>> builtins.__import__ = lambda *a: "Too bad"
  >>> import foo
  >>> foo
  'Too bad'
You can use sys._getframe() to look in the current call stack and poke at variables:

  import sys
  
  def f():
      x = 3
      return g()
  
  def g():
      return sys._getframe().f_back.f_locals['x']
  
  print(f())  # 3
You can make your own class that inherits from types.ModuleType and use it to replace an existing module's class and add interesting new behaviors to its object:

  # foo.py
  import sys, types
  
  class MyMod(types.ModuleType):
      def __call__(self):
          return "Hello!"
  
  sys.modules['foo'].__class__ = MyMod


  >>> import foo
  >>> foo()
  'Hello!'
You can replace sys.ps1 by an object with a __str__ implementation to make a dynamic prompt in the REPL:

  import datetime, sys
  >>> class Prompt: __str__ = lambda self: str(datetime.datetime.now()) + ' >>>'
  >>> sys.ps1 = Prompt()
  2019-09-12 18:33:48.303692 >>>
Python has a lot of exposed detail.


> JS allows you to dynamically modify some of the scopes that names refer to, as well as changing the actual prototype chain itself. I'm not sure you can do such crazy things with Python classes/metaclasses.

> Of course, for v8 in particular, doing any of this crazy manipulation tends to set off alarm klaxons that kick your code off every optimization path, but the language still permits it.

Most of the real badness in JS (direct eval and the with statement stand out above everything else here) can be statically detected; the fact in Python that you can fundamentally change operation of things already on the call stack through prodding at things via the `sys` module makes this an order of magnitude worse (and yes, guards and OSR in principle can be used here, but it's very easy to end up with a _lot_ of guards).

> And for JS, very low overhead of calling into the DOM APIs (written in C++) is a necessary feature for having competitive performance. Arguably more so than in Python, since the overhead of the FFI trampoline itself here is considered a bottleneck.

Oh yes, it's absolutely essential, but the definition is on a very different level: we might have an interface defined in WebIDL that must be exposed to JS in a certain way, but how that's implemented is an implementation detail (and there's nothing in the public API stopping a browser from changing how their JS VM represents strings, for example; the JS VMs themselves don't really have totally stable APIs). Whereas in Python, the C API is public and includes implementation details like refcounting, string representation, etc.


You can't change the inheritance as far as I know after creation without some hacks, but you can change the class that an instance refers to which can kinda sorta achieve the same thing. You can't necessarily add properties to a base class and have all of those reflect immediately unless you use some hackery with class properties.

Example: https://github.com/dabeaz/python-cookbook/blob/master/src/8/...


> You can't necessarily add properties to a base class and have all of those reflect immediately

Are you talking about JS? You definitely can:

  class A {}
  class B extends A {}
  const b = new B();

  A.prototype.test = () => 'test';
  b.test();
  // => 'test'


Would there be a performance penalty (I’m guessing in cache coherency) in having an interpreter that’s really two interpreters in the same process, where modules that use the “strict subset” of the language (the part that doesn’t require the more advanced object-model, or any FFI preemption safety) run their code through a more minimal interpreter, and then whenever your code jumps into a module that requires those things, the interpreter itself jumps into a more-complete “fallback” interpreter? Sort of doing what profile-guided JIT optimization does, but without the need for JITing (and before JITing would even kick in), just instead using a little bit of static analysis during the interpreter’s source-parsing step.

I ask because I know that this is something hardware “interpreters” (CISC CPU microcode decoders) do, by detecting whether the stream of CISC opcodes in the decode pipeline entirely consist of some particular uarch, and then shunting decode to an optimized decode circuit for that uarch that doesn’t need to consider cases the uarch can’t encode. But, of course, unlike hardware, software interpreters have to try to fit in a CPU’s cache lines and stay branch-predicted, so there might not be a similar win.

(Tangent: I once considered writing a compiler that takes Ruby code, rewrites the modules using only a “strict subset” of it to another language, and then either has that language’s runtime host a Ruby interpreter for the fallback, or has the Ruby runtime call the optimized modules through its FFI. I never got far enough into this to determine the performance implications; the plan was actually to enable better concurrency by transpiling Rails web-apps into Phoenix ones, switching out the stack entirely at the framework level and keeping only the “app” code, so single-request performance wasn’t actually the top-level goal.)


Oracle Labs is working on a Python3 implementation for the GraalVM: https://github.com/graalvm/graalpython


Have you tried pypy? I got about 2x of C performance for a simulated annealing problem I was working on recently. Ultimately what I realized was that the clever python structures that made prototyping fast were inherently slow (dicts with tuple keys, etc). Once I ported it to C, then went back to python and used the same simple data structures, pypy was practically just as fast as C.


Could you elaborate on the datastructures you end up using? Looks interesting.


So you mean 1/2 as fast as the C?


History suggests that, for a good approximation of the truth, the resources put into a language implementation (and, in particular, into JIT compiling VMs) strongly correlate with its performance. So V8 and HotSpot, for example, both have great performance -- and both have had large teams working on them for many years.

Interestingly, PyPy has pretty decent performance despite having had a much smaller team working on it, mostly part-time. An interesting thought experiment is whether similar resources put into PyPy -- or a PyPy-like system -- would achieve similar results. My best guess is "yes".


I believe that's the purpose of this: https://github.com/graalvm/graalpython


https://github.com/iodide-project/pyodide seems like the round-about path.


Pyiodide is the CPython interpreter compiled for WASM rather than e.g. X86. So it's not going to be faster than any other CPython build.


Isn't Pyodide just an interpreter that runs in the browser? So it could never be significantly faster than the regular Python interpreter?


Web browsers don't rely on python to run, and the Web is important, so Javascript engines get prioritized.


This is some cool shit!

I don’t know how V8 and JSC compare on memory but I’m happy for this to become a battleground. Nothing but goodness for users if that happens.

(Worth noting that JSC has had a “mini mode” for a while, but it’s focused on API clients. And I don’t know if it’s as aggressive as what V8 did.)


I can't speak to JSC, but at least SpiderMonkey has had memory optimizations that are equivalent to the ones described here (e.g. discarding cold-function bytecode) for a while... I agree that it would be interesting to have more competition, including more measurement, in this space.


I would love a memory usage benchmark battle. :-)

JS is way faster than it was before the JS perf wars. Let's do it again. Then lets fight over power!


Old Carakan (Opera Presto) would win hands down. That browser was using ~1GB with 500 heavy active tabs loaded.


This exact thing is why browser engine diversity is so important, and why even though I never used Edge, it's very disappointing to me that MS is killing it.


Somewhat off-topic, but is there an RSS feed available for the V8 blog? Their posts are always interesting to me, but I've searched a few times with no luck.


Here you go: https://v8.dev/blog.atom

(found through the <link> tag on the blog)


I wonder if something like this could be used to build an electron alternative (or modify the existing electron backend to use this engine), since memory usage is a major complaint for these applications


Well... Electron is powered by Chromium, but uses NodeJS which is based off of V8, V8 is also the default JS engine for Chromium. So in theory, it would benefit Electron directly, not sure why anybody would waste the engineering efforts to recreate Electron if these changes will find their way on Electron.

Edit: Didn't mean to make it sound personal on my second paragraph, but the rest of what I wrote still applies with the context of the article and the comment made.


Please edit personal swipes out of your HN comments.

Your comment broke the site guidelines and provoked an off-topic spat. Would you please review them and stick to them? They're all there for good reason, otherwise we'd have taken them out. Note that they include Assume good faith. That's the opposite of "I can't tell if you're trolling hard".

https://news.ycombinator.com/newsguidelines.html

(Edit: thanks for the edit above! I'll mark this comment off topic and collapse it.)


Dear lord people are getting pointlessly defensive and aggressive. Obviously this article is not referencing the full V8 engine, but a light version that may or may not make it into Chromium

You might want to re-read the HN guidelines

> When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."


Ok, but please don't respond to a bad comment by breaking the guidelines yourself, as in the first sentence above. I realize it's hard to do when someone has replied to you with a provocative swipe, but it's even more necessary in such cases.


> Obviously this article is not referencing the full V8 engine, but a light version that may or may not make it into Chromium

Obviously? Right from the second sentence, the article says:

> Initially this project was envisioned as a separate Lite mode of V8 specifically aimed at low-memory mobile devices or embedder use-cases that care more about reduced memory usage than throughput execution speed. However, in the process of this work, we realized that many of the memory optimizations we had made for this Lite mode could be brought over to regular V8 thereby benefiting all users of V8.


The article is about how many of the lite mode optimizations were added to the last 7 v8 releases, resulting in an 18% reduction vs lite mode's 22%. The people responding to you are making the case that the 18% reduction is basically as good as.


You missed these guideline:

Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

So, the article mentions that.


> However, in the process of this work, we realized that many of the memory optimizations we had made for this Lite mode could be brought over to regular V8 thereby benefiting all users of V8.


This is mentioned in the lead paragraph:

> Lite mode of V8 specifically aimed at [...] embedder use-cases that care more about reduced memory usage than throughput execution speed.


Ah, I mistook that as embedded (i.e. usage on microcontrollers / etc.)


JerryScript and Espruino aren't that good yet, but embedded JS is an option.

You'd probably just be better off with embedded Python or Lua in most cases (MicroPython / eLua). Sadly, eLua doesn't look very active since 2016-ish


The memory issue with Electron is the DOM, not the Javascript. Reducing V8's memory isn't going to help much.


What is it about the JS DOM that makes a DOM of N elements modelling a given app’s view, have a higher memory footprint than the “DOM” (view state) of a native graphics toolkit modelling an equivalent app’s view?


>What is it about the JS DOM that makes a DOM of N elements modelling a given app’s view, have a higher memory footprint than the “DOM” (view state) of a native graphics toolkit modelling an equivalent app’s view?

For starters, because the DOM is a very inefficient design of an app's view, primarily designed for text and simple forms, and with all kinds of extra crap bolted on. Until CSS Grid, there wasn't even a proper layout engine available, and people used styling primitives meant to float text for UI design...

A native UI engine can implement drawing a window with a button (the raw widgets, design wise) with a few lines of code to draw, two rectangles, some edge shading, and some text.

The DOM has thousands of lines for all kinds of contingencies for the same thing...


Part of it is just how feature complete browser compositing is. The other part is bloat due to how it was all implemented. HTML and CSS were never optimal representation for complex documents, and a ton of features have been added on top.


Amazing, very first comment is about getting rid of Electron. I love you guys.


You misread my comment. I also suggested replacing the internal engine it uses with this lighter version to make it performant. modifying to be more performant != getting rid of


> If you prefer watching a presentation over reading articles, then enjoy the video below! If not, skip the video and read on.

This is OT, but I think we can design a format with the best of both worlds. We can have the personal and narrated quality of videos/voiceovers, along with the skimmable/scannable/interactive quality of web content.


I agree, having a transcript of a video is very useful. I've done that with some of my own and other people's videos.

It takes a lot less time to skim over an illustrated transcript than to watch a video, and it lets the readers decide if they're interested enough in actually taking the time to watch the video. Plus it's search engine friendly, and lets you add more links and additional material.

I loved the body language in this classic Steve Jobs video so much that I was compelled to write a transcript with screen snapshots focusing on and transcribing all of his gestures (in parens). After reading the transcript, it's still interesting to watch the video, after you know what body language to look for!

“Focusing is about saying no.” -Steve Jobs, WWDC ‘97 As sad as it was, Steve Jobs was right to “put a bullet in OpenDoc’s head”. Jobs explained (and performed) his side of the story in this fascinating and classic WWDC’97 video: “Focusing is about saying no.”

https://medium.com/@donhopkins/focusing-is-about-saying-no-s...


If only HTML could have hyperlinks, embedded multimedia and textual content in a single file...

(In all seriousness, it's sad that YouTube has such a narrow focus — video files only — which helped it spread to many different devices, from small phones to TVs, but hinders interactivity)


If I can't watch it while I'm doing the dishes, it doesn't get the best of the video presentation world.


Also OT, but the audio on that video is extremely quiet, is there a way to boost volume beyond 100%?


This is a little off-topic, but I find the terminology used in software these days to be a little perplexing. Is "small", a perfectly adequate word to describe less memory usage, not buzzwordy/trendy enough?

"light" or "heavy" just reminds me of that classic story about the weight of software.


It’s been common for a long time to refer to software as “lightweight”, though maybe not specifically “light”.

If the title said “A smaller V8”, I’d probably assume it referred to the size of the binary on disk,


Why not allow the runtime to call the garbage collector (GC)? The GC is very lazy by default, it should be possible to make it collect all garbage. Currently v8 will just let the garbage grow because the GC is so lazy.


Probably should be re-titled to match the post as people are getting confused by the reference to V8 lite, which this post isn't directly about.

> However, in the process of this work, we realized that many of the memory optimizations we had made for this Lite mode could be brought over to regular V8 thereby benefiting all users of V8.

> ...we could achieve most of the memory savings of Lite mode with none of the performance impact by making V8 lazier.


Yes, the submitted title ("V8 lite (22% memory savings)") broke the site guideline which asks: "Please use the original title, unless it is misleading or linkbait; don't editorialize."

Doing this tends to skew discussions enormously, so please follow the guidelines!



This is great, but because you have achieved memory reduction by trading off speed, it would be nice to see charts that show processing time increases too.


> Lite mode launched in V8 version 7.3 and provides a 22% reduction in typical web page heap size compared to V8 version 7.1 by disabling code optimization, not allocating feedback vectors and performed aging of seldom executed bytecode (described below). This is a nice result for those applications that explicitly want to trade off performance for better memory usage. However in the process of doing this work we realized that we could achieve most of the memory savings of Lite mode with none of the performance impact by making V8 lazier.

Copied from article


I don’t know about “none”. Updating the age of the compiled representation of a function on every function entrance doesn’t strike me as cost-free. At a minimum that’s a store.


Bytecode aging actually existed already for other reasons (expiring code caches).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: