I wish new platforms would embrace 64 bit as the default. C and C++ for all common platforms use 32 bit literals and prefer 32 bit operations (char * short => int). Rust made a similar mistake. Java arrays use 32 bit subscripts, etc... I don't have a single computer (including my phone) using 32 bit pointers, but integers are stuck in the late 80s. (We already had 64 bit DEC alphas in the early 90s)
For a web page, 4 gig seems like a courtesy limitation to users, but that should be enforced by the browser refusing to suck up the entire computer rather than the language being unable to. I routinely write (non-web) applications which use arrays larger than 4 gigasamples.
I, on the other hand, wish everyone would embrace 32 bit by default. Doubling the size of every index, integer, etc comes at significant memory cost for next to no benefit. Increasing the amount of memory used directly impacts performance considering how slow memory is and how important it is to keep commonly used items in cache.
If you're seriously using more than 4gb in a web page... I wouldn't mind a non-default wasm64 format. You should also be considering shipping a native app at that point, wasm is not free in terms of performance.
On some platforms like ARM, the transition from 32bit to 64bit(Aarch64) and the improvements to the ISA allowed some software to run much quicker on the same hardware. While the increased memory usage was detrimental if you are running into memory pressure issues, the speed improvements would be well worth the change for most tasks.
When the Pi 3 launched it had a 32bit OS, but switching to 64bit improved the speed from 15 to 30% at the cost of 1.5X larger memory size.[1]
There doesn't have to be a choice between a fast ISA, and even 64 bit registers, and using 32 bit pointers/indices. We can do theoretically do both - it's just extremely rare in practice.
And it's also common to truncate 64 bit pointers down to 48 bits to pack other info alongside them, since those are the only bits that actually get used on most architectures.
You are probably right about the heap, but in the stack there are no big data structures. To move all data types to 64bits would impact the stack size of processes and threads.
Bigger data structures also mean more cache misses. So, systems would be also slower. As, bringing data from memory to the CPU is very slow compared with accessing L1 or L2 cached data.
So, you have a point. But, it does not apply equally to all data.
Let's take an example in your favor and say you've got 16 integer local variables per stack frame and you're recursing 1000 (!!!) times over some really unbalanced tree or such. You saved 64 kilobytes...
Ok, now let's say 32 bit integers are the standard: You either wrote a program that starts to fail in mysterious ways when someone runs it on a 2 GB file, or you complicated the hell out of your data structures by needing arrays of arrays (i.e. https://www.nayuki.io/page/large-arrays-proposal-for-java).
The overwhelming majority of what wasm will be used for probably does not include images, textures, sounds, videos or 3d models. The overwhelming majority of the memory in most wasm programs will probably be not large arrays of text.
The webpage as a whole is a gui program, but wasm is primarily a place to run computationally intensive code. The web-browser deals with things like rendering images natively without storing the image in wasm.
When it is text, it's usually more small bits of text than big bits. If you're just copying around giant strings you're unlikely to be using webasm. More likely you're copying around tons of small strings as you format them into a giant string if that's the end goal. Or you're copying around tons of small strings as you use those small strings to make api calls to the dom (indirectly through js). Or you're just plain running something like a physics simulation and don't even have text in the first place.
Honestly, I don't like the trend of web apps for everything, and I'd be ok if web pages couldn't really be more than toy programs. However, at that point, why even bother with WASM? I mean what is that even for if you're not doing real computation on real data sets? Real data sets are bigger than 2 billion sometimes...
If you've got 2GB of data and you're doing it in WASM... you're probably doing something wrong, for the simple reason that that means you probably loaded 2GB of data over the internet into the browser (and will probably need to redo that every time you revisit the page). That's enough data that you want to save it to disk, which means (probably) not using a browser.
As for "what is the point if not doing computation on "real" data sets"... letting you do computation on reasonably sized data sets. Consider that the size of a gaint codebase like linux is still only a few hundred megabytes (uncompressed). That's more than enough data to be doing interesting work. Or look at the size of a game like quake3.
You mean file:// urls? If you have access to do that... why not just install a real executable as well.
(Can't say I'm terribly up to date on the different ways to store data as a website... but if there's a way to persist multiple gb I would be very surprised)
I really don't like being in the position of defending web apps - that is really not my preferred way to do most things I would write. However, taking a devil's advocate position:
> You mean file:// urls? If you have access to do that...
That depends on who your audience is. It's certainly easier to tell some users "go hit this web page and click ok when it asks to open your file". And there are some web apps which I find preferable to downloaded apps. For instance, I really like draw.io, and it works seamlessly on Windows, MacOS, and Linux.
> but if there's a way to persist multiple gb I would be very surprised
The only limitations I'm aware of are tied to this nonsense about 32 bit integers :-)
Rust literals will infer their type from context if possible. Otherwise I believe the default is i32. There are also literal suffixes. So you can 999u64 of 999.02f64 if you want.
#![feature(type_name_of_val)]
fn main() {
let v = vec![1, 2, 3];
let x = 1;
let y = 1;
v[x];
dbg!(std::any::type_name_of_val(&x));
dbg!(std::any::type_name_of_val(&y));
}
Type inference works only when the types are unambiguous - arrays don't support indexing by anything else. With strict typing and very few implicit conversions in the language it works very well without surprises. You can still declare types ahead of time if you want to.
> Type inference works only when the types are unambiguous
It looks to me like when the types are ambiguous (because nothing placed a restriction on them), integer literals become 32 bit and floating literals become 64 bit. Then quietly, later when you add a restriction, the code above (which you've already worked through one line at a time with debug statements and all) changes from underneath you.
> You can still declare types ahead of time if you want to.
When the implicit stuff is surprising, I'd rather it wasn't ever implicit.
Tangentially, Rust goes well out of its way to avoid "type inference at a distance". For example, unlike Haskell or most MLs, type inference will not work across a function boundary.
#[derive(Debug)]
struct Foo<T> { pub bar: T }
fn hmm<T>(x: T) -> Foo<T> {
Foo { bar: x }
}
fn main() {
let f:Foo<f32> = hmm(1.23);
print!("got: {:?}\n", f);
}
So maybe that's not crossing function boundaries, but the type for the generic "hmm" function as the rvalue is being inferred from the explicitly declared type of the lvalue. I know it's pointless to try and change anyone's mind about this, but I personally find it unsettling :-)
pornel is right, but to be extra clear: if that happened and the types were mis-matched, it’s not very quiet: you’d get a compiler error. We don’t implicitly coerce numerics.
Personally I would prefer i64, but the following are defensible:
- Treat it as a special arbitrary precision type, similar to Go
- Choose isize because it's the native size for the platform
- Make it a compile error when the type can't be inferred
The rationale for i32 in RFC 0212 is basically, "well C compilers usually do this", and some hand waving about cache (why not i16 or i8 then?!?). Then they chose f64 instead of f32 for floating literals, which undermines the cache argument. So really, Rust did it because C compilers do it, and C compilers do it to simplify porting old software, which doesn't apply to Rust.
Yeah, I realize none of this can change. However, if there is ever a breaking version 2.0 on the roadmap, I'll tell you the features which keep me from actually using Rust. The i32 is pretty insignificant. :-)
JavaScript is already an old language from 1993 and quirky in that it doesn't have 64 bit integers. Webassembly inherits that so you can't really say it is a new platform.
I don't see Rust making what you call a mistake. Rust has 64 bit and even 128 bit integers.
64 bit integers and pointers have a trade-off we need to consider: they need more memory. Out there are still embedded systems perhaps running on a tiny battery and having 16 bit memory, and it is wasteful to force 32 bit or even 64 bit data on them.
So currently we have at least three different directly adressable memory sizes: 16 bit (up to 64 KiB directly addressable memory), 32 bit (up to 4 GiB directly addressable memory) and 64 bit (up to 16 EiB directly addressable memory, this is 4 billion times of 4 GiB, however currently platforms don't use all bits and don't have that addressability, for details see: https://en.wikipedia.org/wiki/64-bit_computing#Limits_of_pro...).
Also V8 underpins Node, which has many more legitimate use-cases for that much memory.
That said I think we need some sort of user-facing permissions constraints for memory usage and/or webassembly execution. I fear that in practice this new capacity will mostly get used by unscrupulous crypto-miners.
Why prefer 64bit pointers over multiple memory address spaces (https://github.com/WebAssembly/multi-memory)?
Your CPU has to do a lot of work to keep pretending it's memory is flat, If we give up that pretense you can optimize cache utilization and multiplex multiple WASM threads onto a single OS thread without allowing specter.
Because the CPU is going to do the work to make that illusion happen regardless of if you participate in it or not - so why constrain yourself to anything less than 64-bit pointers if you're not going to get anything in return?
What cache optimization are you referring to? Just that 32-bit pointers is a form of pseudo-compression and can be more densely packed? Because there's not really any other cache benefits in play here, and you don't need 32-bit compressed pointers to get optimized cache utilization - value types & data oriented design is far more effective for improving cache utilization.
Immutable data structures tend to use a lot of pointers see how Haskell executes, even in other paradigms halving pointer s can have a significant impact on alignment (tags, enums, etc).
V8 sandboxes WASM by placing them in a dedicated OS process, this buys 64 bit pointers, but comes at a cost. We can chose for a 64 bit processor to buy us in process memory isolation, by masking pointers or we can have 64 bit pointers.
Go ahead and use smaller types in your arrays of structs, but 32 bit loop variables is quietly asking for weird failures when your users run real data.
Just reading the part about the ensuing JavaScript implications gave me a stress-headache. Bless the people who are willing to work through these kinds of issues so the rest of us get to use the result.
Does the 4GB limit only apply per WebAssembly instance or does it apply to the V8 runtime as a whole? Can a single page run multiple separate 4GB WASM instances?
The title refers to web assembly memory limit. However, it’s mentioned javascript now supports up to 16 million terabytes of memory (equivalent to 4 jira tabs).
Taken seriously: RAM often does get offloaded to disk as needed, so realistically this changes nothing in terms of specs needed to handle what's in memory.
Taken less seriously: Do you not already need this just to run JIRA on Chrome? Or apps on Electron, generally?
I've found its faster and more reliable to reload JIRA tabs by simply creating a VM, running it on a browser in there, and pausing the VM when you don't need JIRA. When you need it, or after a restart, simply load up your VM and resume the operating system. This usually takes only a second or two, much faster than reloading JIRA the normal way.
If anyone from Atlassian is here, please help me create an official JIRA tab VM image! Maintaining my JIRA OS negates probably half the time I save loading JIRA this way!
You can do that in CSS now. I've noticed surprisingly many sites adjusting when I started using a Windows machine and set my system-wide preference to dark mode.
It would be "2^32", not "2^32+1". 2^32 isn't representable in a 32-bit integer, the same way 256 isn't representable in an 8-bit integer--the maximum is 2^8-1, or 255.
But the argument that they need to represent 2^32 and therefore that's why they used 64-bit integers is dubious. No 32-bit WebAssembly program could ever operate on and pass an object as large as 2^32 bytes because the code and data structures for the smallest possible program would already take up more than 1 byte in that 32-bit address space.
That won't happen on a 64 bit machine. By the time we need that, the code will be so vastly different that trying to prepare for it now would be counterproductive.
I agree it's stupid. I think worrying about representing the length of 1 object filling the entire address space is stupid, wether the address space is 2^32 or 2^64
> representing the length of 1 object filling the entire address space
Isn't this about encoding the WASM address space itself, inside the outer computer's 64 bit address space?
It's only a slight annoyance if it doesn't get to be quite the normal limit, but since they're changing some of this code anyway, and they're going to need even bigger numbers in the future, it makes sense to do a proper adaptation.
For a web page, 4 gig seems like a courtesy limitation to users, but that should be enforced by the browser refusing to suck up the entire computer rather than the language being unable to. I routinely write (non-web) applications which use arrays larger than 4 gigasamples.