Hacker News new | past | comments | ask | show | jobs | submit login
In mobile, 62.7% of computation energy cost is spent on data movement [pdf] (ethz.ch)
200 points by bshanks on Nov 27, 2019 | hide | past | favorite | 157 comments

One way in which the brain differs from conventional computers is that neurons appear to serve as both CPU and memory storage. Rather than having a few fast CPUs and a large bank of dedicated memory with a few busses (leading to "the von Neumann bottleneck" of data transfer between memory and CPU), the brain has many slow neurons each of which has many connections with other neurons.

So, in-memory computing may be more brain-like.

The brain appears to be very good at massive concurrency with low energy consumption, at the expense of slow serial computation and a high error rate.

The reason brains work the way they do is because evolution hill climbed to a solution, not because that solution is globally optimal. Compared to silicon, neurons are absurdly slow. Axons transmit information at 100 meters per second. Electronics transmit information 3,000,000 times faster. When your switching speed is 100Hz, the only way to get anything done in a reasonable amount of time is to be massively parallel. That's why brains are the way they are.

Other points of comparison: Animal wings (birds, insects, bats) combine both lift and thrust, but we don't fly around in ornithopters. And while animal legs combine support and power for locomotion, we don't drive around in vehicles with legs. Wheels are far more efficient, but they're not something that evolution can discover easily.

> Wheels are far more efficient

Wheels are far more efficient if you have a very specific type of terrain to deal with. It's not immediately obvious to me that a wheel is more efficient overall on naturally occurring terrain. For example, wheels don't work well on steep and rough terrain, or in water (as opposed to under water). The trade-off might be better stated as wheels are very efficient for very select terrain types, and that efficiency tapers off fairly sharply for some terrains, reaching 0% in some cases, while legs are less efficient overall, but achieve some level of usefulness on almost all terrains (making them a better choice for organisms that have disparate terrain types to deal with).

Similarly, a computer may be much faster in some types of operations, but we can't even get it to do some stuff that's fairly trivial for many animal brains, so that may also be a trade-off in efficiency somewhere (possibly in an area we don't sufficiently understand yet).

Possibly walking recaptures energy that a simple wheel mechanism wouldn't? I vaguely think I read something about how humans are fairly efficient at recycling the energy produced as you stride and gravity pulls you down.

yes, tendons, ligaments and even muscles are used to recapture energy as animals locomote. about 30% for walking is what i seem to remember from my grad school days.

ggreer above shows some real hubris and a deep ignorance of the ingeniousness of biological systems compared to human-created ones. a brain (and nervous system) is miraculous in its ability to gather, assess, store and discard ambiguous and contradictory information at astonishing rates.

> fairly efficient at recycling the energy produced as you stride and gravity pulls you down

Ever walk on dry loose sand at the beach? Plod plod plod.

So are you saying that's the exception that highlights the rule, or are you saying humans evolved to walk on beaches?

Cars that aren't normally driven on sand have issues too.

I'm saying one doesn't notice until it doesn't work, then you notice.

I would take exception to the idea that evolution can't/didn't discover wheels.


Everything from E. Coli, to yeast, to plants, to humans, is based on rotating molecular motors.

Now, it is true that we don't see macroscopic wheels on living organisms. But a person or animal is composed of trillions of cells. By analogy, we don't see the millions of vehicles in a nation composing an entity that has wheels made up of vehicles.

"About half of all known bacteria have at least one flagellum, indicating that rotation may in fact be the most common form of locomotion in living systems"


There's a few animals that have evolved wheels. Much harder to evolve an axle

From a mathematical view on complexity, I don't see exactly the difference between memory and computation. I wonder what would be suitable definitions for the two concepts.

This is actually reflected in modern FPGA devices. To your point, there's no material difference between "computation" and "retrieval" (they're just two different ways of mapping a set of inputs onto a set of outputs). FPGAs [1] are basically giant arrays of truth tables. These are implemented as look-up tables -- RAM -- paired with some dedicated, optimized, high-complexity logic elements like multipliers.

Could you implement a 16x16 bit multiplier (yielding a 32-bit result) as a RAM or ROM, I mean, totally. It would be a 32 bit address space where each entry is also 32 bits. That would take 16GB. You'd have the fastest multiplier in the West, though, net of propagation delay.

[1] From a combinational logic perspective.

Memory is the current state, computation is the permutation of state (your step function that takes you from one state to the next).

This is assuming you have an automaton, I would guess? But what if you approach it via something like entropy?

Edit: Doesn't that step function encode all of memory implicitly?

I have no clue what you mean by approaching computation from the point of view of entropy, entropy is a measure of the degree of disorder in a system, how is that a method of performing computation?

The step function does not encode memory. The step function is a function that takes memory state 1 as input, and outputs memory state 2 as output. The step function itself doesn't change, it stores no information.

In physical terms, the step function is the CPU, minus the caches, registers and other modifiable state. Memory is the disk and ram and all the CPUs caches and registers. You can't tell me anything about what the input to the CPU is, or what it is actually doing (other than that it is capable of running arbitrary x64 assembly), without looking at the state.

A function does store information. The common definition of a function is the set (R,(X,Y)) where R is the graph of the function (consisting of the pairs) and X and Y are domain and codomain, respectively.

Computation can be seen as some act of carrying out logic. And in turn, logic can be approached via entropy. There are a lot of people working in QM, computation and entropy. But the simpler definition is just via partitions (and you especially don't have to insist on it being in the world of QM either). [1]

[1] http://www.ellerman.org/intro-logical-entropy/

The selection of the function to use contains information. Once a function is given (e.g. the function specifying the x64 ISA), the function stores no information, it is immutable.

(Sorry about replying so late, I know it's unlikely you will see this, oh well)

kolmogorov complexity converts the combination of program and data to entropy, fwiw

If computation is the process of spending effort to receive information, then memory is partially paid-down computation. There are lots of ways to blur these lines.

You can make sense of the differences abstractly considering the band, the state register and the instruction table of a Turing machine.

A Turing machine is just one of many ways in which you could make a computing device. It is one of the easiest to mechanize and program for, which is why we use them but there is absolutely no reason to believe it is the only way of doing things. GP was getting at a more basic form of computation than the specific one performed by Turing machines. From a mathematical perspective the tape, the registers and instruction tables are all implementation details.

> absolutely no reason to believe it is the only way of doing things

Well, I did not suggests that.

> From a mathematical perspective the tape, the registers and instruction tables are all implementation details.

No, they are used to define concepts such as space complexity for Turing machines and there is something to learn from Turing machines, that's why people are studying them.

You can think of the Church Turing thesis as providing a formalism for how the tape, registers and instruction tables are all implementation details.

So the band and state register is memory? And the instruction table is computation?

The instruction table is a function. Computation is the act of applying that function to the state.


Grammatically "computation" is "The act of computing" where computing is a verb.

"Memory" is a noun, and not one about an act. It's more like saying "clothing".

This kind of touches on my original point.

A function can be seen both as the set of all pairs of the function, or as an oracle that you feed input to and receive output. The first is to me the full memory space, the second is maybe computation.

> neurons appear to serve as both CPU and memory storage

I'm not sure this is really certain. But that uncertainty is more because it's hard to prove that there isn't another means of recording memory than something we know.

There has been speculation that there might be some genetic or biopolymer based system in the hippocampus, but I'm not familiar with anything more than speculation on that topic.

While there might be other mechanisms at play, plasticity is a form of memory: If each synapse can store weights, read them, and change them, it could be seen as a memory.

For me, an analogy that is quite adequate in computer science would be LUTs in an FPGA: they are memory+compute units, and can be used as both.

To go a bit further, any memory could indeed be considered a computation unit, or vice-versa: consider a results cache, for instance. The difference I see between memory and computation is that memory accesses are, if not instantaneous, at least constant-time. If you want more precision, this can be computed: the necessary memory footprint is extended trough time as well as space (if you only use a results LUT, you'd have to make it bigger to gain precision).

Can it be that the neural structure itself is the memory? The weak/strong relationships maybe form the memories themselves, reconsructing a vivid image like seeds for procedural algorithms.

This still doesn't explain our sense of time in memories. We always remember when other than what, even if some knowledge has no time positioning

It seems unlikely to me that biological systems wouldn't exploit digital (e.g. genomic/DNA/RNA) kinds of memory storage, given that the materials are there and the energy costs of maintaining and accessing that kind of storage can be fantastically low. Keeping memory alive in neural circuits would seem much more difficult. Surely that's happening to a large degree though. I'm just highlighting the possibility of an alternative basis for certain kinds of memory in the brain.

That method of storage doesn't even need to be in the brain - could be anywhere!

High error rate?! I said precisely moving several complex muscles to vocalize a concept that requires simulating not only how I perceive it but how both the directed receiver and other listeners will as well while focusing my eyes rapidly on an area the size of a quarter.

Try doing that a couple times in a row, recording the audio and video. Can you do it the same way twice, with a measure of "same-ness" that can be trivially computed? No, humans cannot do that.

Error rate in the context of computing does not need to mean invalid, just different. Our astonishingly high abilities to compensate for variations in input data do not result from an ability to produce output data repeatably.

I don't think directly comparing humans and computers are that useful as they are "designed" for different tasks.

Could it be feasible to use the "brainlike" model for practical hardware and reap benefits from avoiding the vN bottleneck?

My intuition is that such a system would be much harder to reason about -- and therefore harder for compilers to emit efficient machine code for -- but I'm assuming someone here knows the topic pretty well?

(Before you give me "the lecture", yes, I'm aware that in general it's not a good idea to simply mimic the kludges that evolution came up with.)

Unless we made some big advances I'm not aware of, as far as I know we really don't know much about how the brain works. I doubt anyone can answer your question.

My question doesn't depend on knowing exactly how the brain works; it's a question about mixing memory and CPU, which happens to be similar to (how we believe) the brain works.

> which happens to be similar to (how we believe) the brain works.

Can you link sources supporting that ?

The claim you quoted is actually a null hypothesis—the negation (that the brain separates memory and processing) would require us to posit new types of as-yet-undiscovered neurochemical interactions (which may certainly exist, but we can't say!), and thus the burden of evidence would fall on proving that claim. Or, un-negated, for this claim, the burden of evidence is on disproving it.

If there is nothing supporting the claim there is nothing to disprove though. I'm genuinely curious about sources because I've never heard of that.

As far as I know any comparison between a computer and a brain is flawed from the get go.

edit: and to be clear my initial comment was just hinting that we can't develop a "brainlike" computer architecture because we simply don't know how the brain works at all.

I don't think you're wrong per se. But people aren't "claiming" the brain works this way. People are simply stating that maybe the brain functions like this and we could adopt similar designs based on our perceived impressions of how the brain may function.

We could totally botch it though and end up with an even worse computer and turns out it works nothing like the brain. Who knows. But right now many people "believe" the brain may function in the manner described.

To be clear, what I'm saying here is that this claim maps into our model of neurology as the equivalent of a claim like "the earth isn't a triangle." Whether or not it's supported by any evidence, what it actually is is a positively-phrased restatement of part of our set of axiomatic priors.

We know that the Earth can only be one shape; we know there are an infinite variety of shapes that something can be; and so our priors contain a set of all the claims like "the Earth is potato-shaped" and "the Earth is a doughnut" all with extremely low probability, before we encounter any evidence at all, just because the probability-mass has to get spread out among all those infinitely-many claims.

Assuming continued lack of evidence either way, a claim like "the Earth is not [one particular shape]", then, doesn't require argumentative support to be taken as a default assumption (as you might in e.g. the opening of a journal paper.) The probability of it being any particular shape started very low, and we've never encountered any evidence to raise that probability, so it's stayed very low.

(Yes, that even applies to the specific claim that "the Earth is not an oblate spheroid." If we never encountered any evidence to suggest that claim, then it'd have just as low a default confidence as any of the other claims it competes with.)

For claims with no evidence either for or against them, the analytical priors derived from the facts about the classes of claims to which the claim belongs, determine where the burden of proof lays. Low-confidence priors? Burden to prove. High-confidence priors? Burden to disprove.

In this case, we already know that neurons can do several things, and AFAIK we've never encountered any evidence of neurons having specialized functionality, or any evidence that neurons don't have specialized functionality. Our tools just aren't up to telling us whether they do or not. But, because one claim actually factors out to several claims (neuron specialization → lots of different ways neurons could specialize) while the other claim doesn't (neuron generality → just neuron generality), the probability-mass ends up on the neuron-generality side. (This is another way to state Occam's Razor.)

Mind you, this might be entirely down to our inability to study neuronal dynamics in vivo in fine-enough detail. In this case, our lack of evidence doesn't imply a lack of facts to be found, because we have no evidence for or against this hypothesis. Instead, it just determines what our model should be in the absence of such evidence, until such time as we can gather evidence that does directly prove or disprove the specific hypothesis.

Or, to put that another way: if humans only ever studied bees from a distance, the default hypothesis should be that all bees do all bee jobs. The burden of proof is on the claim that bees specialize. Later, when we get up-close to a beehive, we'd learn that bees do specialize. But that doesn't mean that we were incorrect to believe the opposite before. Both our belief before the evidence, and our belief after the evidence, were the "correct" belief given our knowledge.

I was just going off the claims at the top of the thread:


I have no special expertise otherwise; if you want more substantiation or wish to dispute that point, you could post a reply where the claim was originally introduced.

Yes but in a computer every processing unit can be used for widely different tasks. In contrast, the neurons in a brain are used for the same task over and over again.

But you could as well say, the same part of computer's memory where the execution binary is stored is used for the same thing again and again.

I don't think these parallels between the computer and the brain are valid though. The brain is a highly parallel but slow machine, we have very fast and not-so-parallel computers (just a few cores). I think a lot of what the brain does can be expressed on computers without changing the architecture much, i.e. you don't need to compare the architectures, just the outcomes of computation.

But surely we can improve modern computing devices architecture by taking a look at the computing advantages the brain takes in order to achieve its desired tasks.

Don't forget humans are one of the most expensive resources and nothing can really replace them for even a vast majority of non-physical jobs. So if we could build computing devices that potentially solve some of these problems, needless to say, the productivity boosts would be plentiful.

Secondly, we aren't even close to being able to model anything relatively close to the computational capabilities of the human brain because we don't even understand the human brain. So your comments on highly parallel but slow and fast but no so parallel don't make a lot of sense.

For example take MapBox's new vision SDK. It's able to perform semi-decent feature extraction on the road while people drive via a camera. Well guess what I would absolute stomp the vision SDK on accuracy for every feature it thinks it identified, not only that, I am capable of identifying an order of magnitude more features than it can, not only that, but I'm able to identify new features on the fly and even guess with greater accuracy what they are.

So yeah, there are a plethora of functional yields that we have yet to achieve with some of the most powerful computers in the world that are achieved by the human brain every day. Which could be indicative of maybe both a resource, but also an architecture problem.

> I would absolute stomp the vision SDK on accuracy

I suspect our recognition abilities may in fact be worse than that of a good SDK but we have the advantage of a general real world experience. Bit of a simplified example, in order to recognize a dog you should have seen a lot of various animals, have a basic knowledge of anatomy, animal behaviour etc. A lot of the times recognizing the context helps too: e.g. something on a lead ahead of a walking human in the street, likely a dog - you need only a very quick confirmation to say it's definitely a dog, etc.

I also think the brain does a lot of tree searches with optimizations (which are never perfect), and it's where the brain's parallel architecture proves to be beneficial.

Intuitively though, modern computers are still not powerful enough to perform the same tasks albeit mostly sequentially. I believe we'll get there and I think AGI in a simplified virtual/gaming environment is the best place to test our approaches.

> the neurons in a brain are used for the same task over and over again

How does this square with the fact that people can get large portions of their brain scooped out and still retain many of their previous abilities?

This is a naive assumption, but if there were purely a 1 to 1 correspondence between a neuron and any given task, I’d assume you wouldn’t be able to recover from massive brain damage. But it seems like people can retrain the functional parts of their brain after damage to other parts to pick up the slack somehow.

Super fascinating, any recommendations on where I can do further reading on this?

Low energy consumption checks out - a brain uses only 12 watts.

You have to feed and train it for the first 15+ years before it's any good though.

Brutal truth, that's also assuming 100% success rate. Lots of brains turn out to be shitty brains after 15+ years too...

"Insanity -- a perfectly rational adjustment to an insane world." — R. D. Laing

Your brain uses 12 watts at rest and 120 watts under stress.

There is no CPU in brain, some analogies don't go very far.

Some big part of the data movement can be eliminated by introducing persistent RAM, i.e. the memristor that never happened unfortunately. When the missing component is finally invented I think we'll have an opportinity to revisit some of the core OS concepts, such as a file. Just imagine for a second you have a persistent random-access memory device that doesn't require you to serialize/deserialize things, or move and resolve the binary modules. Your binary module/app will be executed right where it's stored, provided that it was resolved and prepared when it was first copied to your file system. Similarly, a JSON file will be stored as a memory structure rather than serialized JSON (converted to text as necessary, e.g. when viewed in a text editor) Etc. etc.

There are some really interesting changes that can potentially happen when the memristor type of memory becomes available, possibly with its own problems too, but with the huge benefit of moving less data.

> Similarly, a JSON file will be stored as a memory structure rather than serialized JSON

You can do this in C++ right now: mmap a file to a memory region, and just create structs in that region.

This has two problems:

- normally you want to save at controlled points in time, otherwise you have to worry about recovering from states where some function updated part of the data and then crashed (phone ran out of power etc)

- just writing a bunch of internal data structures into a file used to be moderately popular and has great performance, but it's a major headache when you ever update them. You end up implementing a versioning scheme and importers for migrating old files to your new application version. At that point a file format that is designed for data exchange is less headache.

In general mmap already offers you a way to treat your disk like memory with decent performance (thanks to caching), and the number of good use cases turned out to be somewhat limited. I doubt just making that faster with new technology will change much.

There is a huge difference between what the GP proposes and your interpretation: the GP talks about a structure that is not just a memory backed copy of the data in the file (which still requires an in memory structure pointing to the bits and pieces to make sense of it, or a painful parse step for every access and an impossibility to write back to it).

So the whole structure in its native form rather than the contents of the json file, for instance a graph would reside in memory and could be operated on directly.

Hence the 'serialized' in the part that you quoted. Once you serialize it the whole thing becomes hamburger and needs to be parsed again before you can operate on it.

wongarsu is talking about actually mmaping the C structs in memory out to disk. There's no serialization involved: the format on disk is exactly the same as the bytes in RAM, and you rely on the OS to page them in on demand (notably, parts of the file that are never touched by the CPU are never paged in). You get around the invalidation of pointers by never using them: all data is stored as flattened arrays, and if you need to reference another object, you store an array index instead of a pointer.

This is a more common technique than most people suspect. It's taught in most operating system courses [1]. It's the basis for how SSTables (the primary read-only file format at Google, and the basis for BigTable/LevelDB) work, as well as for indexing shards. It was how the original version of MS Word's .doc files worked, and was also why it was so difficult to write a .doc file parser until Microsoft switched to a versioned serialized file format sometime in the 90s. I think it's how Postgres pages work (the DB allocates a disk page at a time and then overlays a C structure on top of it to structure the bytes), but I'm not familiar enough with that codebase to know for sure. It's how zero-copy serialization formats like Cap'n Proto & FlatBuffers work, except they've been specifically engineered to handle the backwards-compatibility aspects transparently.

It has all the problems that wongarsu mentions, but also huge advantages in speed and simplicity: you basically let the compiler and the OS do all the work and frequently don't need to touch disk blocks at all.

[1] https://www-users.cs.umn.edu/~kauffman/4061/lab06.html

I think we already covered that:


Couldn't you just mmap a binary final with the same in-memory structure rather than a text file?

Pointers in your memmapped file would be invalid after the second load. So it can't be the exact same structure that you would normalize use in a memory structure.

You could do it with relative offsets if you wanted, that's getting pretty close to pre-heating a cache from a snapshot. That way the file would not contain pointers but you could still traverse it relatively quickly by adding the offsets to the base address of the whole file.

Pointers could be valid if each file in this new OS resided at a specific location say in a 64-bit address space. You could allocate for example 4GB of virtual memory to each file and guarantee that a file will always be found at the same location with all the pointers intact and valid.

That sounds pretty hacky but it could work. Better to do it right and make the whole thing relocatable and the software addressing it aware of that. If you don't do it that way you may end up with some very interesting bugs, such as when your code also gets loaded into a 4GB segment (not all machines are 64 bits), and now all those pointers are valid throughout your memory image. That's bound to lead to confusion. Most CPUs can do base+offset with very little overhead anyway so there would be no or very little gain.

I don't think relocations are necessary at all. Within one system each file has a fixed place. Relocation and resolving happens only when files are copied across systems, where serialization should happen anyway. So e.g. once you receive a binary from the network, you resolve it and place it somewhere in the 64-bit space and then just execute it from there every time. The inode number becomes the file's physical address essentially.

Base+offset segmentation does have an overhead since you'd need some extra CPU registers for that if I understand it correctly.

Using `blitting` from and to disk for performance is how we got DOC & RTF split - with the former as "Work-In-Progress" format that was fast to save on a floppy, and the latter intended as interchange format.

History turned otherwise, unfortunately...

I recently realized that mmap was how good editors are able to open huge files without stalling like bad editors do. There are use cases for it but yes, you need to know what you are doing.

The sublime text/merge team has some interesting discussion about using it that way: https://www.sublimetext.com/blog/articles/use-mmap-with-care

I think it is this very post that made me realize how sharp but also how double-edged of a blade mmap was.

My pet concern is "have you turned it on and off again?" will no longer work. Just like apps have a robust database and a frontend that needs restarting after getting into a bad state, programs for the Persistent RAM generation may need a robust state management system so that frontend state can be rebuilt.

This could be solved with transactions: If all processing takes place within transactions, then you can always return to a clean state after a crash.

We have persistent ram technology . Intel and Micron have made ssd form factor and also DIMM form factor Optane devices. They are based on 3D xpoint. Putting it on an SOC is possible but would be very expensive at this time .

There's already an experimental OS that is built on idea of persistent program memory: http://phantomos.org/

That's very intriguing, thanks!

The memristor is a real thing, just hasn't been implemented yet right?

Basically, yes: some prototypes have been made in labs, but haven't really seen the light of day. Wikipedia has an article about it here: https://en.wikipedia.org/wiki/Memristor

As noted in the article, HP says that memristor memory may be commercially available by 2018, so I guess we'll see then....

Including a copy of Maslow's hierarchy of needs and some green field vs factory slides seems to be .. overselling it a bit?

Fortunately it pulls back towards a "what are the easy wins" approach: in-memory initialising (rather than stream zeroes over the bus, ask the memory to zero itself) and in-memory copy.

The three real problems are:

- this is only useful if the whole stack "knows" about it, or can be transparently optimised to use it. Otherwise the person running a Javascript for loop to set values to zero will ruin it.

- this relies on the data being in DRAM-row-size chunks, and moving it around within the same physical chip. That may also require re-architecting to make that common enough to be useful. Locality issues are the downside to distributed systems; the whole CPU architecture is oriented towards continuing to pretend that everything happens in a defined order in a single place.

- possible security issues (rowhammer?): may seem far off, but if you don't think about it upfront it gets very expensive later.

(Worth recapping: the silicon is fundamentally differently-processed for DRAM, such that implementing complex logic is slow and expensive.)

It's not just a problem with data movement inside the machine. Data movement between processes and machines is also horribly wasteful.

Text encoding/decoding wastes massive amounts of energy, which is why I've been developing a format over the past 2 years with the benefits of binary (smaller, more efficient) as well as text (human readability/editability): https://github.com/kstenerud/concise-encoding

> Data movement between processes and machines is also horribly wasteful

I agree, and in .NET that’s a solved problem: http://const.me/articles/net-tcp/ (2010).

Their text format is XML. Their binary version takes small fraction of bandwidth, especially when using a custom pre-shared XML dictionary. Binary is much cheaper to produce or parse, the built-in DataContractSerializer understands both formats. Supported by all modern editions of .NET including .NET Core and is compatible across them, e.g. I’ve been recently using it for a network protocol of a Linux ARM device.

Waste can be very subjective.

Exchanging text makes the format easier to debug and accessible to humans, which can be a purpose in itself.

100% agreed, which is why Concise Encoding has 1:1 type-compatible binary [1] and text [2] formats, which can be automatically converted when needed with no data loss.

Human accessibility is an absolute must for any modern format.

[1] https://github.com/kstenerud/concise-encoding/blob/master/cb...

[2] https://github.com/kstenerud/concise-encoding/blob/master/ct...

It should only be easy to debug and accessible to humans when needed IMO. I really think it should be possible to load a "production message", log event or whatever into a debug parser which only then spits out human readable output.

Even switching between modes may introduce new kinds of undefined behavior, which may not be visible on the text-only side

Every extra step in the process adds another potential failure point. But we're fast approaching a data and energy crunch that will push the industry towards binary formats once more. This is my attempt to keep that shift sane, and avoid the mess of the 80s and 90s.

The implementations are almost done now, and my first tool will be a command line utility that reads one format and spits out the other, so that you can take a binary dump from your production system using tcpdump or wireshark or whatever, and then convert it to a human readable format to see what's going on. I'll probably even put in a hex-reader so that you can log the raw message and then read it back:

    2019-10-02:15:00:32: Received message [01 76 85 6e 75 6b 65 73 88 6c 61 75 6e 64 68 65 64 79]

    $ ceconv --hex 01 76 85 6e 75 6b 65 73 88 6c 61 75 6e 64 68 65 64 79
        nukes = launched

I doubt there is any significant crunch coming from just encoding/decoding the stream. Most of the time comes still from waiting for network resources, and evaling megabytes of add-in JS. Now the same problem applies to JS, and the counter argument for openness and usability are still the same. Compressed transmission and binary parallel transmission in HTTP/2 are also helping with the comms size.

The project still seems cool, I'll have to have a deeper look into it soon

This is pretty cool! I really like that dates (and timezone info) are first class types. Often neglected, time ser/de has been a monkeywrench in many a project.

Protobufs are neat but `protoc` is an absolute nightmare. I would warn against going down the path of transpiling to lang-specific message files if at all possible (I don't think you are taking that approach, but github freezes my phone browser, I'll look more on my dev machine)

Great stuff! I would love to see this get more traction. Interchange formats are a classic victim of critical mass necessity.

How does this compare to MessagePack? (https://github.com/msgpack/msgpack)


* Doesn't have a sister text format for human editing.

* Uses raw uncompressed types for integers and floats, which wastes space

* Doesn't support arbitrary sized/precision numeric types

* Timestamp format uses seconds instead of Gregorian fields (which means that your implementation must keep a leap second database, and time conversions are complicated)

* No time zones

* Doesn't have metadata, comments, or references

* Container types have an unnecessary length field

* Stores data in big endian byte order (an unnecessary expense since all popular modern chips are little endian)

* Wastes an extra bit for negative integers

* 32-bit limitations on length (so max 4gb payload sizes)

* No versioning of documents

That's actually a pretty convincing list! You might want to consider adding a feature comparison matrix for a few of the more popular formats in a prominent location. It also probably wouldn't hurt to lead the page with a small example of encoded text, the way MessagePack does (it's what got me to give their website a closer look, which led to GitHub, and then I started using it).

Regarding the point about having a sister text format, I suppose I'd tend to view JSON as filling that role. Not to bother you for a second comparison, but if you happen to have it handy seeing how CTE compares to JSON in terms of encoding efficiency would be interesting to me.

Edit: I found a list of MessagePack <--> JSON limitations that I hadn't really thought about! (https://github.com/ludocode/msgpack-tools#differences-betwee...)

Thanks for the feedback! I'm not exactly the greatest at marketing, so any help in this regard is most appreciated :)

I've worked with lots of binary formats, and they are equally "human-readable" (and far less unambiguous in a lot of cases) once you get used to it. As a coworker once said, "everything is human-readable if you have a hex editor."

It seems your format is yet another attempt at reinventing ASN.1.

But the point is that binary formats shouldn't require looking at hex data just to make sense of them. Especially not nowadays. Once you're stuck parsing binary data by hand (which is tedious), you've lost the amazing benefit that text formats gave us. And editing binary formats is even worse.

That's why I split it into compatible text and binary formats. You transmit/store in binary, and read/edit in text, and the computer deals with the conversion automatically.

I think this is the direction things are heading. Currently, I use yaml for tons of configs and the like because it's super easy to modify. I use jq to process because jq is the bee's knees. There is a yq out there which can read most config formats, but it doesn't support the full jq syntax, so instead I pipe everything through `yq read -j -` to convert to json and then `| jq 'query'`. works a treat.

I can see with a few more small parts an application which persists as bson, streams / tempfile / and/or mmaps to yaml/hcl/whatever you like, opens that in your editor of choice, and translates back to bson.

Seems like a good effort. I couldn’t easily find an example of the text encoding looking at your readme though.

The entirety of acceleration hardware industry is based around this fact. Basically every single accelerator tries to move more memory towards significantly simpler (and wider) compute. At the extreme of this is compute-on-DRAM, not a new idea certainly, but one that is yet to materialize. Systolic architectures are also very efficient . GPUs are far less so, their main advantage is relatively good tooling and extensive programmability, not energy efficiency, per se. And CPUs utterly and completely suck at high throughput workloads, power efficiency wise. They do often have enough compute to do e.g. lightweight deep learning though.

Years ago it cost 8pJ/mm to move a byte on-chip. It's probably closer to 5pJ/mm now, but ALUs consume a fraction of this energy to do something with that byte. And once you leave the chip and hit the memory bus, things get _really_ slow and expensive.

I think this headline is very misleading. This is about computational energy cost and does not represent what the headline implies (that it's 62.7% of total energy consumption)

Yeah on mobile screen and radio dominates the compute part by quite a bit. Not that it's not still worth optimizing.

Yes, It is just lower prioritises.

100% of the Mobile Phone Energy Usage during Internet Browsing, 50% goes to Display, 30% goes to Radio Transmission and 5G Encode / Decoding. Your Three SoC + NAND + Memory only uses 20%, I just think title is a little misleading.

Not to mention the Software we are running could be anywhere between 2 to 10x from maximum efficiency. It is just the human cost involves to get to the point might not be worth it.

There is also Moore's Law, while we might be near or already at the end of it, we still have few more nodes to go and could reduce the energy usage of SoC + NAND + RAM by 30% to 50%.

While on the Display and Radio is isn't so clear how far we could further reduces it.

Screen and radio are both data movement tools, so I don't understand the OP's classification process.

The next step after in-memory computing: in-screen computing!

Maybe not computing, but cathode ray tubes were used as RAM in the forties: https://www.radiomuseum.org/forum/williams_kilburn_williams_...

We're already moving towards self refreshing panels and the like. And it's not a stretch to imagine in-screen cursor movement.

I agree, sorry. The original paper used the phrase "total energy" but although the paper text does mention powering the screen, their figures just talk about computational energy, so that is likely what they meant. It's too late for me to edit the title, but it should be "62.7% of computational energy".

Ok, we've clarified that in the title above.

The number one cost of any distributed system is data movement.

Mobile phones are essentially leaf nodes in a huge distributed system, where they interact with "downstream dependencies", ie. the servers that provide all the functionality of their apps.

So it makes sense that moving data would use the most power.

A small part of the solution: Let's block ads! Proof: https://webtest.app/?url=https://www.wowhead.com

Wow, this website is cool and makes for a fun and very interesting (and telling) comparison between old reddit and the redesign:




An order of magnitude increase in page size and energy consumed, and it takes 5 times as long to load, with 5 times the ad requests.

Given the current environmental state, there should be a big push to shame these websites for wasting energy. Perhaps if this happened on a larger scale we'd see better designs produced that don't rely on highly wasteful and inefficient virtual machines that are mistook for web browsers.

Good idea. Where do we start this shaming? I don't have the resources.

I knew that the new reddit site was terrible (and keeps getting worse), but just under 15mb for a website is insane.

That link was shockingly fast. I was quite surprised by how quickly it loaded - you don't notice any loading time unless you specifically look for it. If only everything would be like that...

Disable JavaScript. Everything becomes fast (or doesn’t work at all).

I'm tired of this comment.

1. Pretty much every website needs JS to function as intended.

2. Delivering fully marked up HTML to the client is a waste of server resources and an under-utilization of 4 GHz client resources.

I'm not going to jump on the JS hate-train. I develop for the web, I use JS every day. I like JS, I think it's reasonable for websites to use JS. I think most of the people who hate JS are either misinformed or just angry that the web has made programming accessible to a new generation of non-programmers. Fight me.

But all that said, seriously, a lot of current websites don't require JS to function, and if you use an extension like uMatrix, it's trivial to re-enable it on the sites that do.

I have Javascript disabled by default on my personal computer at home. It's not, like, trivial to use -- you will find a lot of broken websites. I wouldn't turn off JS for my parents. But if you're technically inclined, turning JS off is really, honestly not a problem. You just enable it whenever a page doesn't load.

Most news sites I visit work without JS, it will speed up a number of sites dramatically, and (when combined with a few other settings) it can be a huge privacy increase too. It's worth considering, particularly for portable devices like laptops.

Yeah, I use uBlock along with a pi-hole-style dnsmasq blacklist. When I have to use crude unfiltered internet somewhere I'm rudely shocked by how slow it is to render--and horrible, once it does.

> 2. Delivering fully marked up HTML to the client is a waste of server resources and an under-utilization of 4 GHz client resources.

And this kind of entitlement is why everybody hates the web noawadays and is running adblock on everything.

What executes on MY resources is up to ME--not somebody else.

If what you're doing isn't directly helping ME, burn your own resources.

I don't think it's entitlement. Many free/ad-supported websites are running on very thin margins and sharing the computational burden with the client is not an unreasonable ask.

It is indeed unreasonable to expect me to waste my battery life to run your tracking scripts so you can stalk me against my wishes and without my consent.

"... as intended."

There's the catch. The web developer's intent does not necessarily align with the user's. The developer wants to use Javascript. What does the user want?

The simple example is a website that just delivers information.

The user just wants the information as quickly and easily as possible. She does not care whether Javascript is used to deliver it.

She does care however if the delivery is more resource-intensive and slower.

But if the website requires JS to even load the content, then it doesn't matter what the user's intent is. And it's easy to say "just close the tab" until you really need that content.

It’s also easy to enable it for that one interaction.

This sounds great in theory, but I tried this for a while and got really frustrated at just how often I had to enable JS for a specific site/page because it was broken, sometimes in non-obvious ways (e.g. some interactions work but others rely on JS). For me at least, it wasn't worth it, but YMMV depending on how you use the web and your tolerance for this kind of thing.

Mostly agree with #1, but I don't understand #2. You're saying delivering plain, pre-rendered HTML is too fast for the client, so we shouldn't use it (especially in a thread about energy use on mobile)?

Using JS for partial page updates (whether that's a full single-page app or something smaller) has the potential to consume fewer network resources across multiple interactions.

Has the potential to ... but in practice, no. Websites which do this overwhelmingly use "best practices" frameworks, libraries, packers, minifiers, and up with an extremely dense 1 MiB javascript bundle, while the meaningful html/text of the page is less than 10 KiB. Images are bigger but the need to load / ability to cache is the same in either situation. And be careful that your api responses are not huge json blobs with more information than the page needs ...

Modern webpages use all the latest best techniques, and lose the performance race by miles, over and over again. Nobody is interested or enabled to clean up the mess, just to try to add more big-bang optimizations on top, after we get a few more features which users hate crammed on top ...

The truth is we had the technology for extremely efficient computation 10 years ago. New technology is nice, I like it, but we don't need it. But also, it doesn't matter. Simple gluttony and sloth will overwhelm any efficiency improvements.

> Pretty much every website needs JS to function as intended.

True, but I prefer them to function better than intended.

1. Actually no, most websites work just fine. I have YouTube, Twitter, and Facebook (I know) whitelisted. Pretty much everything else I use regularly is fantastic.

2. The server doesn’t have to run on a battery small enough to put in my pocket.

> Pretty much every website needs JS to function as intended.

Demonstrably untrue.

Wow, 97% of salon.com is ads!

Page size: 1.86 MB vs 0.05 MB

Energy used: 0.109 Wh vs 0.003 Wh


Just tried this using YouTube as the test site...w/ uBlock, makes more requests, larger page size, etc...hmm

Just a guess, but from the preview image the site shows, youtube w/o ublock loads 2 large banners, and one row of videos, vs 3 rows of videos w/ ublock. The larger number of video thumbnails shown may explain the larger size

That's because there is more room for thumbnails in the right column because there is no ad on top. Weird right :). We have to block more!

That's a pretty extreme example.

Here are the results for NY times: https://webtest.app/?url=https://www.nytimes.com/

Barely drops page size or power usage.

> How is energy consumption measured?

> For now this test only displays the estimated Watt Hour for transferring the bytes of the web page (source). Data transfer is not all it takes to run ads, there is also data crunching happening on servers and rendering of the ads on clients. This means the estimated Wh could actually be higher, but I need more sources for that. Please email me![0]

Given that the page takes an extra 3 seconds to load with ads, power usage is probably higher in reality. On the data side, if you look at the screenshot for both pages, you'll also notice that the blocker-free Chrome version has a giant white space at the top where a banner should be.

My guess is that the banner hasn't loaded yet, but webtest didn't know to wait for it.

In my own (very, very unscientific) test, I loaded up NYT in Firefox (which has my standard extensions installed) and in a fresh Chrome install. The Chrome install downloaded 2.9 MB for ~170 requests, and my Firefox install downloaded ~240 KB for 17 requests.

I left the both browsers open for about a minute, and an extra .3 MB got downloaded on the Chrome side as part of a tracking ping, so it's not just page load either that's a concern here -- every minute you stay on the page you'll leak more data.

To be clear, the NYT is great at optimizing page load with ads. If anything, I would consider them to be a positive anomaly. But even so, I'm seeing ~80% bandwidth savings over a fresh Chrome install. I suspect my extensions are more aggressive than uBlock Origin is by itself. But the page still loads and works fine for me in Firefox, nothing appears to be broken.

[0]: https://webtest.app/?url=https://www.nytimes.com/#energy

That actually might be the largest part of the solution.

Cool, let's do this!

Cerebras's "wafer scale engine" [1] takes some of these ideas and applies them narrowly to deep learning training. 400,000 cores, 18GB of on-chip memory, 9.6 petabytes of memory bandwidth in a 1.2 trillion transistor package in a gigantic 46,225mm^2 package.

[1] https://www.cerebras.net/cerebras-wafer-scale-engine-why-we-...

"We observe that data movement between the main memory and conventional computation units is a major contributor to the total system energy consumption in consumer devices. On average, data movement accounts for 62.7% of the total energy consumed by Google consumer workloads." [0]

[0] https://people.inf.ethz.ch/omutlu/pub/Google-consumer-worklo...

A major improvement to the energy cost of data movement will come from closer memories, like HBM and on-die stacked DRAM.

What do you mean by "closer memories"?

Putting memory closer to the CPU, through dense and short-distance interconnects, allowing for much more efficient communication. Eg. https://www.youtube.com/watch?v=-besHp8HLxo.

Slide 3. Paper https://people.inf.ethz.ch/omutlu/pub/Google-consumer-worklo...

Still, this doesn't sound right. Screen should be dominant.

The paper is focused on "computation", that seems to be defined as what happens on the CPU and the things directly connected to it, and ignores everything else.

I clicked expecting a paper about 4G networking.

I don't think they are talking about the phone's total energy; the charts in that excellent paper you linked to have costs for CPU, L1, L2, DRAM etc but no screen, GPU, radio etc.

A long time ago I was a technical product manager for, among other things, this stuff. At the time, the screen completely dominated and the GPU was a distant second, unless you were playing a 3D game, in which case it used almost as much as the backlight. CPU and radio were a long way behind them.

My intuition is that this hasn't changed, as I can still get dramatically better battery life on my new iPhone in iBooks by playing with the brightness slider.

Per Carroll's results in An Analysis of Power Consumption in a Smartphone: The backlight is either the lowest power local component (10%, 40mW) or highest (50%, 400mW) depending on how bright it is turned up.

The GSM radio is the highest power component (about 400mw), when it is working continuously (phone call).

Related: a big contribution of smartphone OS optimizations and stuff like wifi 6 is about being smarter about turning off the radio for micro time slices when not needed.

Laptops are similar, but it depends a lot on whether you are using a 15W CPU Ultrabook with no GPU, or 65W mobile workstation CPU plus a GPU.

Yes, the title of this thread is not clear.

I expect baseband hardware to be dominant as well.

With all of that ad overhead it's reached a point where the actual costs incurred by the user rival the ad revenue gained by the publisher.

Most of the data is useless anyway. Bloat factor over mostly text content and an adequate quality picture is huge, probably 10x.

ethz going _strong_ these days on HN

TLDR: let's put compute inside the memory. I'm very leaning towards calling this bullshit.

Closer to memory compute is what we have been doing for last 10 years, and that's why we have ever increasing hierarchical caches.

Yes the ALUs are a tiny part of the power budget because moving / syncing data is the hard problem. If you want massive parralelism, use a GPU.

The in memory clone and zero could make some sense, but generally you want to start writing to that memory soon after that, meaning you still need to pull it into your cache and the benefit is negated.

PIM is already a reality it's just a matter of time until it sees broad adoption. PIM does not suffer from the crippling limitations of GPUs which only perform well with arithmetic bound problems, cannot run conventional non SIMD code efficiently and need a relatively large batch size.

For instance try deserializing a million JSON strings with a GPU. The end result is a graph like memory structure which GPUs usually struggle with. GPUs cannot take advantage of sharing the instruction stream here because it will hit branches for almost every single character and diverge quickly. And finally if you only want parse one JSON object then the GPU is worthless.

A PIM based solution would not struggle with heterogeneous non-batched workloads with arbitrary layout at all but still offer the same performance advantages that GPUs enjoy compared to CPUs and reduce energy usage at the same time.

> If you want massive parallelism, use a GPU.

The GPU doesn't have all of your data stored in it. The clone and zero part have a very weak reasoning, focusing on bad targets because they look better than important ones, but take a look at the next section, where they do a simple database search.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact