So, in-memory computing may be more brain-like.
The brain appears to be very good at massive concurrency with low energy consumption, at the expense of slow serial computation and a high error rate.
Other points of comparison: Animal wings (birds, insects, bats) combine both lift and thrust, but we don't fly around in ornithopters. And while animal legs combine support and power for locomotion, we don't drive around in vehicles with legs. Wheels are far more efficient, but they're not something that evolution can discover easily.
Wheels are far more efficient if you have a very specific type of terrain to deal with. It's not immediately obvious to me that a wheel is more efficient overall on naturally occurring terrain. For example, wheels don't work well on steep and rough terrain, or in water (as opposed to under water). The trade-off might be better stated as wheels are very efficient for very select terrain types, and that efficiency tapers off fairly sharply for some terrains, reaching 0% in some cases, while legs are less efficient overall, but achieve some level of usefulness on almost all terrains (making them a better choice for organisms that have disparate terrain types to deal with).
Similarly, a computer may be much faster in some types of operations, but we can't even get it to do some stuff that's fairly trivial for many animal brains, so that may also be a trade-off in efficiency somewhere (possibly in an area we don't sufficiently understand yet).
ggreer above shows some real hubris and a deep ignorance of the ingeniousness of biological systems compared to human-created ones. a brain (and nervous system) is miraculous in its ability to gather, assess, store and discard ambiguous and contradictory information at astonishing rates.
Ever walk on dry loose sand at the beach? Plod plod plod.
Cars that aren't normally driven on sand have issues too.
Everything from E. Coli, to yeast, to plants, to humans, is based on rotating molecular motors.
Now, it is true that we don't see macroscopic wheels on living organisms. But a person or animal is composed of trillions of cells. By analogy, we don't see the millions of vehicles in a nation composing an entity that has wheels made up of vehicles.
Could you implement a 16x16 bit multiplier (yielding a 32-bit result) as a RAM or ROM, I mean, totally. It would be a 32 bit address space where each entry is also 32 bits. That would take 16GB. You'd have the fastest multiplier in the West, though, net of propagation delay.
 From a combinational logic perspective.
Edit: Doesn't that step function encode all of memory implicitly?
The step function does not encode memory. The step function is a function that takes memory state 1 as input, and outputs memory state 2 as output. The step function itself doesn't change, it stores no information.
In physical terms, the step function is the CPU, minus the caches, registers and other modifiable state. Memory is the disk and ram and all the CPUs caches and registers. You can't tell me anything about what the input to the CPU is, or what it is actually doing (other than that it is capable of running arbitrary x64 assembly), without looking at the state.
Computation can be seen as some act of carrying out logic. And in turn, logic can be approached via entropy. There are a lot of people working in QM, computation and entropy. But the simpler definition is just via partitions (and you especially don't have to insist on it being in the world of QM either). 
(Sorry about replying so late, I know it's unlikely you will see this, oh well)
Well, I did not suggests that.
> From a mathematical perspective the tape, the registers and instruction tables are all implementation details.
No, they are used to define concepts such as space complexity for Turing machines and there is something to learn from Turing machines, that's why people are studying them.
Grammatically "computation" is "The act of computing" where computing is a verb.
"Memory" is a noun, and not one about an act. It's more like saying "clothing".
A function can be seen both as the set of all pairs of the function, or as an oracle that you feed input to and receive output. The first is to me the full memory space, the second is maybe computation.
I'm not sure this is really certain. But that uncertainty is more because it's hard to prove that there isn't another means of recording memory than something we know.
There has been speculation that there might be some genetic or biopolymer based system in the hippocampus, but I'm not familiar with anything more than speculation on that topic.
For me, an analogy that is quite adequate in computer science would be LUTs in an FPGA: they are memory+compute units, and can be used as both.
To go a bit further, any memory could indeed be considered a computation unit, or vice-versa: consider a results cache, for instance. The difference I see between memory and computation is that memory accesses are, if not instantaneous, at least constant-time. If you want more precision, this can be computed: the necessary memory footprint is extended trough time as well as space (if you only use a results LUT, you'd have to make it bigger to gain precision).
Error rate in the context of computing does not need to mean invalid, just different. Our astonishingly high abilities to compensate for variations in input data do not result from an ability to produce output data repeatably.
My intuition is that such a system would be much harder to reason about -- and therefore harder for compilers to emit efficient machine code for -- but I'm assuming someone here knows the topic pretty well?
(Before you give me "the lecture", yes, I'm aware that in general it's not a good idea to simply mimic the kludges that evolution came up with.)
Can you link sources supporting that ?
As far as I know any comparison between a computer and a brain is flawed from the get go.
edit: and to be clear my initial comment was just hinting that we can't develop a "brainlike" computer architecture because we simply don't know how the brain works at all.
We could totally botch it though and end up with an even worse computer and turns out it works nothing like the brain. Who knows. But right now many people "believe" the brain may function in the manner described.
We know that the Earth can only be one shape; we know there are an infinite variety of shapes that something can be; and so our priors contain a set of all the claims like "the Earth is potato-shaped" and "the Earth is a doughnut" all with extremely low probability, before we encounter any evidence at all, just because the probability-mass has to get spread out among all those infinitely-many claims.
Assuming continued lack of evidence either way, a claim like "the Earth is not [one particular shape]", then, doesn't require argumentative support to be taken as a default assumption (as you might in e.g. the opening of a journal paper.) The probability of it being any particular shape started very low, and we've never encountered any evidence to raise that probability, so it's stayed very low.
(Yes, that even applies to the specific claim that "the Earth is not an oblate spheroid." If we never encountered any evidence to suggest that claim, then it'd have just as low a default confidence as any of the other claims it competes with.)
For claims with no evidence either for or against them, the analytical priors derived from the facts about the classes of claims to which the claim belongs, determine where the burden of proof lays. Low-confidence priors? Burden to prove. High-confidence priors? Burden to disprove.
In this case, we already know that neurons can do several things, and AFAIK we've never encountered any evidence of neurons having specialized functionality, or any evidence that neurons don't have specialized functionality. Our tools just aren't up to telling us whether they do or not. But, because one claim actually factors out to several claims (neuron specialization → lots of different ways neurons could specialize) while the other claim doesn't (neuron generality → just neuron generality), the probability-mass ends up on the neuron-generality side. (This is another way to state Occam's Razor.)
Mind you, this might be entirely down to our inability to study neuronal dynamics in vivo in fine-enough detail. In this case, our lack of evidence doesn't imply a lack of facts to be found, because we have no evidence for or against this hypothesis. Instead, it just determines what our model should be in the absence of such evidence, until such time as we can gather evidence that does directly prove or disprove the specific hypothesis.
Or, to put that another way: if humans only ever studied bees from a distance, the default hypothesis should be that all bees do all bee jobs. The burden of proof is on the claim that bees specialize. Later, when we get up-close to a beehive, we'd learn that bees do specialize. But that doesn't mean that we were incorrect to believe the opposite before. Both our belief before the evidence, and our belief after the evidence, were the "correct" belief given our knowledge.
I have no special expertise otherwise; if you want more substantiation or wish to dispute that point, you could post a reply where the claim was originally introduced.
I don't think these parallels between the computer and the brain are valid though. The brain is a highly parallel but slow machine, we have very fast and not-so-parallel computers (just a few cores). I think a lot of what the brain does can be expressed on computers without changing the architecture much, i.e. you don't need to compare the architectures, just the outcomes of computation.
Don't forget humans are one of the most expensive resources and nothing can really replace them for even a vast majority of non-physical jobs. So if we could build computing devices that potentially solve some of these problems, needless to say, the productivity boosts would be plentiful.
Secondly, we aren't even close to being able to model anything relatively close to the computational capabilities of the human brain because we don't even understand the human brain. So your comments on highly parallel but slow and fast but no so parallel don't make a lot of sense.
For example take MapBox's new vision SDK. It's able to perform semi-decent feature extraction on the road while people drive via a camera. Well guess what I would absolute stomp the vision SDK on accuracy for every feature it thinks it identified, not only that, I am capable of identifying an order of magnitude more features than it can, not only that, but I'm able to identify new features on the fly and even guess with greater accuracy what they are.
So yeah, there are a plethora of functional yields that we have yet to achieve with some of the most powerful computers in the world that are achieved by the human brain every day. Which could be indicative of maybe both a resource, but also an architecture problem.
I suspect our recognition abilities may in fact be worse than that of a good SDK but we have the advantage of a general real world experience. Bit of a simplified example, in order to recognize a dog you should have seen a lot of various animals, have a basic knowledge of anatomy, animal behaviour etc. A lot of the times recognizing the context helps too: e.g. something on a lead ahead of a walking human in the street, likely a dog - you need only a very quick confirmation to say it's definitely a dog, etc.
I also think the brain does a lot of tree searches with optimizations (which are never perfect), and it's where the brain's parallel architecture proves to be beneficial.
Intuitively though, modern computers are still not powerful enough to perform the same tasks albeit mostly sequentially. I believe we'll get there and I think AGI in a simplified virtual/gaming environment is the best place to test our approaches.
How does this square with the fact that people can get large portions of their brain scooped out and still retain many of their previous abilities?
This is a naive assumption, but if there were purely a 1 to 1 correspondence between a neuron and any given task, I’d assume you wouldn’t be able to recover from massive brain damage. But it seems like people can retrain the functional parts of their brain after damage to other parts to pick up the slack somehow.
There are some really interesting changes that can potentially happen when the memristor type of memory becomes available, possibly with its own problems too, but with the huge benefit of moving less data.
You can do this in C++ right now: mmap a file to a memory region, and just create structs in that region.
This has two problems:
- normally you want to save at controlled points in time, otherwise you have to worry about recovering from states where some function updated part of the data and then crashed (phone ran out of power etc)
- just writing a bunch of internal data structures into a file used to be moderately popular and has great performance, but it's a major headache when you ever update them. You end up implementing a versioning scheme and importers for migrating old files to your new application version. At that point a file format that is designed for data exchange is less headache.
In general mmap already offers you a way to treat your disk like memory with decent performance (thanks to caching), and the number of good use cases turned out to be somewhat limited. I doubt just making that faster with new technology will change much.
So the whole structure in its native form rather than the contents of the json file, for instance a graph would reside in memory and could be operated on directly.
Hence the 'serialized' in the part that you quoted. Once you serialize it the whole thing becomes hamburger and needs to be parsed again before you can operate on it.
This is a more common technique than most people suspect. It's taught in most operating system courses . It's the basis for how SSTables (the primary read-only file format at Google, and the basis for BigTable/LevelDB) work, as well as for indexing shards. It was how the original version of MS Word's .doc files worked, and was also why it was so difficult to write a .doc file parser until Microsoft switched to a versioned serialized file format sometime in the 90s. I think it's how Postgres pages work (the DB allocates a disk page at a time and then overlays a C structure on top of it to structure the bytes), but I'm not familiar enough with that codebase to know for sure. It's how zero-copy serialization formats like Cap'n Proto & FlatBuffers work, except they've been specifically engineered to handle the backwards-compatibility aspects transparently.
It has all the problems that wongarsu mentions, but also huge advantages in speed and simplicity: you basically let the compiler and the OS do all the work and frequently don't need to touch disk blocks at all.
You could do it with relative offsets if you wanted, that's getting pretty close to pre-heating a cache from a snapshot. That way the file would not contain pointers but you could still traverse it relatively quickly by adding the offsets to the base address of the whole file.
Base+offset segmentation does have an overhead since you'd need some extra CPU registers for that if I understand it correctly.
History turned otherwise, unfortunately...
As noted in the article, HP says that memristor memory may be commercially available by 2018, so I guess we'll see then....
Fortunately it pulls back towards a "what are the easy wins" approach: in-memory initialising (rather than stream zeroes over the bus, ask the memory to zero itself) and in-memory copy.
The three real problems are:
- this relies on the data being in DRAM-row-size chunks, and moving it around within the same physical chip. That may also require re-architecting to make that common enough to be useful. Locality issues are the downside to distributed systems; the whole CPU architecture is oriented towards continuing to pretend that everything happens in a defined order in a single place.
- possible security issues (rowhammer?): may seem far off, but if you don't think about it upfront it gets very expensive later.
(Worth recapping: the silicon is fundamentally differently-processed for DRAM, such that implementing complex logic is slow and expensive.)
Text encoding/decoding wastes massive amounts of energy, which is why I've been developing a format over the past 2 years with the benefits of binary (smaller, more efficient) as well as text (human readability/editability): https://github.com/kstenerud/concise-encoding
I agree, and in .NET that’s a solved problem: http://const.me/articles/net-tcp/ (2010).
Their text format is XML. Their binary version takes small fraction of bandwidth, especially when using a custom pre-shared XML dictionary. Binary is much cheaper to produce or parse, the built-in DataContractSerializer understands both formats. Supported by all modern editions of .NET including .NET Core and is compatible across them, e.g. I’ve been recently using it for a network protocol of a Linux ARM device.
Exchanging text makes the format easier to debug and accessible to humans, which can be a purpose in itself.
Human accessibility is an absolute must for any modern format.
The implementations are almost done now, and my first tool will be a command line utility that reads one format and spits out the other, so that you can take a binary dump from your production system using tcpdump or wireshark or whatever, and then convert it to a human readable format to see what's going on. I'll probably even put in a hex-reader so that you can log the raw message and then read it back:
2019-10-02:15:00:32: Received message [01 76 85 6e 75 6b 65 73 88 6c 61 75 6e 64 68 65 64 79]
$ ceconv --hex 01 76 85 6e 75 6b 65 73 88 6c 61 75 6e 64 68 65 64 79
nukes = launched
The project still seems cool, I'll have to have a deeper look into it soon
Protobufs are neat but `protoc` is an absolute nightmare. I would warn against going down the path of transpiling to lang-specific message files if at all possible (I don't think you are taking that approach, but github freezes my phone browser, I'll look more on my dev machine)
Great stuff! I would love to see this get more traction. Interchange formats are a classic victim of critical mass necessity.
* Doesn't have a sister text format for human editing.
* Uses raw uncompressed types for integers and floats, which wastes space
* Doesn't support arbitrary sized/precision numeric types
* Timestamp format uses seconds instead of Gregorian fields (which means that your implementation must keep a leap second database, and time conversions are complicated)
* No time zones
* Doesn't have metadata, comments, or references
* Container types have an unnecessary length field
* Stores data in big endian byte order (an unnecessary expense since all popular modern chips are little endian)
* Wastes an extra bit for negative integers
* 32-bit limitations on length (so max 4gb payload sizes)
* No versioning of documents
Regarding the point about having a sister text format, I suppose I'd tend to view JSON as filling that role. Not to bother you for a second comparison, but if you happen to have it handy seeing how CTE compares to JSON in terms of encoding efficiency would be interesting to me.
Edit: I found a list of MessagePack <--> JSON limitations that I hadn't really thought about! (https://github.com/ludocode/msgpack-tools#differences-betwee...)
It seems your format is yet another attempt at reinventing ASN.1.
That's why I split it into compatible text and binary formats. You transmit/store in binary, and read/edit in text, and the computer deals with the conversion automatically.
I can see with a few more small parts an application which persists as bson, streams / tempfile / and/or mmaps to yaml/hcl/whatever you like, opens that in your editor of choice, and translates back to bson.
Years ago it cost 8pJ/mm to move a byte on-chip. It's probably closer to 5pJ/mm now, but ALUs consume a fraction of this energy to do something with that byte. And once you leave the chip and hit the memory bus, things get _really_ slow and expensive.
100% of the Mobile Phone Energy Usage during Internet Browsing, 50% goes to Display, 30% goes to Radio Transmission and 5G Encode / Decoding. Your Three SoC + NAND + Memory only uses 20%, I just think title is a little misleading.
Not to mention the Software we are running could be anywhere between 2 to 10x from maximum efficiency. It is just the human cost involves to get to the point might not be worth it.
There is also Moore's Law, while we might be near or already at the end of it, we still have few more nodes to go and could reduce the energy usage of SoC + NAND + RAM by 30% to 50%.
While on the Display and Radio is isn't so clear how far we could further reduces it.
Mobile phones are essentially leaf nodes in a huge distributed system, where they interact with "downstream dependencies", ie. the servers that provide all the functionality of their apps.
So it makes sense that moving data would use the most power.
An order of magnitude increase in page size and energy consumed, and it takes 5 times as long to load, with 5 times the ad requests.
1. Pretty much every website needs JS to function as intended.
2. Delivering fully marked up HTML to the client is a waste of server resources and an under-utilization of 4 GHz client resources.
But all that said, seriously, a lot of current websites don't require JS to function, and if you use an extension like uMatrix, it's trivial to re-enable it on the sites that do.
Most news sites I visit work without JS, it will speed up a number of sites dramatically, and (when combined with a few other settings) it can be a huge privacy increase too. It's worth considering, particularly for portable devices like laptops.
And this kind of entitlement is why everybody hates the web noawadays and is running adblock on everything.
What executes on MY resources is up to ME--not somebody else.
If what you're doing isn't directly helping ME, burn your own resources.
The simple example is a website that just delivers information.
She does care however if the delivery is more resource-intensive and slower.
Modern webpages use all the latest best techniques, and lose the performance race by miles, over and over again. Nobody is interested or enabled to clean up the mess, just to try to add more big-bang optimizations on top, after we get a few more features which users hate crammed on top ...
The truth is we had the technology for extremely efficient computation 10 years ago. New technology is nice, I like it, but we don't need it. But also, it doesn't matter. Simple gluttony and sloth will overwhelm any efficiency improvements.
True, but I prefer them to function better than intended.
2. The server doesn’t have to run on a battery small enough to put in my pocket.
Page size: 1.86 MB vs 0.05 MB
Energy used: 0.109 Wh vs 0.003 Wh
Here are the results for NY times:
Barely drops page size or power usage.
> For now this test only displays the estimated Watt Hour for transferring the bytes of the web page (source). Data transfer is not all it takes to run ads, there is also data crunching happening on servers and rendering of the ads on clients. This means the estimated Wh could actually be higher, but I need more sources for that. Please email me!
Given that the page takes an extra 3 seconds to load with ads, power usage is probably higher in reality. On the data side, if you look at the screenshot for both pages, you'll also notice that the blocker-free Chrome version has a giant white space at the top where a banner should be.
My guess is that the banner hasn't loaded yet, but webtest didn't know to wait for it.
In my own (very, very unscientific) test, I loaded up NYT in Firefox (which has my standard extensions installed) and in a fresh Chrome install. The Chrome install downloaded 2.9 MB for ~170 requests, and my Firefox install downloaded ~240 KB for 17 requests.
I left the both browsers open for about a minute, and an extra .3 MB got downloaded on the Chrome side as part of a tracking ping, so it's not just page load either that's a concern here -- every minute you stay on the page you'll leak more data.
To be clear, the NYT is great at optimizing page load with ads. If anything, I would consider them to be a positive anomaly. But even so, I'm seeing ~80% bandwidth savings over a fresh Chrome install. I suspect my extensions are more aggressive than uBlock Origin is by itself. But the page still loads and works fine for me in Firefox, nothing appears to be broken.
Still, this doesn't sound right. Screen should be dominant.
I clicked expecting a paper about 4G networking.
A long time ago I was a technical product manager for, among other things, this stuff. At the time, the screen completely dominated and the GPU was a distant second, unless you were playing a 3D game, in which case it used almost as much as the backlight. CPU and radio were a long way behind them.
My intuition is that this hasn't changed, as I can still get dramatically better battery life on my new iPhone in iBooks by playing with the brightness slider.
The GSM radio is the highest power component (about 400mw), when it is working continuously (phone call).
Related: a big contribution of smartphone OS optimizations and stuff like wifi 6 is about being smarter about turning off the radio for micro time slices when not needed.
Laptops are similar, but it depends a lot on whether you are using a 15W CPU Ultrabook with no GPU, or 65W mobile workstation CPU plus a GPU.
Closer to memory compute is what we have been doing for last 10 years, and that's why we have ever increasing hierarchical caches.
Yes the ALUs are a tiny part of the power budget because moving / syncing data is the hard problem.
If you want massive parralelism, use a GPU.
The in memory clone and zero could make some sense, but generally you want to start writing to that memory soon after that, meaning you still need to pull it into your cache and the benefit is negated.
For instance try deserializing a million JSON strings with a GPU. The end result is a graph like memory structure which GPUs usually struggle with. GPUs cannot take advantage of sharing the instruction stream here because it will hit branches for almost every single character and diverge quickly. And finally if you only want parse one JSON object then the GPU is worthless.
A PIM based solution would not struggle with heterogeneous non-batched workloads with arbitrary layout at all but still offer the same performance advantages that GPUs enjoy compared to CPUs and reduce energy usage at the same time.
The GPU doesn't have all of your data stored in it. The clone and zero part have a very weak reasoning, focusing on bad targets because they look better than important ones, but take a look at the next section, where they do a simple database search.