Hacker News new | past | comments | ask | show | jobs | submit | pquki4's comments login

JSON is usually used for front end-back end communication or public API endpoints, otherwise protobuf/Thrift/Avro is commonly used in the backend for internal services (that is controlled by one organization), for very good reasons. Same for HTML -- you need to thank HTML for being able to read hacker news frontpage on a 10 year old kindle with a barely usable browser. I suggest you look all these up before complaining about nothing.


I don’t think it’s very good reasons. We could definitely have binary first serialization protocols and good tooling built into the browser. But no, we encode everything as text, even binary stuff as base64, and whack that into strings


There is nothing preventing anyone to build a whole new set of infrastructure for such "binary first serialization" 20 years ago, 10 years ago or today. We don't even need to do that much. Instead of "text/html" or "application/json", let's just use some binary format in the request headers everywhere and make both the client and server support it. Why hasn't that happened?

It's for the same set of reasons, and people aren't dumb.


All of those are slower than json in many contexts. JSON parsing and serialization is very fast!


In what context is json slower than protobuf?


Did you mean to ask the reverse question (in what context is protobuf slower than json)? Because that's definitely the question on my mind, since GP's assertion runs counter to my expectations and experience.

JSON is a heavy-weight format that requires significantly more memory for both serialization and deserialization, and the representations of values in JSON are optimized for human consumption rather than machine consumption.

Just listing a few examples:

- Strings in JSON require you to scan for the terminating quotation mark that isn't escaped. Meanwhile, in protobuf, the length of the string is given to you; you can just grab the bytes directly.

- Parsing an integer in JSON requires multiplication by 10 and addition / subtraction for each digit. Meanwhile, in protobuf, fixed64 and related types are either in host order or are just a ntohl away; int64 and other varint types only require bit twiddling (masks, shifts, etc). Do you think it's easier to parse "4294967296" from a string, or 5 bytes along the lines of {0x88, 0x80, 0x80, 0x80, 0x00}?

- Having a format agreed-upon ahead of time (protobuf) means that your keys don't require more string parsing (JSON).


The benchmarks available for protobuf generally have it parsing like 5x slower than json (and I suspect the payloads are smaller, but not 5x smaller). I don't think that the code generators shipped with protobuf generate parsers of comparable quality to simdjson, so it's a bit unfair in that sense.


Can you point to some of these benchmarks? https://news.ycombinator.com/item?id=26934854 suggests that in at least one synthetic benchmark (with a 7.5KB protobuf message which expands to a 17KB JSON payload), protobuf parsing at 2GB/s would be comparable to JSON parsing at 5GB/s.

Meanwhile, simdjson's numbers (https://github.com/simdjson/simdjson/blob/master/doc/gbps.pn...) show a peak parsing speed of about 3GB/s depending on the workload. Of course, it's not clear you can compare these directly, since they were probably not run on systems with comparable specs. But it's not clear to me that there's a 5x difference.

Perhaps my experience differs because I'm used to seeing very large messages being passed around, but I'd be happy to reconsider. (Or maybe I should go all-in on Cap'n Proto.)


> - Parsing an integer in JSON requires multiplication by 10 and addition / subtraction for each digit. Meanwhile, in protobuf, fixed64 and related types are either in host order or are just a ntohl away; int64 and other varint types only require bit twiddling (masks, shifts, etc). Do you think it's easier to parse "4294967296" from a string, or 5 bytes along the lines of {0x88, 0x80, 0x80, 0x80, 0x00}?

For this one, actually I think the varint may be harder because you have to parse it before you know which byte the next value starts on, but recently there has been some progress in the area of fast varint parsers. For parsing decimal numbers, a good start is here http://0x80.pl/articles/simd-parsing-int-sequences.html. Some users at https://highload.fun/tasks/1/leaderboard are calculating the sum of parsed base-10 integers at about the speed of reading the string from RAM, but this task is subtly easier than parsing each number individually, which may only be doable at half or a quarter of the speed of reading the string from RAM, and then you'd have to pay a bit more to also write out the parsed values to another buffer.


From the intro of your first link:

> While conversion from a string into an integer value is feasible with SIMD instructions, this application is unpractical. For typical cases, when a single value is parsed, scalar procedures — like the standard atoi or strtol — are faster than any fancy SSE code.

> However, SIMD procedures can be really fast and convert in parallel several numbers. There is only one "but": the input data has to be regular and valid, i.e. the input string must contain only ASCII digits.

There definitely are some benefits and speedups available with SIMD, but that intro doesn't inspire a whole lot of confidence in its relevance to JSON parsing, where the only case where you might have this regularity is if you definitely have an array of integers. (JSON strings are not restricted to ASCII, as they can and do include Unicode.)


I think you'd have to pay some additional copies to perform batch processing of integers in json documents in the general case. Last I checked simdjson included the typical scalar code for parsing base-10 integers and a fairly expensive procedure for parsing base-10 doubles (where most of the runtime is paid in exchange for getting the final bit of the mantissa right, which was not reasonable for our use case but is reasonable for a general-purpose library).

That said, it's not clear to me that the scalar integer parsing code should win even if you're only parsing integers individually. For inputs that have the length of the number vary unpredictably, it pays a significant amount of time for branch misses, while the vector code can replace this with a data dependency.

Edit: After writing the above, I thought that probably most documents have a regular pattern of number lengths. I don't know if this works well with branch predictors if number of branches in the pattern is pretty long (in terms of the sum of the lengths), but probably the branches cost ~nothing for a lot of real-world inputs.


Run benchmarks. It doesn’t matter what I think is easier to parse.

Protobuff is slower than JSON.parse in node by 4-8x for my data sets: large reference data needed.

I was only measuring decode time, since that’s I can recompute my data well in advance.


In browser Javascript, because there you have to load the decoder which is already slower to parse than JSON, then run it in the JS vm to do the actual decoding whereas JSON is built in and has a native highly-optimized parser.


For a script, you run it from start to end.

For a notebook/repl environment, you can create any number of intermediate steps, rerun the previous step with minor modifications and check if the results are better, rinse and repeat. For jupyter notebook specifically, you can visualize data and add markdown inline which are very useful.

You won't understand it unless you are already familiar with the workflow.


Sometimes I need to run my code in small pieces for testing and evaluation, I do this with multiple smaller scripts. Data can be saved to files and this accomplishes nearly the same thing as a notebook without needing to have the notebook environment


Sure, if you don't mind the overhead (both development and processing time) of loading/saving state to disk (about 10-30 minutes for a lot of my data). In notebooks you don't have to think about it since it's just the objects in memory (and indeed they don't make it easy to think about it, which is a reasonable criticism, but I don't currently know of a framework or system which gives you the advatages of both).


Whenever I work with large datasets I use a small subset of the overall data to do testing while I build the pipeline, this avoids long run times and allows for quick iteration while I get things set up to run against the full dataset


If there are complaints about code quality in a jupyter notebook, chances are that someone is doing it wrong -- either the author is using it for demonstration purpose, for production or near production environment, or the person looking at others' notebooks is taking them too seriously.

Most code in jupyter notebook is bad, but that's fine, because nobody should waste time doing that unless they must reproduce the results or they are sharing the notebook with others.

Notable example of high quality notebook: https://youtu.be/zduSFxRajkE


I thought Meta Horizon and the sales number of Vision Pro[0][1] already proves your thesis wrong. Even Zuckerberg stopped talking about it.

[0]https://www.macrumors.com/2024/04/23/apple-cuts-vision-pro-s... [1] https://www.macrumors.com/2024/04/22/apple-vision-pro-custom...


I am referring to the future (the actual future, not simulated ones), so it is not possible to know if I am wrong.

I predict this is yet another domain rich with opportunity for AI.


I think the point of the original comment is "take off". A non proprietary ChromeOS (that is browser first, simple to use for non techy people, easy to manage) sounds like a neat idea to me, but the names you mentioned achieved nowhere near the reputation/popularity of Ubuntu.

Maybe it is more about the business model. ChromeOS runs efficiently on low-cost hardware for a small license fee (?), and get almost guaranteed updates. In exchange, customers sell their souls (actually, data) to Google. Won't happrn with Linux. But undeniably it sells and works well enough for ChromeOS.


...if you are willing to sacrifice the "phone" part of it.


Calls and SMS work fine on pinephone.

Battery life not so much.


It works fine for me.


Fine if you don't need to use it for online banking on your phone, that's my main blocker.


My bank sends an SMS for OTP.

I know it isn't very secure. But certainly better than having a seed locally stored.


I'd rather a seed on a device I own rather than being able to be smished.


If I can't see the seed and I have no way of exporting it. What do I have?


If I can't see the person in the telco being paid off to have my IMEI reattached arbitrarily, what do I have? A person who can MITM the SMS codes.


I wouldn't call an Android device "a device you own".


Why don't I own it, explain.


I choose banks which do not require mobile banking, including virtual credit cards. You can also try Waydroid to run Android apps. I heard it worked for some banks.


You can run a full VSCode on Crostini -- the "real" Electron app -- but not on Termux. There are ways around it like using code-server on a Linux environment, but far from the same thing.


... which confirms that this has more to do with the vendor rather than hardware/OS?


Did you read this comment twice yourself to see the problem?


Personally I care about this much more than whatever AI stuff happens at WWDC.

This is the real "developer" issue we are talking about.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: