Hacker News new | past | comments | ask | show | jobs | submit | iraqmtpizza's comments login

meh. memory address is the ID

Airline booking numbers used to just be the sector number of your booking record on the mainframes HDD.

That’s why they were constantly recycled?

This is such a simple scheme.

I wonder how they dealt with common storage issues like backups and disks having bad sectors.


They're likely record based formatting rather than file based. At the high level the code is just asking for a record number from a data set. The data set is managed including redundancy/ECC by the hardware of that storage device.

My jaw just hit the floor. What a fascinating fact!

source?

Thomas' is not grammatically correct in any version of English that I know. It's not plural. There is no special rule for that. Both the street signs you mentioned are at least grammatically coherent


When a word ends with an s, the use of an apostrophe without another s is valid English.

Thomas’ and Thomas’s are the same thing.


I actually had to look this up, it depends if the possessive form is actually said with one or two Ss. For instance, it's Jones's and Bridges'.

https://grammar.collinsdictionary.com/easy-learning/what-are... https://www.sussex.ac.uk/informatics/punctuation/apostrophe/...


lol no. arguably the word ain't is more proper English than Chris' or boss'


It is trivial to find more than sufficiently authoritative source that cover the rules that make "Chris'" and "boss'" perfectly value contracted possessives in English. [1]

However, it's English: there isn't just one rule, another rule can also be valid and might be the one you're familiar with on a day to day basis. That doesn't mean any other way to say or write the same thing is wrong, it's just a pattern you never saw. Like someone going "lol snuck isn't a real word, it's sneaked!" and then you hand them a dictionary and they learn something new about their own language.

[1] https://www.ox.ac.uk/sites/files/oxford/media_wysiwyg/Univer...


Page not found while accessing your link.


You pronounce both of your examples with two S phonemes at the end. Putting that in the written form is absolutely Ok.


> You pronounce both of your examples with two S phonemes at the end.

Um... what? Pronouncing a possessive suffix with /səs/ isn't valid anywhere. The only possibility is /səz/. Same goes for the plural suffix.


Being confident doesn't change the fact that you're misinformed about English grammar.


Lol yes though?


Confidently wrong.

> Some writers and editors add ’s to every proper noun, be it Hastings’s or Jones’s. There also are a few who add only an apostrophe to all nouns ending in s.

> ..One method, common in newspapers and magazines, is to add an apostrophe plus s (’s) to common nouns ending in s, but only a stand-alone apostrophe to proper nouns ending in s.

> Examples: the class’s hours; Mr. Jones’ golf clubs; The canvas’s size; Texas’ weather


"Some writers" say that the French Foreign Legion has been deployed to the front lines in Ukraine too. We must take this very seriously, then!

Some writers write things like Four Fat Harvard Girls Lose Book Bag too. They use sentence fragments. They try to save ink by doing weird shit. Professional Buzzfeed writers write AF (yes, in caps) to mean as fuck. The Atlantic used the words electroöptical and rôles in 1940. Just because some minimum-wage burnout or penny-pinching editor breaks a rule doesn't mean that the rule doesn't exist.

If you go around the office saying that someone drank out of the boss' mug, they'll think you're fresh off the boat. Not only is it wrong in written English, it's not even accepted colloquially in spoken English, anywhere. And so it makes perfect sense that the written form would reflect the pronunciation.

Saying Texas' weather out loud just confuses people into thinking you're using it as an adjective when what you're really doing is trying to sound smart when you're actually sounding dumb. If you point at a book and say, that's Chris', you sound like you have brain damage. How is the book Chris'? Chris is a person, not a book! The only reason that people don't correct you is that they're being polite. And people misspell words all the time and the world doesn't cave in. That doesn't imply any particular thing about English grammar.

Another commenter found that you can say Jeff Bridges' because this is an irregular case to avoid saying the same sound twice—an exception which proves the rule (and also, I don't think it's irrelevant at all to point out the fact that Bridges is literally a plural noun made into a name). But Thomas is decidedly not in this narrow category. His source even uses Thomas' as an example of what not to do, lol. Normally I wouldn't dumpster someone this hard but hn rate limits so I may as well lengthen my response. Nothing personal.


Rules, especially in English, are not grounded in any agreed-upon authority and never have been, tracing all the way back to Chaucer more-or-less codifying the written form of the language by simply writing one story that then became the most popular English printed work for a generation.

Try not to lean too hard on how other people use the language just because it ain't how you use it. Makes you look outta touch with the way folks are playin' around with one of our shared human comms protocols, neh?

> but hn rate limits

Not in general. Only if you've proven yourself to be a poster for whom the mods think that rate-limiting you improves the health of the discourse 'round these parts.

Being someone who is also rate-limited. ;)


> it's not even accepted colloquially in spoken English

How the hell would you hear the apostrophe in "spoken English"?


That's exactly his point. You don't hear the apostrophe but you do hear the "s," meaning that Thomas and Thomas' cannot be distinguished. And so Thomas's must be used instead.


The spoken and written language are not the same thing. Even if you say "Thomas's," sources disagree on whether you write "Thomas's" or "Thomas'", because the latter is more consistent with the rules for other ends-in-s words and, therefore, easier to remember.

(My personal prediction: give it 100 to 200 years and we're going to drop the trailing 's' in all these cases. "Cat'" will just be pronounced "cats" and understood to mean "an adjective indicating the noun is owned by the cat").


As the OP said, "Thomas'" is pronounced "Thomases".

"Thomas's" means "belongs to Thomas". Pronounced the same, but spelled differently, because it is a different word.


Thomas’ is pronounced the same as Thomas’s.


Now you've provided an explanation I can see you're right (but downvoted). We would say Thomas-es in the possessive so it's written Thomas's.


It is easy to observe that this "rule" is false. Even though everyone pronounces it "thomases", some spell that "Thomas's", others spell it "Thomas'". It is a purely stylistic spelling difference, and both forms are in common use, in literate environments. So, there is no one rule about how this word is spelled. And since neither form reflects the pronunciation, both are purely conventional, they don't have a much deeper meaning to lean on.


You’re wrong. It’s definitely correct, but even native English speakers get the Rules of Apostrophe wrong all the time.


It's an older rule that's falling out of style, but it is real. Until 2017 Thomas' was correct by the Associated Press stylebook.


I do like Thomas'® English Muffins though.


Associating my use of Apple products with Apple Computer overlooks the diverse and global nature of computing communities


Australian submarines don't need better propulsion because demographics of ukraine, yes


If you throw the same question at it 15 different ways, it can eventually give you ideas for optimizations that you probably wouldn't have thought of otherwise. It knows parts of APIs that I've never used. ByteBuffer#getLong, ByteBuffer#duplicate, StringBuilder#deleteCharAt, RandomGenerator#nextLong(long)


LLM can't find a missing ampersand? Sad!


In 1993 or so g++ couldn't do it, but I suspect that all c++ compilers today would. Then why would I need anything else? However the point is that when dealing with proprietary hardware devices you occasionally get a situation where the incantations as "documented" should work, but they don't, and the usual software process diagnostics are silent about it. Some domain specific experienced creativity is required to coax a response to begin finding the illness. Yes you can pay for support and escalate but small shop management sometimes is a bit hesitant to pay for that.

I am very curious how an LLM is supposed to be trained on situations whose context does not exist on the open internet.


Read this about ten years ago. Couldn't find it on HN, though.


They shouldn't be non-existent. Zip-then-encrypt is not secure due to information leakage.

EDIT: also, it's not safe—message length is dependent on the values of the plaintext bytes, period. i'm not saying don't live dangerously, i'm just saying live dangerously knowing


The information leakage problem occurs when compression is done in the TLS layer, because then the compression context includes both headers (with cookies) and bodies (containing potentially attacker-controlled data). But if you do compression at the HTTP layer using its Transfer-Encoding then the compression context only covers the body, which is safe.


It can still leak data if attackers can get their input reflected. I.e. I send you a word, and then I get to observe a compressed and encrypted message including my word and sensitive data. If my word matches the sensitive data, the cyphertext will be smaller. Hence I can learn things about the cipgertext. That is no longer good encryption.


What you are talking about is generally referred to as the "BREACH" attack. While there may theoretically be scenarios where it is relavent, in practise it almost never is so the industry has largely decided to ignore it (its important to distinguish this from the CRIME attack which is about http headers instead of the response body which has a much higher liklihood of being exploitable while still being hard).

The reason its usually safe is that to exploit you need:

- a secret inside the html file

- the secret has to stay constant and cannot change (since it is adaptive attack. CSRF tokens and similar things usually change on every request so cannot be attacked)

- the attacker has to have a method to inject something into the html file and repeat it for different payloads

- the attacker has to be able to see how many bytes the response is (or some other side channel)

- the attacker is not one of the ends of the communication (no point to attack yourself)

Having all these requirements met is very unlikely.


Do you often send raw bitmaps for the same reason?


Do you often get completely pwned and have your encrypted calls transcribed by people eating doughnuts because you thought it was safe to compress sensitive data before encrypting? https://web.archive.org/web/20080901185111/https://technolog...


zip-then-encrypt leaks information about the plaintext. if it's life or death, better not to compress at all


Only when the attacker can choose part of the plaintext and do the same thing over and over again with different chosen plaintexts to compare results.

Yes, there are scenarios where that matters. However the vast majority of usecases of utf-8 don't fit that or even use encryption at all.


That is not the only way. There are other ways of knowing partial contents of files and changes to files, depending on the situation. If the document is a known form in which one of five boxes is checked by the sender, it's probably not hard to rule out certain selections based on the ciphertext length, if not pin down the contents exactly.


I'm not sure i entirely understand your example (if there are 5 checkboxes and 1 checked, presumably length would be the same regardless which one of those are checked). However to your broader point, i agree there exist scenarios along those lines (e.g. fingerprinting known communication based on length), however most of them apply even better when not using compression.


The checkbox example is completely plausible. There is no guarantee that all checkboxes lead to the same number of bytes changed in the file when checked. What if the format makes a note of the page number wherever a checkbox is checked? 1X could be two bytes and 15X would be three.

And even if the format only stored the checkbox states as a single bit each (unlikely), compression algorithms don't care. They will behave differently on different byte sequences, which can easily lead to a difference in output length.

Also, it's already been done with voice calls with no attacker-controlled data: https://web.archive.org/web/20080901185111/https://technolog...


The attack you're referring to is not specific to compression. It's the same class of attack that can reveal keystrokes over older versions of ssh based on packet size and timing, even on uncompressed connections. Conversely, fixed-bitrate voice streams don't have the same vulnerability as variable-bitrate encodings even though they're still compressed.

The version of your checkbox example which is vulnerable without any formal data compression is when the checkbox is encoded in a field that is only included or changes in length if the value isn't the default, common in uncompressed variable-length encodings like JSON.


I'm sure that the people getting hacked care deeply about whether the attack they suffered was sui generis.

Also, zip/deflate etc was not designed to eliminate side channel leakage. Some compression schemes obviously (with padding) can mitigate leaks, but it has to be done deliberately


Any of it has to be done deliberately. The length of the data reveals something about its contents whether it's compressed or not.

The special concern with compression is when attacker-controlled data is compressed against secret data because then the attacker can measure the length multiple times and deduce the secret based not just on the length but on how the length changes when the secret is constant and the attacker-controlled data varies. This can be mitigated with random padding (makes the attack take many times more iterations because it now requires statistical sampling) or prevented by compressing the sensitive data and attacker-controlled data separately.


If your example needs additional assumptions to be a relavent example then you should probably state them when you bring up the example.


like what lol


Encryption is completely unrelated to the task at hand, which is text encoding and compressing, and text encoding is not encryption.


Huh, never heard that before. Does it leak more information than just encrypting without zipping? Struggling to imagine how this attack works.


It's an extension of the chosen-plaintext attack, and so requires the attacker to be able to send custom text that they know is in the encrypted payload. If the unencrypted payload is "our-secret-data :::: some user specified text", then the attacker can eventually determine the contents of our-secret-data by observing how the size of the encrypted response changes as they change the text when the compression step matches up with a part of the secret data. It can be defeated by adding random-length padding after compression and before the encryption step, though.


Essentially if you zip something, repeated text will be deduplicated.

For example "FooFoo" will be smaller than "FooBar" since there is a repeated pattern in the first one.

The attacker can look at the file size and make guesses about how repetitive the text is if they know what the uncompressed or normal size is.

This gets more powerful if the attacker can insert some of their own plaintext.

For example if the plaintext is "Foo" and the attacker inserts "Fo" (giving "FooFo") the result will be smaller than if they inserted zq where there is no pattern. By making lots of guesses the attacker can figure out the secret part of the text a little bit at a time just by observing the size of the ciphertext after inserting different guesses.


Encrypting without zipping doesn't leak any information about the content. You can't rule out certain byte sequences (other than by total length) just by looking at the ciphertext length.

If "oui" compresses to two bytes and "non" compresses to one byte, and then you go over them with a stream cipher, which is which:

A: ;

B: *&


This has nothing to do with compression. If you use "yes" and "no" instead of "oui" and "non" (which just happen to be three characters each) and you compress "yes" to "T" and "no" to "F" then the uncompressed text will be the leaky one.


It’s an example meant to prove the idea.


Yes, and my example was an example meant to prove the opposite idea. The point is that it is irrelevant whether you compress or not. You can leak information either way.


I leak the length of my phone call and you leak:

1. the length of your phone call; and

2. what language you were speaking; oh and

3. half the words you said

(i.e. pwned)

https://web.archive.org/web/20080901185111/https://technolog...


> you leak [a bunch of stuff]

How? Remember, the uncompressed text gets encrypted too.


It's in the article if you would bother to read it LOL. "simply measuring the size of packets without decoding them can identify whole words and phrases with a high rate of accuracy . . . [the researchers] can search for chosen phrases within the encrypted data"


Ah.

That article is about voice calls. Totally different topic. Nothing to do with UTF-8.


Cryptography noob here: I'm confused by "Encrypting without zipping doesn't leak any information about the content." Logically speaking, if we compress first and therefore "the content" will now refer to "the zipped content", doesn't this mean we still can't get any useful information?


Not OP, but 'zipping and encrypting' one thing (a file for example) does not leak information by itself. The problem comes when an adversary is able to see the length of your encrypted data, and then can see how that length changes over time - especially if the attacker can control part of the input fed to the compressor.

So if you compressed the string "Bob likes yams" and I could convince you to append a string to it and compress again, then I could see how much the compressed length changed.

If the string I gave you was something already in your data then the string would compress more than it would if the string I gave you was not already in your data - "Bob likes yams and potatoes" will be larger than "Bob likes yams likes Bob".

If the only thing I can see about your data is the length and how it changes under compression - and I can get you to compress that along with data that I hand to you - then eventually I can learn the secret parts of your data.


Encryption generally leaks the size of the plaintext.

This is true in both the compressed and non-compressed case. However with compression the size of the plaintext depends on the contents, so the leak of the size can matter more than when not using compression.

Even without compression this can matter sometimes. Imagine compressing "yes" vs "no".


> Encryption generally leaks the size of the plaintext.

Ah, I see. Naïvely, this seems like a really bad thing for an encryption algorithm to do—is there no way around it? Like, why is encryption different from hashing in this regard?


There are methods, but they are generally very inefficient bandwidth wise in the general case. The general approach is to add extra text (pad) so that all messages are a fixed size (or e.g. some power of 2). The higher the fixed size is, the less information is leaked and the less efficient it is. E.g. if you pad to 64mb but need to transmit a 1mb message, that is 63mb of extra data to transmit.

Part of the problem (afaik) is we lack good math tools to analyze the trade offs of different padding size vs how much extra privacy they provide. This makes it hard to reason about how much padding is "enough".

Another approach is adding a random amount of padding. This can be defeated if you can force the victim to resend messages (which you then average out the size of).

Hashing is different because you don't have to reconstruct the message from the hash. With encryption the recipient needs to decrypt the message eventually and get the original back. However there is no way to transmit (a maximally compressed) message in less space then it takes up.

There are special cases where this doesn't apply e.g. if you have a fixed transmission schedule where you send a sprcific number of bytes on a specific agreed upon schedule.


Yes, of course it leaks more information than encryption without compression, because that’s just encryption which doesn’t leak anything.

In an enormous number of real world cases adversaries can end up including attacker-controller input alongside secret data. In that case you can guess at secret data and if you guess correctly, you get smaller compressed output. But even without that, imagine the worst case: a 1TB file that compresses to a handful of bytes. Pretty clearly the overwhelming majority of the text is just duplicate bytes. That’s information which is leaked.


Go is not object-oriented.

Especially since Go does not have “private state”.

Guess what does? C++ and Java. Specifically private member variables, and they were inspired by Poops and Butts.

Does that make C++ and Java “Object Oriented” by the creator of the term. Not by itself, but it does make them considerably more “Object Oriented” than Go.

>Let us also ignore the fact that in the JVM you can not even implement Singleton correctly because by design the JVM can not guarantee that one and only one instance of a class is ever created.

It's called enum, t4rd. Released nineteen and a half years ago. Check it out. And let us rather ignore you while you do so.


I once had an epic discussion with Brian Goetz regarding the use of Enums as an implementation of the Singleton pattern.

Lesson learned, and the Twitter thread is now history: https://twitter.com/brunoborges/status/1068297236071747585

Goetz's reply: https://twitter.com/BrianGoetz/status/1068196389858017282


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: