Hacker News new | past | comments | ask | show | jobs | submit | gildas's comments login

No T-spins [1], impressive!

[1] https://harddrop.com/wiki/T-Spin_Guide


No T-spins, disappointed.

It looks like the bot just beat humans because they react faster


yes but this aligns with the initial statement in the post that top 1 doesn't happen as much and top 15 is more common.

I must admit I'm a little disappointed that the bot isn't able to do this. It would be much more efficient.

A <noscript> script would be even more suitable, but I agree with the principle. I added a link to view the demo without downloading the file, see https://gildas-lormeau.github.io/Polyglot-HTML-ZIP-PNG/demo.... (it was not working previously because GitHub serves pages in UTF-8).


Thanks, it was an error on my part, now corrected.


Indeed, for example the HTML of the files used for the presentation slides [1] use <noframe> tags to keep the HTML well-formed. This point is addressed in the conclusion of the presentation.

[1] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG/raw/...


Here is the demo file (cf. the first paragraph and the end of the article): https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG/raw/...


This opens a download dialog for me rather that render the html (in firefox on android)


This is done on purpose, so you can rename the file to make sure it's polyglot.


Thanks, on an actual computer it's easy to check :)


For the record, I've just added a link to view the file without downloading it.


A screenshot would help


A screenshot of what? It just looks like a normal web page.


Note that if you're on iOS, it's possible that the HTML page doesn't work at all because when it's opened from the filesystem, it's displayed by a viewer which doesn't support JS instead of Safari.


You're right, SingleFile (which is capable of saving pages in this format) does a little better than the demo, but it can also be optimized. In fact, I chose the JSON format to keep things as simple and didactic as possible for the presentation. I think I need to use your suggestions to optimize this structure in SingleFile ;)


Note that you can also take advantage of the fact that a ZIP can be password-protected and make your web page secret! For example https://gildas-lormeau.github.io/private/ (password: "thisisapage").


If you are loading external libraries like in this example your encrypted data is at risk. It would be better to include the decryption code directly in the Js or embed Js zlib.


It's possible to define the Content Security Policy with a <META> tag in the "bootstrap page" and prevent this kind of security issue, e.g. <META http-equiv="content-security-policy" content="connect-src 'self' data: blob:;">


I don't think that will prevent data exfiltration. Malicious javascript could create e.g. an img element with the data to exfiltrate stored in a query parameter of the image URL.


The request will be blocked by the CSP.


Why? The CSP policy isn't setting default-src or img-src. So image loads are allowed from everywhere.


That was just an example of syntax, nothing prevents you from blocking more resources and sandbox the page.


If we make it strict enough to block exfiltration, it'll block the external libraries from loading. So that means we have to load our scripts from the same origin instead of external origins (as jclarkcom suggested).

But the whole reason for CSP was to allow us to use external libraries without exfiltration risk. If we stop using external libraries, then our motivation for using CSP is gone. So CSP is useless for the purpose of this conversation.


I think there's been a misunderstanding, there was an error in the article suggesting that zip.min.js is not inlined in the page. This error has been corrected meanwhile. I'm sorry for that. The goal is obviously to create pages that work offline, as shown in the demo.


source integrity is probably the more applicable feature for gp’s concerns


You can also use the SubtleCrypto API


SingleFile respects your privacy, you can find the privacy policy here [1].

[1] https://github.com/gildas-lormeau/SingleFile/blob/master/pri...


I took another approach in SingleFile by offering a way to save pages as self-extracting pages (i.e. ZIP/HTML polyglot files), see [1] for more info.

[1] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG


Really cool stuff there, how did you read the bytes back? Normally you get a CORS error if you try to use a network request to read back yourself.


The saved page is encoded in windows-1252. It includes "consolidation data" to read the ZIP data as text from the DOM and recover the replacements of \r and \r\n occurrences (this is the only data loss and it represents approx. 1% of ZIP data), see the links below for more info.

https://gildas-lormeau.github.io/Polyglot-HTML-ZIP-PNG/en-EN...

https://gildas-lormeau.github.io/Polyglot-HTML-ZIP-PNG/en-EN...

https://gildas-lormeau.github.io/Polyglot-HTML-ZIP-PNG/en-EN...


If "CR" is the only bad byte, that means that 255/256 of the symbols are okay to use. That beats UTF-16 embedded in a string, where only 63481/65536 of the symbols are okay to use.

My approach was to use very large integers. You can split the input file into blocks of X bits, then represent that block as X+1 bits. The output is bigger because it can't have any forbidden bytes in there.

For the case of 255 of 256 symbols, packing 1415 bits of data into 1416 bits of space is the most efficient block size (before reaching a ridiculously large size) at 0.0706215% expansion. (For an infinite block size, you'd have an expansion of 1 - (log base 256 of 255), or 0.070582%)

Encoding: Turn 1415 bits of data into a very large number. Repeatedly divide and modulo by 255, giving a range of 0-254. Then add 1 to all bytes "CR" or larger. Now you have 1416 bits of encoded data, which cannot be "CR".

Decoding: Read a byte, decode back to 0-254 by subtracting 1 if it's greater than "CR". Multiply by 255 and add to your big number. At the end, you'll have a really big number that holds 1415 bits of data. This would be 177 big multiplies, and 177 big adds.

Decoding (the faster way):

Javascript uses floats, but you can treat them as 48-bit integers. Just watch out for the bitwise operators, they will truncate results down to 32 bits. That means use actual multiplication and division instead of bit shifting.

6 bytes at a time: 48 bits can hold 6 bytes. With normal floating point math, you can multiply each byte by 255^0, 255^1, 255^2, 255^3, 255^4, 255^5, and sum them together. Then you multiply-and-add these 6-byte chunks to a big int. Then the operations afterwards use big ints. First 6 bytes get multiplied by 255^0, next 6 bytes get multiplied by 255^6, then 255^12, 255^18, etc. Whole thing is summed together. This cuts it down to 30 bigint multiply-and-adds, (30 multiplies and 30 adds)

Homemade bigint: It's an array of doubles, but used as 48-bit integers. Compared to the actual BigInt, it removes all allocations, and you can access the bits inside directly, speeding up the part where you extract bits from the number. Only mathematical operation required for decoding is the "multiply and accumulate" operation. Using the homemade bigint sped things up dramatically.

---

So then, that's a lot of math just to avoid escaping (or fixing up) your bytes, but I think that would get close to the minimum possible expansion.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: