Hacker News new | past | comments | ask | show | jobs | submit | nick238's comments login

I'm actually kinda curious what SG-1's production company setup was for moving around the gate to all of British Columbia's quarries and forests.

The ice clearly moves, as Europa's surface isn't just same as an airless, solid ball-of-rock's default: craters, but has all sorts of features that reshape the surface, so given enough time, it'll equilibrate. (What 'enough' means is left as an exercise to the reader). There are papers[1] that discuss the ice properties, but it's hard to get a specific answer out of them. There have to be tons of research papers out there about the design criteria for melt-drill probes like this, for Europa, Enceladus, and others.

[1]: https://websites.pmc.ucsc.edu/~fnimmo/website/draft5.pdf


Wire-guided torpedos[1] and missiles[2] are fairly common, and the wire pays out from the projectile-side so it's not progressively dragging more and more. The more recent DM2A4 Seehecht torpedo[3] has a fiber-optic link, probably to reduce EM emissions or the detectability of a km's-long wire/antenna, despite being underwater.

[1]: https://en.wikipedia.org/wiki/Mark_48_torpedo

[2]: https://en.wikipedia.org/wiki/Wire-guided_missile

[3]: https://en.wikipedia.org/wiki/DM2A4


A mini nuclear-reactor-as-a-heat-source might be appropriate for a melt-drill. RTGs are a bit unfortunate as they'll exponentially decay from the time of manufacturing, and you'll need to both 1. deliver high enough power at Europa, and 2. radiate away that much power and a bit more when you're flying there.

A nuclear reactor could produce basically no heat while offline, then be switched on and suddenly provide 100s of kW when it gets to wherever it's going. The hard part in space is radiating away the heat, but if you're on an ice world, that's orders of magnitude easier.

The hardest part I'd see would just be getting into the ice; there's not really any "melting" in vacuum. The constant boiling away of the water would keep insulating your heater from the ice. Meters 1 to 20,000 are probably pretty easy.


I thought reading closed-source code is a very bad idea, as if you wind up writing something similar, then you're possibly guilty of copyright infringement. If you have one team look at the design, catalog the features, and describe how it works in detail, then you can have another team implement it, and you're on legally much better ground (unless patents):

https://en.wikipedia.org/wiki/Clean-room_design


> I thought reading closed-source code is a very bad idea, as if you wind up writing something similar, then you're possibly guilty of copyright infringement.

The idea is to avoid accidentally copying code, though TBH i think this is some sort of legend that comes from decades ago and not really practical (nor, as the article mentions, required by law). Writing something similar alone won't make you guilty of anything, otherwise people who worked as programmers wouldn't be able to work at a different company on the same or similar field again ever.


The page is timing out for me, but is it the inverse problem of the time when Steve Mould/Matt Parker measured the unknown quantity π, but already assuming a size of the molecules? Presumably Lord Rayleigh already had a at least a good order-of-magnitude approximation of pi...

https://www.youtube.com/watch?v=lmgCgzjlWO4


By 1870 pi was known to several hundred decimal digits, for something like this calculation where you have other large sources of error Archimedes approximation from 2 millennia earlier would probably be fine. (<1% error)

https://en.m.wikipedia.org/wiki/Chronology_of_computation_of...


Note that pi to 40 digits is sufficient to calculate the circumference of the observable universe to subatomic precision.


While biological structures are ultimately 'DNA-controlled', the translation of the ACTGs to codons to proteins to structures explodes in complexity. Even within that simple molecular (i.e. no microstructures, let alone macrostructures like the whole cell) realm, everything is influencing everything else all at once. Proteins will suppress or enhance transcription of others, alter how other proteins are synthesized, and so on.

Once you have some molecules/proteins, they'll assemble into microstructures controlled by other proteins and small molecules that can push/pull on 'things', and they'll be influenced by the ambient conditions: temperatures, pHs, mineral contents which they're not fully in control of, but they're the progeny of ancestors that have been doing it for millions of years in the same habitat, and they've been trained and optimized to handle those varying conditions, so they get it right most of the time.

The biology then interacts with physics and chemistry to kinetically grow faster here to make something grow outwards, slower here to make it grow inward, then you can start forming shapes.


Double your compression ratio for the low, low price of 100,000x slower decompression (zstd: 215GB, 2.2 ns/byte vs. nncp: 106GB, 230 µs/byte)!

The neural network architectures are technically impressive, but unless there's some standard compression dictionary that works for everything (so the training/compression costs amortize down to nil), and silicon architecture dramatically changes to compute-on-memory, I don't know if this would ever take off. Lossy compression would probably provide huge advantages, but then you need to be domain specific, and can't slap it on anything.


The point of the contest was not compression tech for communication and storage. It was a belief that minimizing log loss on large corpora (i.e. message length) was key to AI progress. That should remind you of some pretty hot more-recent developments.

Here was someone else with a similar pov in the 2000s: https://danburfoot.net/research.html


The issue with this paradigm (see also: Hutter Prize) was its insistence on lossless compression.

Intelligence is just as much about knowing what to throw away as what to keep. Any nontrivial cognitive system operating in a nontrivial physical environment will necessarily have a lossy model of the world it's embedded in.

Also of interest, compressionism, a theory of mind based on data compression:

https://ceur-ws.org/Vol-1419/paper0045.pdf


You can turn any lossy compression scheme into a lossless scheme by encoding the error... The lossy scheme should be aiming for low error, making the error encoding progressively smaller as the lossy scheme improves.

What's harder to deal with from a measurement perspective is sematic equivalence. calling some kinds of errors zero-cost, but not having a great way to categorize what the loss of, exactly. But it's kinda what you want for really extreme compression: the content is equivalent at a high level, but may be a very different byte stream.


Group theory is the abstraction you want


Can you explain? I'm struggling to see how group theory is relevant here


This part

>What's harder to deal with from a measurement perspective is sematic equivalence. calling some kinds of errors zero-cost, but not having a great way to categorize what the loss of, exactly. But it's kinda what you want for really extreme compression: the content is equivalent at a high level, but may be a very different byte stream.

Is basically saying

> What's harder is defining a reconstruction process in terms of a "semantic group" i.e. an output encoding and associated group actions under which the loss is invariant, and having the group actions express the concept of two non-identical outputs being"equivalent for the purposes of downstream processing".

Taco Cohen is one of the pioneers of this line of research, and invariant, equivariant and approximately iv/ev architectures are a big thing in scientific and small data ml


"an output encoding and associated group actions under which the loss is invariant"

What loss is that, exactly?

One of the difficulties in speech processing is that we generally don't have a great model for speech quality or equivalence. The human hearing system is a finicky and particular beast, and furthermore varies from beast to beast, depending both on physiology and culture. Good measures (eg, Visqol, a learned speech quality metric) tend to be helpful for measuring progress when iterating on a single system, but can give strange results when comparing different systems.

So it 's easy to imagine (say) pushing the generated speech into some representation space and measuring nearness in that space (either absolute or modulo a group action), but it begs the question of whether nearness in that space really represents semantic equivalence, and how to go about constructing it in the first place.

Let alone why one would bother allowing some group symmetries into the representation when we plan to define a loss invariant under those symmetries... Throwing group theory at speech representations feels like a solution in search of a problem, as someone who has worked a lot with group theory and speech compression.


Group theory is just a way to think about what properties a function needs to have under specific actions on the data. If you want to train a speech network that should be peak-amplitude invariant (to make a random example) you can normalise the amplitudes, modify your loss to be invariant to it or modify the network output to be invariant to it. These might have different numerical tradeoffs (e.g. one of the reasons why people use equivariant architectures with a final invariant aggregator is that it allows each layer to use and propagate more information, and one the reasons why graph neural networks are a thing is because we don't always have a canonical labeling).

All the stuff you mentioned is true, but thinking about it in an abstract sense, that means there's a set of universal symmetries and a set of highly context depending symmetries, and group theory is afaik our best method of thinking about them rigorously - as I say in the child comment, not the end point, but our current best starting point (in my opinion)


Thanks, I need to read more. I get how a semantic group could occur but it doesn't seem obvious to me that groups and group actions are necessarily the correct abstraction for representing semantic similarity.


It's not the right one on its own, but it's the closest we have right now to talk about things like "high level equivalence" rigourously. Like, one way to talk about logic is that it's e.g. invariant under double negation - so any compression algorithm that does "semantically lossless compression" can exploit that property. Deep learning then becomes an algorithm to find features in the data which remain invariant under irrelevant transformations, and representations useful for semantic similarity could then be formulated as a very big and complex composite group, potentially conditional... there's probably a new theory of hierarchal similarity and irreducible collisions/errors required, but as of now, group theory, approximate equivariance and category theory are the bleeding edge of thinking about this rigorously while still having sota benchmarks/real world applicability to my knowledge

Happy to discuss more via email as well


I tend to agree, but is that really a fatal flaw? A lossy compression scheme that gets it 90% right only has to encode the delta for the remaining 10%. Which is a big cost, sure, but the alternative is evaluating how important the thrown-away stuff is, and that evaluation is rife with subjective value judgements. There are no right answers, only differing flavors of wrong ones. It's the question of what is important to generalize over, and nobody is ever going to agree on that.


Your issue applies equally to GPT-3 (and afaik its successors): the log likelihood loss which the GPT base model was trained on is exactly the "lossless compression" length of the training set (up to trivial arithmetic-coding overhead).

The question I'd raise is why the route that got traction used more ad-hoc regularization schemes instead of the (decompressor + compressed) length from this contest and the book I linked.


> Intelligence is just as much about knowing what to throw away as what to keep.

Sure, but you can see it as two steps. Decide what doesn't matter at all, and throw it away. Then compress the rest losslessly (according to how surprising it is, basically the only way)

I think a theory of mind based on data compression is backwards, for this reason. When you have "data", you've already decided what's important. Every time you combine "A" and "B" into a set, you have already decided

1. That they are similar in some way you care about

2. That they're distinct in some way you care about (otherwise, it would be the same element!)

... and you do this every time you even add two numbers, or do anything remotely more interesting with "data". There is no "data" without making a-scientific statements about what's important, what's interesting, what's meaningful, what matters etc.


You could throw away 95% of enwiki without losing anything of value. This sounds like a joke but it would make the result worth reading which sounds like it is worth the exercise.


> unless there's some standard compression dictionary that works for everything

There is, at least for anything worth compressing. It’s called the halting sequence. And while it exists, it’s also uncomputable unfortunately haha.


Apparently there's ways of getting the first few binary digits, but it's useless at that length.


Not useless. That’s most of mathematics right there.


What precisely are you lot referring to? I'm familiar with the Halting Problem, and glancingly familiar with Ω / Chaitin's constant(s). But I'm not familiar with any kind of uncomputable sequence which we know how to get the first few binary digits out of that has profound things to say about math and is somehow related to the Halting problem. And trying to Google it doesn't help as I just get search result noise about the Halting problem itself with no reference to any special kinds of sequence.


It's Ω / Chaitin's constant for a particular encoding. It's uncomputable as a whole, but maybe we will get a couple more bits.


Alright then how does that help anyone with text compression, and/or how would computing it's digits reveal any deep secrets of mathematics?


See my other reply. It’s not Chaitin’s constant but the mutual information between ZFC and the halting sequence, which is precisely I(ZFC : H) bits.

If you had H, you could optimally compress any non-random string of data of length n in O(n) time. The rough sketch of how you would do this is to create brute-force search programs and then instead of actually running them just look up whether they halt or not via the halting sequence. By chaining together a few of these μ-operator functions, you can build up the whole compressed string 1 bit at a time.

Since we only have a few bits of H, we can’t compress at that level of magic, but what we do have still represents everything computable that ZFC can describe, which for the most part includes every algorithm we use in daily life.


> What precisely are you lot referring to? [...] glancingly familiar with Ω / Chaitin's constant

Huh? I'm not referring to Chaitin's constant; the first few digits of that aren't going to help with much. I'm referring specifically to the halting sequence H as I mentioned in my post. See Vitányi's "Kolmogorov's Structure Functions and Model Selection" or any of Levin's papers on algorithmic complexity. (Vitányi uses a definition for H where a program's input is the empty tape rather than itself, but this is mainly a difference in convention).

You can take a formal logical theory F (e.g., Q, PA, ZFC, ...) and since the proofs are recursively enumerable, iterate over all proofs equivalent to the statement "p_i halts" or "p_i never halts" in the language of F for program p_i (as indexed lexicographically on a reference universal Turing machine).

As an example, for any halting program p, an unbounded search through PA will eventually find a proof that p halts (assuming soundness of PA). But there are certain non-halting programs for which PA cannot prove the non-halting behavior whereas ZFC can. So in terms of mutual algorithmic information I(a : b) = K(a) + K(b) - K(a, b), PA's inability to prove the non-halting behavior of p implies I(p : PA) = 0 whereas I(p : ZFC) > 0 since ZFC can prove this. More specifically K(p) - K(p | ZFC) > 0, given that the description of ZFC is minimal.

The above implies that I(ZFC : H) > I(PA : H), where H is the halting sequence, so we can say that ZFC is "more powerful" than PA in its ability to prove non-halting behavior of Turing machines. But a minimal description of ZFC and PA is still a finite amount of information, and in fact Chaitin's Incompleteness theorem states that for any formal logical system F, there is a constant C_F beyond which F cannot prove any string has higher Kolmogorov complexity than C_F. I suspect C_F is pretty close to K(F), but I haven't seen a proof of that fact anywhere. This is essentially another way to state Gödel's first incompleteness theorem, with additional detail about the limit of what formal system F can or cannot do.

So when I was referring to the few binary digits that correspond to "most of mathematics", I'm talking about the I(ZFC : H) bits of mutual algorithmic information between ZFC and H. We know there is a 745 state binary Turing machine that encompasses all of ZFC’s behavior (see Scott Aaronson's posts about this), but many people think the minimal size of a machine that computes all that ZFC can prove about the behavior of Turing machines is far smaller than that.


This is a good point, and yet -- as high as position 24 (out of 208) in the list, we have "bsc-m03 0.4.0" at only ~160ns per byte compressed (compared with most above it being two or three orders of magnitude slower). Below position 24, there's one in the 20s of ns per byte. Looks like the cost was in the size of the compression program. Not sure if there was any "cheating" going on, hunting for benchmark scores with these two famous test bodies of text.

By comparison, the top contender clocks in at ~240,000 ns per byte (making your point).


We use bsc and nanozip for archiving DB backups (usually 1-100GB files) and have not found anything remotely close to these as CPU usage vs compression ratio.

    c:> bsc.exe e "%~1" "%~1.bsc" -b1000 -m4e1t -M4H20
This uses 3GB of RAM and compresses w/ about 300MB/s but is slow on decompression while

    c:> nz.exe a -cd -p2 -m1024m "%~1.nz" "%~1"
uses about 1GB of RAM and compresses w/ about 250MB/s to worse compression ratios but is much faster than bsc on decompression.

Both of these smash zstd and 7-zip on compression and speed -- something like 2x better and 10x faster.


One interesting application could be instant messaging over extremely bandwidth constrained paths. I wouldn’t be surprised if Apple were doing something like this for their satellite-based iMessage implementation.

Of course it could also just be a very large “classical” shared dictionary (zstd and brotli can work in that mode, for example).


Unfortunately, while you want to have encryption on your messages, the required encryption signature eclipses the message size.

Without signing the messages (and just using a stream cipher), you could do it, but an adversary can send garbage pretending to be you, which I don't think is acceptable in the modern world.

Also, the message headers (message length, from, to) also start to dominate and compressing them is hard (usually depending on various hardware tricks like differing CDMA keys).


For such low-bandwidth applications, you'll usually want to do the key exchange while you have a faster connection, as well as build a dictionary of expected recipients etc.

Once you have a pre-exchanged symmetric key pair and IVs etc., encryption can be done with zero overhead, and you can choose your own trade-off between authentication security and message size. Even 4 bytes go a long way (an attacker would need to send 2^31 messages over a very bandwidth-constrained channel to impersonate you), and 8 bytes make it safe enough for practically all applications.

That way, you can keep the authentication overhead very small (on the order of a few bytes), similar to how it's done for e.g. SRTP.


Depends on the application! For long term storage of rarely accessed text it might make sense



It does seem as though the training cost of these models ought to be included in their times, because the compressors are useless for any purpose other than compressing English Wikipedia.


The training cost is included because these NN compressors are running the training while they are compressing. They are general compressors, although the specific variants submitted here may be tuned a little to Wikipedia (e.g. the pre-processors).


In these cases, the models are so small, and have to run on very limited CPU, that the training time is going to be fairly negligible. The top-listed program, nncp, is just 0.06b parameters! https://bellard.org/nncp/nncp_v2.pdf#page=4


Imagine if we didn't have dedicated hardware for say caching - every CPU instruction or bit of data has to be read from SSD every use.

We'd see a 100,000x slowdown I expect.

But dedicated hardware (RAM, various cache levels) solved this.

We could do the same for neural net compression if we needed to. But the question is do enough people care enough about a 2x data size reduction?


Dedicated hardware isn't magic because there's an intrinsic cost that cannot be reduced. In the case of neural nets you have to store the weights somewhere and you have to perform the multiplications.


I get that all dicts are now effectively an `collections.OrderedDict`, but I've never seen practical code that uses the insertion order. You can't do much with that info (no `.pop()`, can't sort a dict without recreating it) beyond maybe helping readability when you print or serialize it.


Readability, reproducibility, and simple LRU.

Reproducibility matters for tests, they become simpler. Some other algorithms become simpler as well.

LRU is just a dict with preserving order: on access just delete and insert again.


Diamonds are forever, but they fare poorly when they meet a hydraulic press (source: Hydraulic Press Channel) or a chemist who wants "real diamond water" (source: NileRed).


Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: