Hacker News new | past | comments | ask | show | jobs | submit | zwaps's comments login

Depends on country. In the north, no way.

What a terrible article

That's not a lot compared to 8xA100

A100's are definitely still the best, but IBM's offering isn't a joke. Depending on what you're trying to do, it might be a good way to go.

We'd probably need to see pricing for IBMs offering, because it's possible it'll be eye watering-ly high compared to buying even A100's.

TBH, I don't really think they compete in anything like similar markets.

You buy a DGX A100, or a cluster of them, for training and running large deep learning models (or for doing "traditional" HPC).

IBM's solution is more a small inference engine that is part of the CPU, so you don't need to move you data off-chip when doing a little bit of inferencing as part of some other workflow. I don't work with mainframes so I could be talking out of my behind, but maybe something like DL-assisted fraud detection as part of processing bank transactions?

> TBH, I don't really think they compete in anything like similar markets.


Also all Stat Trek Federation ships

It really isn’t

I bought a washing machine from them and it had continuous problems for several months, until I got a refund.

The technician advised me to buy from Miele, Siemens or Bosch, as Samsung apparently has lots of issues.

Printers Dishwashers Washing machines

The unholy Trinity of appliance hell. Every brand that makes these has issues. If you get 3-5 years of use out of any of them (post ~2005) you're lucky.

I'm firmly convinced that every washing machine or dishwasher brand just wants to steal from you

Not my personal experience but it helps to talk to repair guys.

I learned that a lot of machines break down because of the combination of low temperature washing and the types of soap/detergent people use. It clogs up the machines and without regular maintenance, the occasional hot wash or better soap, it destroys components.

Would love to know which soaps are good and which are bad, and also why hot temperatures help - I would have assumed that high temperatures stress the components more than low temperatures.

English is not my first language so this is going to be hard to get across.

Apparently low temperatures do not fully resolve the modern soap (which is thick and heavily perfumed), leaving behind a lot of residue. An occasional hot wash clears them.

I don't remember the name of the better soap, but basically what he says is that for clothes that are just a bit smelly but not really dirty/stained, modern soap is massive overkill. It also doesn't need all this perfume. Your clothes are fine smelling neutral, they don't have to smell like a day in the Alps.

Would really advise to talk to a local maintenance guy, they can probably explain it much better.

As a layman, my understanding is that soap (or other residue) builds up due to low temperature washing. High temperature washes break down the build up.

I believe most front loaders these days have both a self clean cycle (you're supposed to run it every month or two, it's basically an extremely long hot water rinse+spin that you don't add soap or put clothes in for), and a drain filter that should be accessible near the bottom front (expect black slime if you haven't cleaned the filter recently).


Most economists (who write these sort of textbooks) have some sort of math background. The push to find the most general "math" setting has been an ongoing topic since the 50's and so you can probably find what you are looking for. It's not part of undergraduate textbooks since adding generality gives better proofs but often adds "not that much" to insight. Nevertheless, the standard micro/macro models are just applications of optimization theory (lattice theory typically for micro, dynamical systems for macro). Game theory (especially mechanism design) is a bit of different topic, but I suppose that's not what you are looking for.

E.g., micro models are just constrained optimization based on the idea of representing preference relations over abstract sets with continuous functions. So obviously, the math is then very simple. This is considered a feature. You can also use more complex math, which helps with certain proofs (especially existence and representation).

You could grab some higher level math for econ textbooks, which typically include the models as examples, where you skip over the math.

For example, for micro, you can get the following: https://press.princeton.edu/books/hardcover/9780691118673/an... I think it treats the typical micro model (up to oligopoly models) via the first 50 or so pages while explaining set theory, lattices, monotone comparative statics with Tarski/Topkis etc.

I think everyone who comes from a different literature where academic "rigor" is higher and similar results already exist (in the author's case, he is aware of Kernel results) is infuriated by ML papers like "Attention is all you need".

They are, in fact, not really good academic papers. Finding a clever name and then choosing the most obtuse engineering-cosplay terms is not a good paper. It's just difficult to read. And so next, many well known results get discovered again to much acclaim in ML and head scratching elsewhere.

For example, yes they are kernel matrics. Indeed, the connection between reproducing kernel hilbert spaces and attention matrices has been exploited to create approximating architectures that are linear (not quadratic) in memory requirements for attention.

Or, as the author of the article also recognizes, the fact that attention matrices are also adjacency matrices of a directed graph can be used to show that attention models are equivariant (or unidentified, as the author says) and are therefore excellent tools to model Graphs (see: the entire literature of Geometric deep learning) and rather bad tools to model sequences of texts.

LLMs may or may not collapse to a single centroid if the amount of text data and parameters and whatever else are not in some intricate balance that nobody understands, and so they are inherently unstable tools.

All of this is true.

But then, here is the infuriating thing: all this matters very little in practice. LLMs work, and on top of that, they work for stupid reasons!

The problem of "identification" was quickly solved by another engineering feat, which was to slap on "positional embeddings". As usual, this too didn't happen because there was a deep mathematical understanding. Rather, it was attempted and it worked.

Or, take the "efficient transformers" that "solve" the issue of quadratic memory growth by using kernel methods. Turns out, in practice, it just doesn't matter. OpenAI, or Anthropic, or Meta simply do not care about slapping on another thousand GPUs. They care about throughput. The only efficiency innovation that really established itself was fusing kernels (GPU kernels, that is) in a clever way to make it go brrrrr. And as clever as that is, there's little deep math behind it.

Results are speculation and empirics. The proof is in the pudding, which is excellent.

> The proof is in the pudding, which is excellent.

not for long. steam engines existed long before statistical mechanics, but we dont get to modernity without the latter

Yet we have many medicines that we have empirically shown to work without a deep understanding of the mechanics behind them and we’re unlikely to understand many drugs, especially in psychiatry, any time soon.

Trial and error makes the universe go round.


> The problem of "identification" was quickly solved by another engineering feat, which was to slap on "positional embeddings". As usual, this too didn't happen because there was a deep mathematical understanding. Rather, it was attempted and it worked.

Wasn't that tried, because of robotics?

It's a commonly solved issue, that a hand of a robot must know each joints orientation in space. Typically, each joint (a degree of freedom) has a rotary encoder built in. There is more than one type, but the "absolute" version fits the one used in positional embeddings:


(full article: https://www.akm.com/global/en/products/rotation-angle-sensor... )

I find that parallel very fitting, since a positional embedding uses a sequence of sinusoidal shapes of increasing frequency. In the "learned positional embedding" gpt's (such as the gpt-2), where the network is free to use anything it would like to, seems that it actually learns the same pattern as the predefined one (albeit a little bit more wonky).

Transformers don't need quadratic memory for attention unless you scale the head dimension proportional to the sequence length. And even that can be tamed.

The arithmetic intensity of unfused attention is too low on usual GPUs; it's even more a memory bandwidth issue than a memory capacity issue. Just see how much faster FlashAttention is.

Thank you for this clarification. What do you think of geometric deep learning? What other more formal mathematical approaches/research are you aware of?

And on top of that, the nomenclature is really confusing.

+1 to that. It is like the ML people went out of their way to co-opt existing statistical terminology with a slightly different spin to completely muddle the waters.

It's just because they did not study statistics, so they were unaware of it.


At my department, instructors were well aware of statistics. That was a prerequisite course on the AI path. Some early day software (WEKA) used statistic nomenclature extensively.

The best part of DNNs I think is the brute force backprop over essentially randomized feature generation (convolutions)… Statisticians would never do that.

My consulting company bills me out at a much higher multiple than that in my gross salary

You should start working for yourself. If you want advice/help on how to start, my contact info is in my bio.

I used to be in that boat, now I'm in this boat.

I'm a part time freelance musician, and oddly enough there's a similar situation in the music business. Bandleaders have to pay the musicians a reasonable cut of the proceeds from gigs, or the musicians will split off and form their own band.

when i work for myself the number of billable hours is cut in half due to the overhead of finding and negotiating new projects. i am in that boat and i have trouble keeping it afloat. i'd rather take a cut and work with an agency or a partner that finds the projects for me.

I've been working on streamlining these things for myself, given the exact same pain points and limitations, and would be happy to share what I've learned. My contact info is in my bio.

I'd like to hear more. It would be nice of you to write it in a comment to share with others that are interested instead of requesting DMs :)

I'm not requesting DMs.

I'm not comfortable sharing many specific details about my business publicly, in this forum. But I am comfortable sharing them with people who are where I was and want to get where I am. I'm happy to. I'm sure you understand.

In service of answering your question anyway. I'm a hardware product design engineer. I can take ownership over an entire complex piece of hardware (medical device, IT product, etc) and architect it, design it mechanically, electrically and do the systems engineering. I'm able to deliver entire complex hardware products that work well (more than well), can be mass manufactured, and meet cost targets. I've designed surgical robotics systems, artificial hearts and other class III implanted devices, stuff in the disney parks, times square and even the Smithsonian. I have a website about myself with more information at www.iancollmceachern.com

Another alternative is to hire someone to manage the overhead and/or work on finding & negotiating new projects.

how does one get started with this? with an agency they already have a track record and they also have multiple people they can draw on so if they have a project they can manage it with low risk.

for an individual partner we need to get to know each other first, and then that person needs to be willing to only get paid when a project is successful, because otherwise i'd face having to pay a full salary from the start before we even have the first project.

how does one find such a person?

Sales people you can pay on a commission basis, which separates the wheat from the chaff pretty quickly. There are companies like Sales Focus that specialize in it.

What is a Langchain like Vector db? Langchain is not a vector db?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact