More

ilove196884 · 2025-01-22T04:00:00 1737518400

I hate how paper titles are worded like seo techniques.

spiritplumber · 2025-01-22T04:14:28 1737519268

Turn something into a metric and it will be misused. Ever always was

jampekka · 2025-01-22T09:12:16 1737537136

Attention is all you need!

visarga · 2025-01-22T19:16:53 1737573413

138,000 papers are "X is all you need", now 138,001

https://scholar.google.com/scholar?hl=ro&as_sdt=0%2C5&q=%22i...

hatthew · 2025-01-22T21:24:19 1737581059

640 if you use intitle, exclude citations, and filter to 2017 onwards

llm_trw · 2025-01-22T10:07:59 1737540479

The attention economy and its consequences.

littlestymaar · 2025-01-22T10:54:30 1737543270

Unexpected Kaczynski.

TeMPOraL · 2025-01-22T11:09:54 1737544194

> Ever always was

Always has been.

silent gunshot

verdverm · 2025-01-22T04:37:55 1737520675

This is a riff on the original "attention is all you need" paper, there has been a few of these lately

Matthyze · 2025-01-22T07:19:00 1737530340

A few? A multitude.

verdverm · 2025-01-22T07:32:46 1737531166

This one might be right if they have in fact unified multiple attention approaches into a single framework

see Section 3.4

LPisGood · 2025-01-22T15:17:16 1737559036

Having a catchy title is great for short hand. If it didn’t have such a catchy name I probably wouldn’t remember Flush+Reload, Spectre, or even Attention is All You Need

Upvoter33 · 2025-01-22T14:54:44 1737557684

On the one hand, sure, it's dumb.

But, on the other hand, it's hard to get researchers to read your paper, esp. in fast-moving areas. Every little thing might be the difference between reading the abstract or not. Reading the abstract might lead to reading the intro. And so on.

So, for better or worse, the competition for human eyeballs is real.

Ironically, in this case, "attention" is all that the authors want.

ilove196884 · 2025-01-16T17:49:03 1737049743

I know this this might sound scripted or can be considered cliche but what is the use case for DBOS.

qianli_cs · 2025-01-16T17:56:38 1737050198

The main use case is to build reliable programs. For example, orchestrating long-running workflows, running cron jobs, and orchestrating AI agents with human-in-the-loop.

DBOS makes external asynchronous API calls reliable and crashproof, without needing to rely on an external orchestration service.

ilove196884 · 2024-12-18T17:44:37 1734543877

See I absolutely dislike this thought that hyperscalars can easily beat nvidia. It is not their domain of expertise. Tpu are not where near GPU in performance. People really underestimate Nvidia's expertise and strengths.

Etheryte · 2024-12-18T18:30:57 1734546657

They don't need to beat them on performance though. If you get half the performance at third the price you can just make more chips and be fine. It's not like Google is gonna run out of datacenter space.

ilove196884 · 2024-12-18T18:37:33 1734547053

TPU support is very minimal. Also XLA just does not compare to CUDA yet.

Etheryte · 2024-12-18T23:33:19 1734564799

All of this may very well be true — but it doesn't matter. Google is getting results very similar to the rest of the pack, probably at a fraction of the cost. The implementation doesn't matter so long as you get the results.

mupuff1234 · 2024-12-18T17:50:34 1734544234

And yet Google seems to be doing just fine while spending less on GPUs than other big tech, so reality doesn't seem to align with your comment.

ilove196884 · 2024-12-18T18:31:47 1734546707

https://www.hpcwire.com/2024/10/30/role-reversal-google-teas... Well they are still buying GPUs.

mupuff1234 · 2024-12-18T19:35:30 1734550530

Mostly for external cloud customers so doesn't change my point - TPUs have been more than enough for Google to lead the pack.

ilove196884 · 2024-12-18T17:41:49 1734543709

Also TPU v1,v2 and v3 were ASICs, but since v4 they have added some new features so they have a lower performance/watt which is quite near Nvidia's power draw. I think Hopper is at 700W and TPU are around 600W.

YetAnotherNick · 2024-12-18T17:55:50 1734544550

Power draw doesn't matter in cloud. All that matters is performance/price for the task in hand.

jeffbee · 2024-12-18T18:02:28 1734544948

In the cloud, opex == energy.

talldayo · 2024-12-18T18:04:20 1734545060

Which is why you don't send 4 TPUs to do 1 GPU's job.

jeffbee · 2024-12-18T18:45:45 1734547545

If the TPUs cost less, that implies that they draw less power. If they cost more then nobody will use them.

ilove196884 · 2024-12-18T17:39:27 1734543567

They are overhyped and not as performant as Nvidia regardless of marketing.

ilove196884 · 2024-12-18T17:38:41 1734543521

Meta is Nvidia shop for training and AMD for inferencing. Strict seperation between vendors.

ilove196884 · 2024-12-18T17:37:40 1734543460

Meta is still GPUs. Amazon Trainium 1 was a failure and is trying an upgrade with Trainium 2. Google is a TPU shop but still busy GPUs for cloud.

ilove196884 · 2024-12-17T19:12:39 1734462759

With what hardware? These PQC standards are still slow and definitely not ready for deployment.

ilove196884 · 2024-12-17T19:11:25 1734462685

I would honestly love to see a full documentation and history of MySQL use a Meta considering it is the largest MySQL shop in the world. Would honestly give soo many small tricks and ideas.

ilove196884 · 2024-12-02T02:07:32 1733105252

Damn. They really implemented an MHD generator. That's like the holy grail in hypersonics. Active flow control is definitely well sought after in hypersonics.