Hacker News new | past | comments | ask | show | jobs | submit | dudus's comments login

There's a lot. But you don't need it all. Just choose well and keep it simple.

These days I find it very easy to recommend Astro for instance. It is batteries included enough you really don't need anything else.

Bun, deno and Hono are other projects that simplify a lot of the tooling by including most of what you need out of the box.

I'd argue that if you choose any of the options above web dev is easier than ever.


Seriously? Bun has over 500 open issues with the label "crash". https://github.com/oven-sh/bun/issues?q=is%3Aissue%20state%3...

The socket implementation is broken too https://github.com/oven-sh/bun/issues/5627


I'll suggest maybe tailscale is what you are looking for to share with family a VLAN


Adding on to this: you can host your own Tailscale-compatible server with Headscale, if you want to be totally independent.


I’m going to dig into Headscale with Authelia for OIDC, that’s pretty close to what I was imagining, found this tutorial [0] (wow posted just 4 days ago). Thanks ya’ll.

[0] https://www.reddit.com/r/selfhosted/comments/1ic1w4q/headsca...


Congress has "supposedly" seen the evidence, but is confidential. But more than that they seem to be acting in something that can happen more than in something that has happened already


They had the option to divest into an American entity. But failed or didn't want to do it.

You have the freedom of speech to manipulate and be anti-democratic as long as you are the US government or bound by its control.


Actually, the option to divest is to escape control by the Chinese government, not to enter control by the US government.


You can buy SaaS kits that include a frontend with pricing pages, backend and all code necessary to wrap any API and resell at a profit.


When you do that you are just putting the whole book into context for openAI to reason about. That works if the work of smaller than the context.

For longer documents or for groups of documents you need a kind of search to extract the most relevant passages to throw in the context.

That is RAG. That search you do.

It's usually a semantic search using embedded data created by a specialized model to create these embeds and a specific algorithm to chunk the document into smaller pieces to derive meaning from.

So you have multiple pieces involved into the job. A chunker, an embbeder model, a vector database, etc.


I have a question about RAG in general (I am quite ignorant regarding LLM and I an trying to reason about possible hurdles before starting experimenting).

I would like to "train" an LLM with a few thousand analysis documents that detail the various enhancements applied to a in-house app over the last twenty years.

My question is: some of the modules that are part of my app have been totally revamped, sometimes more of once. So while the general requirements for module Foo are more or less consistent, documents talking of it from 2005 to, say 2018 describes either bug fixes or small enhancements. In 2019 our main Foo provider completely changed their product and therefore the interface, so the 2 docs talking of Foo in 2019 are "more authoritative" than anything before that date... but then COVID happened so we have now Foo 3.0 which was implemented in late 2022 and is now being idly maintained with, again, small enhancements and fixes.

Documents have IDs which include an always increasing number (they start their life as Jira Issues)so just saying "newer=more accurate/valid/authoritative" could help, but I hope we do not need to rank/tag/grade every single document manually in order to assess how much weight it has on any specific topic.

Is this something that needs special treatment or will it just "work"?


I fully agree and have said it for years.

Microsoft is the main culprit of DNT failures.


DNT is a failure since it relies on advertiser self-regulation. We shouldn't ask them not to track us, we should make it very hard for them to do it.


It's a failure because law makers haven't made it clear ignoring it is illegal.

You probably can build a case around DNT clearly communicating that a user doesn't want to be tracked and as such it should be treated like an if the user manually opt out of all tracking.

But as long as lawmakers or court don't pin it down legally to make it a clear cut case instead of some gray area thing with a lot of wiggle room.

There is very little you can do against modern tracking tech without crippling browser functionality, as such solutions have to be law based foremost and supplemented with technology and actually painful penalties if companies try to sneak by this.


> It's a failure because law makers haven't made it clear ignoring it is illegal.

How can I, as an end user, check if a site is ignoring it and therefore report it to the proper authorities?


We should make it both technically hard and illegal for the surveillance industry to track us. Corporations continue to reinvent de facto government from the bottom up, and if most Americans weren't too distracted freebasing the fallacy that corporations and government are opposing forces we might be able to preserve individual liberty.


Advertisers are the cause of DNT's failure, not Microsoft.


It's always both, the people willing to pay someone to make things worse, and the people willing to take the money to do it.


I mean I agree, I'm against advertisers.

But advertisers exist and will continue to exist, and have no incentive to follow this. I don't think either are at fault necessarily; I think it was a weak attempt all around.

The only thing that will get companies to comply are a/ laws (and so far all laws have done is annoy end users) b/ browsers doing more to block tracking (which is almost impossible; this will forever be a game of cat-and-mouse).


DNT was always doomed to fail. MS just forced the issue.


The longer the context the more backtracking it needs to do. It gets exponentially more expensive. You can increase it a little, but not enough to solve the problem.

Instead you need to chunk your data and store it in a vector database so you can do semantic search and include only the bits that are most relevant in the context.

LLM is a cool tool. You need to build around it. OpenAI should start shipping these other components so people can build their solutions and make their money selling shovels.

Instead they want end user to pay them to use the LLM without any custom tooling around. I don't think that's a winning strategy.


This isn't true.

Transformer architectures generally take quadratic time wrt sequence length, not exponential. Architectural innovations like flash attention also mitigate this somewhat.

Backtracking isn't involved, transformers are feedforward.

Google advertises support for 128k tokens, with 2M-token sequences available to folks who pay the big bucks: https://blog.google/technology/ai/google-gemini-next-generat...


During inference time, yes, but training time does scale exponentially as backpropagation still has to happen.

You can’t use fancy flash attention tricks either.


No, additional context does not cause exponential slowdowns and you absolutely can use FlashAttention tricks during training, I'm doing it right now. Transformers are not RNNs, they are not unrolled across timesteps, the backpropagation path for a 1,000,000 context LLM is not any longer than a 100 context LLM of the same size. The only thing which is larger is the self attention calculation which is quadratic wrt compute and linear wrt memory if you use FlashAttention or similar fused self attention calculations. These calculations can be further parallelized using tricks like ring attention to distribute very large attention calculations over many nodes. This is how google trained their 10M context version of Gemini.


So why are the context windows so "small", then? It would seem that if the cost was not so great, then having a larger context window would give an advantage over the competition.


The cost for both training and inference is vaguely quadratic while, for the vast majority of users, the marginal utility of additional context is sharply diminishing. For 99% of ChatGPT users something like 8192 tokens, or about 20 pages of context would be plenty. Companies have to balance the cost of training and serving models. Google did train an uber long context version of Gemini but since Gemini itself fundamentally was not better than GPT-4 or Claude this didn't really matter much, since so few people actually benefited from such a niche advantage it didn't really shift the playing field in their favor.


Marginal utility only drops because effective context is really bad, i.e. most models still vastly prefer the first things they see and those "needle in a haystack" tests are misleading in that they convince people that LLMs do a good job of handling their whole context when they just don't.

If we have the effective context window equal to the claimed context window, well, I'd start worrying a bit about most of the risks that AI doomers talk about...


There has been a huge increase in context windows recently.

I think the larger problem is "effective context" and training data.

Being technically able to use a large context window doesn't mean a model can actually remember or attend to that larger context well. In my experience, the kinds of synthetic "needle in haystack" tasks that AI companies use to show how large of a context their model can handle don't translate very well to more complicated use cases.

You can create data with large context for training by synthetically adding in random stuff, but there's not a ton of organic training data where something meaningfully depends on something 100,000 tokens back.

Also, even if it's not scaling exponentially, it's still scaling: at what point is RAG going to be more effective than just having a large context?


Great point about the meaningful datasets, this makes perfect sense. Esp. in regards to SFT and RLHF. Although I suppose it would be somewhat easier to do pretraining on really long context (books, I assume?)


Because you have to do inference distributed between multiple nodes at this point. For prefill because prefill is actually quadratic, but also for memory reasons. KV Cache for 405B at 10M context length would take more than 5 terabytes (at bf16). That's 36 H200 just for KV Cache, but you would need roughly 48 GPUs to serve bf16 version of the model. Generation speed at that setup would be roughly 30 tokens per second, 100k tokens per hour, and you can server only a single user because batching doesn't make sense at these kinds of context lengths. If you pay 3 dollars per hour per GPU, it's $1440 per million tokens cost. For fp8 version the numbers are a bit better: you need only 24 GPUs, generation speed stays roughly the same, so it's only 700 dollars per million tokens. There are architectural modifications that will bring that down significantly, but, nonetheless, it's still really really expensive, but also quite hard to get to work.


Another factor in context window is effective recall. If the model can't actually use a fact 1m tokens earlier, accurately and precisely, then there's no benefit and it's harmful to the user experience to allow the use of a poorly functioning feature. Part of what Google have done with Gemini's 1-2m token context window is demonstrate that the model will actually recall and use that data. Disclosure, I do work at Google but not on this, I don't have any inside info on the model.


Memory. I don't know the equation, but its very easy to see when you load a 128k context model at 8K vs 80K. The quant I am running would double VRAM requirements when loading 80K.


This was my understanding too. Would love more people to chime in on the limits and costs of larger contexts.


> The only thing which is larger is the self attention calculation which is quadratic wrt compute and linear wrt memory if you use FlashAttention or similar fused self attention calculations.

FFWD input is self-attention output. And since the output of self-attention layer is [context, d_model], FFWD layer input will grow as well. Consequently, FFWD layer compute cost will grow as well, no?

The cost of FFWD layer according to my calculations is ~(4+2 * true(w3)) * d_model * dff * n_layers * context_size so the FFWD cost grows linearly wrt the context size.

So, unless I misunderstood the transformer architecture, larger the context the larger the compute of both self-attention and FFWD is?


FFWD later is independent of context size, each processed token passes thought the same weights.


So you're saying that if I have a sentence of 10 words, and I want the LLM to predict the 11th word, FFWD compute is going to be independent of the context size?

I don't understand how since that very context is what makes the likeliness of output of next prediction worthy, or not?

More specifically, FFWD layer is essentially self attention output [context, d_model] matrix matmul'd with W1, W2 and W3 weights?


I may be missing something, but I thought that each context token would result in an 3 additional parameters per context token for self attention to build its map, since each attention must calculate a value considering all existing context


I’m confused. Backdrop scales linearly w


> you need to chunk your data and store it in a vector database so you can do semantic search and include only the bits that are most relevant in the context

Be aware that this tends to give bad results. Once RAG is involved you essentially only do slightly better than a traditional search, a lot of nuance gets lost.


This depends on the amount of context you provide, and the quality of your retrieval step.


> Instead you need to chunk your data and store it in a vector database so you can do semantic search and include only the bits that are most relevant in the context.

Isn't that kind of what Anthropic is offering with projects? Where you can upload information and PDF files and stuff which are then always available in the chat?


They put all the project in the context, works much better than RAG when it fits. 200k context for their pro plan, and 500K for enterprise.


I don't know whether using exponential in the general English language usage of the word, but it does not get exponentially more expensive


Seems like a good candidate for a "dumb" AI you can run locally to grab data you need and filter it down before giving to OpenAI


Very nice effect. I love these more creative realistic elements.

This one reminds me of this cool card effect

https://poke-holo.simey.me/


Funny enough I also implemented the 3D cards in Forza Motorsport 7 :p


Clicking around found this breakdown [1]. Expertly crafted but also really cool like the optical illusion background does a lot of the heavy lifting!

---

1: https://www.joshdance.com/100/day50/


Python strength is not the syntax but the standard and 3rd party libraries.


3rd party libraries don't spring from nowhere. And no language starts having one in abundance. People have to be motivated enough to write all those libraries in the first place and a lot of them are written just to use python syntax over C Code.

I'm not saying it's the only reason to choose python now but it's definitely among the biggest reasons.


Not necessarily. Python had the best C-API, that was the main reason. If Nim or Lisp copied that C-API people might move.

It is safe to say in 2024 that people do not want FFIs.


Yes but we're still getting to the same point.

Why not just call the C code you've already written in C ? Because they would rather use python (or python like) syntax.

I don't think we actually disagree here. Even your point about the better C-API doesn't indicate that syntax wasn't a deciding factor, just that one of several options had better compatibility.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: