Hacker News new | past | comments | ask | show | jobs | submit | boywitharupee's comments login

and is Griffin a state space model?


No, it's a combination of RNN and Transformer.


I mean, SSMs are in fact under the hood RNNs


At the end of the day, either you carry around a hidden state, or you have a fixed window for autoregression.

You can call hidden states "RNN-like" and autoregressive windows "transformer-like", but apart from those two core paradigms I don't know of other ways to do sequence modelling.

Mamba/RWKV/Griffin are somewhere between those two extremes.


i wonder if we can train a foundational model on this data which will eventually allow to semantically search the codebase?


what's the memory and compute requirements for this?


but which model to tokenize with? is there a leaderboard for models that are good for RAG?


“For RAG” is ambiguous.

First there is a leaderboard for embeddings. [1]

Even then, it depends how you use them. Some embeddings pack the highest signal in the beginning so you can truncate the vector, while most can not. You might want that truncated version for a fast dirty index. Same with using multiple models of differing vector sizes for the same content.

Do you preprocess your text? There will be a model there. Likely the same model you would use to process the query.

There is a model for asking questions from context. Sometimes that is a different model. [2]


is this known as a procedural generation?


Yes, it is probably its main usage

The author of the GitHub repo is a pretty successful game author and has used it many times for his games

Here’s a talk by him talking about procedural generation using wave function collapse: https://m.youtube.com/watch?v=0bcZb-SsnrA


care to explain why attention has precision issues with fp8?


Oh so float8's L2 Norm from float32 is around I think 1e-4, whilst float16 is 1e-6. Sadly attention is quite sensitive. There are some hybrid methods which just before the attention kernel which is done in fp8, upcasts the Q and K from the RoPE kernel to become float16, then also leaves V to be in float8. Everything is done in fp8 on the fly, and the output is fp8. This makes errors go to 1e-6.


Yes, but it's a bit more complicated. There are 2 FP8 formats: E5M2 and E4M3.

E5M2 is like an IEEE 754. But to compensate the smaller exponent, "E4M3’s dynamic range is extended by not representing infinities and having only one mantissa bit-pattern for NaNs".

Some people reported E4M3 is better for the forward pass (small range, more precision) and E5M2 is better for the backward pass (bigger range, less precision). And most implementations have some sort of scaling or other math tricks to shrink the error.

[0] FP8 Formats for Deep Learning (Nvidia/ARM/Intel) https://arxiv.org/abs/2209.05433


Fair points! Ye Pytorch's fp8 experimental support does scaling of the gradients. Interesting point on a larger range for the forward pass, and a small range for the gradients! I did not know that - so learnt something today!! Thanks! I'll definitely read that paper!


the title seems like a misnomer.

shouldn't this be "python 3.13 gets a new jit compiler" because python already has a jit.


I think they mean CPython, yes.


JAX is a wrapper on top of XLA. Instead of writing pure python, you're writing JAX abstractions.

for ex, a simple loop in JAX:

  def solve(i, v): return i+v
  x = jax.lax.fori_loop(0, 5, solve, 10)


> what the Komoglorov complexity of "NSFW GLSL Content" is

can you explain what you mean by above?


The smallest GLSL program that generates content generally regarded as NSFW.

It's among the more interesting of concepts in computer science: https://en.wikipedia.org/wiki/Kolmogorov_complexity

Though it probably greatly benefits from a good lecturer if you do not have the background... https://people.csail.mit.edu/rrw/6.045-2020/


currently, on apple silicon "GPU" <> "Metal" are synonymous.

yes, there are other apis (opengl,opencl) to access the gpu but they're all deprecated.

technically, yes, this is using Metal.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: