Hacker News new | past | comments | ask | show | jobs | submit | aghilmort's comments login

"humans are terrible at building butterflies"

also, the Microsoft Word vs. genome size chart, <https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto...>


There are some existing attempts such as RASP iric


Hypertokens define a spreadsheet-esque label-based referential coding environment. HTs can be used in any LLM context to create arbitrary operations as will be shown in the paper. Current expected release is later this month.


one place such a beast might exist is at the holographic bulk/boundary limit esp. if message adiabatically describable in some quantum system


absolutely awesome -- huge need


incredibly succinct way of describing -- any official sources, arXiv or otherwise that articulate similarly, re: mix, sample, optimize, regularize?


I believe he's describing Orthogonal Matching Pursuit. It's a dictionary learning algorithm that can be used to recover sparse dictionarries using L1 regularization.


Not quite, though very related and I believe both should end up with essentially the same result.

Matching pursuit is essentially a greedy algorithm, if I recall correctly - please do correct me if I am wrong, where you conceptually find the component that explains the most data at each iteration, remove it, and then repeat the process on the residual data. Pardon if that isn’t quite the right explanation, but it’s what my intuition is recalling right now…

What I was describing was a simpler algorithm that can be done with gradient descent or any other vanilla optimizer.

Your model parameters are the coefficients a_i over all basis functions in the frequency domain representation. Run them through the synthesis function to get a time domain signal, and select the values of the time domain signal where your target data is known. Compute the squared error at each of those locations, and take the mean. This is your reconstruction error, and should be trivially differentiable with respect to the coefficients a_i. Compute an additional error term which is a standard L1 regularization, ie sum(|a_i|), which can be added to the reconstruction error term with some weight λ (λ=1 is even fine here, at least for simple problems), and then is also trivially differentiable (provided you haven’t initialized any of the coefficients to 0). As with any L1 regularization term, the resulting solution should be sparse in the L1 regularized parameters (look up visualizations of problems with only 2 model parameters to see how this emerges from the L1 contour lines of equal loss forming “diamonds” with the points on the axes).


ah got it -- thank you!


Not quite - see my response for to GP for what I was describing.


ahi interesting

so residual error derivatives in a sense?

the diamond construct also feels evocative of dimer &/or branched manifold / lattice methods, be that Viterbi or otherwise 2.2 in op post is reminiscent of that, e.g., if we view the DCT reconstruction as implicit matched filter

yes in theory should converge on similar result may quickly get into alternating conic optimization especially depending how the signal pair constructed, e.g., if one signal is an ECC &/or if L1 regularization is operating as an error squasher alternating op,

definitely good stuff


> so residual error derivatives in a sense?

Yep, loss = residuals*2 + lambda*sum(|a_i|), and we take the gradient of that with respect to the a_i to guide our gradient descent steps.


finalizing arXiv paper on why hypertokens eliminate AI hallucinations


thank you such a useful / insightful set of thoughts!


great need; mulling over; shows up all the time in AI paradigms


glad to have helped you :)


just realized Siri typo'd meant to say great read


interesting; the Pi of Gödel numbering if knew primes in advance


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: