Hacker News new | past | comments | ask | show | jobs | submit | matrix2596's comments login

wouldnt presenting numbers in reverse order, with the least significant digit on the left and most significant on the right help with the reasoning?

They do that in the paper

I also wondered the same and check the model configs. they are using bigger vocab size and the intermediate size of fully connected layer seems to be bigger.


archive link plz


Just append archive.is/ to the beginning of the url. Chances are, it is already archived. https://archive.is/https://www.scientificamerican.com/articl...


Or save a bookmark in your browser and edit its destination to be this Javascript bookmarklet to let you load the archive.is version of any URL you're currently on without even needing to remember the domain or type anything:

  javascript:void(location.href='https://archive.is/?run=1&url='+encodeURIComponent(location.href))
Or version for IA's Wayback Machine instead:

  javascript:void(window.open('https://web.archive.org/web/*/'+location.href))
(The archive.is one takes you to it in the same tab, while the wayback machine one opens a new one - because personally I use the former when I can't load a page, so don't need that tab kept open, and use the W.M. for comparing current to old versions of the page. But it should be fairly self-explanatory how to swap one URL with the other if you prefer it differently.)

Or this more complicated version of the Wayback Machine one, which if you click while on an empty tab will instead give you an alert with a text field in which to type or paste whatever URL you want to look up:

  javascript:(function()%7Bif(location.href.indexOf('http')!=0)%7Binput=prompt('URL:','https://');if(input!=null)%7Blocation.href='http://web.archive.org/web/*/'+input%7D%7Delse%7Blocation.href='http://web.archive.org/web/*/'+location.href;%7D%7D)();


Thank you, that's so convenient!


You're welcome :)


I'm building upon insights from this paper (https://arxiv.org/pdf/2403.03950.pdf) and believe that classification can sometimes outperform regression, even when dealing with continuous output values. This is particularly true in scenarios where the output is noisy and may assume various values (multi modal). By treating the problem as classification over discrete bins, we can obtain an approximate distribution over these bins, rather than settling for a single, averaged value as regression would yield. This approach not only facilitates sampling but may also lead to more favorable loss landscapes. The linked paper in this comment provides more details of this idea.


Isn't it a given that classification would "outperform" regression, assuming n_classes < n_possible_continuous_labels? Turning a regression problem into a classification problem bins the data, offers more examples per label, simplifying the problem, with a tradeoff in what granularity you can predict.

(It depends on what you mean by "outperform" since metrics for classification and regression aren't always comparable, but I think I'm following the meaning of your comment overall)


I would like these APIs to have prepaid options. Then you can control your max budget. Even OPENAI doesnt have that option.


its funny but the large language models can be seen as billions of if statements learnt from data


Not really, since they do a mathematical function over blocks and don't need a single if statement. They map learned data + input -> output as a pure function


ReLU is if-like since it only passes half of the input values with the other half becoming 0


Is ML different than if statements? Yes.

But is it reallllly different? No not so much.


thats great news. I was using arxiv vanity to read on mobile phones. I am not seeing it on all articles, is it only for new papers?


that actually makes sense. thanks.


the model says 8x7B model, so its a 56B model. what is the GPU memory requirements to run this model for a 512 context size? are there any feasible quantization models of this available? I want to know if my 16GB VRAM GPU can run this model? Thanks


According to https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF :

18.14GB in 2bit, which is still too high for your GPU, and most likely borders on unusable in terms of quality. You could probably split it between CPU and GPU, if you don't mind the slowdown.


funny in my language inko (telugu) means another. so its yet another programming language.


Inko is Japanese for 'parrot'. Although it can also mean 'obscenity' which came to mind...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: