Hacker News new | past | comments | ask | show | jobs | submit login

Open training dataset + open steps sufficient to train exactly the same model.



This isn't what Meta releases with their models, though I would like to see more public training data. However, I still don't think that would qualify as "open source". Something isn't open source just because its reproducible out of composable parts. If one, very critical and system defining part is a binary (or similar) without publicly available source code, then I don't think it can be said to be "open source". That would be like saying that Windows 11 is open source because Windows Calculator is open source, and its a component of Windows.


Here’s one list of what is needed to be actually open source:

https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e73...


That's what I meant by "open steps", I guess I wasn't clear enough.


Is that what you meant? I don't think releasing the sequence of steps required to produce the model satisfies "open source", which is how I interpreted you, because there is still no source code for the model.


They can't release training dataset if it was illegally scrapped all over the web without permission :) (taps head)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: