Hacker News new | past | comments | ask | show | jobs | submit login
Falcon LLM: The Most Powerful Open Source LLM Released to Date (thesequence.substack.com)
15 points by rmaiti on June 16, 2023 | hide | past | favorite | 5 comments



Could we link a non-paid source? Presumably this article is just rephrasing the announcement, and they have a picture of an eagle on the top. I'm not exactly a biology expert but cmon!

For example:

- https://huggingface.co/tiiuae

- https://falconllm.tii.ae/

- https://www.tii.ae/news/falcon-40b-worlds-top-ai-model-rewar...

- or hell just the leaderboard https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

P.S. The UAE makes LLMs now? That's... a tad unexpected. Must've been out of the loop. Good for them - despite political grievances, you've gotta admit this is pretty damn impressive.

EDIT: Also it looks like the news here is that they released an `instruct` version of the pre-existing `falcon-40b`? Might also be worth highlighting in the title.


(didn't read because paywall)

From experimenting with Falcon-40b-instruct the base model, outputs are acceptable for very simple tasks though still dwarfed by closed-source LLMs in usefulness.

The thing stopping me from experimenting further is how expensive inferences are - 40b to takes 8-10mins per 200 token generation, loaded in 4-bit with transformers on a colab A100. The model weighs in around 27GB on VRAM with these settings.

Has anyone gotten faster inferences with the 40b model?


1 to 2 tokens per second on CPU, depending on the CPU/RAM, thanks to GGML.

See discussions: https://github.com/ggerganov/ggml/pull/231

And the code: https://github.com/jploski/ggml/tree/falcon40b


Anyone tried Falcon 40b on a Apple M2 max or M2 ultra? How many tokens per second?


Why is the inference so sluggish?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: