Falcon LLM: The Most Powerful Open Source LLM Released to Date

bbor · on June 16, 2023

Could we link a non-paid source? Presumably this article is just rephrasing the announcement, and they have a picture of an eagle on the top. I'm not exactly a biology expert but cmon!

For example:

- https://huggingface.co/tiiuae

- https://falconllm.tii.ae/

- https://www.tii.ae/news/falcon-40b-worlds-top-ai-model-rewar...

- or hell just the leaderboard https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

P.S. The UAE makes LLMs now? That's... a tad unexpected. Must've been out of the loop. Good for them - despite political grievances, you've gotta admit this is pretty damn impressive.

EDIT: Also it looks like the news here is that they released an `instruct` version of the pre-existing `falcon-40b`? Might also be worth highlighting in the title.

courseofaction · on June 16, 2023

(didn't read because paywall)

From experimenting with Falcon-40b-instruct the base model, outputs are acceptable for very simple tasks though still dwarfed by closed-source LLMs in usefulness.

The thing stopping me from experimenting further is how expensive inferences are - 40b to takes 8-10mins per 200 token generation, loaded in 4-bit with transformers on a colab A100. The model weighs in around 27GB on VRAM with these settings.

Has anyone gotten faster inferences with the 40b model?

jxy · on June 16, 2023

1 to 2 tokens per second on CPU, depending on the CPU/RAM, thanks to GGML.

See discussions: https://github.com/ggerganov/ggml/pull/231

And the code: https://github.com/jploski/ggml/tree/falcon40b

sliken · on June 16, 2023

Anyone tried Falcon 40b on a Apple M2 max or M2 ultra? How many tokens per second?

flangola7 · on June 16, 2023

Why is the inference so sluggish?