I saw this a couple weeks back on crates because I wanted a taskwarrior alternative on windows. Haven't tried it yet, but seems interesting. The NLP features are something that more task apps should implement.
Isn't this kinda pigeonholing yourself to one neural network architecture? Are we sure that transformers will take us to the promised land? Chip design is a pretty expensive and time consuming process, so if a new architecture comes out that is sufficiently different from the current transformer model wouldn't they have to design a completely new chip? The compute unit design is probably similar from architecture to architecture, so maybe I am misunderstanding...
It’s a bet. Probably a good one to make. The upside of being the ones who have an AI chip (not a graphics chip larping as an AI chip) is huge. It will run faster and more cheaply. You get to step all over OpenAI, or get a multi
billion dollar deal to supply Microsoft data centres. Or these ship on every new laptop etc. You get to be the next unicorn ($1tn company). So that is a decent bet for investors assuming the team can deliver. Yes the danger is there is a new architecture that runs on a CPU that for practical purposes whoop’s Attention’s ass. In which case investors can throw some money at ASICifying that.
Yep, transformers showed up in 2017, nearly 7 years ago, and they still wear the crown. Maybe some new architecture will come to dominate eventually, but I would love a low cost PCIe board that could run 80B transformer models today.
Well, GPT-4 runs on a transformer architecture, and even if for unknown reasons GPT-4 is the upper limit of what you can achieve with transformer models, having hardware specialized to run the architecture extremely fast would always be very useful for many tasks (the tasks GPT-4 can already handle at least).
This was my first thought too. Even if transformers turn out to be the holy grail for LLMs, people are still interested in diffusion models for image generation.
I think we’re about to see a lot of interesting specialized silicon for neural nets in the coming years, but locking yourself into a specific kind of model seems a little too specialized right now.
Diffusion models could actually be implemented with transformers, hypothetically. Their training and inference is what makes diffusion models unique, not the model architecture.
You have to look at the pre-trained models and not the "tuned" models. Its number one in that category (which is what I think they are referring to, given the benchmarks are against Mistral and llama)
I really like this. Where I work, my boss just picks a place on Airbnb near the beach, and we stay there for the weekend. Will recommend this to my boss.