More

samatdav · 2025-02-01T09:14:27 1738401267

cool! So the user does not need to use any additional tools?

banddk · 2025-02-01T09:27:51 1738402071

Yes, everything works within the website !

samatdav · 2025-01-03T05:10:26 1735881026

Haha, yes it is a pattern. However, the claim here is that "our tiny model beats best model" is applicable for highly specific tasks.

samatdav · 2025-01-03T05:08:32 1735880912

Yes, you can download and host the fine-tuned open-source model like Llama. The fine-tuning is easy once you have the data, but gathering and cleaning data is challenging. There are also optimizations like upsampling and distillation that could improve the quality of the resulting model. We had 40 engineers at the Asana AI org and never did the fine-tuning because it is not easy.

samatdav · 2025-01-03T05:06:09 1735880769

Thank you, we will!:) This was a quick landing page for us to start the conversation and gather feedback. We are trying to make sure we are not building something that nobody needs.

samatdav · 2025-01-03T05:05:15 1735880715

We used a single file for the context. It is a cherry-picked example, you are right. I wanted to demonstrate a simple visual change that our model did correctly unlike Sonnet-3.5. Since we are just getting started, we don't have many features like making changes across multiple files in the code editor so it would be harder to demo. Our premise is that a smaller fine-tuned works better than a large, general-purpose SOTA model. We plan to share more metrics and data in the future.

samatdav · 2025-01-03T04:58:54 1735880334

Good point, I agree, we haven't shared enough details. Since we are very early, we only got high level results and want to get feedback on what direction would be most applicable and useful. We plan to add more metrics and data to the website in the future and also want to publicly host a fine-tuned model for anyone to try and see.

samatdav · 2025-01-03T04:56:27 1735880187

I agree. Our local early results were promising were a higher percentage of code change requests produced a functionally correct output. We will post more metrics and data in the future.

samatdav · 2025-01-03T04:55:23 1735880123

Not yet, but we plan to publicly host a fine-tuned model so anyone can try.

samatdav · 2025-01-03T04:50:55 1735879855

We run a set of change requests on the discourse repo. Good point, we plan to publish more detailed testing benchmarks and metrics on the website.

samatdav · 2025-01-03T04:48:19 1735879699

Good point, we plan to publish more benchmarks and also publicly host a model for anyone to try. We think Llama is a good option but as we progress we will test other open source models too like deepseek.