Is this a good test case for all of these competing open (and even closed) source LLMs:
feed it a list of YouTube/SoundCloud quality “artists + song titles” and ask it to clean them up/figure out how split/parse them into CSV or JSON and then identify their genre
I want to make sure I’m not being too harsh when I criticize these as useless if they can’t do this “basic” task because I’m pretty sure I was able to get GPT-3.5 to do this reasonably well for about $0.50 with no token cost optimization
I’m just curious why people are so infatuated and putting so much effort into all of these other open source models if they couldn’t complete this basic task.
If you fine tune them to be task specific, they'll perform well. In my experience, this control loop is a better investment than "prompt engineering". (When I say task specific, I mean very task specific)
GPT4 over the API is too fine tuned, which constrains its behavior. It fails to capture nuance in instructions. When you have the bag of weights, you can actually control your model. Having actual control over the model, and understanding the infrastructure that it's running on helps you meet actual SLAs.
And it's cheaper, if you're not backed by infinite venture money.
Could you explain what you mean by fine-tune? For example, I don't have the answers to what the songs parsed out + genre identified into JSON looks like. You're saying I'd have to train the model with known answers, and then maybe it could predict with some accuracy going forward?
I don't see how this warrants the extra exciting popularity of LLAMA2, etc.
I still haven't found my own personal niche "good enough" test case
I think it depends a lot on the scale of what you're trying to do, whether it's worth it or not to invest in OSS/DIY. If you're one person looking to do a "one off" task like organizing some of your own music, then you're correct that it's probably not worth it to invest time and effort into getting an open source model to do it for you. Just pay $0.50 and be done with it! But if you want to build an app that does that for people, and you want to host it for free/cheap, the costs could add up quickly. And especially if you are a company with a language task that will have lots of users--the up front R&D cost can definitely be worth it to save on costs of usage.
I was trying to argue “the open source models can’t do it in their current state”
I’m curious what could be done time investment wise by a single person like myself that could “tweak” LLAMA2 into being able to do a task it by default can’t
feed it a list of YouTube/SoundCloud quality “artists + song titles” and ask it to clean them up/figure out how split/parse them into CSV or JSON and then identify their genre
I want to make sure I’m not being too harsh when I criticize these as useless if they can’t do this “basic” task because I’m pretty sure I was able to get GPT-3.5 to do this reasonably well for about $0.50 with no token cost optimization
I’m just curious why people are so infatuated and putting so much effort into all of these other open source models if they couldn’t complete this basic task.