The purpose of this research is to compare large vision-language models where the vision component is pre-trained using different techniques, namely on image classification versus unsupervised contrastive pre-training (see OpenAI's CLIP). PaLI-3 also isn't an instruction-tuned model, so comparing it to Llava would be a little apples-to-oranges.
The value prop is to hire fewer junior devs or even replace them. They don't mean to help junior devs.
Also, I'm not sure if you'd enjoy writing code for those "grunt work". I'd love PRs that I can easily check correctness for and would get some small job done.
His point wasn’t about whether the “grunt work” is enjoyable or not, but that it is necessary work for juniors to do in order to gain experience.
I’m not sure. If these AI tools become sophisticated enough it might be better experience to learn how to use them instead of doing the underlying work. Career-wise anyway.
It's necessary for sure, but we want to let junior devs choose to do the more interesting work.
We're also trying to make it easy to use Sweep. One outcome is an entirely simulated teammate, which is part of what we're doing with allowing you to review Sweep's PR
If Vision Pro wins, its likely because of the higher-res (so virtual monitors become usable) and the ecosystem (any iOS apps + projecting Apple devices).
The 'trust' factor may play into it too. For years, we've seen Meta and Apple bicker about privacy. I don't think this was just for fun; I think this was for the upcoming headset war we're going to witness.
Do you trust Meta or Apple with eye tracking data?
Apple pushes an update so that FB needs to ask you for permission for tracking. You think that's just being noble? It's because they want to become an ad company too[1], at the cost of destroying countless SMBs in the meantime.
I'm not sure how long this trust is going to last.
That's the thing. I don't trust facebook at all. After many years, I switched to Apple (Mac) and I don't fight as much as before. Heck, I'm now learning Swift/SwiftUI just because that by the time we get version 3, We could program something similar to what Tony Stark has at home (fingers crossed).
Now, fb is still "sharing" with the community (React, Pytorch, Prophet, etc...), so at least on that front they're winning.
The problem is that the traffic issue is nonstop and affects things, so you have to fix it. In reality at least the traffic will resolve itself overnight.
I'm not sure how that can be a problem. If traffic couldn’t have any effect on your city, then why bother simulating it? The whole point is to make you think ahead and plan the capacity of your road network so that traffic jams are less likely and fixing them is easier.
It's not hard to embed links in the generated answer as demonstrated by Bing Chat. Under the hood, it still uses Bing Search as a first-step filter. So you still want to very much rank high on search results. SEO will not change much in that sense.
It's pretty straightforward forward with LangChain and GPT-Index. There are lot of tutorials on the Internet for the same like this one https://youtu.be/9TxEQQyv9cE
I don't think chunking + embedding based retrieval is good enough. It's a good first draft for a solution, but the chunks are out of context, so the LLM could combine them in an unintended way.
Better to question each document separately and then combine the answers into one last LLM round. Even so, there might be inter-document context that is lost - for example looking at one document that depends on details in another one. Large collections of documents should be loaded up in multiple passes, as the interpretation of a document can change when encountering information in another document. Adding one single piece of information to a collection of documents could slightly change the interpretation of everything, that's how humans incorporate new information.
One interesting application of document-collection based chat bots is finding inconsistencies and contradictions in the source text. If they can do that, they can correctly incorporate new information.
I guess not. Probably an offline process where they scrape the websites into chunks and build embeddings. At query time first search for the relevant chunks and then put those chunks into the prompt?
Yes, you are right. It's not possible to give the entire content in a prompt. A users' site can have a lot of pages and each page can potentially be super long.