Hacker News new | past | comments | ask | show | jobs | submit | more tracyhenry's comments login

I thought everyone was using flexboxes. In what cases would a grid be preferable?


Flex for one dimension and grid for 2 dimensions.


maybe someone more informed can help me understand why they didn't compared to Llava (https://llava-vl.github.io/)?


The purpose of this research is to compare large vision-language models where the vision component is pre-trained using different techniques, namely on image classification versus unsupervised contrastive pre-training (see OpenAI's CLIP). PaLI-3 also isn't an instruction-tuned model, so comparing it to Llava would be a little apples-to-oranges.


Maybe they just didn’t know about llava while conducting their research. It can take days to train a model sometimes.


Weeks to months at larger scales even.


The value prop is to hire fewer junior devs or even replace them. They don't mean to help junior devs.

Also, I'm not sure if you'd enjoy writing code for those "grunt work". I'd love PRs that I can easily check correctness for and would get some small job done.


His point wasn’t about whether the “grunt work” is enjoyable or not, but that it is necessary work for juniors to do in order to gain experience.

I’m not sure. If these AI tools become sophisticated enough it might be better experience to learn how to use them instead of doing the underlying work. Career-wise anyway.


It's necessary for sure, but we want to let junior devs choose to do the more interesting work.

We're also trying to make it easy to use Sweep. One outcome is an entirely simulated teammate, which is part of what we're doing with allowing you to review Sweep's PR


Sweep is targeted towards senior devs that can do two things. 1. review code quickly 2. articulate requirements well

Also, here's another example of "grunt work". Sweep added a banner to our landing page, and I didn't touch my IDE at all. https://github.com/sweepai/landing-page/pull/226


I would honestly just ignore that feedback. It's needlessly reductive and oxymoronic (coding is fun! But give juniors boring grunt work)


If Vision Pro wins, its likely because of the higher-res (so virtual monitors become usable) and the ecosystem (any iOS apps + projecting Apple devices).


The 'trust' factor may play into it too. For years, we've seen Meta and Apple bicker about privacy. I don't think this was just for fun; I think this was for the upcoming headset war we're going to witness.

Do you trust Meta or Apple with eye tracking data?


Apple pushes an update so that FB needs to ask you for permission for tracking. You think that's just being noble? It's because they want to become an ad company too[1], at the cost of destroying countless SMBs in the meantime.

I'm not sure how long this trust is going to last.

[1] https://proton.me/blog/apple-ad-company


That's the thing. I don't trust facebook at all. After many years, I switched to Apple (Mac) and I don't fight as much as before. Heck, I'm now learning Swift/SwiftUI just because that by the time we get version 3, We could program something similar to what Tony Stark has at home (fingers crossed).

Now, fb is still "sharing" with the community (React, Pytorch, Prophet, etc...), so at least on that front they're winning.


The problem is that the traffic issue is nonstop and affects things, so you have to fix it. In reality at least the traffic will resolve itself overnight.


There are mods that can do that (literally wipe every simulated traffic off the map) but you can handle traffic pretty easily if you plan ahead.

My last few cities really had no traffic problems at all except for one old main road that I really didn't want to redevelop for some reason.


I'm not sure how that can be a problem. If traffic couldn’t have any effect on your city, then why bother simulating it? The whole point is to make you think ahead and plan the capacity of your road network so that traffic jams are less likely and fixing them is easier.


It's not hard to embed links in the generated answer as demonstrated by Bing Chat. Under the hood, it still uses Bing Search as a first-step filter. So you still want to very much rank high on search results. SEO will not change much in that sense.


how do you decide what content on the page to index, and how to split them to fit the context window?

Amazing concept btw - would love to see more examples (like a chatbot for a more well-known site).


It's pretty straightforward forward with LangChain and GPT-Index. There are lot of tutorials on the Internet for the same like this one https://youtu.be/9TxEQQyv9cE


I don't think chunking + embedding based retrieval is good enough. It's a good first draft for a solution, but the chunks are out of context, so the LLM could combine them in an unintended way.

Better to question each document separately and then combine the answers into one last LLM round. Even so, there might be inter-document context that is lost - for example looking at one document that depends on details in another one. Large collections of documents should be loaded up in multiple passes, as the interpretation of a document can change when encountering information in another document. Adding one single piece of information to a collection of documents could slightly change the interpretation of everything, that's how humans incorporate new information.

One interesting application of document-collection based chat bots is finding inconsistencies and contradictions in the source text. If they can do that, they can correctly incorporate new information.


I index everything. I don't pick and choose. Like I said, I do pre-processing to scrape the entire website content.

When the user asks, I try to get the relevant bits and answer the question based on that.


I guess not. Probably an offline process where they scrape the websites into chunks and build embeddings. At query time first search for the relevant chunks and then put those chunks into the prompt?

Would love more details though from the author!


Yes, you are right. It's not possible to give the entire content in a prompt. A users' site can have a lot of pages and each page can potentially be super long.


Are you talking about local storage? If so the limit is 5 MB: https://developer.chrome.com/docs/extensions/reference/stora...


Reminds me of VisualChatGPT (https://github.com/microsoft/visual-chatgpt), which also uses a LLM to decide what vision models to run.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: