Before I began the test, I thought the agents would be much better at this task than most humans -- after all they should have better, more stateful memory than us. The results are intriguing.
Here are the scores from 10 attempts:
OpenAI operator: 5, 5, 6, 5, 5, 4, 6, 5, 5, 5
Anthropic computer use agent: 7, 9, 6 (rate limited), 12, 9, 7, 9, 11, 12, 6 (rate limited)
hey! thanks so much for the feedback. I'd actually love to keep updating / iterating these cartoons so they are more approachable. If you have time, I'd love to hear more on which pages are confusing & how I could have explained it better!
I _tried_ to give a definition to embeddings on page 11, but maybe that's not the most intuitive? Lmk! feel free to DM
The use of the word "relevant" such as in the phrase "are the most relevant" is problematic.
Relevancy is dependent on and relative to a specific task or interest. "Related to" or "associated" might be a better choice, since parts of a text can be statistically associated with each other.
While I feel that you are accurately conveying the terminology in the field I personally feel that terminology overstates things. For example, the notion that words closer in meaning are generally closer in latent space could be more accurately stated as "words closer in latent space are often used in similar ways."
I think this comes down to meaning being largely determined by a person's individual interpretation at a given point in time, similar to the arbitrariness of relevancy.
Hi HN! Recently we made a starter kit for better learnings on how to get started building AI apps with multi modal models. It's been fun to build, but we also discovered there are many things we needed to take care of: caching, long video processing pipelines, model evaluation, etc
> each step is a code-level transaction backed by its own job in the queue. If the step fails, it retries automatically. Any data returned from the step is automatically captured into the function run's state and injected on each step call.
This is one thing I've seen so many companies spending tons of time implementing themselves, and happens _everywhere_ -- no code apps, finance software, hospitals, anything that deals with ordering system...the list goes on.
Also is there a way to add styles / instruct on colors?