Thanks and great question! There's a ton of eval tools out there but there are only a few that actually focuses on evals. The quality of LLM evaluation depends on the quality of dataset and the quality of metrics, and so tools that are more focused on the platform side of things (observability/tracing) tend to fall short on the ability to do accurate and reliable benchmarking. What tends to happen for those tools are users use them for one-off debugging, but when errors only happen 1% of the time, there is no capability for regression testing.
Since we own the metrics and the algorithms that we've spent the last year iterating on with our users, we balance between giving engineers the ability to customize our metric algorithms and evaluation techniques, while offering the ability for them to bring it to the cloud for their organization when they're ready.
This brings me to the tools that does have their own metrics and evals. Including us, there's only 3 companies out there that does this to a good extent (excuse me for this one), and we're the only one with a self-served platform such that any open-source user can get the benefit of Confident AI as well.
That's not all the difference, because if you were to compare DeepEval's metrics on more nuance details (which I think is very important), we provide the most customizable metrics out there. This includes researched-backed SOTA LLM-as-a-judge G-Eval for any criteria, and the recently released DAG metric that is a decision-based that is virtually deterministic despite being LLM-evaluated. This means as user's use cases get more and more specific, they can stick with our metrics and benefit from DeepEval's ecosystem as well (metric caching, cost tracking, parallelization, integrated with Pytest for CI/CD, Confident AI, etc)
There's so much more, such as generating synthetic data to get started with testing even if you don't have a prepared test set, red-teaming for safety testing (so not just testing for functionality), but I'm going to stop here for now.
I think one rule around Show HN is that you allow people to see content without signing up, let alone paying for it. So this is a violation of that rule.
Edit: actually search is not behind the paywall (although that's not very obvious)
Also, there's no way to delete your account, or remove your email. Which is not just frustrating -- it's flatly against GDPR, CCPA, etc etc.
I relatively recently was looking for a new job, so I was curious to see how well my own search process compared to this. Then I was mildly annoyed to discover that they require email signup to actually view any details about the company or the job. So I give them my email. And then I was even more disappointed to discover that they require you to pay money to even see the link to a single actual job posting. Sorry, not gonna do that if I'm just trying to scratch the curiosity itch. So then I go to delete my account, and... nothing. No can do.
Honestly one of the quickest turnarounds from "oh, neat" to "jeez, what a disappointment" that I've had in recent times.
I made a point to collect almost no information (only email) from the user (as opposed to Linkedin which asks for all sorts of data to sell) but happy to delete your account if you just email support. I also dislike having my data used or sold. That is not the purpose here, the feature to delete your account is just not yet there, appologies
couldn't take this article seriously considering it doesn't mention Meta's Rayban smart glasses, which largely does what they want already: a pair of glasses without visuals but with AI in your ears
Yeah at this point it's almost jumping on a new hype bandwagon to "come up" with the idea of ambient audio based AI.
The kicker here though is since it's all driven by a phone in your pocket (a) it will either kill your battery or not be allowed at all by the platform, and (b) it has no camera, so it has no idea what you are actually seeing or looking at so it will be a second class citizen to all the versions of this that are camera enabled (such as, as you mention, the RayBans).
I did buy a Vision Pro, but it's a nearly unusable device and outside of fora, I've never met anyone whose had a positive experience, so I suspect even among Vision Pro users, it's a minority opinion.
Hand tracking is not a feasible input method for routine computing.
This has been a great idea for decades. I want Haystack to be successful just like many other attempts. The early execution seems promising. And I suspect there will be many challenges (e.g. when it's hard to figure out caller/callee,, inconsistent UX preferences across developers, etc). Kudos for taking this on!
Btw I've always thought that this is even more powerful when the screen estate is more infinite than a 2D screen (like in a VR headset).
I love the idea of a Haystack VR world! It's a shame that VR software is in a tenuous state due to the biological factors, but I believe it's the future "one day".
Doesn't matter. As long as you use VR to display a virtual 3D environment and you move within it, your inner ear will fight with your vision system if you're moving or not. If the visual system and the accelerometer don't agree, the positioning system throws an exception.
And, for whatever reason, the human exception handler for that problem is firmly linked to the barf() subroutine ;)
Like I said, as long as you're not flying around- moving within it, then your inner ear doesn't care. Turning your head doesn't count. I don't see a need in a code-in-VR system to move like that. And most VR games solve this my having your teleport instead of translate.
I think the barf routine is because when your brain senses your vestibular system not working it thinks "oops I must be poisoned" and tries to make you throw up.
The fact the Vision Pro is passthrough VR instead of an AR screen on glass (as in when the battery is dead you see black, not see the room with no AR) says that it's far away.
That's why I think the newspapers will manage to win against the LLM companies. They won against Google despite having no real argument why they should get paid to get more traffic. The search engine tax is even a shakier concept than the LLM tax would be.
Newspapers are very powerful and they own the platform to push their opinion. I'm not about to forget the EU debates where they all (or close to all) lied about how meta tags really work to push it their way, they've done it and they will do it again.
That doesn’t mean that it wasn’t theft of their content. The internet would be a very different place if creator compensation and low friction micropayments were some of the first principles. Instead we’re left with ads as the only viable monetization model and clickbait/misinformation as a side effect.
I don't quite get it. If listing your link is considered as theft, HN is then a thief of content too. If you don't want your content stolen, just tell Google to not index your website?
I guess it's more constructive to propose alternatives than just bashing the status quo. What's your creator compensation model for a search engine? I believe whatever being proposed is trading off something significant for being more ethic.
reply