Hacker News new | past | comments | ask | show | jobs | submit | tracyhenry's comments login

This looks great. I would love to know more what makes Confident AI/DeepEval special compared to tons of other LLM Eval tools out there.

Thanks and great question! There's a ton of eval tools out there but there are only a few that actually focuses on evals. The quality of LLM evaluation depends on the quality of dataset and the quality of metrics, and so tools that are more focused on the platform side of things (observability/tracing) tend to fall short on the ability to do accurate and reliable benchmarking. What tends to happen for those tools are users use them for one-off debugging, but when errors only happen 1% of the time, there is no capability for regression testing.

Since we own the metrics and the algorithms that we've spent the last year iterating on with our users, we balance between giving engineers the ability to customize our metric algorithms and evaluation techniques, while offering the ability for them to bring it to the cloud for their organization when they're ready.

This brings me to the tools that does have their own metrics and evals. Including us, there's only 3 companies out there that does this to a good extent (excuse me for this one), and we're the only one with a self-served platform such that any open-source user can get the benefit of Confident AI as well.

That's not all the difference, because if you were to compare DeepEval's metrics on more nuance details (which I think is very important), we provide the most customizable metrics out there. This includes researched-backed SOTA LLM-as-a-judge G-Eval for any criteria, and the recently released DAG metric that is a decision-based that is virtually deterministic despite being LLM-evaluated. This means as user's use cases get more and more specific, they can stick with our metrics and benefit from DeepEval's ecosystem as well (metric caching, cost tracking, parallelization, integrated with Pytest for CI/CD, Confident AI, etc)

There's so much more, such as generating synthetic data to get started with testing even if you don't have a prepared test set, red-teaming for safety testing (so not just testing for functionality), but I'm going to stop here for now.


+1

I think one rule around Show HN is that you allow people to see content without signing up, let alone paying for it. So this is a violation of that rule.

Edit: actually search is not behind the paywall (although that's not very obvious)


You can currently search the job listings without having to sign up, which only hides the company and the link the job listings.


That’s definitely violating the spirit of the rule


Right, the data is worth nothing without additional info.


Also, there's no way to delete your account, or remove your email. Which is not just frustrating -- it's flatly against GDPR, CCPA, etc etc.

I relatively recently was looking for a new job, so I was curious to see how well my own search process compared to this. Then I was mildly annoyed to discover that they require email signup to actually view any details about the company or the job. So I give them my email. And then I was even more disappointed to discover that they require you to pay money to even see the link to a single actual job posting. Sorry, not gonna do that if I'm just trying to scratch the curiosity itch. So then I go to delete my account, and... nothing. No can do.

Honestly one of the quickest turnarounds from "oh, neat" to "jeez, what a disappointment" that I've had in recent times.


I made a point to collect almost no information (only email) from the user (as opposed to Linkedin which asks for all sorts of data to sell) but happy to delete your account if you just email support. I also dislike having my data used or sold. That is not the purpose here, the feature to delete your account is just not yet there, appologies


UPDATE: You can now delete your account (under the 'My Account' tab within the /settings page)


couldn't take this article seriously considering it doesn't mention Meta's Rayban smart glasses, which largely does what they want already: a pair of glasses without visuals but with AI in your ears


Yeah at this point it's almost jumping on a new hype bandwagon to "come up" with the idea of ambient audio based AI.

The kicker here though is since it's all driven by a phone in your pocket (a) it will either kill your battery or not be allowed at all by the platform, and (b) it has no camera, so it has no idea what you are actually seeing or looking at so it will be a second class citizen to all the versions of this that are camera enabled (such as, as you mention, the RayBans).


This is a pretty common view for Vision Pro users. Hand tracking is great for these. Can't imagine having to use a controller.

Ofc Vision Pro users are extreme minorities so you are not wrong. But I highly encourage you to try out Vision Pro if you haven't.


I did buy a Vision Pro, but it's a nearly unusable device and outside of fora, I've never met anyone whose had a positive experience, so I suspect even among Vision Pro users, it's a minority opinion.

Hand tracking is not a feasible input method for routine computing.


This has been a great idea for decades. I want Haystack to be successful just like many other attempts. The early execution seems promising. And I suspect there will be many challenges (e.g. when it's hard to figure out caller/callee,, inconsistent UX preferences across developers, etc). Kudos for taking this on!

Btw I've always thought that this is even more powerful when the screen estate is more infinite than a 2D screen (like in a VR headset).


I love the idea of a Haystack VR world! It's a shame that VR software is in a tenuous state due to the biological factors, but I believe it's the future "one day".


"In a tenuous state due to the biological factors" is easily the funniest SV euphemism for "can't be used by humans."


It’s ok after they deliver the MVP there will be a wetware update.


Who knows, one day it may be possible (hopefully without any dystopian updates to human biology)!


At this point, that seems dubious - your inner ear is going to go all inner ear on you, no matter what. Unless you get to turn that off, VR is not it.


I hope our code-editor-in-VR wouldn't involve flying around like hilarious depictions of "The Gibson" in bad sci-fi movies.


Doesn't matter. As long as you use VR to display a virtual 3D environment and you move within it, your inner ear will fight with your vision system if you're moving or not. If the visual system and the accelerometer don't agree, the positioning system throws an exception.

And, for whatever reason, the human exception handler for that problem is firmly linked to the barf() subroutine ;)


Like I said, as long as you're not flying around- moving within it, then your inner ear doesn't care. Turning your head doesn't count. I don't see a need in a code-in-VR system to move like that. And most VR games solve this my having your teleport instead of translate.

I think the barf routine is because when your brain senses your vestibular system not working it thinks "oops I must be poisoned" and tries to make you throw up.


"Fixing our eyeballs is just an engineering problem."


Check out SoftSpace https://soft.space … not an IDE but similar idea for knowledge mapping


This is actually really cool!


please post a photo!


I was out last night, but here you go: https://ibb.co/68hPjRT


Thank you, this looks amazing


That’s deep red! Quite a different sight!


Llama 3 is the open source model while this is a product offering competing w/ ChatGPT, Claude and Gemini. Not a dupe imo.


not sure what you're talking about, this is just the playground/demo site for the new Llama 3. The discussion is over there.


+1 really well made. I wonder if there's a framework for making these kind of websites?


found in the sources that babylon.js (https://www.babylonjs.com/) is in use.


BabylonJS indeed. We had a tutorial up on our previous YouTube channel, if there's interest, we'll reupload it.


I’d love to see it


We'll upload it here today or tomorrow: https://www.youtube.com/channel/UCXF--ktsN0t97W9R0GN6_3Q


looking forward to it!



Awesome, thank you very much for the reupload!


thank you so much


My hot take:

ALL AI wearable companies should return their money to investors, and wait for the AR glasses by Apple.

AR glasses are the ultimate form factor. And as much as I hate monopoly, Apple has the right app/dev ecosystem, and will make Siri work.


Vision Pro isn't really supposed to be used outside.


I'm talking about AR glasses, not Vision Pro.


Do we know an ETA on those?


The fact the Vision Pro is passthrough VR instead of an AR screen on glass (as in when the battery is dead you see black, not see the room with no AR) says that it's far away.


Or while moving.


Absolutely correct.

The AR glasses race is between Apple and Meta Platforms.

This 'LLM-first phone' looks like a solution in search of a problem. But we'll see.


> the first being at the birth of modern search engines.

Why do you say that? Search engines would at least direct the viewer to the source. NYT gets 35%+ of its traffic from Google: https://www.similarweb.com/website/nytimes.com/#traffic-sour...


Just because they asked for forgiveness instead of asking first for permission, it's original sins will not be erased :-)

"Google Agrees to Pay Canadian Media for Using Their Content" - https://www.nytimes.com/2023/11/29/world/americas/google-can...


That's why I think the newspapers will manage to win against the LLM companies. They won against Google despite having no real argument why they should get paid to get more traffic. The search engine tax is even a shakier concept than the LLM tax would be.

Newspapers are very powerful and they own the platform to push their opinion. I'm not about to forget the EU debates where they all (or close to all) lied about how meta tags really work to push it their way, they've done it and they will do it again.


That doesn’t mean that it wasn’t theft of their content. The internet would be a very different place if creator compensation and low friction micropayments were some of the first principles. Instead we’re left with ads as the only viable monetization model and clickbait/misinformation as a side effect.


I don't quite get it. If listing your link is considered as theft, HN is then a thief of content too. If you don't want your content stolen, just tell Google to not index your website?

I guess it's more constructive to propose alternatives than just bashing the status quo. What's your creator compensation model for a search engine? I believe whatever being proposed is trading off something significant for being more ethic.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: