More

tracyhenry · 2025-02-20T19:17:46 1740079066

This looks great. I would love to know more what makes Confident AI/DeepEval special compared to tons of other LLM Eval tools out there.

jeffreyip · 2025-02-20T19:45:00 1740080700

Thanks and great question! There's a ton of eval tools out there but there are only a few that actually focuses on evals. The quality of LLM evaluation depends on the quality of dataset and the quality of metrics, and so tools that are more focused on the platform side of things (observability/tracing) tend to fall short on the ability to do accurate and reliable benchmarking. What tends to happen for those tools are users use them for one-off debugging, but when errors only happen 1% of the time, there is no capability for regression testing.

Since we own the metrics and the algorithms that we've spent the last year iterating on with our users, we balance between giving engineers the ability to customize our metric algorithms and evaluation techniques, while offering the ability for them to bring it to the cloud for their organization when they're ready.

This brings me to the tools that does have their own metrics and evals. Including us, there's only 3 companies out there that does this to a good extent (excuse me for this one), and we're the only one with a self-served platform such that any open-source user can get the benefit of Confident AI as well.

That's not all the difference, because if you were to compare DeepEval's metrics on more nuance details (which I think is very important), we provide the most customizable metrics out there. This includes researched-backed SOTA LLM-as-a-judge G-Eval for any criteria, and the recently released DAG metric that is a decision-based that is virtually deterministic despite being LLM-evaluated. This means as user's use cases get more and more specific, they can stick with our metrics and benefit from DeepEval's ecosystem as well (metric caching, cost tracking, parallelization, integrated with Pytest for CI/CD, Confident AI, etc)

There's so much more, such as generating synthetic data to get started with testing even if you don't have a prepared test set, red-teaming for safety testing (so not just testing for functionality), but I'm going to stop here for now.

tracyhenry · 2024-12-07T17:20:25 1733592025

+1

I think one rule around Show HN is that you allow people to see content without signing up, let alone paying for it. So this is a violation of that rule.

Edit: actually search is not behind the paywall (although that's not very obvious)

Jabbs · 2024-12-07T17:22:07 1733592127

You can currently search the job listings without having to sign up, which only hides the company and the link the job listings.

chipgap98 · 2024-12-07T17:25:20 1733592320

That’s definitely violating the spirit of the rule

ddgflorida · 2024-12-07T17:29:50 1733592590

Right, the data is worth nothing without additional info.

nbadg · 2024-12-07T17:34:52 1733592892

Also, there's no way to delete your account, or remove your email. Which is not just frustrating -- it's flatly against GDPR, CCPA, etc etc.

I relatively recently was looking for a new job, so I was curious to see how well my own search process compared to this. Then I was mildly annoyed to discover that they require email signup to actually view any details about the company or the job. So I give them my email. And then I was even more disappointed to discover that they require you to pay money to even see the link to a single actual job posting. Sorry, not gonna do that if I'm just trying to scratch the curiosity itch. So then I go to delete my account, and... nothing. No can do.

Honestly one of the quickest turnarounds from "oh, neat" to "jeez, what a disappointment" that I've had in recent times.

Jabbs · 2024-12-07T17:40:35 1733593235

I made a point to collect almost no information (only email) from the user (as opposed to Linkedin which asks for all sorts of data to sell) but happy to delete your account if you just email support. I also dislike having my data used or sold. That is not the purpose here, the feature to delete your account is just not yet there, appologies

Jabbs · 2024-12-09T17:39:53 1733765993

UPDATE: You can now delete your account (under the 'My Account' tab within the /settings page)

tracyhenry · 2024-11-09T22:37:30 1731191850

couldn't take this article seriously considering it doesn't mention Meta's Rayban smart glasses, which largely does what they want already: a pair of glasses without visuals but with AI in your ears

zmmmmm · 2024-11-09T23:47:15 1731196035

Yeah at this point it's almost jumping on a new hype bandwagon to "come up" with the idea of ambient audio based AI.

The kicker here though is since it's all driven by a phone in your pocket (a) it will either kill your battery or not be allowed at all by the platform, and (b) it has no camera, so it has no idea what you are actually seeing or looking at so it will be a second class citizen to all the versions of this that are camera enabled (such as, as you mention, the RayBans).

tracyhenry · 2024-09-26T16:01:26 1727366486

This is a pretty common view for Vision Pro users. Hand tracking is great for these. Can't imagine having to use a controller.

Ofc Vision Pro users are extreme minorities so you are not wrong. But I highly encourage you to try out Vision Pro if you haven't.

closewith · 2024-09-26T17:21:24 1727371284

I did buy a Vision Pro, but it's a nearly unusable device and outside of fora, I've never met anyone whose had a positive experience, so I suspect even among Vision Pro users, it's a minority opinion.

Hand tracking is not a feasible input method for routine computing.

tracyhenry · 2024-09-25T17:14:42 1727284482

This has been a great idea for decades. I want Haystack to be successful just like many other attempts. The early execution seems promising. And I suspect there will be many challenges (e.g. when it's hard to figure out caller/callee,, inconsistent UX preferences across developers, etc). Kudos for taking this on!

Btw I've always thought that this is even more powerful when the screen estate is more infinite than a 2D screen (like in a VR headset).

akshaysg · 2024-09-25T17:46:19 1727286379

I love the idea of a Haystack VR world! It's a shame that VR software is in a tenuous state due to the biological factors, but I believe it's the future "one day".

dangerlibrary · 2024-09-25T19:03:34 1727291014

"In a tenuous state due to the biological factors" is easily the funniest SV euphemism for "can't be used by humans."

IgorPartola · 2024-09-25T19:30:05 1727292605

It’s ok after they deliver the MVP there will be a wetware update.

akshaysg · 2024-09-25T19:46:38 1727293598

Who knows, one day it may be possible (hopefully without any dystopian updates to human biology)!

groby_b · 2024-09-26T00:48:28 1727311708

At this point, that seems dubious - your inner ear is going to go all inner ear on you, no matter what. Unless you get to turn that off, VR is not it.

cdchn · 2024-09-26T17:03:24 1727370204

I hope our code-editor-in-VR wouldn't involve flying around like hilarious depictions of "The Gibson" in bad sci-fi movies.

groby_b · 2024-09-26T22:36:50 1727390210

Doesn't matter. As long as you use VR to display a virtual 3D environment and you move within it, your inner ear will fight with your vision system if you're moving or not. If the visual system and the accelerometer don't agree, the positioning system throws an exception.

And, for whatever reason, the human exception handler for that problem is firmly linked to the barf() subroutine ;)

cdchn · 2024-09-27T11:26:04 1727436364

Like I said, as long as you're not flying around- moving within it, then your inner ear doesn't care. Turning your head doesn't count. I don't see a need in a code-in-VR system to move like that. And most VR games solve this my having your teleport instead of translate.

I think the barf routine is because when your brain senses your vestibular system not working it thinks "oops I must be poisoned" and tries to make you throw up.

cdchn · 2024-09-26T17:02:25 1727370145

"Fixing our eyeballs is just an engineering problem."

doctorhandshake · 2024-09-25T20:46:36 1727297196

Check out SoftSpace https://soft.space … not an IDE but similar idea for knowledge mapping

akshaysg · 2024-09-25T21:54:08 1727301248

This is actually really cool!

tracyhenry · 2024-05-11T01:20:19 1715390419

please post a photo!

pelagicAustral · 2024-05-11T12:44:01 1715431441

I was out last night, but here you go: https://ibb.co/68hPjRT

tracyhenry · 2024-05-12T00:07:50 1715472470

Thank you, this looks amazing

jiehong · 2024-05-11T20:05:47 1715457947

That’s deep red! Quite a different sight!

tracyhenry · 2024-04-18T16:41:56 1713458516

Llama 3 is the open source model while this is a product offering competing w/ ChatGPT, Claude and Gemini. Not a dupe imo.

ChrisArchitect · 2024-04-18T20:12:11 1713471131

not sure what you're talking about, this is just the playground/demo site for the new Llama 3. The discussion is over there.

tracyhenry · 2024-04-13T01:41:43 1712972503

+1 really well made. I wonder if there's a framework for making these kind of websites?

maxdaten · 2024-04-13T03:10:29 1712977829

found in the sources that babylon.js (https://www.babylonjs.com/) is in use.

janosd · 2024-04-13T05:23:02 1712985782

BabylonJS indeed. We had a tutorial up on our previous YouTube channel, if there's interest, we'll reupload it.

Atotalnoob · 2024-04-13T06:22:20 1712989340

I’d love to see it

janosd · 2024-04-13T09:18:26 1712999906

We'll upload it here today or tomorrow: https://www.youtube.com/channel/UCXF--ktsN0t97W9R0GN6_3Q

tracyhenry · 2024-04-13T15:29:38 1713022178

looking forward to it!

janosd · 2024-04-13T17:27:55 1713029275

It's up: https://youtu.be/Qw22utZ8IlY

maxdaten · 2024-04-13T18:33:51 1713033231

Awesome, thank you very much for the reupload!

tracyhenry · 2024-04-13T22:21:27 1713046887

thank you so much

tracyhenry · on Jan 9, 2024

My hot take:

ALL AI wearable companies should return their money to investors, and wait for the AR glasses by Apple.

AR glasses are the ultimate form factor. And as much as I hate monopoly, Apple has the right app/dev ecosystem, and will make Siri work.

terafo · on Jan 9, 2024

Vision Pro isn't really supposed to be used outside.

tracyhenry · on Jan 9, 2024

I'm talking about AR glasses, not Vision Pro.

daveguy · on Jan 9, 2024

Do we know an ETA on those?

whywhywhywhy · on Jan 11, 2024

The fact the Vision Pro is passthrough VR instead of an AR screen on glass (as in when the battery is dead you see black, not see the room with no AR) says that it's far away.

dweekly · on Jan 9, 2024

Or while moving.

rvz · on Jan 9, 2024

Absolutely correct.

The AR glasses race is between Apple and Meta Platforms.

This 'LLM-first phone' looks like a solution in search of a problem. But we'll see.

tracyhenry · on Dec 27, 2023

> the first being at the birth of modern search engines.

Why do you say that? Search engines would at least direct the viewer to the source. NYT gets 35%+ of its traffic from Google: https://www.similarweb.com/website/nytimes.com/#traffic-sour...

belter · on Dec 27, 2023

Just because they asked for forgiveness instead of asking first for permission, it's original sins will not be erased :-)

"Google Agrees to Pay Canadian Media for Using Their Content" - https://www.nytimes.com/2023/11/29/world/americas/google-can...

realusername · on Dec 27, 2023

That's why I think the newspapers will manage to win against the LLM companies. They won against Google despite having no real argument why they should get paid to get more traffic. The search engine tax is even a shakier concept than the LLM tax would be.

Newspapers are very powerful and they own the platform to push their opinion. I'm not about to forget the EU debates where they all (or close to all) lied about how meta tags really work to push it their way, they've done it and they will do it again.

kbos87 · on Dec 27, 2023

That doesn’t mean that it wasn’t theft of their content. The internet would be a very different place if creator compensation and low friction micropayments were some of the first principles. Instead we’re left with ads as the only viable monetization model and clickbait/misinformation as a side effect.

tracyhenry · on Dec 27, 2023

I don't quite get it. If listing your link is considered as theft, HN is then a thief of content too. If you don't want your content stolen, just tell Google to not index your website?

I guess it's more constructive to propose alternatives than just bashing the status quo. What's your creator compensation model for a search engine? I believe whatever being proposed is trading off something significant for being more ethic.