Hacker News new | past | comments | ask | show | jobs | submit | mayank's comments login

This seems odd. If your scribe can lie in complex and sometimes hard to detect ways, how do you not see some form of risk? What happens when (not if) your scribe misses something and real world damages ensue as a result? Are you expecting your users to cross check every report? And if so, what’s the benefit of your product?

We rely on multimodal input: the voiceover from the superintendent, as well as the video input. The two essentially cross check one another, so we think the likelihood of lies or hallucinations is incredibly low.

Superintendents usually still check and, if needed, edit/enrich Fresco’s notes. Editing is way faster/easier than generating notes net new, so even in the extreme scenario where a supe needs to edit every single note, they’re still saving ~90% of the time it’d otherwise have taken to generate those notes and compile them into the right format.


Even just audio transcription can hallucinate in bizarre ways. https://arstechnica.com/ai/2024/10/hospitals-adopt-error-pro...

> They need to reword this. Whoever wrote it is a liability

Sounds like it’s been written specifically to avoid liability.


I'm sure it was lawyers. It's always lawyers.


Yes, lawyers do tend to have a part to play in writing things that present a legally binding commitment being made by an organisation. Developers really can’t throw stones from their glass houses here. How many of you have a pre-canned spiel explaining why the complexities of whichever codebase you spend your days on are ACTUALLY necessary, and are certainly NOT the result of over-engineering? Thought so.


> How many of you have a pre-canned spiel explaining why the complexities of whichever codebase you spend your days on are ACTUALLY necessary, and are certainly NOT the result of over-engineering? Thought so.

Hm, now you mention it, I don't think I've ever seen this specific example.

Not that we don't have jargon that's bordering on cant, leading to our words being easily mis-comprehended by outsiders: https://i.imgur.com/SL88Z6g.jpeg

Canned cliches are also the only thing I get whenever I try to find out why anyone likes the VIPER design pattern — and that's despite being totally convinced that (one of) the people I was talking to, had genuinely and sincerely considered my confusion and had actually experimented with a different approach to see if my point was valid.


Sure lawyers wrote it but I'd bet a lot there's a product or business person standing behind the lawyer saying - "we want to do this but don't be so obvious about it because we don't want to scare users away". And so lawyers would love to be very upfront about what is happening because that's the best way to avoid liability. However, that conflicts with what the business wants, and because the lawyer will still refuse to write anything that's patently inaccurate, you end up with a weasel word salad that is ambiguous and unhelpful.


Very interesting! I wonder to what extent this assumption is true in tying completions to traditional code autocomplete.

> One of the biggest constraints on the retrieval implementation is latency

If I’m getting a multi line block of code written automagically for me based on comments and the like, I’d personally value quality over latency and be more than happy to wait on a spinner. And I’d also be happy to map separate shortcuts for when I’m prepared to do so (avoiding the need to detect my intent).


This is great feedback and something we are looking at in regards to Cody. We value developer choice and at the moment for Chat developers can choose between various LLM models (Claude 3 Opus, GPT 4-Turbo, Mixtral 8x7b) that offer different benefits.

For autocomplete, at the moment we only support Starcoder because it has given us the best return on latency + quality, but we'd def love to support (and give users the choice to set an LLM of their choice, so if they prefer waiting longer for higher quality results, they should be able to)

You can do that with our local Ollama support, but that's still experimental and YMMV. Here's how to set it up: https://sourcegraph.com/blog/local-code-completion-with-olla...


> We value developer choice and at the moment for Chat developers can choose between various LLM models (Claude 3 Opus, GPT 4-Turbo, Mixtral 8x7b) that offer different benefits.

I wish y'all would put a little more effort into user experience. When you go to subscribe it says:

> Claude Instant 1.2, Claude 2, ChatGPT 3.5 Turbo, ChatGPT 4 Turbo Preview

Trying to figure out what's supported was tedious enough[0] that I just ended up renewing my Copilot subscription instead.

[0] Your contact page for "information about products and purchasing" talks about scheduling a meeting. One of your welcome emails points us to your Discord but then your Discord points us to your forum.


Thank you for the feedback. I totally agree there and we'll address this.


> aws s3 bucket needs to match the domain for website hosting.

This is outdated information, and not required anymore when using CloudFront.

And even in the past, you could use the S3 API to implement a reverse proxy without matching bucket and domain names.


This is a pretty standard measure called the Trimmed Mean: https://statisticsbyjim.com/basics/trimmed-mean/


Variability in software runtime arises mostly from other software running on the same system.

If you are looking for a real-world, whole-system benchmark (like a database or app server), then taking the average makes sense.

If you are benchmarking an individual algorithm or program and its optimisations, then taking the fastest run makes sense - that was the run with least external interference. The only exception might be if you want to benchmark with cold caches, but then you need to reset these carefully between runs as well.


For performance benchmarking the minimal runtime is typically the best estimator if the computations are identical, cause it measures perf w/o interrupts.

If the language is garbage collected, or if the test is randomized you obviously don't want to look at the minimum.


> the minimal runtime is typically the best estimator

Depends what you’re estimating. The minimum is usually not representative of “real world” performance, which is why we use measures of central tendency over many runs for performance benchmarks.


Very debatable, if both are done right.


It is in industry, but you may be shocked if you read “research code”


> You don't actually write code with e2b. You write technical specs and then collaborate with an AI agent.

If I want to change 1 character of a generated source file, can I just go do that or will I have to figure out how to prompt the change in natural language?


I'm sure there would be a way to edit the artifacts... otherwise this would be a constant exercise in frustration!


I feel like this would be like trying to edit a confluence page.

It's in the frustration valley between WYSIWYG and just writing code. The worst of both worlds.


So, not too different from typical real world coding tasks today.


> Because there are many services (each with their own readiness criteria), a cold boot takes a long time until all services stabilize (worst case I've seen was over 30 minutes). With hibernation they can resume where they started off within a couple of minutes.

This is exactly what we use hibernation for in conjunction with EC2 Warm Pools -- fast autoscaling of services that have long boot times. There's an argument to be made that fixing slow boots should be the "correct" solution, but in large enough organizations, hibernated instances are a convenient workaround to buy you some time to navigate the organizational dynamics (and technical debt) that lead to the slow boot times in the first place.


This is a wonderful article, architecture, and project. Can anyone from Clickhouse comment on any non-technical factors that allowed such a rapid pace of development, e.g. team size, structure, etc.?


Couple of things- 1. Hiring the best engineers globally - this was huge as we are a distributed company and can hire the best talent anywhere in the world. 2. Flat team structure specially in the first year with a concept of verticals(technical areas - like autoscaling, security, proxy layer, operations etc) and having one engineer own and drive these verticals. 3. Lot of cross team collaboration and communication (across product, engineering, business, sales) 4. Lastly as mentioned in the blog post, it was very important for us to stick to the milestone based sprints for faster development and product team helped a lot to prioritize the right features so that engineering could deliver.


Passion and experience


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: