Hacker Newsnew | past | comments | ask | show | jobs | submit | mcv's commentslogin

I've also noticed that going off the rails. At the start of a session, they're pretty sharp and focused, but the longer the session lasts, the more confused they get. At some point they start hallucinating bullshit that they wouldn't have earlier in the session.

It's a vital skill to recognise when that happens and start a new session.


I've noticed. I'm already through 48% of my quota for this month.

As a tool to help developers I think it's really useful. It's great at stuff people are bad at, and bad at stuff people are good at. Use it as a tool, not a replacement.

Opus 4.5 ate through my Copilot quota last month, and it's already halfway through it for this month. I've used it a lot, for really complex code.

And my conclusion is: it's still not as smart as a good human programmer. It frequently got stuck, went down wrong paths, ignored what I told it to do to do something wrong, or even repeat a previous mistake I had to correct.

Yet in other ways, it's unbelievably good. I can give it a directory full of code to analyze, and it can tell me it's an implementation of Kozo Sugiyama's dagre graph layout algorithm, and immediately identify the file with the error. That's unbelievably impressive. Unfortunately it can't fix the error. The error was one of the many errors it made during previous sessions.

So my verdict is that it's great for code analysis, and it's fantastic for injecting some book knowledge on complex topics into your programming, but it can't tackle those complex problems by itself.

Yesterday and today I was upgrading a bunch of unit tests because of a dependency upgrade, and while it was occasionally very helpful, it also regularly got stuck. I got a lot more done than usual in the same time, but I do wonder if it wasn't too much. Wasn't there an easier way to do this? I didn't look for it, because every step of the way, Opus's solution seemed obvious and easy, and I had no idea how deep a pit it was getting me into. I should have been more critical of the direction it was pointing to.


Copilot and many coding agents truncates the context window and uses dynamic summarization to keep costs low for them. That's how they are able to provide flat fee plans.

You can see some of the context limits here:

https://models.dev/

If you want the full capability, use the API and use something like opencode. You will find that a single PR can easily rack up 3 digits of consumption costs.


Gerring off of their plans and prompts is so worth it, I know from experience, I'm paying less and getting more so far, paying by token, heavy gemini-3-flash user, it's a really good model, this is the future (distillations into fast, good enough for 90% of tasks), not mega models like Claude. Those will still be created for distillations and the harder problems

Maybe not, then. I'm afraid I have no idea what those numbers mean, but it looks like Gemini and ChatGPT 4 can handle a much larger context than Opus, and Opus 4.5 is cheaper than older versions. Is that correct? Because I could be misinterpreting that table.

I don't know about GPT4 but the latest one (GPT 5.2) has 200k context window while Gemini has 1m, five times higher. You'll be wanting to stay within the first 100k on all of them to avoid hitting quotas very quickly though (either start a new task or compact when you reach that) so in practice there's no difference.

I've been cycling between a couple of $20 accounts to avoid running out of quota and the latest of all of them are great. I'd give GPT 5.2 codex the slight edge but not by a lot.

The latest Claude is about the same too but the limits on the $20 plan are too low for me to bother with.

The last week has made me realize how close these are to being commodities already. Even the CLI the agents are nearly the same bar some minor quirks (although I've hit more bugs in Gemini CLI but each time I can just save a checkpoint and restart).

The real differentiating factor right now is quota and cost.


> You'll be wanting to stay within the first 100k on all of them

I must admit I have no idea how to do that or what that even means. I get that bigger context window is better, but what does it mean exactly? How do you stay within that first 100k? 100k what exactly?


Okay, here's the tl;dr:

Attention based neural network architectures (on which the majority of LLMs are built) has a unit economic cost that scales (roughly) n^2 i.e. quadratic (for both memory and compute). In other words, the longer the context window, the more expensive it is for the upstream provider. That's one cost.

The second cost is that you have to resend the entire context every time you send a new message. So the context is basically (where a, b, and c are messages): first context: a, second context window: a->b, third context window: a->b->c. It's a mostly stateless (there are some short term caching mechanisms, YMMV based on provider, it's why "cached" messages, especially system prompts are cheaper) process from the point of view of the developer, the state i.e. context window string is managed by the end user application (in other words, the coding agent, the IDE, the ChatGPT UI client etc.)

The per token cost is an amortized (averaged) cost of memory+compute, the actual cost is mostly quadratic with respect to each marginal token. The longer the context window the more expensive things are. Because of the above, AI agent providers (especially those that charge flat fee subscription plans) are incentivized to keep costs low by limiting the maximum context window size.

(And if you think about it carefully, your AI API costs are a quadratic cost curve projected into a linear line (flat fee per token, so the model hosting provider in some cases may make more profit if users send in shorter contexts, versus if they constantly saturate the window. YMMV of course, but it's a race to the bottom right now for LLM unit economics)

They do this by interrupting a task halfway through and generating a "summary" of the task progress, then they prompt the LLM again with a fresh prompt and the "summary" so far and the LLM will restart the task from where it left of. Of course text is a poor representation of the LLM's internal state but it's the best option so far for AI application to keep costs low.

Another thing to keep in mind is that LLMs have poorer performance the larger the input size. This is due to a variety of factors (mostly because you don't have enough training data to saturate the massive context window sizes I think).

The general graph for LLM context performance looks something like this: https://cobusgreyling.medium.com/llm-context-rot-28a6d039965... https://research.trychroma.com/context-rot

There are a bunch of tests and benchmarks (commonly referred to as "needle in a haystack") to improve the LLM performance at large context window sizes, but it's still an open area of research.

https://cloud.google.com/blog/products/ai-machine-learning/t...

The thing is, generally speaking, you will get a slightly better performance if you can squeeze all your code and problem into the context window, because the LLM can get a "whole picture" view of your codebase/problem, instead of a bunch of broken telephone summaries every dozen of thousands of tokens. Take this with a grain of salt as the field is changing rapidly so it might not be valid in a month or two.

Keep in mind that if the problem you are solving requires you to saturate the entire context window of the LLM, a single request can cost you dollars. And if you are using 1M+ context window model like gemini, you can rack up costs fairly rapidly.


Using Opus 4.5, I have noticed that in long sessions about a complex topic, there often comes a point when Opus starts spouting utter gibberish. One or two questions earlier it was making total sense, and suddenly it seems to have forgotten everything and responds in a way that barely relates to the question I asked, and certainly not to the "conversation" we were having.

Is that a sign of having having surpassed that context window size? I guess to keep them sharp, I should start a new session often and early.

From what I understand, a token is either a word or a character, so I can use 100k words or characters before I start running into limits. But I've got the feeling that the complexity of the problem itself also matters.


It could have exceeded either its real context window size (or the artificially truncated one) and the dynamic summarization step failed to capture the important bits of information you wanted. Alternatively, the information might be stored in certain places in the context window where it failed to perform well in needle in haystack retrieval.

This is part of the reason why people use external data stores (e.g. vector databases, graph tools like Bead etc. in the hope of supplementing the agent's native context window and task management tools).

https://github.com/steveyegge/beads

The whole field is still in its infancy. Who knows, maybe in another update or two the problem might just be solved. It's not like needle in the haystack problems aren't differentiable (mathematically speaking).


You need to find where context breaks down, Claude was better at it even when Gemini had 5X more on paper, but both have improved with last releases.

People are completely missing the points about agentic development. The model is obviously a huge factor in the quality of the output, but the real magic lies in how the tools are managing and injecting context in to them, as well as the tooling. I switched from Copilot to Cursor at the end of 2025, and it was absolute night and day in terms of how the agents behaved.

Interesting you have this opinion yet you're using Cursor instead of Claude Code. By the same logic, you should get even better results directly using Anthropic's wrapper for their own model.

My employer doesn't allow for Claude Code yet. I'm fully aware from speaking to other peers, that they are getting even better performance out of Claude Code.

In my experience GPT-5 is also much more effective in the Cursor context than the Codex context. Cursor deserves props for doing something right under the hood.

yes just using AI for code analysis is way under appreciated I think. Even the most sceptical people on using it for coding should try it out as a tool for Q&A style code interrogation as well as generating documentation. I would say it zero-shots documentation generation better than most human efforts would to the point it begs the question of whether it's worth having the documentation in the first place. Obviously it can make mistakes but I would say they are below the threshold of human mistakes from what I've seen.

(I haven't used AI much, so feel free to ignore me.)

This is one thing I've tried using it for, and I've found this to be very, very tricky. At first glance, it seems unbelievably good. The comments read well, they seem correct, and they even include some very non-obvious information.

But almost every time I sit down and really think about a comment that includes any of that more complex analysis, I end up discarding it. Often, it's right but it's missing the point, in a way that will lead a reader astray. It's subtle and I really ought to dig up an example, but I'm unable to find the session I'm thinking about.

This was with ChatGPT 5, fwiw. It's totally possible that other models do better. (Or even newer ChatGPT; this was very early on in 5.)

Code review is similar. It comes up with clever chains of reasoning for why something is problematic, and initially convinces me. But when I dig into it, the review comment ends up not applying.

It could also be the specific codebase I'm using this on? (It's the SpiderMonkey source.)


My main experience is with anthropic models.

I've had some encounters with inaccuracies but my general experience has been amazing. I've cloned completely foreign git repos, cranked up the tool and just said "I'm having this bug, give me an overview of how X and Y work" and it will create great high level conceptual outlines that mean I can drive straight in where without it I would spend a long time just flailing around.

I do think an essential skill is developing just the right level of scepticism. It's not really different to working with a human though. If a human tells me X or Y works in a certain way i always allow a small margin of possibility they are wrong.


But have you actually thoroughly checked the documentation it generated? My experience suggests it can often be subtly wrong.

They do have a knack for missing the point. Even Opus 4.5 can laser focus on the wrong thing. It does take skill and experience to interpret them correctly and set them straight when they go wrong.

Even so, for understanding what happens in a big chunk of code, they're pretty great.


>So my verdict is that it's great for code analysis, and it's fantastic for injecting some book knowledge on complex topics into your programming, but it can't tackle those complex problems by itself.

I don't think you've seen the full potential. I'm currently #1 on 5 different very complex computer engineering problems, and I can't even write a "hello world" in rust or cpp. You no longer need to know how to write code, you just need to understand the task at a high level and nudge the agents in the right direction. The game has changed.

- https://highload.fun/tasks/3/leaderboard

- https://highload.fun/tasks/12/leaderboard

- https://highload.fun/tasks/15/leaderboard

- https://highload.fun/tasks/18/leaderboard

- https://highload.fun/tasks/24/leaderboard


All the naysayer here have clearly no idea. Your large matrix multiplication implementation is quite impressive! I have set up a benchmark loop and let GPT-5.1-Codex-Max experiment for a bit (not 5.2/Opus/Gemini, because they are broken in Copilot), but it seems to be missing something crucial. With a bit of encouragement, it has implemented:

    - padding from 2000 to 2048 for easier power-of-two splitting
    - two-level Winograd matrix multiplication with tiled matmul for last level
    - unrolled AVX2 kernel for 64x64 submatrices
    - 64 byte aligned memory
    - restrict keyword for pointers
    - better compiler flags (clang -Ofast -march=native -funroll-loops -std=c++17)
But yours is still easily 25 % faster. Would you be willing to write a bit about how you set up your evaluation and which tricks Claude used to solve it?

Thank you. Yeah, I'm doing all those things, which do get you close to the top. The rest of things I'm doing are mostly micro-optimizations such as finding a way to avoid AVX→SSE transition penalty (1-2% improvement).

But I don't want to spoil the fun. The agents are really good at searching the web now, so posting the tricks here is basically breaking the challenge.

For example, chatGPT was able to find Matt's blog post regarding Task 1, and that's what gave me the largest jump: https://blog.mattstuchlik.com/2024/07/12/summing-integers-fa...

Interestingly, it seems that Matt's post is not on the training data of any of the major LLMs.


How are you qualified to judge its performance on real code if you don't know how to write a hello world?

Yes, LLMs are very good at writing code, they are so good at writing code that they often generate reams of unmaintainable spaghetti.

When you submit to an informatics contest you don't have paying customers who depend on your code working every day. You can just throw away yesterday's code and start afresh.

Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash.


I know what's like running a business, and building complex systems. That's not the point.

I used highload as an example because it seems like an objective rebuttal to the claim that "but it can't tackle those complex problems by itself."

And regarding this:

"Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash"

Again, a combination of LLM/agents with some guidance (from someone with no prior experience in this type of high performing architecture) was able to beat all human software developers that have taken these challenges.


> Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash.

The skill of "a human software developer" is in fact a very wide distribution, and your statement is true for a ever shrinking tail end of that


> How are you qualified to judge its performance on real code if you don't know how to write a hello world?

The ultimate test of all software is "run it and see if it's useful for you." You do not need to be a programmer at all to be qualified to test this.


What I think people get wrong (especially non-coders) is that they believe the limitation of LLMs is to build a complex algorithm. That issue in reality was fixed a long time ago. The real issue is to build a product. Think about microservices in different projects, using APIs that are not perfectly documented or whose documentation is massive, etc.

Honestly I don't know what commenters on hackernews are building, but a few months back I was hoping to use AI to build the interaction layer with Stripe to handle multiple products and delayed cancellations via subscription schedules. Everything is documented, the documentation is a bit scattered across pages, but the information is out there. At the time there was Opus 4.1, so I used that. It wrote 1000 lines of non-functional code with 0 reusability after several prompts. I then asked something to Chat gpt to see if it was possible without using schedules, it told me yes (even if there is not) and when I told Claude to recode it, it started coding random stuff that doesn't exist. I built everything to be functional and reusable myself, in approximately 300 lines of code.

The above is a software engineering problem. Reimplementing a JSON parser using Opus is not fun nor useful, so that should not be used as a metric


> The above is a software engineering problem. Reimplementing a JSON parser using Opus is not fun nor useful, so that should not be used as a metric.

I've also built a bitorrent implementation from the specs in rust where I'm keeping the binary under 1MB. It supports all active and accepted BEPs: https://www.bittorrent.org/beps/bep_0000.html

Again, I literally don't know how to write a hello world in rust.

I also vibe coded a trading system that is connected to 6 trading venues. This was a fun weekend project but it ended up making +20k of pure arbitrage with just 10k of working capital. I'm not sure this proves my point, because while I don't consider myself a programmer, I did use Python, a language that I'm somewhat familiar with.

So yeah, I get what you are saying, but I don't agree. I used highload as an example, because it is an objective way of showing that a combination of LLM/agents with some guidance (from someone with no prior experience in this type of high performing architecture) was able to beat all human software developers that have taken these challenges.


This hits the nail on the head. There's a marked difference between a JSON parser and a real world feature in a product. Real world features are complex because they have opaque dependencies, or ones that are unknown altogether. Creating a good solution requires building a mental model of the actual complex system you're working with, which an LLM can't do. A JSON parser is effectively a book problem with no dependencies.

You are looking at this wrong. Creating a json parser is trivial. The thing is that my one-shot attempt was 10x slower than my final solution.

Creating a parser for this challenge that is 10x more efficient than a simple approach does require deep understanding of what you are doing. It requires optimizing the hot loop (among other things) that 90-95% of software developers wouldn't know how to do. It requires deep understanding of the AVX2 architecture.

Here you can read more about these challenges: https://blog.mattstuchlik.com/2024/07/12/summing-integers-fa...


You need to give it search and tool calls and the ability to test its own code and iterate. I too could not oneshot an interaction layer with Stripe without tools. It also helps to make it research a plan beforehand.

If that is true; then all the commentary around software people having jobs still due to "taste" and other nice words is just that. Commentary. In the end the higher level stuff still needs someone to learn it (e.g. learning ASX2 architecture, knowing what tech to work with); but it requires IMO significantly less practice then coding which in itself was a gate. The skill morphs more into a tech expert rather than a coding expert.

I'm not sure what this means for the future of SWE's though yet. I don't see higher levels of staff in big large businesses bothering to do this, and at some scale I don't see founders still wanting to manage all of these agents, and processes (got better things to do at higher levels). But I do see the barrier of learning to code gone; meaning it probably becomes just like any other job.


None of the problems you've shown there are anything close to "very complex computer engineering problems", they're more like "toy problems with widely-known solutions given to students to help them practice for when they encounter actually complex problems".

>I'm currently #1 on 5 different very complex computer engineering problems

Ah yes, well known very complex computer engineering problems such as:

* Parsing JSON objects, summing a single field

* Matrix multiplication

* Parsing and evaluating integer basic arithmetic expressions

And you're telling me all you needed to do to get the best solution in the world to these problems was talk to an LLM?


Lol, the problem is not finding a solution, the problem is solving it in the most efficient way.

If you think you can beat an LLM, the leaderboard is right there.


It acts differently when using it through a third party tool

Try it again using Claude Code and a subscription to Claude. It can run as a chat window in VS Code and Cursor too.


My employer gets me a Copilot subscription with access to Claude, not a subscription to Claude Code, unfortunately.

at this point I would suggest getting a $20 subscription to start, seeing if you can expense it

the tooling is almost as important as the model


Security and approval is considered more important here. Just getting approval for neo4j on the clearest ever use case for it, took a year. I'm not going to spend my energy on getting approval for Claude Code.

Get it for yourself on your personal computer

Point it at your unfinished side projects if any and describe what the project was supposed to do

You need to be able to perceive how far behind you’re falling while simping for corporate policies


> Opus 4.5 ate theough my Copilot quota last month

Sure, Copilot charges 3x tokens for using Opus 4.5, but, how were you still able to use up half the allocated tokens not even one week into January?

I thought using up 50% was mad for me (inline completions + opencode), that's even worse


I have no idea. Careless use, I guess. I was fixing a bunch of mocks in some once-great but now poorly maintained code, and I wasn't really feeling it so I just fed everything to Claude. Opus, unfortunately. I could easily have downgraded a bit.

If it can consistently verify that the error persists after fix--you can run (ok maybe you can't budget wise but theoretically) 10000 parallel instances of fixer agents then verify afterwards (this is in line with how the imo/ioi models work according to rumors)

I don't want to take my boss's place, though. I don't want to manage people, I want to address problems on a broader scope than I'm currently in a position to do.

I've been using it very recently. Not for mathematics , but for programming. And while Claude Opus is much more likely to admit mistake ("You're absolutely correct!" instead of "That's fine") when I correct it, it does require correcting, and has been incapable at grasping complex problems. I can't trust it to produce correct code for complex problems, because when I did, the solution turned out to be wrong. Plausible looking, and certainly producing some results, but not the correct ones.

It has still been useful, because parts of the problem have been solved before, and in some cases it can accurately reproduce those parts, which has been massively helpful. And it can help me understand some aspects of the problem. But it can't solve it for me. It has shown that that's simply beyond its capabilities.


Out with the old, in with the new, doesn't have to be bad, but it depends on what your old and new are. I'd be a lot less skeptical about migrating OS-level sttuff from C to Rust than from C to React.

If the motivation is "Because I refuse to learn C", then both approaches will be bad. You can't avoid understanding what you're migrating, but seemingly Microsoft thinks they're above that. Fits with the average mindset of developers within the Windows ecosystem, at least from my experience.

Totally agreed, I have learned a lot of technologies to understand legacy systems. Either you run them or to migrate away from them. If you do not learn and respect the legacy system your migration is bound to fail.

I'm not buying vinyl records, but I still have a ton of them, and my ancient record player broke down ages ago. Similarly, I've got tons of CDs I'm not using anymore. The fate of old media, I guess. But I do miss selecting a specific album to listen to. Spotify is not the same.

I guess I'm in the market for a new record player. Is that market picking up again?


Does AMD not support Display Port? I'm not an expert on this, but that sounds to me like the superior technology.

TVs don't support displayport, so it makes Linux PCs like the Steam Machine inferior console replacements if you want high refresh rates. A lot of TVs now support 4K/120hz with VRR, the PS5 and Xbox Series X also support those modes.

(Some games support 120, but it's also used to present a 40hz image in a 120hz container to improve input latency for games that can't hit 60 at high graphics quality.)


Why don't TVs support displayport? If HDMI 2.1 support is limited, a TV with displayport sounds like an obvious choice.

I thought audio might be the reason, for as far as I can tell, displayport supports that too.


Legacy is a bitch.

It took a long time to move from the old component input over to HDMI. The main thing that drove it was the SD to HD change. You needed HDMI to do 1080p (I believe, IDK that component ever supported that high of a resolution).

Moving from HDMI to display port is going to be the same issue. People already have all their favorite HDMI devices plugged in and setup for their TVs.

You need a feature that people want which HDMI isn't or can't provide in order to incentivize a switch.

For example, perhaps display port could offer something like power delivery. That could allow things like media sticks to be solely powered by the TV eliminating some cable management.


The legacy issue is even worse than that. I have a very new Onkyo RZ30 receiver and it is all HDMI with no DisplayPort to be seen. So it is the whole ecosystem including the TV that would need to switch to DP support.

> For example, perhaps display port could offer something like power delivery.

It already does. A guaranteed minimum of 1.65W at 3.3V is to be provided. Until very recently, HDMI only provided a guaranteed minimum of something like 0.25W at 5V.


It's not nothing, but it's also very little to play with.

5W is what I'd think is about the minimum for doing something useful. 25W would actually be usable by a large swath of devices. The raspberry pi 4, for example, has a 10W requirement. Amazon's fire stick has ~5W requirement.


> It's not nothing, but it's also very little to play with.

Sure. But it's ~6.6x more than what HDMI has historically guaranteed. It's pretty obvious to anyone with two neurons to spark together that the problem here isn't "amount of power you can suck out of the display port". If it were, DP would have swept away HDMI ages ago.


> It's pretty obvious to anyone with two neurons to spark together that the problem here isn't "amount of power you can suck out of the display port".

Nobody said it was.

I gave that out as and example of a feature that DP might adopt in order to sway TV manufacturers and media device manufactures to adopt it.

But not for nothing, 0.25W and 1.67W are virtually the same thing in terms of application. Just because it's "6.6x more" doesn't mean that it's usable. 0.25W is 25x more than 0.01W, that doesn't make it practically usable for anything related to media.


> But not for nothing, 0.25W and 1.67W are virtually the same thing in terms of application.

You really can't power an HDMI (or DisplayPort) active cable on 0.25W. You can on 1.67W. This is why in mid-June 2025 the HDMI consortium increased the guaranteed power to 1.5W at 5V. [0] It looks pretty bad when active DP cables (and fiber-optic DP cables) never require external power to function, but (depending on what you plug it into) the HDMI version of the same thing does.

> Nobody said it was.

You implied that it was in a bit of sophistry that's the same class as the US Federal Government saying "Of course States' compliance with this new Federal regulation is completely voluntary: we cannot legally require them to comply. However, we will be withholding vital Federal funds from those States that refuse to comply. As anyone can plainly see, their compliance is completely voluntary!".

DP 1.4 could have offered 4kW over its connector and TVs would still be using HDMI. Just as Intel and Microsoft ensured the decades-long reign of Wintel prebuilt machines [1], it's consortium that controls the HDMI standard that's actively standing in the way of DP deploying in the "home theater".

[0] "HDMI 2.1b, Amendment 1 adds a new feature: HDMI Cable Power. With this feature, active HDMI® Cables can now be powered directly from the HDMI Connector, without attaching a separate power cable." from: <https://web.archive.org/web/20250625155950/https://www.hdmi....>

[1] The Intel part is the truly loathsome part. I care a fair bit less about Microsoft's dirty dealings here.


> You implied that it was in a bit of sophistry that's the same class as the US Federal Government saying "Of course States' compliance with this new Federal regulation is completely voluntary

This is a very bad faith interpretation of my comment. I did not imply it and I'm not trying to use CIA tricks to make people implement it as a feature.

Are you upset that I gave an example?


Sophistry might have been considered a CIA-grade trick ~2,500 years ago, but it's pretty well known by now.

I think it's not really an issue for 95-99% of users who uses devices with non open source drivers so there is no incentive for manufacturers to add it?

Tell Valve that it isn't an issue. They have built in hardware support for HDMI 2.1 on the new Steam Machine but can't support it in software.

> Why don't TVs support displayport?

For the same sorts of reasons that made it so for decades nearly every prebuilt PC shipped with an Intel CPU and Windows preinstalled: dirty backroom dealings. But in this case, the consortium that controls HDMI are the ones doing the dealings, rather than Intel and Microsoft.

"But Displayport doesn't implement the TV-control protocols that I use!", you say. That's totally correct, but DisplayPort has the out-of-band control channel needed to implement that stuff. If there had been any real chance of getting DisplayPort on mainstream TVs, then you'd see those protocols in the DisplayPort standard, too. As it stands now, why bother supporting something that will never, ever get used?

Also, DP -> HDMI active adapters exist. HDR is said to work all the time, and VRR often works, but it depends on the specifics of the display.


Correction, you can get 4K@120hz with HDMI 2.0, but you won't get full chroma 4:4:4, instead 4:2:0 will be forced.

In my case I have an htpc running linux and a radeon 6600 connected via hdmi to a 4k @ 120hz capable tv, and honestly, at the sitting distance/tv size and using 2x dpi scaling you just can't tell any chroma sub-sampling is happening. It is of course a ginormous problem when on a desktop setting and even worse if you try using 1x dpi scaling.

What you will lose however is the newer forms of VRR, and it may be unstable with lots of dropouts.


Do consoles support anything above 60 FPS?

My PS5 can do 4k/120 hz with VRR support, not sure about the others.

I'm bit puzzled, isn't VRR more for low powered hardware to consume less battery (handhelds like steam deck)? How does it fit hardware that is constantly connected to power?

(I assume VRR = Variable Refresh Rate)


Variable refresh rate is nice when your refresh rate doesn't match your output. Especially when you're getting into higher refresh rates. So if your display is running at 120hz, but you're only outputting 100hz: you cannot fit 100 frames evenly into 120 frames. 1/6 of your frames will have to be repeats of other frames, and in an inconsistent manner. Usually called judder.

Most TVs will not let you set the refresh rate to 100hz. Even if my computer could run a game at 100hz, without VRR, my choices are either lots of judder, or lowering it to 60hz. That's a wide range of possible refresh rates you're missing out on.

V-Sync and console games will do this too at 60hz. If you can't reach 60hz, cap the game at 30hz to prevent judder that would come from anything in between 31-59. The Steam Deck actually does not support VRR. Instead the actual display driver does support anything from 40-60hz.

This is also sometimes an issue with movies filmed at 24hz on 60hz displays too: https://www.rtings.com/tv/tests/motion/24p


It reduces screen tearing without adding all the latency that vsync introduces.

VRR is necessary to avoid tearing or FPS caps (V-sync) when your hardware cannot stably output FPS count matching the screen refresh rate.

Are there games running at 4k 120hz?

Call of Duty and Battlefield both run at 4K@120 with dynamic resolution scaling, PSSR or FSR.

Most single player games (Spider-Man, God of War, Assassin's Creed etc) will allow a balanced graphics/performance which does 40 in a 120hz refresh.


Full 4k - very few, but lots are running adaptive resolutions at > 2k and at 120hz

Touryst renders the game at 4K120 or 8k60. In the latter case, the image is subsampled to 4K output.

Or just stick to regular sims. They're very reliable, esims are not.

I don't really see the case for eSims. In theory they could save a bit of time: activating immediately upon ordering your subscription online, instead of waiting days for delivery, but my new telco still sends you a plastic card with a QR code to scan before you can download it, completely nullifying that advantage. Besides, when you want to keep your number, you can rarely activate the new subscription anyway. On top of that, there's too much that can go wrong, and recovering from problems is harder than with a regular sim which just works.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: