Hacker Newsnew | past | comments | ask | show | jobs | submit | Juminuvi's commentslogin

I know you say you don't use the paid apis, but renting a gpu is something I've been thinking about and I'd be really interested in knowing how this compares with paying by the token. I think gpt-oss-120b is 0.10/input 0.60/output per million tokens in azure. In my head this could go a long way but I haven't used gpt oss agentically long enough to really understand usage. Just wondering if you know/be willing to share your typical usage/token spend on that dedicated hardware?


For comparison, here's my own usage with various cloud models for development:

  * Claude in December: 91 million tokens in, 750k out
  * Codex in December: 43 million tokens in, 351k out
  * Cerebras in December: 41 million tokens in, 301k out
  * (obviously those figures above are so far in the month only)
  * Claude in November: 196 million tokens in, 1.8 million out
  * Codex in November: 214 million tokens in, 4 million out
  * Cerebras in November: 131 million tokens in, 1.6 million out
  * Claude in October: 5 million tokens in, 79k out
  * Codex in October: 119 million tokens in, 3.1 million out
As for Cerebras in October, I don't have the data because they don't show the Qwen3 Coder model that was deprecated, but it was way more: https://blog.kronis.dev/blog/i-blew-through-24-million-token...

In general, I'd say that for the stuff I do my workloads are extremely read heavy (referencing existing code, patterns, tests, build and check script output, implementation plans, docs etc.), but it goes about like this:

  * most fixed cloud subscriptions will run out really quickly and will be insufficient (Cerebras being an exception)
  * if paying per token, you *really* want the provider to support proper caching, otherwise you'll go broke
  * if you have local hardware that is great, but it will *never* compete with the cloud models, so your best bet is to run something good enough, basically cover all of your autocomplete needs, and also with tools like KiloCode an advanced cloud model can do the planning and a simpler local model do the implementation, then the cloud model validate the output


This is the perfect use case for local models. It's why we set out to create cortex.build! A local LLM


Sorry, I don't much track or keep up with those specifics other than knowing I'm not spending much per week. My typical scenario is to spin up an instance that costs less than $2/hr for 2-4 hours. It's all just exploratory work really. Sometimes I'm running a script that is making a call to the LLM server api, other times I'm just noodling around in the web chat interface.


I noticed something similar. I asked it extract a guid from an image and it wrote a python script to run ocr against it...and got it wrong. Prompting a bit more seemed to finally trigger it to use it's native image analysis but I'm not sure what the trick was.


Probably just ask it to use native image analysis versus writing code. I have done this before extracting usernames from screenshots.


I've run into this with uploading audio and text files, have to yell at it to not write any code and use it's native abilities to do the job.


I always assumed the folks who intentionally do this either work for the company, are associated with the company, or are in some way part of q/a pilot user group.


Neat project! I use a tool called taskfile.dev for managing tasks like this in projects but I've always thought it would be nice to have a more global tool like this.

Right now what I do is write functions and add them to my profile but this leads to adding helper code to do stuff like handle errors, handle vars, printing out index of commends with documentation, etc. This makes my profile just messy so I think a tool like this could come in handy.


Thanks :)

taskfile.dev looks quite useful for project specific stuff.


> Sure, they could try. And the Supreme Court would strike down the law they pass.

Isn't the balance to this that congress and the executive branch can restructure the courts? Eg. add justices, set term limits, etc.


I think the article just sited an anecdotal example from one user who had Firefox installed. Maybe others with Chrome see Chrome instead? Or maybe ms is trying to play nice with Google to make edge chromium development easier.


Don't they? Don't they push out new apps which get installed and pinned to the Mac dock after almost every other update without asking first? I could be mistaken, I don't use macs every day.


I think pretty much. If you have a PC that has ever have any form of activated windows on it in the past it should activate a windows 10 install. So this would prob only exclude people building on new hardware which are considered oem anyway.


Interesting! I knew that med school applicants were accepted on some sort of ranking and matching process but didn't realize there was a real algorithm behind it. This, or one similar, seems to be what's behind it.


There's no matching process at the beginning of medical school, but there is for assigning med school graduates to residencies. This variant of the problem is NP-hard, so there's no exact solution, but matching still works pretty well. https://web.stanford.edu/~alroth/papers/rothperansonaer.PDF has all the details.


When you are referring to this "variant", are you referring to maximal matching, or bipartite matching when the number of people and schools don't equal?


Not the parent, but I think they refer to the fact that with admissions, matches are permanent (once a student accepts a school's offer, said offer can't be rescinded), while Gale-Shapely requires provisional pairings until all the rounds have been gone through and a final result is available.


The issue is that couples want to be placed together. That makes everything trickier than if people just placed independently.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: