Hacker Newsnew | past | comments | ask | show | jobs | submit | outofpaper's commentslogin

It's just hard for us to see its grand plans.

If anything a CLU, a Command Line Utility would be the best thing to call the small programs that both van be run as one offs from the command line and have their output piped to other Command Line Utilities... I don't know why the term isn't being used more. CLU keep it simple. CLIs are a catch all from every single CLU up to MidnightCommander, Zork, Mosh, and OpenCode.


> CLIs are a catch all from every single CLU up to MidnightCommander

No they're not, for reason just explained. CLI means "command-line interface". Midnight Commander runs in a terminal emulator, but it's not a CLI.


To the average person ab public key is about as comprehensible as JSON.


Why would anyone test in production???!!!


Selecting the wrong environment in your test setup by mistake?

I refuse to believe that someone on the security team intentionally tested random user scripts in production on purpose.


Once you get big enough… there comes a point where you need to run some code and learn what happens when 100 million people hitting it at once looks like. At that scale, “1 in a million class bugs/race conditions” literally happen every day. You can’t do that on every PR, so you ship it and prepare to roll back if anything even starts to look fishy. Maybe even just roll it out gradually.

At least, that’s how it worked at literally every big company I worked at so far. The only reason to hold it back is during testing/review. Once enough humans look at it, you release and watch metrics like a hawk.

And yeah, many features were released this way, often gated behind feature flags to control roll out. When I refactored our email system that sent over a billion notifications a month, it was nerve wracking. You can’t unsend an email and it would likely be hundreds of millions sent before we noticed a problem at scale.


Yes this is a common release practice.

However this is a different situation as we’re talking about running arbitrarily found third-party scripts. I can’t imagine that was ever intended to be done in production.

Fun story, when I worked at Facebook in the earlier days someone accidentally made a change that effectively set the release flags for every single feature to be live on production. That was a day… we had to completely wipe out memcached to stop the broken features and then the database was hammered to all hell.


I would say you can get to this point far below 100 million people, especially on web. Some people are truly special and have some kind of setup you just can't easily reproduce. But I agree, you do really have to be confident in your ability to control rollout / blast radius, monitor and revert if needed.


> I refuse to believe that someone on the security team intentionally tested random user scripts in production on purpose.

Do I have a bridge to sell you, oh boy


I have never heard of this kind of insane behaviour before.


There are plenty of ways to safely test in production. For one thing you need to limit the scope of your changes.


"Everyone has a test environment. Some are lucky enough to have a separate production environment."


It's sucrose to being a good sweet article. Really though it should be used as a draft for something that wasn't just vibed. Eg fix the mice and loose the sparks from cutting wood.


A harness is a collection of stubs and drivers configured to assist with automation or testing. It's a standard term often used in QA as they've been automating things for ages before Gen Ai came on to the scene.


Yes, it is also a device used to control the movement of work animals, which farmers have been using for ages before QA came on to the scene.


its technically an IDE, but harness makes it sound new and fancy.


??? I'm pretty sure you know what the differences are. Go touch grass and tell me it's the same as looking at a plant on a screen.

Dealing with organic and natural systems will, most of the time, have a variable reward. The real issue comes from systems and services designed to only be accessible through intermittent variable rewards.

Oh, and don't confuse Claude's artifacts working most of the time with them actually optimizing to be that way. They're optimizing to ensure token usage. I.E. LLMs have been fine-tuned to default to verbose responses. They are impressive to less experienced developers, often easier to detect certain types of errors (eg. Improper typing), and will make you use more tokens.


So gambling is fine as long as I'm doing it outside. Poker in a casino? Bad. Poker in a foresty meadow, good. Got it.


Basically true tbqh. Poker is maybe the one exception, but you're almost always better off gambling "in the wild" e.g. poker night with your buds instead of playing slots or anything else where "the house" is always winning in the long run. Are your losses still circulating in your local community, or have they been siphoned off by shareholders on the other side of the world? Gambling with friends is just swapping money back and forth, but going to a casino might as well be lighting the money on fire.


Exactly! The whole point of personal agents is that the data is yours and it's where you want it not in someone's cloud. What harness you use to work with this should be a matter of preference and not one of lock in.


The future will be ownership of our memories and data. AI companies will fight tooth and nail to keep that data walled in and impossible to export.


I agree. This is why I think Google has the long term advantage. They already have so much data. I can ask Gemini a question and it'll reference an email I sent a month ago.


It's an edge but I think it's going to become hard to gate data as they do. Soon our AI assistants will see and hear everything we see and hear in real-time. All of that will be ingested somewhere. Google can't prevent us from recording the things we see and hear.

Perhaps the competitive moat of the future will be time critical access to data. Google likely gets new data faster than everyone else, and they could use this time arbitrage in products like news, finance, research, etc.


If regulators force the capability of exporting to exist, what ya gonna do?

I continue to find it amusing that people really think corporates are really holding power. No - they are holding power granted to them by the government of the state.

Remind me why Zuck et al had to kiss the ring.


Very often, the regulators don't. Here in the US, half the country would refinance their mortgage for iMessage interoperability... if it were possible. Any time regulators reach for the "stop monopoly" button, Tim Cook screeches like a rhesus monkey and drops a press release about how many terrorists Apple stops.

If lobbying was illegal then you might have a point here, but alas.


Since it's already not walled-in in most cases I don't see this happening very effectively.

Using openrouter+kilocode I can simply switch between different providers' models and not miss out on anything.


Lol it aas an admirable attempt at something new. I loved the interesting blend of messaging and document creation. It the code still lives on as an archived open project btw.

https://github.com/apache/incubator-retired-wave


As misleading. Lots of their marketing push or at least thr ClawBros pitch it as running local on your MacMini.


To be fair, you do keep significantly more control of your own data from a data portability perspective! A MEMORY.md file presents almost zero lock-in compared to some SaaS offering.

Privacy-wise, of course, the inference provider sees everything.


To be clear: keeping a local copy of some data provides not control over how the remote system treats that data once it’s sent.


Which is what I said in my second sentence.


It’s worse than “[they] can see everything.” They can share it.


Is it not a given that anyone that gets access to a piece of information is also capable of sharing it?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: