More

nulld3v · 2024-05-21T19:48:39

I get the compliance perspective but it feels stupid to bring it up now, especially since iTerm2 has already had integrated network features for a long time.

Agreed on the "do not access the network" feature flag though, every program should have that. Or really it should just be a toggle in the OS on a per-app basis.

derefr · 2024-05-21T20:23:19

I dunno about "do not access the network" — sounds like the wrong granularity. I want an app like e.g. Evernote or Calendly, to sync to its own cloud backend (or better, to my configured server.) I just don't want them sending my data off anywhere else.

Annoyingly though, in that scenario, the desire to not have my data processed by third-party vendor APIs, would need to apply to both the client (which I can control through technical measures, e.g. LittleSnitch) and to the cloud backend it talks to (which I fundamentally cannot control.) So such a config flag can't be purely a technical measure, but also has to be something communicated to the backend, ala "Do Not Track." And unlike HTTP, most of the other application-layer protocols we use today don't have anything like a standardized way to communicate "user-imposed constraints on how they want you to process their request, while still giving the same result".

paulmd · 2024-05-21T23:57:14

> And unlike HTTP, most of the other application-layer protocols we use today don't have anything like a standardized way to communicate "user-imposed constraints on how they want you to process their request, while still giving the same result".

This is the wrong approach and what you really want is for LLMs to instead have access to a palette of pre-vetted bugtested commands and options implemented by other applications.

ie think like those python embeds in OpenAI… but instead of building a python script, you should be building an Ansible playbook or a MacOS shortcut that does the task, rather than an LLM banging together shell code directly.

Things like file access or web request etc are just more primitives in this model. Don’t want it to call out to the web? Don’t give it access to the web request primitives. Like this literally has been a solved problem for 30 years - macOS intents and windows COM interfaces allow applications to expose these capabilities in a way that can be programmatically interfaced by other code to build up scripts.

https://developer.apple.com/documentation/appintents

https://learn.microsoft.com/en-us/windows/win32/com/the-comp...

This is HyperCard-era stuff, Unix just won so thoroughly that people don’t consider these capabilities, and everyone assumes everything has to be a single giant unshaped command of CLI garbage piped together.

The Unix mindset is not the right one for LLMs working at a token level. The idiom you need to mimic is MacOS intents or Ansible actions… or at least powershell actions. The amount of unconstrained shell-scripting involved needs to be minimized rigorously. Every time you do it it’s a risk, so why make it write anything more complex than glue code or composable YAML commands?

derefr · 2024-05-22T15:21:28

You're misunderstanding what I was trying to describe.

I wasn't talking about the direct use-case at hand (iTerm), because iTerm communicates directly with ML APIs. But such a "request processing constraint" would not be for use in communicating with ML APIs.

If the client application is directly communicating with an ML API, then the technical measure — a group-policy flag that tells the app to just not do anything that would communicate with those APIs; or further, which alters the app's sandbox to forcibly disable access to any/all known ML API servers on the OS-network-stack level — is enough to prevent that.

Rather, what I was referring to by a "request processing constraint", is for the other case — the case that can't be addressed through purely technical measures. And that's the case where a client application is making a request or syncing its data to a regular, non-ML business-layer server; where that backend then might decide to make requests to third-party ML APIs using the client's submitted data.

Think: the specific REST or gRPC API that most proprietary "service client" apps are paired with; or an arbitrary SMTP server for your email client; or the hard-coded AMQP broker, for an IoT device that relies on async data push to a remote message-queue; or the arbitrary S3-compatible object-store endpoint for your `rclone` or `pgbackrest` invocation.

It is for these systems, that you'd need a way for the client to tell the backend system it is communicating with, that the client is operating under a regime where third-party ML is not to be used.

The spirit of such a "request processing constraint" being communicated to such a backend system, would be: "don't allow any of the data being carried by this request, once it arrives at your [regular non-ML business-layer] backend system, to then be sent to any third-party ML APIs for further processing. Not in the process of resolving the request, and not later on for any kind of asynchronous benefit, either. Skip all those code-paths; and tag this data as tainted when storing it, so all such code-paths will be skipped for this data in the future. Just do what processing you can do, locally to your own backend, and give me the results of that."

(Ideally, there would also be some level of legal regulation, requiring vendors to at least put their "best effort attempt" into creating and maintaining a fallback backend-local [if not client-side!] implementation, for people who opt out of third-party ML processing — rather than just saying "the best we can do is nothing; your data goes unprocessed, and so the app just does nothing at all for you, sorry.")

---

You're also misunderstanding the misapprehensions that the people in the comments section of this post are having to data being passed to ML models.

Most of the people here who are having a visceral reaction to iTerm including ML capabilities, aren't thinking through the security concerns of this specific case of iTerm generating a shell script. I do understand that that's what you were trying to address here by talking about high-level capability-based APIs; but it wasn't what I was trying to suggest fixing, because "iTerm could generate a dangerous shell script" isn't what anyone was really complaining about here.

Rather, what people here were doing, was echoing a cached-thought response they had already long decided on, to the use or capture of their data by third-party ML-model service APIs generally.

The things that make people reject the idea of applications integrating with third-party ML APIs generally, are:

1. that the model would "do its job", but produce biased or censored results, creating unexpected "memory holes" in the resulting data (for recognition / transformer models), or tainting their processed data with arbitrary Intellectual Property captured from others (for generative models); or

2. that the ML company being sent your data would have Terms of Service (that the vendor of the product or service you're using agreed to, but which you never did) that any data submitted to them in a request can be used for model-training purposes. (And where, legally, due to the Terms of Service you consented to with the vendor, whatever minimal right they have over "your data" makes it "their data" just enough for them to be able to consent to the use that data with third parties, without any additional consent on your part.)

Regarding problem 1 — imagine, as I said above, the Evernote use-case. You add a bunch of notes and web clips to an Evernote notebook through their client app. This app syncs with their cloud backend; and then this cloud backend sends each note and clip in turn through several phases of third-party ML-assisted processing during the indexing process — "AI OCR", "AI fulltext keyword extraction", etc. But maybe the models don't know some words they weren't trained on; or worse yet, maybe the creators of those models intentionally trained the AI to not emit those words, but didn't document this anywhere for the vendor to see. Now some of your documents are un-searchable — lost in the depths of your notebook. And you don't even realize it's happening, because the ML-assisted indexing isn't "flaky", it's rock-solid at 100% recognition for all your other documents... it just entirely ignores the documents with certain specific words in them.

Regarding problem 2 — think about ML built into a professional creative tool. A tool like Adobe Illustrator (where it's probably called client-side) — or even Adobe Bridge (where it might just be happening in their "creative cloud", analyzing your Bridge library to suggest stock image assets that might "go well with" your projects.) What if your own Illustrator illustrations, sitting in Bridge, are being fed by Adobe to some third-party image2text model... owned by a company that also does generative image models? In other words, what if your own Illustrator illustrations — done as a work-for-hire for a company that is paying you to create them a unique and cutting-edge brand appearance — are being used to train a generative image model to ape that novel style you created for this client; and that version of the model trained on your images gets published for use; and people end up discovering and using your new style through it, putting out works on the cheap that look just like the thing you were going to charge your client for, before you even present it to them?

A "third-party ML processing opt-out" request-constraint, would be something intended to circumvent these two problems, by disallowing forwarding of "your" data to third-party ML APIs for processing altogether.

This constraint would disallow vendors from relying on third-party models that come with opaque problems of bias or censorship that the vendors were never informed of and so can't work around. (Such vendors, to enable their backend to still function with this constraint in play, would likely either fall back to non-ML heuristics; or to a "worse" but "more transparent" first-party-trained ML model — one trained specifically on and for the dataset of their own vertical, and so not suffering from unknown, generalized biases, but rather domain-specific or training-set specific biases that can be found and corrected and/or hacked around in response to customer complaints.)

And by disallowing vendors from submitting data to third-party model companies at all, this constraint would of course prevent anyone (other than the vendor themselves) from training a model on the submitted data. Which would in turn mean that any applicable "data use" by the vendor, could be completely encapsulated by contract (i.e. by the Terms of Use of the software and accompanying service) that the client can see and consent to; rather than being covered in part by agreements the vendor makes with third parties for use of "the vendor's" data that currently just happens to contain — but isn't legally considered to be — "the user's" data.

20after4 · 2024-05-21T22:19:22

Given that the "do not track" header is almost universally disregarded, I'm not sure what value we'd gain from implementing more instances of disregarded user preferences.

9dev · 2024-05-21T21:37:55

Technical measures are the wrong lever to this problem. I can always send your precious data to my backend and proxy it to whatever third party vendor from there, and there’s nothing you can do to prevent that.

Instead, a legal solution like the GDPR offers better means of protection. The way the fines are structured, vendors have a clear incentive to not exfiltrate your data in the first place.

derefr · 2024-05-22T14:55:04

> Instead, a legal solution like the GDPR offers better means of protection.

I mean, yes, that was my point — that there'd need to be some legal thing like GDPR. But that thing would very likely need some kind of explicit user-driven policy choice (ala how websites are now forced to ask for a user-driven cookie-handling policy.)

To comply with such a law, it would be likely that every application-layer protocol that could in theory involve a backend that relies on the use of third-party ML vendors, would have to be modified to somehow carry that policy choice along with requests. It'd be a huge boondoggle.

nomel · 2024-05-21T20:21:09

> Agreed on the "do not access the network" feature flag though, every program should have that.

I would claim that is not the responsibility of the app. That should be the sandbox/OS responsibility, to make sure it's actually true, rather than an app providing a checkbox that potentially does nothing.

adamomada · 2024-05-21T20:30:43

I use little snitch to get this system-wide feature and imho it’s absolutely worth the money for insight on how your apps communicate

nulld3v · 2024-05-20T14:24:55

I don't think we are ignoring any evidence here, there is just no evidence (or at least I haven't seen any). It seems these spirits are allergic to video or something, I have seen more video evidence of UFOs than spirits.

Of course, personal experience is also valid evidence, and I have seen none of that either.

hypertele-Xii · 2024-05-20T18:08:51

You just saw a first-hand account personal experience as evidence, in text, from me, yet here you are explicitly denying that fact. Nothing short of literal personal experience is going to convince you of something everyone knew for thousands of years, because you make no room in your head to unpack the thought. It's like there's a hidden filter in your mind that automatically associates spirits with nonsense and it never reaches your consciousness.

nulld3v · 2024-05-20T18:22:53

Well that's the thing right? First hand accounts alone are not worth much in my mind, especially since there are so many other first hand accounts of different religions. Who am I to believe?

Generally, I follow the "don't trust, verify" approach for first hand accounts. I don't believe something is true, even if 1000 people tell me the same thing. I think this is a reasonable approach, especially in today's age of misinformation. 1000 people can repeat the same false rumor as long as the rumor seems reasonable.

hypertele-Xii · 2024-05-20T22:23:59

> not worth much

Interesting change of tone. Now it's already worth something, just not much. But previously you wrote:

> there is just no evidence (or at least I haven't seen any) ... personal experience is also valid evidence, and I have seen none of that either.

You went from total denial to already assigning worth.

This isn't "today's age of misinformation" stuff, by the way. These are literally thousands of years old historical records of eyewitness accounts. It is in fact "the human mind is just a meat computer" that is the modern day misinformation. It's leading you further away from the soul. So that demons can take over.

nulld3v · 2024-05-21T20:35:21

Yeah, sorry for the inconsistency there. I didn't consider that personal anecdotes and hearsay are technically evidence as evidence is literally anything that supports a conclusion.

It is however, a good indicator for how little I value those two forms of evidence however.

My point with "today's age of misinformation" is not really that there is more misinformation these days. That may be true, but it could also just be that we have access to a higher volume of information. It's more that we are more aware of misinformation, and can develop habits + tools to deal with misinformation.

strogonoff · 2024-05-22T09:13:53

Any evidence here would be unsound if you try to apply natural science’s requirements to it.

Scientific method is about making observable predictions; i.e., it ultimately hinges on the experience of the observer and existence of observer’s mind. When you try to apply it to the theory of mind itself, you short-circuit that logic. There is pretty much no useful (falsifiable or provable) claim or conclusion to be made, and all evidence is immediately tainted as it gets deconstructed into arbitrary categories in vogue today, goes through the meatgrinder of lossy verbal descriptions, and ultimately gets subjectively interpreted by your own mind.

In other words, it is not the problem of the evidence—this is among the best evidence you can get—it is the problem of the framework you are interpreting it in.

nulld3v · 2024-05-25T23:56:05

In many scenarios, the observer is a machine or tool, not a human mind. And of course there's that whole aspect of replication along with that "scientific method" thing. If science was simply the act of humans making observable predictions and telling them to others, then there would be no difference between "science" and "personal anecdote".

I also don't understand why the mind is relevant. We are trying to prove something that exists outside the mind right? However, even if this phenomena was something that only humans could observe, it would still be testable with science. Science makes observations about human behavior all the time.

Ok, all that said, almost none of this is relevant because my proof standards are not as rigorous as scientific standards. I just want to see some videos of the beings, I'm not asking someone to perform a study here.

strogonoff · 2024-05-26T10:54:46

> In many scenarios, the observer is a machine or tool, not a human mind.

An unconscious, non-experiencing mechanism is not an observer in the way the term “empirical”[0] is meant—to observe is to experience.

> I also don't understand why the mind is relevant.

See above.

(I do not think I really understood the rest of your comment.)

[0] https://en.wikipedia.org/wiki/Empirical_evidence

nulld3v · 2024-05-28T15:02:33

You can observe the state of a machine that is expected to derive it's state from an event or state of another object. For example, a video camera derives it's state from the light rays entering the lenses.

I'm asking for some video evidence of religion. So really I am asking for an opportunity to observe a state of a machine, albeit a very specific state. I suppose you could argue this is just a very roundabout way to indirectly experience religion.

strogonoff · 2024-05-31T12:27:58

> You can observe the state of a machine

Yes, but you still observe it, right? That’s how evidence is created.

Religion operates at a level closer to philosophy. You can interrogate theories of mind logically, but when you try to apply scientific method it breaks down—there’s no hard evidence you can obtain to prove or disprove your hypothesis. Similar is true of the claims made by a religion, though its obvious weak point is it’s more axiomatic and less logically rigorous (which is why I am not a proponent).

nulld3v · 2024-06-05T04:12:05

But why is religion special in this regard? Why does religion necessarily operate at a level closer to philosophy but other things don't?

strogonoff · 2024-06-06T19:05:35

What other things do you mean, and why do you think it’s special?

“At this level” in context of this discussion simply means matters outside of the scope of natural sciences. Both philosophy (e.g., of mind) and religion make claims that are non-provable and non-falsifiable using scientific method. They are orthogonal to it.

hypertele-Xii · 2024-05-27T17:31:17

> that whole aspect of replication along with that "scientific method" thing

So uh, religion has been replicated quite a lot. We have historical records of it. We've seen an unprecedented revolution from religion, including science. And we've seen our pinnacle of civilization beginning to collapse since most people abandoned God. How much more proof/evidence/anecdata you need? We still track time in years since Jesus was born. That was 2024 years ago.

> even if this phenomena was something that only humans could observe, it would still be testable with science.

This is a belief. The belief that there exists nothing in the universe that cannot be tested by science. But science is filled with untestable things. Mind-numbingly humongous leaps of pure speculation about something that makes no sense and cannot be measured. Like dark matter, spacetime singularities, or "the big bang".

Science can't even measure consciousness! Or do you take IQ tests as gospel?

nulld3v · 2024-05-28T14:40:47

I have not heard of this replication before so I would be glad to see some examples of this! I mean I'm fairly convinced Jesus did exist, I'm just not convinced that they had any of their spiritual powers.

I have definitely not seen our civilization start to collapse though. I'm not even sure what that would look like (maybe a transition to a low-trust society or something)?

Of course, I do not believe everything can be tested by science, but my belief that religion specifically can be tested is because religion describes the most powerful forces in the universe. And not only that, humans can interact with these forces! So we should be able to detect these forces by observing how humans behave when they interact with these forces.

nulld3v · 2024-05-15T22:58:03

A similar app that's on Android: https://play.google.com/store/apps/details?id=com.urbandroid...

Unsure if it actually works though, my personal test results are mixed.

nulld3v · 2024-05-10T19:27:30

I'm replying just to +1 this as well. This has saved me multiple times not from phishing attacks but from simple mistakes such as entering a password into a HTTP page when the website supports HTTPS.

Often I'm just browsing along and try to get KeepassXC to autofill a password only to be frustrated when it refuses to work. Then the frustration turns to relief when I go into KeepassXC and see that I've entered "https://website.com" into KeepassXC's URL box causing KeepassXC to only autofill the password on the HTTPS variant of the website and I was on the HTTP variant of the page.

Obviously it's best for the website to just setup HSTS, but I can't fix that for them.

I previously used auto-type and always thought browser extensions were insecure until I realized this.

accoil · 2024-05-10T20:53:45

It's not like the clipboard is secure either. Any arbitrary app can listen to the clipboard in X11, and while it seems harder in Wayland, I'm not sure if I've ever seen a clipboard permission dialog (my Wayland experience is limited though).

Turning off the browser intergation means that the user may accidentally auto-type into the wrong website. Turning off auto-type means that external applications can see the password.

Arnavion · 2024-05-10T22:48:14

With Wayland, the compositor gets to decide which clients to send the "clipboard data available to paste from this file descriptor" event to (wl_data_offer). For example the compositor might only send it to the client whose window is currently focused. So clients that don't receive this event would not have the fd to be able to read from it. Clients that do receive the event can read that data without any restrictions.

That said, this ends up also making this like clipboard managers or wl-paste not work, so there is a wlroots protocol (wlr_data_control) that lets the client know about all data offers. How is a malicious process prevented from being a client of this interface (or even should a process be prevented...) depends on the compositor.

nulld3v · 2024-05-10T18:27:24

> features a large majority of people don't use.

I would be shocked if this was true. I have recommended KeepassXC to many of my friends and family and they all use at least one of the features that was in the removed list. To be fair, none of them use debian, but some of them do use other Linux distros.

I don't use debian on desktop anymore on my main machine, but I did use BunsenLabs for a long time and even back then KeepassX (and afterwards KeepassXC) auto-type was a critical feature I used literally every day.

gr4vityWall · 2024-05-11T02:20:18

I never knew KeepPassXC could talk with my browser or interact with ssh until I saw this thread, and I've been using it for almost 10 years now.

nulld3v · 2024-05-10T18:22:05

Hmm, excluding the comment about the missing "-full" version, I don't see how it is not comparable. There is nothing that makes the networking code in KeepassXC more "risky" compared to the networking code in Linux.

nulld3v · 2024-05-10T16:47:09

> as intended and announced upstream

Meanwhile, from a maintainer of the KeepassXC repo:

> Good luck to you. Really bad decision. We will be sure to let everyone know.

https://github.com/keepassxreboot/keepassxc/issues/10725#iss...

nulld3v · 2024-05-10T16:25:16

They removed other non-networking features too. E.g. even autotype was removed. At that point why not just store your passwords in an encrypted notes app?

lupusreal · 2024-05-10T18:07:06

I use KeePassXC, but not any network features or autotyping, because I like the password generation and because the interface is nice. I previously used Vim's old encryption feature (since removed I think?) and I think KeePassXC as I use it is a good upgrade from that.

zikduruqe · 2024-05-10T16:39:51

Just use https://www.passwordstore.org

nulld3v · 2024-05-10T16:20:05

But that's not what they disagree with. They are saying you shouldn't have a package called "keepassxc" if it is missing a ton of features from upstream KeepassXC. You should name it something different instead.

So you shouldn't have "keepassxc" and "keepassxc-full". Instead you should have "keepassxc" and "keepassxc-minimal".

vlovich123 · 2024-05-10T16:28:06

But it’s a valid build configuration option provided by upstream. Not sure I follow this line of reasoning.

nulld3v · 2024-05-10T16:56:29

Sure, I think that's a reasonable stance. If upstream agrees that such a configuration is a valid distribution of KeepassXC and can be branded KeepassXC, then that's up to them. I would probably disagree with upstream in that scenario just from a UX perspective, but I would understand both sides.

But in this case, upstream has responded and clearly indicated that they do not want the minimal distribution of KeepassXC to be branded as the main "keepassxc" package: https://github.com/keepassxreboot/keepassxc/issues/10725#iss...

nulld3v · 2024-05-07T16:58:39

To be fair, the logo is like 2px tall on Firefox Android...

iamnbutler · 2024-05-07T18:37:29

that is my bad – someone flagged this before and I forgot to fix it. A quick fix is deploying now!