Have legal send Google a C&D and shoot an email to the FTC about anticompetitive...

godelski · 2025-01-21T19:40:41 1737488441

Even if this works, it represents a failure in the system that needs to be fixed.

(I assume you're just trying to help the parent solve their problem so I'm not trying to be dismissive of your comment)

bdangubic · 2025-01-21T20:54:20 1737492860

I am not sure this aphorism is helping in any way though? of course system needs to be fixed, there are about 829 things google/bigtech-wise that needs to be fixed but of course they won't be fixed. the only course of action in vast majority of cases like this is legal action

altairprime · 2025-01-21T21:38:50 1737495530

California’s labor board could state that anyone impacted by algorithmic decisions has the right to review the algorithm used, and that all algorithms used must be deterministic and diagrammable. If clearly stated, “flip a coin” or “choose one at random” is fine, but “trained AI network” is not.

This would shine light on algorithms used at Uber, DoorDash, Amazon, Microsoft, Workday (based in Oakland). Anyone with a worker in California whose work is subject to algorithmic intervention would have the right to request the source code to all algorithms impacting their gig, temporary, or permanent employment.

I cannot imagine a more frightening regulatory path for California tech. They would spend a billion dollars trying to stop it.

godelski · 2025-01-21T22:07:22 1737497242

  > all algorithms used must be deterministic

Be careful with your choice of wording here. There are many non-{AI,ML} algorithms that are not deterministic. Hell, we don't even have to go to Turing or talk about Busy Beaver. What about encryption? We want to inject noise here and want that noise to be as random and indeterminable as possible.

There are also many optimization algorithms that require random processes. This can even include things like finding the area under a curve because it may be faster to use Monte Carlo Integration. You might not even be able to do it otherwise.

  > and diagrammable

An ANN is certainly diagrammable.

I understand the intent of your words and even agree with it. I think openness and transparency are critical. But because I care and agree I want to make sure we recognize how difficult that the wording is. Because it is often easy to implement a solution that creates a bigger problem than the thing we sought to solve.

Personally, I'd love to see that things become "Software Available." I mean if it was a requirement for everyone, then it is much easier to "prove" when code is cloned. Of course, this is easier said than done since there's many "many ways to skin a cat" but in essence, this is not too dissimilar from physical manufacturing. It's really hard to keep secrets in hardware. Plus, there's benefits like you can fix your fucking tractor when it breaks down. Or fix a car even if it is half a century old. I do expect if this would become reality that it'd need a lot more nuance and my own critique applies, but I just wanted to put it out there (in part, to get that critique).

altairprime · 2025-01-21T23:44:36 1737503076

Obviously a diagram of an ANN is ‘possible’ and just as obviously it’s not in compliance relative to an algorithm with a runbook as most governments recognize and use. I’m not writing a forum comment with the law or rule as I would craft it to ensure that a judge can reasonably find against such examples. HN is not a useful place to workshop legalese :)

No law or rule will be able to, in ‘legal’ code terms, fully exclude attempts to slip through loopholes in the proposed restriction. That doesn’t at all invalidate the threat of it; that’s just the cost of doing business with any legal code — which, itself, cannot be interpreted fully deterministically at all.

You’re welcome to propose better wording, of course; and: I also recommend writing a letter to an elected representative or state board if you do! I think they would jump at the chance to even the odds without being seen as disadvantaging their sponsors.

godelski · 2025-01-22T01:40:59 1737510059

Fair enough. Though I suspect that it will be quite difficult to find the right words, even with substantial legalese. But I did want to make the note of caution. Especially as this even permeates into the public language, which in turn ends up being what politicians use because they just care about signaling instead of solving the actual problems...

altairprime · 2025-01-22T02:54:07 1737514447

“Decision processes can be analyzed and reproduced by a typical citizen without burdensome preconditions” is a nice simple way to put it. Neural network training is not accessible to a typical citizen (one that you might find on a jury) without burdensome effort involving terabytes of input data and hundreds of thousands of dollars, and a black-box pre-trained network does not satisfy the terms of replicable as it cannot be interpreted by analysis. Techies will object that ‘burdensome’ is poorly defined, but it serves to concentrate the subjective judgement into a measurable test that can be evaluated and justified by the judiciary; I expect that a judge would not find “download and execute an AI” to pass that test, but you could always explicitly analysis to be possible in a reasonable length of time without a computer. Similarly, language regarding ‘typical citizens’ is already well-known and understood in the field.

This is all moot if no one asks for it, though :) The exact wording of the deck chairs has no bearing on the course of the ship and all.

godelski · 2025-01-22T03:43:19 1737517399

I'd settle for "interpretable".

Because frankly, I don't think the average citizen can understand a lot of even basic algorithms used in data analysis. I teach undergrads and I can say with high certainty that even many senior CS students have difficulties understanding them. There's PhDs in other STEM feilds that have issues (you can have a PhD in a STEM field without having a strong math background. Which "strong" means very different things to different people. For some Calculus means strong while others PDEs isn't sufficient)

Why I'd settle for interpretable is since someone could understand and explain it. There needs to be some way we can verify bias, and not just by means of black box probing. While black box probing allows to detect bias it only allows us to do so by sampling and requires us to sample just right. Black box probing makes it impossible to show non-bias.

What I want is __causal__ explanations. And I think that's what everyone wants. And causal explanations can be given even with non-deterministic algorithms.

BlueTemplar · 2025-01-22T02:37:47 1737513467

For the most sensitive algorithms, if the algorithm cannot be explained in easily understandable language (as determined by a jury), then it's illegal.

(It's how we use paper ballots rather than machines, and especially not everyday computers for voting : the extra risk just isn't worth it.)

godelski · 2025-01-22T03:37:00 1737517020

  >  if the algorithm cannot be explained in easily understandable language (as determined by a jury), then it's illegal.

You've just made

  - A*
  - VNMC Sampling
  - Runge–Kutta
  - PDE solvers
  - Integrals
  - Monte Carlo Integration

and so much more illegal. You've probably made all of computing illegal because good luck explaining systemd let alone kernels to a jury.

altairprime · 2025-01-22T15:58:51 1737561531

You’ve just rejected an idea because your interpretation of it takes a severe path where use of randomness is incompatible with documenting an algorithm. Flowchart the process “flip a coin” to demonstrate why this world-ending interpretation isn’t so.

godelski · 2025-01-22T18:07:13 1737569233

You clearly didn't read the comment that they were replying to which starts with

  >>> Be careful with your choice of wording here.

porridgeraisin · 2025-01-22T07:09:33 1737529773

Come on, where did systemd come into the picture? Don't be obtuse.

No one is saying that the algorithm should be three if statements that the jury can understand in 30 seconds.

As an example, I'd say that doing maximum bipartite matching using factors like proximity, rating, etc, for drivers/gigs is a reasonable thing that you can explain to the jury. It's not that they have to understand the proof for the algorithm itself.

The most if not only important thing is that you should be be able to convince the jury that you're not including criteria in the matching process that is actively or accidentally malicious towards the gig workers.

The problem with high dimensional LP solvers, optimisation problems, PID controllers, or other systems with a feedback loop is that it's very tempting to include revenue (or a confounding factor thereof) into your objective. This can, as you might imagine, lead to something that harms the workers.

On the other hand, worker satisfaction is much much harder to quantify and is not included in the objectives at all usually. Number of active work-hours and simple things like that are not typically a good signal because of the fundamental nature of most gig work in this context -- they are doing it out of necessity, and taking a risk losing out on employee protections.

godelski · 2025-01-22T18:20:01 1737570001

  > where did systemd come into the picture? Don't be obtuse.
  >>> if the algorithm cannot be explained in easily understandable language (as determined by a jury), then it's illegal.

It came here and because just the other day there was a big thread on systemsd where plenty of people were complaining about how difficult it was[0]. I'm really not convinced that the average person could understand systemd.

The thing is that the simpler you explain it, the less informed the jury is. So the question is if you can find an explanation that is sufficient to make informed decisions. The reality of this is that this then becomes more dependent on who's a smooth talker. Who can convince the jury that they understand and understand in the way they want to. There's a big bias in the balance of power here. Details are hard to explain and if anything relies on something subtle, well you're fucked. So all you need to do to abuse employees is make the algorithm sufficiently complex and get a smooth talking lawyer.

  > The problem with high dimensional LP solvers, optimisation problems, PID controllers, or other systems with a feedback loop is that it's very tempting to include revenue

Some of these are fully interpretable, others are not. But most people have a really hard time understanding high dimensions. Where a straight line isn't "straight" (geodesic) or how the average can be a meaningless value, or how a unit ball inscribed in a unit cube can extend out of the unit cube. Problem with high dimensions is that it is really hard to explain without math because you no longer can rely on any visualization reference.

[0] https://news.ycombinator.com/item?id=42749402

porridgeraisin · 2025-01-23T18:37:21 1737657441

> So all you need to do to abuse employees is make the algorithm sufficiently complex and get a smooth talking lawyer.

Yeah that is true. I was speaking more to the importance of interpretability in these things, and how the lack of it leads to bad incentives very quickly.

> Problem with high dimensions is that it is really hard to explain without math because you no longer can rely on any visualization reference.

Yep, and the co-dependencies also tend to become very hard to reason about.

BlueTemplar · 2025-01-25T08:49:32 1737794972

Which is the point : "computing" shouldn't be involved in these decisions.

godelski · 2025-01-21T21:57:46 1737496666

I disagree, but mainly because we're here on HN. We're in a location where many engineers and engineering managers of these things exist. Many of these problems can be solved WITHOUT legal. In fact, arguably this MUCH cheaper.

For developers:

This means to stop rushing, to do things the "right" way. To not just write code to pass the unit tests, but to write code that is more robust than that. To write code that is modifiable and modular (that whole monad thing that the PL people keep yelling at us about). To not just be someone who glues code together (be that from stack overflow or GPT), but _writes_ code. To actually know the entire codebase and beyond just what you're in charge of. To push back against your managers and write better code. To fix problems without being asked. To fix issues __before__ they're asked. Sure, writing fast and dirty code will get you done quicker but it is just putting off more work later. Because why do today what will be twice as much work tomorrow?

For managers:

Recognize that good and efficient code are important and make your business better and more profitable. To stop this "don't let perfection get in the way of good" nonsense, because perfect code doesn't exist. If an developer is writing "perfect code" then either there's a miscommunication between dev and manager about what is "good enough" or the dev is dumb (junior) and thinks perfect code exists. Give time for devs to go back and clean up the messes. Realize it is _cheaper_ and easier to clean a mess today than it is tomorrow since messes compound. Don't wait for something to break to fix it, fix it before it breaks. Maintenance is FAR cheaper than replacement. To be careful how you evaluate your devs because things like lines of code written, number of commits, or tickets resolved are all extremely noisy measurements[2]. All can actually be indicative of a bad developer as much as it can be of a good developer. Because when you have a true 10x developer 10x fewer tickets will be created in the first place. It's very hard to evaluate a future that didn't happen. Look for your developers who are foreseeing problems. Hire a few "grumpy devs", people who are pointing out problems AND trying to solve them. A good developer is good at recognizing and finding problems (see example below: stop testing if your devs can fizzbuzz or leetcode and instead see if they can anticipate problems and think about solving them. The more you believe LLMs will do the coding in the future the more important this skill is!).

(I've often been told that a difference between "academic code" and "business code" is that the business cares about if something "works". That what matters is the product in the customer's hands. My experience has been that business do not in fact care. This experience involves working in production and even demonstrating how to fix problems that more than double the performance of the product. Not like "made a 10ms process a 5ms process" but "customers can make 1 widget per hour, now customers can make 2 widgets per hour at half the cost")

For both:

To recognize that things compound and thus the little things add up. To stop being dismissive of small improvements, especially if they are quick to resolve. Small issues compound, but so do small improvements. Stick your neck out a little. If there's never enough time to do it right but there's always enough time to do it twice, then something is wrong (there is time to do it right).

Here's a simple example of how a small thing can compound while requiring almost no extra dev time (5 minutes? 30 max?):

  If you have a forum that people need to fill out and it has values you can know or reasonably guess (e.g. country and timezone can be reasonably guessed even if not logged in), provide those as defaults.
  Better, add a copy -- don't remove from the alphabetical list! -- of the most frequent/likely to the top. 
  (Seriously, as a US person why am I always scrolling to the bottom of a list to enter country of origin on a page that is expecting me to be a US citizen... We're writing software. Software automates. Fucking automate this shit for me).

Sure, this saves the user 1-30 seconds[0], but that too scales. The magic of software is scale! Even half a second for a user is extremely valuable as that's almost 3 months if you have a million people doing this each year (or one person doing it a million times ;). Especially since this is very little in dev work, you can find stack overflow posts for the javascript for this quite easily.

I'm not just speaking out of my ass here. I write tons of small scripts and programs to take care of little things for me. They add up in surprising ways. Honestly, I'd get much more utility if other developers made this process more accessible to me. But the reason to write accessible code is not for others, but for yourself[1].

[0] Might seem like a nothing burger but this can be quite timely. My partner is Korean, so she never knows if her country is "Republic of Korea", "ROK", "Korea", "South Korea", or even some others. She has to guess and check, and it isn't like these things are near one another.

[1] I write a lot of research code. Many of my peers write just quick and dirty code. This is fine, I do this on my first pass too. But once something gets going it is incredibly important to have flexible code, even at the cost of optimization! (obviously depends on need and stage of code) because you have to constantly modify things. They may often be faster to first result but I'm faster to "completion" and often can be more thorough. If things are hardcoded then it is hard to modify and easy to make mistakes (did you check all the places?). If things are hard to change, then you're discouraged to answer questions as they arise. But I also worked as a mechanical engineer and an experimental physicist previously doing R&D. So I saw that development has various stages and it requires rebuilds along the way with the new build being aimed at the stage of development not the final goal. A beautiful thing about software is that the performance costs of "repairability" or "modifiability" are extremely low and often non-existent. So it is almost always advantageous to write that way. First dirty, then modifiable, then optimize what and only what needs to be optimized.

[2] There's an important concept here: Randomness/Noise is the measurement of uncertainty. You can't know something to infinite precision, so you have to include some uncertainty. Meaning, if you want to be more accurate, you have to account for "randomness" or "noise". If you just take numbers at face value then you are evaluating incorrectly. Numbers aren't enough, in any situation. Ignoring them will make you less precise and often bit you in the ass. It always does this at the worst possible time and worse, it is often hard to recognize where the ass biting came from.

andrepd · 2025-01-21T22:51:15 1737499875

Nice blogpost (genuinely, not ironic), but entirely unrelated to the issue being discussed.

godelski · 2025-01-21T23:24:57 1737501897

  > Nice blogpost (genuinely, not ironic)

It's a point I've been meaning to actually turn into an actual blog post lol

  > but entirely unrelated to the issue being discussed.

It was an answer to the parent's comment about the aphorism not being helpful. What I'm trying to say is that there are many things that can be done besides regulation. I really want to stress these points because they are actionable things that you, an every day "cog in the wheel" person can do to make meaningful progress towards fixing the problems. If I've learned anything, it is that little things add up. It is easy to see how little things add up in a destructive way, as I don't think it is a reach to say that "everything working normally" is an unstable equilibrium, but we need to recognize that that too means the little things we do matters. Either in continuing the equilibrium or even pushing to make it more stable.

Side note: Always happy to see physicists in the ML space. Biased as an ex-physicist working in ML myself lol. I'm sure you can see how physics influenced my outlook haha