Hacker News new | past | comments | ask | show | jobs | submit login
The FTC’s new enforcement weapon: “Algorithmic destruction” (protocol.com)
109 points by simonebrunozzi on March 17, 2022 | hide | past | favorite | 78 comments



Good, I hope they tear through this data-mining-gold-rush and set us back 1000 years, and I hope all the wantrepreneurs trying to exploit all of us with this data bullshit end up homeless and poor.

I am not a data point, I do not want to be a data point, and I do not want products and services to interact with me as if I was. We humans are superior at curation, discovery, and anticipating our needs. Not AI.


> I am not a data point, I do not want to be a data point, ...

<insert obligatory The Prisoner reference>

  I am not a number! I am a free man!
  I will not make any deals with you.
  I've resigned.
  I will not be pushed, filed, stamped, indexed,
  briefed, debriefed, or numbered!
  My life is my own!


> We humans are superior at curation, discovery, and anticipating our needs

You speak as though advertising was for you.

The data being collected about you is used to sell you to advertisers, and your data to 3rd parties.

you see advertisements not because you are interested in a product, but because the seller of the product is interested in one of your characteristics (like your income, sex, weight, religious affiliation, price insensitivity, poor decision making skills, etc...)


The value-adds I wrote above are what this data economy is being sold to us as. Where are the consumer-visible benefits of AI? Your "curated" "exclusive" (gag) recommendations "just for you" on Spotify or Netflix. "Products you might like" on Amazon. Etc etc etc


> We humans are superior at curation, discovery, and anticipating our needs. Not AI.

1. I think is far from established, and if anything the accumulating evidence shows the opposite (though I do hate the term AI when used like this)

2. The issue (for me, at least) is not "AI" replacing human equivalents (though in some parts of life, this is a huge issue), but rather the ease to which the AI's actual purpose and success metrics can be different from the claims made about it by its owner-operators.

It's not that this can't happen with human beings too - financial advisers who are actually lining their own pockets rather than providing you with the best advice for you. But it is much easier to hide nefarious intent inside an algorithm, and harder to discover it without access to source code (and non-trivial even then).


We humans are superior at curation, discovery, and anticipating our needs.

Depends, I would prefer reco Algos be open source and I can plug in different ones.

Trying to make sense of the mass of data without a personalized recommendation engine is a fools errand.


all we need is grep for audio and video and I'm good.


Related: there are a few channels on youtube that I archive locally, and I always grab the subtitles when I do. I throw them all into a database that I can run queries against if I'm trying to remember which video something was said in. It's not a direct grep of the video, but pretty close!


We basically need a curation protocol.


This is the field of library science


If AI is inferior, there is no need to ban it. Pick a side!


That's the same argument as :

If nuclear weapons are superior to all weapons, no need to ban them!!


It's literally the opposite argument


Should we not have counted Covid deaths or cases because those individuals also didn't want to be a data point?


Who's making money directly from those figures?


I was recently looking for dimmable string lights, and everything I found online required installing an app to work the lights. In what universe is installing a shitty app more convenient for the user than turning a dial? It seems like more and more products are being converted into data harvesting channels, regardless of their intended purpose.


Because app has a shitload more functions than a dial. I can program my strips to do all kinds of cool stuff.

I mean, what data is going to get extracted? Your favorite color? The app that came with mine is great and I can work the controls remotely.

Might be the wrong forum to be complaining that an app is too hard for you.


> Might be the wrong forum to be complaining that an app is too hard for you.

You're misconstruing the objection here. No one thinks an app is too hard. The problem I don't want my email to end up in some database because I wanted to dim my lights. I imagine it would also be able to collect info on what bluetooth devices I have in my kitchen, etc.


Have you checked the trackers and data loggers build into that app that directly or indirectly call home when you interact with your lights?

I have an aquarium and I use a water quality tester when I suspect something is wrong. Before it was a colour stripe you have to compare with a reference printed on a package. Now it uses a camera and matches the color better for a better analysis. However, that app calls home directly to JBL, so they are building a profile on how I abuse my fish because every time it logs, it logs a bad situation. It also leaks the usage to jquery, google and crashlytics without notifying me or asking my consent.


> It also leaks the usage to jquery, …

This is the first time I’ve read someone refer to loading jquery as a data leak.

Ideally they shouldn’t be using jquery, or just shipping it with the app/self-hosting. But for the general case, are you saying your ideal would be an app prompting you for permission before it loaded any external resource from any url? Even one as ubiquitous as jquery?

How is the fact that your IP address requested a copy of jquery, one of the most downloaded JavaScript libraries of all time, any kind of meaningful signal?

I’d be more worried about it as an attack vector than a privacy infringement.


I would like an app that handles the functioning of my paid product locally without internet access. The product was sold with strings attached that was not mentioned.

Loading any framework script not hosted on the site the app originates is the same kind of leak as the facebook pixel. jquery gets my IP, access time and referrer. They can cross-site track me over all pages that load their scripts.

I also use frameworks scripts, only I host them locally to counter this kind of leakage.


You're a drop of water in the ocean. Protect the shit that matters and stop worrying about apps "stealing" your contacts or whatever.


That's the thing, people actually think their data matters. It doesn't. Your a point in a multi billion point multimillion dimensional cloud. Your data doesn't matter. Yeah, data farms can glean data trends from populations, but there's infinite ways to do this and most aren't actually concerned about who "you" are, but what "people" are doing. In short, you're not that important, you're not being individually spied on, get over yourself. You're uninteresting as fuck.


It's a matter of perspective. My particular data doesn't matter to a company in the grand scheme of things; they will be fine without it. But it matters to me. I have to deal with the spam. I have to answer the robocalls. I have to worry about my personal details being leaked. I have to deal with identity theft targeting me. There's only one person who's concerned about me - me.

I want to avoid the hassle because it affects me and my life. I might not be important to others, but I'm important to myself. Don't tell me to get over myself.


> I mean, what data is going to get extracted? Your favorite color?

For instance, a log of every time I turn my lights on and off? Coupled with of course all kinds of fingerprinting info about my phone which is also tied to all my accounts. Others mentioned it could also scan the local network. Creepy.

So sick of apps being required for basic things. My ISP is trying to require one to admin my router...


Sometimes all you want or need is a dial. I don't need my dimmable lights to have a rave function, or work as a visualizer for the music I'm listening to.

A dial on the wall also doesn't potentially open an IoT shaped hole in my network security.


So buy one with those features, otherwise, you're just an old man yelling at clouds.


> I mean, what data is going to get extracted? Your favorite color?

Well probably that as well as information about every device on the same wifi network and a list of nearby bluetooth devices.


Your browser can do this too. So what?


Sure the browser can do that as well but your question was what data can be extracted and its more than just your favorite color.


And then of course you can't turn them off when AWS is down


Can we please, please, please stop calling statistical regressions "algorithms"? This is getting out of hand.


Statistical regression is algorithm. I think I fail to get the sentiment of this comment though (as someone not in data-oriented profession)


I think it's too late, much like cryptographers lamenting the word "crypto", but I see where they're coming from. I have thought it is unfortunate that the popular conception of "the algorithm" is actually a remarkably bad example of an algorithm. I mean sure it's an algorithm - a computer is executing it - but when I think of an algorithm I think of a well defined sequence of steps solving a discrete problem. Whereas "the algorithm" / machine learning tends to attack a different kind of problem - "squishy" problems like recommendation systems where we don't really know how to explain how we come up with answers as a discrete list of instructions.


Ha! And can we please stop calling curve fitting a "statistical regression"?


A trained model is a set of rules to go from, for example, an image to a tag identifying the object it shows.

That's an algorithm. It is not "statistical regression", even if that is the method that was used in its creation.

The whole its-just-statistics-shtick is what's getting out of hand. People are just mindlessly repeating the point because it seemed smart when it was first made. Hint: it no longer does, especially if you use it wrong.


By that logic, most of mathematics are "algorithms" because they map one space to another. That is absolute nonsense. An algorithm is a sequence of steps, an implementation. A trained model is only an algorithm in the most generous interpretation of the term. You cannot discount the underlying mathematics of it, not matter how unscrutable it may be. A distribution implemented in Python doesn't magically become any less statistical just because it is executed on a CPU.


1. Multiply your input number by 7

2. Add three

3. Square the result

4. Take this result as your final number

There's an algorithm for f(x) = (7x + 3)^2. Similarly, any map from one space to another is an algorithm if it's describable on paper. Every mathematical proof is analogous to an algorithm in important ways. Math is not "equal to" algorithms, in the sense that two different algorithms can describe the same underlying math, but every description of any math essentially has to be an algorithm.

The only parts of math that would not fit would be anything non-constructible, but we know we are limited to exploring and describing only 0% of that world since our descriptions (algorithms) are countably infinite, and there are constructivists who believe non-constructible things don't exist in any meaningful sense. So, yes, most (or all) of the practice of mathematics is algorithms.

Why are we trying to gatekeep this word?


In addition natural numbers are algorithms, in that numbers are functions of no inputs, i.e. 3 is f()=3, similar to random()=[0,1]. And functions are a subset of algorithms.

My word for algorithms that aren't closed-form functions, my life's work, is "repetigrams" because they involve repetition in the form of iteration or recursion.


Math consists of proofs. Every proof corresponds to a computer program, by Curry-Howard Correspondence. Every program is an implementation of an algorithm.

Hence, math consists of things that correspond to implementations of algorithms.


Algorithms Considered Harmful


Confusing headline, it seems to imply that all algorithms are going to die, like we'll just stop using algorithms. Obviously that doesn't make sense, but that is what it sounds like.

I also question whether we are talking about algorithms, or the data set they are working with or the model created from that. I can forgive confusing an algorithm with its implementation (e.g. the source code), but this goes beyond that.


I believe the answer to your second line is "yes." This FTC enforcement action appears to address all of the above: any data that is collected in an illegal manner (in this case, which violates COPPA) and any machine learning models that are created through the use of this data (in their entirety, since you can't un-train a piece of information from a model).

Machine learning is somewhat unique in the software world in that the actual useful artifacts are not necessarily strongly tied to the source code itself. You can have the identical source code running at two different companies, but by supplying them with two different training sets, you'll end up with very different outputs. That's what algorithmic destruction is targeted at--not even necessarily the source code or algorithm in the technical sense (you can't destroy "KMeans" or "convolution" as a concept, obviously), but both the data and the model weights that are produced through the use of that data that are used in perform a business action. Those weights are typically stored separately from the source code, and can be extremely expensive to re-create from scratch.


The articles use of “algorithm” is incredibly bad.

> The FTC’s new enforcement weapon spells death for algorithms

makes it sound like running quick-sort is a federal crime


A trained model, which is obviously the subject here, is an algorithm.


I generally wouldn't define algorithm as such, although I can see where the lines between them can become blurry.


> I generally wouldn't define algorithm as such, although I can see where the lines between them can become blurry.

Its pretty clear cut tbh. An algorithm is a set of steps to follow to produce some output. A trained model is, 'hey do these matrix multiplications with these coefficients to get an output'. The fact that the exact coefficients were arrived at via backprop, doesn't make it not an algorithm.


Not necessarily, as the trained model can just be the matrix. It's more like data, as presumably the same algorithm with different weights, trained by different data would be permissible.


ML training is an algorithm that produces an algorithm as output


It doesn't really though. The prediction algorithm already existed before training, it just is a particular collection of parameters that the training produces. You wouldn't say changing the interest rate changes the algorithm for calculating interest payments. The weights can be viewed another input to the algorithm. You aren't going to outlaw linear regression, but a particular set of coefficients.


Oh yes so chromium and minecraft are algorithms now. What a useful definition we have here.. Obviously colloquially when people says algorithm, it is an implicature that they mean a hard computing (not soft) serie of steps that achieve a specific goal. In other words, a statistical algorithm does not point to the same category as a deterministic algorithm and an algorithm refer to the later class by default.


No reasonable person would define Chromium or Minecraft as algorithms; that's not what the poster was saying.


oh yes chromium is not a "set of steps to follow to produce some output" then?.. If the author wasn't meaning what he said, what is the actual criteria he was meaning?


> What a useful definition we have here..

Indeed it is - for one thing, it allows us to see that various useful theorems and results about algorithms and computability apply as much to large programs as to small ones, such as the fact that there's no fundamental impediment to porting them between computers with different instruction sets, or running them in virtual machines.

What's not so well or usefully defined here is your distinction between hard and soft computing.

> In other words, a statistical algorithm does not point to the same category as a deterministic algorithm and an algorithm refer to the later class by default.

You appear to be under the misapprehension that the set of statistical algorithms is disjoint from that of deterministic algorithms. I strongly suspect that all the algorithms covered by the article are both statistical in terms of what they compute and deterministic in terms of how they do it.


How could the FTC require algorithmic destruction? How do you prove you destroyed something? Would you have to lobotomize everyone who had worked on building it?


You don’t. Upon receiving a destruction order, a company can delete the trained model or not. If they do delete it, it’s all good. If they don’t delete it, they (hopefully)can’t get caught using it again without company-destroying fines. They can probably do some kind of recovery in most cases, but I doubt a judge would be very favorable to them if they’re caught disobeying such an unambiguous order.


How does the FTC prove they’ve used a destroyed algorithm versus a reasonable facsimile? I feel like the lawyers are going to love the litigation that will come of this


So, essentially, they require the company to pretend that they destroyed the "algorithm" (I think they actually mean machine learning model). If they don't, or they do but they had a copy, or they had used the knowledge to know which if several general purpose algorithms worked better at approximating it, there's essentially no way for an outside enforcement agency to know, and even with a whistleblower there's a real problem trying to enforce it. How do you prove, even if you have the database seized, which data the machine learning model came from?

It's the same problem with saying "destroy that data", which is also difficult to enforce (and probably isn't), except that it's an additional level of difficulty in enforcing.

I'm not saying they don't have a point, I'm just saying I don't see how they will be able to enforce this, even with whistleblowers.

Whistleblower: "that ML model came from illegal data" Company: "No, it didn't." ...


I think the trained model. I don't think people will remember all the coefficients.


Right, but if the government says I have to turn over some nuts and bolts that I got through some illegal action I can just give them the location of my warehouse. Then the rightful owner can come pick them up with a government agent inspecting the process.

If I have a bunch of drugs or chemicals I shouldn't have, I can give them the addresses and they can have government agents watch the destruction.

If I have an algorithm (that is just a bunch of computer files) how do I prove I destroyed it? Do they literally just watch someone run "rm banned_model.bin" and then decide it's gone? It's basically impossible to prove you don't have a copy of something. There could always be a backup at another location. There could always be an encrypted copy stored somewhere that is impossible to detect.

Any moderately well company has a service contract with a backup provider that acts an option of a last resort for restoring lost data. How do you get that provider to destroy their copy?


In general, law is ok with good-faith mistakes if you obey the spirit of the law. If you forgot about an off-site backup but then deleted it when you found out about it, I doubt you'd get in much trouble. If you deliberately used a technological mechanism to evade a court order, you'd probably be in for a world of hurt if you got caught, but that's fundamentally no different than committing any other crime and hoping you don't get caught.


You're right; There's not any way to prove it's gone completely. But in the real world, these hypotheticals can be handled through increasing fines and possible jailtime for contempt of court.


You can randomly check if the company is using the model down the line as well as rely on whistleblowers. If they don't use it there's little point for them to risk a fine by keeping it.


Unless there's problematic material within the algorithms/models -- and there could be, in some cases -- I get the feeling that, longer-term, algorithmic transparency would be far more effective.

It would help demonstrate what biases were introduced, the systems and processes that permitted/encouraged those to exist, and it would help prevent repeats of similar mistakes in future.

(it seems fair for the FTC to also be able to order companies to stop using a particular category of algorithms; that doesn't require or imply deletion, though)


It would be really nice to have companies destroy algorithms based on knowledge obtained illegally (pretty much all of Facebook and Google). I just don't see it happening anytime soon.


As the article notes, the policy has been used, including against Facebook's partner Cambridge Analytica. So I cannot say that it will be "happening soon", but it is certain to "have happened already".


What evidence do you have for these claims? What knowledge do you believe was obtained illegally?


Facebook was forced to pay $650 million in 2021 for violating Illinois' Biometric Information Privacy Act[1]. That's not necessarily grounds for a federal regulatory action, but if something like this happens in the future the does violate federal law (in the article, this was caused by not paying sufficient attention to COPPA), it could mean Facebook has to delete/re-do quite a bit of work in addition to purely monetary damages.

[1] https://techcrunch.com/2021/03/01/facebook-illinois-class-ac...


That's a great example. I can say that for a very long time (at least 5 years, probably more but I wasn't there to know for sure) Google has had explicit policies and training for all employees about not sharing data across projects without explicit user consent for this and many other reasons.


The fine (around 1 million) is a total joke though.

These companies make billions off of harvesting data. They need to be shuttered entirely if they are found to be violating privacy rights.

The data they have gathered, and even possibly shared with allies also needs to be traced and deleted as well before I'd believe this isn't just fluffy optics.

These companies also regularly risk breaches of that data, which can be very very harmful when it lands in the wrong hands... It's really far beyond a time when serious moves should have been taken to eliminate this kind of behavior, and frankly I can't see any future where we can trust any company with any accurate information about us.

I've even got into the habit of filling inaccurate info into (non vital) data forms for companies that don't need to know my personal information.

We also have power to fight the private data gorge they drive...

By creating incorrect location tags, using pseudonyms and improper spellings on our names, by opting out of tracking and using additional personal privacy tools, not buying devices that serve tracking, and most of all, by not enrolling with or supporting companies that harvest personally identifiable information.

Facebook IS NOT a government entity, they never had the authority to make everyone switch over to their real names...

In order to protect our personal privacy, we need to wake up and start asserting our rights on our own, because consumer protection has been sleeping at the wheel, and even coddling these corrupt and deeply irresponsible private companies & industry control freak CEOs for way too long. Hold these companies accountable and legislate the end of this cycle of personal data abuse or it will get far far worse with every update.


This headline is, as they say, greatly exaggerated.

Algorithms have an age old anti-enforcement weapon, called the white shoe law firm. https://www.debevoise.com/insights/publications/2020/10/thir...


I don't understand your point here. Are you simply saying that companies can prevent this by...hiring an expensive lawyer? That seems like sort of a strawman argument; hiring a "white shoe law firm" doesn't make an FTC action go away. Companies could hire one of those anyway, but now if the firms fail to win their case (which does in fact happen) the FTC has much sharper teeth which with to move forward. The risk profile for companies engaging in data malpractice goes significantly up.

Outside of that, the headline seems very accurate--algorithmic disgorgement is a very new approach (the Everalbum ruling was in early 2021), and with the Kurbo ruling, one it appears is going to be much more common moving forward. The whole area of regulating algorithmic use is pretty novel--the whole field is basically so new that agencies are still figuring out the way to go about.


No way -- the FTC self-polices because of fear of losing a case. And precedent deters the FTC -- one borderline enforcement action that gets overturned either by a lower court or an appeal court spells a complete disaster for their leadership. That's why a single tech company with an effective legal strategy and deep pockets can upend whatever enhanced enforcement capacity this new "weapon" poses.


> Algorithmic destruction

I don't understand this label. Isn't this more about discovering how data is (ab)used by companies (what watchdogs are supposed to do), and less about destroying things with algorithms?


It explains in the first few paragraphs of the linked articled what they're trying to convey.

"...destroy the algorithms or AI models it built using personal information collected through its Kurbo healthy eating app from kids as young as 8 without parental permission."

and

"..forcing them to delete algorithmic systems built with ill-gotten data could become a more routine approach, one that modernizes FTC enforcement to directly affect how companies do business."

Those quotes are from the first 3 paragraphs.


Ok. I read "algorithmic" as "done using an algorithm". So (imho) a better label would be "algorithm destruction". Still not a great label though.


Agree, your phrase is an improvement on theirs.


What I fear is law-enforced backdoors that could literally destroy everything - programs, code, data... Just imagining this dystopian concept brings me shivers. You didn't pay taxes? Data deletion.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: