If you can recreate a file so it’s hash matches known CP then that file is CP my dude. The probability of just two hashes accidentally colliding is approximately: 4.3*10-60
Even if you do a content aware hash where you break the file into chunks and hash each chunk, you still wouldn’t be able to magically recreate the hash of a CP file without also producing part of the CP.
The Twitter thread this whole HN thread is about shows just how to make collisions on that hash. So any image can be manipulated to trigger a match, even if that image isn’t CP.
It's the weights from the middle of a neural network that they're calling a "hash" because it encodes and generates an image it has classified as bad. Experts have trouble rationalizing about what weights mean in a neural network. This is going to end badly.
If this was a hash then it would be as the parent describes, this is at best a very fuzzy match on an image to take into account blurring/flipping/colour shifting.
It's vastly more likely that innocent people will be implicated for fuzzy matches on innocuous photos of their own children in shorts/swimming clothes than it is to catch abusers.
The other thing is, when you have nothing to hide you won't take efforts to hide it - meaning you'll upload all of your (completely normal) photos to iCloud without thinking about it again.
The monsters making these images know what they're doing is wrong, so they'll likely take efforts to scramble or further encrypt the data before uploading.
tldr; it's far likelier that this dragnet will only even apply to innocent people, than it is to catch predators.
All this said, I'm still in support of Apple taking steps in this direction, but it needs far more protections put in place to prevent false positives than this solution allows. A single false accusation by this system, even if retracted later and rectified, would destroy an entire family's lives (and could well cause suicides).
Look what happened in the Post Office case in the UK as an example of how these things can go wrong - scores of people went to prison for years for crimes they didn't commit because of a simple software bug.
> The monsters making these images know what they're doing is wrong, so they'll likely take efforts to scramble or further encrypt the data before uploading.
The ones that make national news from big busts do, because the ones that don't get caught much sooner and only make local news, because Google and other parties are have automatic CSAM identification online already (server side, not client side, AFAIK), and are sending hits to Homeland Security.
If your goal is validation (i.e. this is a JPG/PNG) and stripping of EXIF data it is entirely possible to write your own parser in a managed and safe language in less than 500 lines of code without sacrificing any performance.
Don’t load them into memory, parse them as a stream byte-by-byte in accordance with the standard for the codec, check every offset before seeking, and reject images that don’t conform to the standard.
The overhead of stream abstractions is negligible if your goal is security when processing arbitrary input files provided from a zero-trust environment.
In environments where you’re prioritizing performance I’d still argue streams are likely your best bet when the size of the file to be parsed is not a constant. You wouldn’t want to load 50 large files into ram on a server environment let alone a phone.
If your input buffer is a bunch of tiny 10 KB files and you trust them? Sure, load them into memory and access their indices on the stack. Make sure you reuse the buffer to avoid unnecessary allocations.
If you want parallel processing with zero-allocations then streams with an array pool for their backing buffer are the best bet.
Not loading arbitrary files into memory will always be safer than doing so.
As for decoding - I believe the functions for validating if an array of bytes is an image should be far removed from the decoding and presentation of those bytes to the frame buffer. You don’t need to decode a JPG to validate that a file is a JPG. It either conforms to the standard or it doesn’t; the pixel data is irrelevant.
E.g. for a Web browser like Firefox the priority has to be to be as fast or faster than the competition, THEN be secure. That's just the reality of what users care about. If the goal was just security we'd all have been using HotJava for the last 24 years.
The goal for Rust was performance plus safety. That's pretty hard to pull off.
> You wouldn’t want to load 50 large files into ram on a server environment let alone a phone.
mmap() works pretty well here.
> As for decoding - I believe the functions for validating if an array of bytes is an image should be far removed from the decoding and presentation of those bytes to the frame buffer. You don’t need to decode a JPG to validate that a file is a JPG. It either conforms to the standard or it doesn’t; the pixel data is irrelevant.
Yeah but in a browser for example you never want to just "validate" an image file, you want to decode it, and separating validation from decoding is just asking for trouble. That is the meaning of "parse, don't validate".
Huh, the debug symbols are in separate .pdb files, aren't they?
38MB is a pretty typical size for a real-world, self-contained, assembly-trimmed package, and the size comes from the runtime itself, plus the core assemblies that weren't trimmed out.
Actually, in my tests with the AoT compiler I had to strip the binary to get a significantly smaller one despite there being a .pdb file. Your mileage may vary as the build flags can be a bit fiddly.
We're unleashing the future of computing, collaboration, productivity, and development by using real-time video streaming to change the way that desktop software is built and distributed.
The political speech some wish to see silenced revolves around the racial and socioeconomic inequality they benefit from each day. Turning a blind eye to systemic racism does not mean it does not exist, and inherently saying Black lives matter is not a political statement. The ideological aggression against it however is.
> I rarely see talented developers doing this because they're too busy working.
I love the privilege on display here. You can ignore racial inequality issues because "they are too distracting," meanwhile I have to wake up each morning, read yet another one of my brothers and sisters has been killed by police, and I still have to run my business. Your mediocrity exist in your complacently to the status-quo. On a daily basis we innovate, build, ship, and push tech more than you could ever hope -- and we don't need to ban "political discussions" at work.
If a business chooses to condemn something without backing up that solidarity with meaningful resources and capital, they should be criticized.
> meanwhile I have to wake up each morning, read yet another one of my brothers and sisters has been killed…
Unless they have the same parents, they are not your brothers and sisters more than OP is. Imagine an us-vs-then based on skin color is extremely racist and should be criticized.
Roughly 3/4 of people killed by police in the US are not one of your "brothers and sisters" if by "brothers and sisters" you are referring to humans who share your melanin content and location of ancestral origin (of course all humans are descended from African humans, but to suit the purposes of demagogues, we can ignore that and just stop at the arbitrary point in lineage where we can maximize division to reap power).
As you read this, it is extremely likely that you cant name a single one of the non-black people killed by police last year, can you? Of course not. Or even in the last 5 years, can you? There are 3 times as many of these people as the group you are ethnocentrically focused on.
Let's also talk about the "sisters" aspect. Statistically, women killed by police are minimal. The referenced website, run by activists, removed the filter for women, but you can download the data yourself to see that it's a minimal amount.
Based on the logic of this movement created by people who don't understand basic stats, the discrepancy between men and women must mean that police are systemically sexist against men, right? Because only the outcome matters, and all explanations except bigotry must be ignored.
A big difference these days is that when a non-black person is wrongfully shot by police, the complete lack of national media coverage allows the police to get off scot free.
See Daniel Shaver, who was recorded on camera being executed, unarmed in his underwear, begging for his life, by a psycho cop.
Unlike Chauvin, said psycho cop was exonerated and dismissed with a full pension, as if he honorable retired. Where were you and BLM? Nowhere, because it's a bigoted, ethnocentric movement designed to divide rather than unite.
You should ask yourself why you used the phrase "brothers and sisters" about people who share your ethnnicity, and why you don't use it for all humans, like I do. Ask yourself if you've been sucked into the precursor to pure racism, ethnocentrism. Ask yourself if you've allowed yourself to be misinformed by media designed to monetize your confirmation bias, rather than inform you.
Please don't lie and spread hateful rhetoric like the idea that BLM is "bigoted" (against whom, the police?). BLM activists actually were involved in a lot of the Daniel Shaver activism. It is a travesty that the cop that killed him wasn't held accountable, but trying to blame the BLM movement for that when then, and now, they're actively trying to make changes to that system, in ways that are helpful to people of all races, makes me think you're being driven by an irrational hate and fear of the BLM movement.
> ethnocentrism
This word doesn't mean what you seem to think it means. "Ethnocentrism" is an antonym of cultural relativism, which I guess could be seen as a sort of supremacy ideology, but not in the way you seem to be using it.
> Based on the logic of this movement created by people who don't understand basic stats, the discrepancy between men and women must mean that police are systemically sexist against men, right? Because only the outcome matters, and all explanations except bigotry must be ignored.
Certainly there are ways that society fails men, yes. It frames them as more violent and dangerous, and because of this I would indeed expect that (and the stats back up) that men are killed, by police disproportionately compared to women, basically no matter how you measure.
> See Daniel Shaver, who was recorded on camera being executed, unarmed in his underwear, begging for his life, by a psycho cop.
Literally the only substantial voice that stood up and went to bat for Daniel Shaver was BLM. You're right, police violence is far from isolated to Black people. Most BLM activists would agree that stopping police violence has multi-racial benefits!
First off it's been over seven years, it's not hard to understand that black lives matter is a decentralized movement that is not centrally represented.
The website is not the movement.
Second, both of those "opinion pieces" have references to high profile black lives matter activists (deray, shaun king, etc) speaking out about Daniel Shaver, so that's about as official as you're going to get.
Third, that "facebook post" documents an actual event organized by the family of Daniel Shaver, thanking the local blm chapter for their support.
If you have a more substantial voice that has done more than blm in bringing attention to this issue in a productive way, feel free to bring that forward.
This is a very naive view. Black people are responsible for higher share of murders, for example, than their share of population. We should look at the frequency of interactions with the police, not at the population shares.
I don’t have data to link to but I’m almost certain that if men and women are 50/50 then crime, or police interactions, are more like 80/20. Add onto that math that males have a higher level of aggression and physical ability to pose a threat and I feel the majority of your delta is explained.
Just knowing the percentage of men owning guns is way higher than women owning guns would change your perceived threat level based on sex.
He seems like a good kid, and very smart. Homeless as a teen, white mom kicked out as a teen due to dating a black man, clearly unstable upbringing which is brutal for any kid.
Taught himself how to code at the library.
He was a recipient of the Thiel Fellowship. You get $100,000 for 2 years of living expenses and you get to work on whatever.
Unfortunately he doesn't seem to blog about that and instead focuses on the time a cop forced his head into a steering wheel because he reached in his pocket.
The human mind gravitates towards the negative, probably due to evolutionary pressures.
The US is such a racist place that a white dude who is a well-known Republican gave him $100,000 to do whatever for 2 years but he ignores that and focuses on the negative experiences in his life with the minority of whites who are bigots, ignoring the kindness he clearly must have come across.
I can relate because I had a similar background to him in the sense where I had a very poor and also drug addicted mother who was in and out of homeless shelters. Eventually when my father was able to locate me I got to have a somewhat normal life compared to that, although we were technically below poverty line income wise.
I don't think he ever got that lucky and I can't imagine what he's been through. He seems to have done great things already. I wish him well, and hope he learns to recognize how fortunate he has been in other ways. If being given 100 grand to do whatever for 2 years isn't privilege, then I don't know what is.
Edit:
Importantly, I forgot to add that he CLEARLY capitalized on that fellowship. He's created a pretty badass startup that leverage video/data streaming to allow playing of console/pc games on iOS/Android/etc. Personally of note for me, his startup created a grpc-like serialization format (bebop) that I looked at months ago for a project I was working on. The lesson/reminder for me is that behind every post on the internet is a human being who I probably have a ton in common with. Plus, anyone who buys an old Camaro and fixes it up themselves is automatically relegated to a higher status in my embarrassingly country boy worldview, lol.
Just because you're passive aggressive doesn't mean I am. I meant what I said and there was no insinuation at all. I don't compliment people's software lightly. I would advise you to take your negative filter off.
>Just because you're passive aggressive doesn't mean I am.
No, it's what you wrote that means you're passive aggressive.
>Unfortunately he doesn't seem to blog about that and instead focuses on the time a cop forced his head into a steering wheel because he reached in his pocket.
>The human mind gravitates towards the negative, probably due to evolutionary pressures.
>The US is such a racist place that a white dude who is a well-known Republican gave him $100,000 to do whatever for 2 years but he ignores that and focuses on the negative experiences in his life
>I wish him well, and hope he learns to recognize how fortunate he has been in other ways. If being given 100 grand to do whatever for 2 years isn't privilege, then I don't know what is.
This is the most condescending thing I've read in a while, and clearly a backhanded compliment.
I read Mr. Sampson's blog, and I literally got tears in my eyes. I regretted engaging with him in the way I did, but HN doesn't let me delete posts. It was also too late to edit them.
He and I had something in common, and it's just extremely rare to encounter fellow coders who spent some part of their childhood in "the system" (shelters, foster care, etc) like I did. I've also seen hardcore racists who disown their children for dating a person of color. I even tried to dissuade a man (he was a boss on a construction site i worked at) one time from doing this to his daughter. The hatred in his heart was so deep that it was like talking to a wall.
I saw a commenter continuing to engage, and I wanted to express my emotions I felt after reading his blog. I understand the anger too. I remembered, especially in my mid to late 20s, having the suppressed memories/emotions come back and fuck up my head, and often consume me. It was an attempt to relate and process, and yes, a reminder that these negative memories cause us to forget about being grateful for the good things and good people we encounter on a daily basis. The dehumanization of people who disagree with us, is bad, and I'm not pretending I'm not part of the problem, including some asshole things I said in this thread, which I regret.
Based on your outright hostility and strong political beliefs (you clearly have politics incorporated into your identity), I doubt you'll believe my explanation, but I'm writing this in the off-chance that Mr. Sampson sees this. He's embarked on a difficult path with his startup, and I sincerely wish him well. That's it.
This is better written and less condescending than your original write-up, except for your last childish insult here which you just couldn't resist making. Whatever, I'll take it. Cheers.
- The benchmark code is present in the laboratory directory of the repository.
- We don’t compare to Capt’n Proto because it does not have a stable web-based implementation, at least not one that has the features that make it so fast natively, so there is nothing to compare.
- Flatbuffers are fast but have a notoriously awful API to work with while also creating their own non-standard data structures in languages like C++. Bebop generates standard type-safe code.
- Bebop doesn’t try to compress data other than strings. This is because we don’t want to be responsible for compressing trailing zeroes when faster compression algorithms exist that can be down after encoding. Also most data is tiny.
- Bebop supports discriminated unions and has a much more robust type system than Flatbuffers.
- We’re not convincing anyone to use our stuff. It was made for us and open sourced because it was useful; we don’t need people ripping out their current serializers if there’s no pressure to do so.
Even with that I think flatbuffer, is still the main competition for something like this, not including it might make this look more impressive, but including it and mentioning that bebop has some advantages over flatbuffers (although with less language support) would be more fair.
They are impossible to benchmark against each other without making an assumption about how often you want to access the data, and which parts of it you want to access.
but this means Bebop and FlatBuffers can exist side-by-side / solve different problems: what FlatBuffers does makes sense if you want to access only parts of your objects in limited specific ways, what we do is better if you're always interested in the whole packet.