Hacker News new | past | comments | ask | show | jobs | submit login
Accused murderer wins right to check source code of DNA testing kit (theregister.com)
981 points by anfilt 69 days ago | hide | past | favorite | 496 comments



> The co-founder of the company, Mark Perlin, is said to have argued against source code analysis by claiming that the program, consisting of 170,000 lines of MATLAB code, is so dense it would take eight and a half years to review at a rate of ten lines an hour.

First, the defence doesn't necessarily have to evaluate all 170,000 lines. They just need to find one buggy line which could potentially overturn the result.

Second, even if it did take a full 8 years, is that a good reason to deny the defendant due process?


Lol. If it would take 8.5 yrs to review, it's probably god awful, and should never ever ever be used to convict someone of such a crime.

My prediction: this firm will probably try to get removed from the case, rather than open source their shitty code.

Source: I've worked on MATLAB codebases for various genomics research projects in the past.


170,000 lines of Matlab code for a project is not a good sign. Unless they’re also including the source of various Matlab toolboxes which are already tested by the Mathworks.

It’s such a high-level language it’s hard to imagine what the hell they’re doing with all that code. It’s probably mostly useless cruft from GUIDE.


My guess is a bit of each: The company high-balling the LoC estimate to try to impress/scare the judge, but prooobably also has a truly terrible codebase.


How that number was probably arrived at...

PHB: Hey, how many lines of code do we have? CodeMonkey: You want a high estimate or a low estimate? PHB: High CodeMonkey: Well, including unit tests, comments, whitespace, build scripts, integration test harness... 170k

At least I hope they have enough testing code to be signficant...


Having seen Matlab code that's been exported to C, it's mostly bloated by static arrays, and there's quite a lot of redundant functions. So maybe the LOC count is from a C export?


I’ve made 4 decent sized gui based tools before. I would never use GUIDE. At minimum it uses eval.

If you want a terrible gui-based gui creation interface: labview awaits.


I compiled eval once for fun.


A friend of mine did that. They were able to run arbitrary .m files from an executable. As far as I know, this doesn't violate any terms of use. I wonder what keeps someone from just downloading the freely available MATLAB runtime and running a compiled eval wrapper and suddenly have a freely available version of MATLAB.

Also just because eval can be compiled doesn't mean it should. It will forever be a security risk and I will not write code that uses it in good conscious. Fortunately, Mathworks has provided good alternatives. My personal favorite has been variable field names. It really opens up a lot of elegant coding.

https://www.mathworks.com/help/matlab/matlab_prog/string-eva...


>My prediction: this firm will probably try to get removed from the case, rather than open source their shitty code.

That isn't necessarily their choice. The prosecutors will make the decision about whether to withdraw the DNA evidence. They probably won't, given that they would need to give the defendant a new trial, which could lead to an accused murderer getting off. A bad look for any prosecutor.

More to the point, if the firm withdraws from any case where their credibility is questioned, what does that say to law enforcement agencies who are thinking about using their software?


My understanding is that (some) law enforcement agencies have been more than happy to drop cases rather than subject investigative tools to proper scrutiny[0]. They have no qualms resorting to "parallel construction"[1], and simply using the inadmissible (sometimes illegal) evidence to find admissible evidence.

[0] https://arstechnica.com/tech-policy/2015/04/fbi-would-rather...

[1] https://en.wikipedia.org/wiki/Parallel_construction


Stingrays are more useful as an investigative tool than an evidentiary tool. DNA is the other way around.


That would be implying that the prosecutor would prefer taking the life of an innocent rather than having it hurt his career, making the prosecutor kind of a criminal.


>making the prosecutor kind of a criminal.

Never met a lawyer before huh?

Jokes aside, prosecutors pushing through cases they know to be unsound isn't exactly uncommon. Many prosecutors are more concerned with their conviction rates than they are in justice, because that's what they are measured and rewarded by.


I often hear this, who is rewarding them for high conviction rate.


Voters, because when it comes to issues of criminal justice, crowds are rarely paragons of sober temperance and restraint.


I think you are wrong and that most prosecutors want to do the right thing, like most working people


"Right" and "wrong" are dependent upon the system and how it rewards you.I would agree that most prosecutors what to serve justice for malfeasance that has been committed. That's different than whether a case is the "right" or "wrong" one to take.

If a case seems unclear, and you could spend years working on a conviction that will ultimately fall through, that hurts your ability to do justice for more readily winnable cases. You have to spend the time building a case, do all the paperwork, go to trial, etc. That's opportunity cost. So spending that on a case you have 10% chance of winning just isn't a good use of time. Add that to the fact that conviction rate is a metric used to quantify skill, you're rewarded for serving justice successfully. And that then dictates how much money you can get which can help fund enforcing justice.

I believe you're looking at the moral right/wrong, and I don't believe that is the same right/wrong being discussed in terms of how lawyers often choose cases. At the end of the day, lawyers need work and they get that mostly through word of mouth and reputation. You don't really get either of those when you lose cases.


You're version of the right thing and the prosecutors version might not align.

The right thing for them is to put as many criminals behind bars. They review cases and pick ones they can win. They will attack and find unrelated weak points in your character to win. They believe they are doing the right thing and will use whatever they can legally against you. You being innocent and going to court is means someone made a mistake. To confess to a mistake loses you credibility, to confess to an ongoing process mistake could open up other cases where dangerous people could be set free.

Is that your version of the right thing?


It is trial if wrong to convince one's self that accused are probably guilty and that actions that convict them are moral even the proof is insufficient or weak or the procedure flawed.

Most people want to do the right thing wherein right thing is almost entirely defined by norms and customs of their environment. If the norms and expectations are high ethical and correct standards people will follow them to the degree they are able.

To what degree are such standards broken or defective in America though?

Lest we forget the head lawyer of Texas a state home to aprox 27 million people or around 8% of the nation is a man whose own prosecution has for years only been stymied by the difficulty of prosecuting the man at the head of the states justice department. Either 8 or 9 (I've lost track) directly beneath him have resigned and accused him of corruption.

This isn't even an isolated instance corruption is found in fact all over the united states.

Even when in theory we would like to do the right thing we have a hard time establishing what standards are even real. Look at the fact. For proof of that look no further than the science of hair analysis which the FBI spent decades using to convict the accused before we realized that they were incapable of differentiating dog hair from human hair.

Think of entire people going in to work producing work product about imaginary science they were pretending to do competently and sending people to death row in part because of their fake work product.

https://www.washingtonpost.com/local/crime/fbi-overstated-fo...

The justice system in America is a bad joke that is primarily differentiated from say Cuba in that bribes are paid to your lawyer instead of directly to government officials.


Prosecutors are shaped by an environment that equates “the right thing” to “punishing the guilty.” It’s like any profession... a surgeon will think you need surgery and a prosecutor will think the guy in handcuffs needs to go to jail.


Sadly, evidence contradicts that thought. It shouldn’t but it does.


Everyone wants to do the right thing.

It is just that some think the right thing for themself is to maximize their career progress.

And I would not know in general about state prosecutors, but what I know anecdotally second hand, does not sound good.


I believe that's true as well, and I never said otherwise.


The prosecutor doesn't see it that way. They see it as just "knowing" the guy is "definitely guilty". It's just like, a feeling you know? And a win will look great when they go for re-election (why is that even a thing?).

Presuming rational actors in this case is missing the general problem with the system: people very easily convince themselves they know the truth despite how the validity of the evidence changes. Whatever it said initially, that must be right - it's misinformation 101. Once a belief is established it is much harder to change.


And a win will look great when they go for re-election (why is that even a thing?).

You would prefer that they not be elected? That they would be appointed by some politician, with the public having no recourse?

The fact is that the public like prosecutors who convict people. That's deeply unfair. But it's also deeply democratic.


You already elect politicians. If that system is producing people you don't trust to manage the affairs of state, why would electing prosecutors lead to different results?


Hard to say, really. It's one of those compromises, quis custodiet and all that.

I very much agree with you: a government has a monopoly on violence and ultimately we all end up trusting it. Too many checks and balances lead to gridlock. Too few lead to oppression. Much of it ends up being decided on inertia. We do it both ways in different jurisdictions, with successes and failures in both.


That's not how prosecuters work in the US. Their goal is to win the case, not make the "right" decision. They'll spin evidence as hard as they can against the accused.


>taking the life of an innocent

The prosecutor isn't unilaterally deciding whether the DNA evidence is valid. There will be a public hearing where both the prosecution and defense show evidence about the validity of the DNA evidence, and a court will rule based on that evidence.


You should read up on the rates of plea bargaining, as well as the methods prosecutors use to push defendants to do so, which include:

- Not revealing all information they are required to.

- Parallel construction (see above)

- Overcharging, with the goal of making the plea more palatable than the cost/risk of defending multiple absurd charges.

- Lying to you while getting to throw you in jail if you lie to them.

As a result, only 5% of federal cases go to trial.

None of behaviors these are rare. If your understanding of the legal system is based on popular culture, as most people’s is, it is basically law enforcement propaganda that has little relationship to reality.


Believe it or not, I was already aware of all of those things, having followed a number of criminal defense blogs.

If you read the article and appellate decision which is linked, it says what I just said:

>On Wednesday, the appellate court sided with the defense [PDF] and sent the case back to a lower court directing the judge to compel Cybergenetics to make the TrueAllele code available to the defense team.


Apologies, I thought you were stating general facts about DNA evidence in general, not about this specific case.


Yeah, the system is in a pretty horrific state when you have to count on prosecutors' restraint for anything. Granted, we are in such a state, but it's beneficial not to just accept that as the status quo.


It would also give every person convicted using their software an incentive to open an appeal.


I like how this is considered a bad thing. Like we can’t let this guy point out that he’s being convicted by an unauditable black box that suddenly isn’t worth using if it has to stand up to scrutiny because then everyone would want to. The horror.

Like I’m actually kinda shocked this is the reality. I would have assumed that DNA evidence would have some blessed methodologies and tools/algorithms, with a strict definition of what constitutes a match or partial match specifically so this wouldn’t happen.


Here in Sweden, there is a legal practice that you can't find someone guilty based on DNA evidence alone. Probabilistic evidence is nice to point law enforcement in a direction, but there is always a risk of false positives.

In this case we are also dealing with probabilistic genotyping involving DNA Mixtures with DNA from several individual contributors, and most likely degraded DNA. It is the tool the police can use when other more traditional methods is not possible because of the mixture. That should mean the qualitative value of the DNA evidence is lower, requiring even stronger additional evidence from other sources.


In the U.S.A., a man can be convicted upon the word of a single witness, even if the defence poked significant holes into the reliability of said witness.

What can happen in the U.S.A. is that one lone man says “I saw the defendant do it.”; the defence attorney can point out that the witness was drunk at the time, that he has motive to lie, that he initially reported another story to the police and only later settled on this story, and what ever else to render him completely unreliable.

The jury can nevertheless return a verdict of guilty, and there are no grounds for appeal then, as it is the power of the jury to decide who is “reliable”, and it is not required to explain it's thought process at all.

What a shocking development that such would result into a criminal justice system where a defendant's race and gender plays such a factor.


> What a shocking development that such would result into a criminal justice system where a defendant's race and gender plays such a factor.

It takes only one person in the jury to hang the jury. It's not a majority vote it's a unanimous vote.


Bench trials are also required to be unanimous.

Methinks the U.S.A.-man often thinks that bench trials in other countries are done by a single juror; they are not and can range from three to twelve in how many professional jurors are required to reach a unanimous conclusion.

But this is not so much about lay fact finding vis-ǎ-vis trained fact-finding, but the rules of evidence.

Scotland also has jury trials, but does not permit that a man be convicted upon the word of a single witness; there must be further independent, corroborating evidence.

There are many other differences with, for instance, the Dutch system that guarantee a fairer trial. One very big one is that in the Netherlands both the defence and prosecution have one groundless appeal; either side if it not agree with the verdict can demand a fresh new trial with different jurors once. — this obviously reduces flukes of justice.

The other is far stronger rules of evidence and more consistent rulings. Juries are very fickle and legal experts rarely know what verdict they will return based on the evidence they saw before them; whereas with trained jurors, their verdict is often similar with the same evidence given to them.

Indeed, one might argue that the practice of plea bargains, which would be considered unconceivably unethical in most jurisdictions, are actually the saving grace, as they permit stability to this otherwise fickle system as the negotiations between both parties are more reproducible given the same evidence, than fickle juries.


Interesting. What does Swedish law consider non-probabilistic evidence? Even something like eye-witness testimony I would consider to be probabilistic, given how easy it is to manipulate memories, even unintentionally.


Clear videographic, use of a PIN that only the accused had access to, etc


How do you prove that only the accused had access to a PIN? Surely that's probabilistic as well?


Isn't all evidence probabilistic? What's an example of something that isn't?


This is one of these scary areas where reality matches my teenaged experiences playing Shadowrun. I used to hope that the brutal dystopia we played through was just fun. Now I’m seeing that the present needs a word even more brutal than dystopia. :(


Kafkatopia


You nailed it! Thanks friend, that’s some wicked writing and thinking.


I do not find this reality worse at all than people being convicted upon the black box testimony of blood splatter analysts, which is simply an expert testifying that in his conclusion the blood indicated such-and-that.

Or of course, that the U.S.A. permits conviction based on the sworn testimony of a single eye witness, which is noteably unreliable.

All of these are black boxes that are routinely meant to convict. — it would not surprise me if such software were far more reliable than human eye witness accounts, but if there's one thing I noticed, it's that a man is seldom afraid of bad matters, he is only afraid of bad matters produced by new technology; far worse matters can stay, so long as they be ancient enough.


> If it would take 8.5 yrs to review, it's probably god awful, and should never ever ever be used to convict someone of such a crime.

It's not like you review all scientific evidence and re-do the experiments that lead up to the discovery of <insert some evidence method> in the first place. Validating all that would also take years and much of it can be established as generally accepted by all parties. Similarly, there will be some trust involved with this source code as well. Getting the opportunity to look for bugs is essential in my opinion, but it needn't take multiple years. Focus on the parts you doubt, similar to what you'd do if you were reviewing the scientific method used in analog evidence.

Of course, the two aren't identical. Validating scientific methods and validating a program is different in that the program is proprietary and the science (usually) merely behind a paywall. The latter can then be replicated by others and becomes established. The former will only ever be seen by that company and doesn't become established. So scrutiny is necessary, but after a couple cases that used an identical version, requiring access without articulating particular doubts would unduly delay the case. It doesn't seem unreasonable to start trusting the program after a bunch of defendants had experts look at it and found no way to cast doubt on its result. If you don't think software of 180k lines can be used in court under such circumstances because it would take too long to review, we should throw out pretty much all software anywhere in the judicial system. (That's not what you said, but some of the replies including yours hint at that.)


> It's not like you review all scientific evidence and re-do the experiments that lead up to the discovery of <insert some evidence method> in the first place.

Actually, it is. That's how science works and that's how convictions often get overturned.

> Validating all that would also take years

Are you suggesting that unvalidated data is being used to prosecute crimes?

> and much of it can be established as generally accepted by all parties.

The point here is that it isn't established as generally accepted by all parties.

> Similarly, there will be some trust involved with this source code as well.

"Trust but verify"

> If you don't think software of 180k lines can be used in court under such circumstances because it would take too long to review, we should throw out pretty much all software anywhere in the judicial system.

I firmly believe that if the source code isn't available to review by all parties, including the public, then it shouldn't be used in a criminal court.


> Are you suggesting that unvalidated data is being used to prosecute crimes?

Yes. Pseudoscience is the bread and butter of criminal forensics.


I think they meant "should be used to prosecute crimes."


> It's not like you review all scientific evidence and re-do the experiments that lead up to the discovery of <insert some evidence method> in the first place. Validating all that would also take years and much of it can be established as generally accepted by all parties. Similarly, there will be some trust involved with this source code as well

There are a few important differences between a generally accepted method, and some Matlab black-box that you feed an input into, and it prints out 'guilty' and 'not guilty'.

1. The former is based on centuries of peer review, where the best ideas eventually get selected for. The latter is an externally un-reviewed application, which encapsulates the best of whatever we could ship by Thursday.

2. You can call an expert witness to the stand, and ask them questions about the state of the art of <some evidence based method>. You can ask them why. You can ask them about how certain one should be about their statements. You can't cross-examine a black box.

The actual solution to your quandary is to require that forensic analysis services must pass an annual, independent, double-blind analysis of the accuracy of their methods, before they are used in a courtroom - and that the results of those audits are made available to the defense.

It's one thing for a man in a lab coat to take the microphone and say that their methods are accurate 'to within one in a million'. It's quite another to see an audit, where 100 samples were sent in for analysis over six weeks, and only 92 of them were analysed correctly.

A jury might still convict on the basis of that 92% accuracy, but only if other meaningful evidence points against the defendant.

Unfortunately, the reality of forensic science in 2021 is that most of it is sloppy bunk, with no assurances of accuracy.


>The actual solution to your quandary is to require that forensic analysis services must pass an annual, independent, double-blind analysis of the accuracy of their methods, before they are used in a courtroom - and that the results of those audits are made available to the defense.

Agreed! But if that's the standard, it still doesn't involve letting the defendant see the source code.


In USA, accurate to 1 in a million means you can convict 300 innocent people for every guilty one.

Bad stats, especially around DNA, has convicted many innocent people.

BTW, law and Order did an episode on bad DNA science convicting someone.


As an innocent person, I'd much rather take a 1 in a million chance at a wrongful conviction, than risk a jury trial.


the point he is making that 1 in a million was an outright lie used by prosecutors to secure convictions on innocent, while the real criminals are still out and about


> Validating scientific methods and validating a program is different in that the program is proprietary and the science (usually) merely behind a paywall.

Or completely fictitious.

Have you heard the story about the FBI crime lab and the “science” of fiber analysis that they developed, and not only used in federal criminal trials but also provided as a service for state and local agencies for decades?


Or the Shirley McKie debacle in Scotland in 2005, where it turned out that finger print detection was more of an 'art' than a science. The ball got rolling once they started convicting police officers (so at least the analysis was double-blind?)

Or the phantom of Heilbronn, where dozens of crimes were linked to a single woman. Who turned out to be the lab technician that assembled the kits. Doubts started once they discovered the caucasian female DNA in cells of the charred remains of a black male.

I often wonder how prosecuters defend against the use of these cases to create doubt.

https://en.wikipedia.org/wiki/Shirley_McKie

https://en.wikipedia.org/wiki/Phantom_of_Heilbronn


It indeed doesn't seem fair to them that they are forced to share their trade secrets simply because the prosecutors elected to use their software.


it doesn't seem fair to sentence someone (to death?) based on a unaudited secret sauce tool.

the company can always retract their product if they want to keep it a trade secret.


You assume it to be one or the other.

The real solution would obviously that the prosecutors cannot enter into evidence the conclusions of any closed source software.

Of course, this only displaces the issue, what of the black box c.p.u.'s whereupon this software ran?


If their argument is that their codebase is basically unreadable, then I see why they're scared someone might find some bugs here and there.


I'd get an expert witness in there to testify basically that.

I also don't think you should code anything mission critical like this in Matlab. It's a decent language for prototyping, not for production.


The numerics in Matlab are far better than pretty much any developer can produce in production. This is why Matlab is used in production - it's vastly more reliable than people rebuilding the things it is good at by hand for bespoke solutions.


"I could whip this up in JavaScript in about NaN or [object Object] hours over a weekend!"


True. But I believe this is a case where correctness and clarity are the paramount concerns.

There should be a public reference implantation of these methods if they are going to be used in court.


Seems like mathematical clarity is what Matlab is good at, at least compared to e.g. python and C.


Most industry Matlab I've seen is similar to numpy code, heavily vectorized to make it work fast, somewhat inscrutable for everything that's not linear algebra, and a lot of assumptions about perfect floating point precision. Couple that with a unit-testing unfriendly culture and you have a code disaster. Especially on 170k lines.


Most industry C/C++/Java/Python/XXX code I have seen in production is a numerical disaster. I've been working in all these codebases for decades.

There's nothing you just wrote that is any better in any other language, except that Matlab provides a huge suite of state of the art numeric routines that almost no everyday developer could come close to making as solid.

Writing a nicely illustrated manual on brain surgery with nice fonts and proper grammar based on 11th century medicine is of little use for doing actual brain surgery.

Writing clean code based on bad numerics is also of little use for producing good results. Especially if you then have to defend that codebase in court.

Bad developers will make bad decisions in any language. At least using solid numerics underlying the code provides a huge benefit to building the entire codebase instead of on crap numerics. Every nice clean codebase I have been part of has still had crap numerics. Good numerics is nearly completely orthogonal to clean code, and it's a highly technical skill set that almost no developer has even an inkling of how to do well, no matter how pretty their formatting and documentation. I have never in 30+ years of working on highly technical teams worked with someone who really gets the nuances and details of how to do solid numerical code. I routinely get codebases and developers that do the absolute worst things numerically. I have only really good people in conferences on such topics, or online from similar filtering. These people are extremely rare in software development, to the point I don't think I've ever met on on an actual project (and the numerics when needed have always fallen to me, and I've often been selected for technical projects because such people are terribly hard to find when needed).


> I'd get an expert witness in there to testify basically that.

Sounds really expensive.


Not necessarily. I'd happily do it for a reasonable hourly rate + transportation. It would end up costing not more than a few thousand dollars, which is very reasonable given our legal system.

Hell, if it seemed outrageous enough I'd probably do it for transportation costs alone.

I'm sure I'm not the only one with this outlook.


It might not be as much as you think. I know professionals who’ve been paid a few hundred dollars to be an expert witness more than once, but usually in medicine. It’s easy money for someone if a lawyer often does certain cases that require an expert witness.


Laughs in autocoded control systems


>Second, even if it did take a full 8 years, is that a good reason to deny the defendant due process?

"It's just gonna take so long, plus the code is a bit messy. We're gonna be doing all that work just because the rest of someone's life teeters on the results of the inquiry? Maan, that's a bummer."


> Second, even if it did take a full 8 years, is that a good reason to deny the defendant due process?

No, but the person that wants to have it analyzed will have to either spend the time themselves, or pay the expert witness for their time; it could be a costly affair.

But I think it's warranted. An independent software review, and a double blind assertion with the exact version of the software used in the conviction to test the accuracy and reliability of the application.


Any software used to convict people, especially on a such a serious crime, should be audited like the fed is. Twice yearly, once by a public firm and another by the government itself. It should have to pass both of these audits to be used.


It should be audited by science.


Yes there should be teams comprised of people whose expertise is in biology (specifically whatever field is responsible for DNA matching) and other people who are programmers that review the code and make sure there’s no mistakes. That would do it and I’m surprised this isn’t already a thing.


It should totally be available to be "red teamed", like the researchers who exploit fingerprint readers with gummy bear moulds made with laser prints of fingerprints, or who hold up photographs to face recognition systems.

The government should send a half a dozen to DefCon/CCC and let attendees loose trying to fool them.


Hardware and business process would also need review. Its no good having perfect code if you can insert the wrong sample. It's also no good having perfect code if a well-timed EMI burst or power level shift games the result.


Definitely. There was even a documentary about it on Netflix about how crimelab technicians were pushed for results rather than accuracy. A couple major scandals involved intentional faking of tests and resulting in countless people going to prison and the judicial consequences it caused as well as the friction involved to seek justice. Like, oh no, all these people convicted for life on flimsy grounds is such a burden of our time.


Eh that’s overkill. If can be proofed and guaranteed that it isn’t bugging out we don’t need to run through every building block this way.


I would have said that too before starting a hardware company. Now, I'm not so sure.


> but the person that wants to have it analyzed will have to either spend the time themselves, or pay the expert witness for their time; it could be a costly affair.

Sure.

And the prosecution using the company claiming to have "totally reliable DNA evidence" should be totally on the hook for those costs (plus damages) when that analysis or expert witnesses show up "reasonable doubt" flaws in the software or the processes in which that software is used, including then risking retrials or mistrials of all other cases in which it was used.

If the prosecutors want to play high stakes games with defendants lives and liberty using "evidence" from proprietary software or devices, they need to be held to the consequences of losing their stakes.

[Edit: I wonder what the legal system would think of a CyberGenetics competitor funding the expert witness analysis of their software on behalf of the defence???]


Wait until you hear how long it would take at a rate of 2 lines per hour, as long as we're just throwing out random numbers.


A lot of bashing over a vague third-hand quote without a source.

In the very next paragraph they say:

> The company offered the defense access under tightly controlled conditions outlined in a non-disclosure agreement, which included accepting a $1m liability fine in the event code details leaked. But the defense team objected to the conditions, which they argued would hinder their evaluation and would deter any expert witness from participating.

So it's a concern about IP protection for them.


Maybe, or maybe that's just the smokescreen they put up to prevent anyone looking at it.

I notice that they didn't say "here's the results of our last independency audit and verification of correctness", which I think would be a fantastic counter-argument... if they had one.


There is no legitimate need for IP protection against destroying innocent lives.


Indeed. In fact, if there were a globally agreeable case for open science and open source, to which all governments could contribute, DNA analysis for criminal attribution would have to be it, no? A side benefit would be that sending evidence to numerous international labs would greatly frustrate attempts at domestic law enforcement / lab corruption.


“If the criminals know how our system works, they can work around it!” is probably what they’re thinking. i.e. security through obscurity.


> So it's a concern about IP protection for them.

For that kind of product, source code is not actually that valuable in itself; it's the standards compliance, reliability and trustworthiness. Most charitable explanation is that the vendor is clueless about what their value really is, least charitable is that they know exactly how fucked up their code is.


Arguably the public analysis of the correctness of their methods ought to be done once in a fashion that is usable by all former and subsequent defendants. This is supported by their own claims that such an analysis will be extremely onerous.

If its a multi million dollar affair like they claim its virtually impossible that every defendant will be able to fund such an affair.

In fact in the case that a disastrous flaw is found it may be advantageous to simply drop the case and hope that past and future defendants wont be able to each afford to press the point.


It shouldn't take 10 lines an hour should it? I don't have experience reviewing professional code of this size, so please correct me if my assumption is wrong, but that number doesn't seem right.


As part of a quality control team, I personally went through over 1.2 million lines of working code (i.e. not including comments) over the span of about 8 months, M-F, 9am-5pm.

So, yeah - this number is bunk.


It really isn't. Most of the code is probably going to be uninteresting and you can do 10 lines a minute or more. Some of the code will be more relevant and might take a day for 10 lines. This would just be checking for accuracy though so you could probably just ignore a huge chunk of it.


I'd say more like ten lines a second, if you're skimming it to look for something. And of course, the work can be parallelized.


Is this satire?


I'm not sure what you mean. What numbers would you call realistic?


"With enough eyeballs, all bugs are shallow^h^h^h^h^h^h buried in the critical open source dependancy underpinning the entire internet maintained by that one guy who's holding down a day job and doing it in his spare time." -- with apologies to Linus

(As always, XKCD beat me to this gag: https://xkcd.com/2347/ )


10 lines per hour? doesn't that seem painfully slow?


It depends on the level of scrutiny. It doesn't seem unreasonable. We review a lot more code per hour (usually C-like code though) but then we're not supposed to lock someone up for murder, we just find basic things like memory corruption. Don't even need to get into the business logic to find bugs that totally break the application, let alone all of it.

When writing Python (I don't have stats about reading), a 1.0 version of a small project took me 1.5 hours and consisted of 183 lines of code, so 2.2 lines per minute. That's much faster than this, but 183 lines is also a ton less complex than understanding the entirety of 180k lines and properly assessing whether it does exactly and only what it's supposed to.

10 lines per hour is probably taken as a lower bound to prove a point, especially because they argue about checking the whole thing (large parts can probably be skipped), but as a standalone statistic I would say it's probably within an order of magnitude from the true value. And for software time estimates that would be an amazing feat :p


If it means what I think it means - understanding the code - sometimes it takes days to understand just one line of code. Document digging, googling, asking around, fiddling with test cases, reading production log etc.


have you ever seen MATLAB code?


That’s actually a pretty good argument for banning such code from the criminal justice system. The idea that unreadable code is deciding who gets locked up is really worrying.


This thread makes me sad because when I was taught and used matlab we had strong pressures to properly comment and document our code to make it legible (if only to our own future selves). It feels almost criminal to not do that in these circumstances.


and depending on which MATLAB version they run it in, it might have completely different results if they haven't tested carefully...


This is not my experience. Could you share an example?


well, this script, for example, gave me all sorts of fits trying to run on the then-new version of MATLAB I tried it on: http://cpc.cs.qub.ac.uk/summaries/AEJS_v1_0.html


Ah okay. I thought you meant numerical changes, which would be reason to not trust a language.

Mathworks has broken some legacy support in the past, but they have slowed down on that practice. They used to threaten that dll loading would go away “in a future version of MATLAB” but have since backpedaled on that. My biggest issue is writing code that leverages cool new features (especially timetables) but some people I work with never update their IDE.


Yes, You also dont read programs like a book, You generally follow the methods being used, Reading line to line would be like reading a a book with all its pages re arranged.


Both 0.1 lines an hour and 1000 lines per hour would be equally wrong. That isn't how people would review that sort of code. They would test it and then thoroughly examine any areas of concern that crop up.

I've run into 300-line programs that have taken me a month to figure out because the math was hard and I've run into 100,000 line programs that have taken me a few hours to tear apart.


10 lines an hour seems quite low.


8 years is a long time. What he then wanted to code review Matlab, and then the compiler that Matlab used, then do some silicon verification...

Six to nine months seems like enough to do a very good code review with some testing. There's a good chance that 75% of that Matlab code doesn't execute for his test.


This is this persons life we’re talking about. No amount of time is too much and it isn’t hurting anyone to let him do a code review.


No. You give a reasonable amount of time. What if he asked for 100 years?


The person's guilt must be proven beyond all reasonable doubt. If you are accusing him there is a range of possible proof. If all you have is based on an unreadable codebase, that takes 100 years to read, that's your (the prosecution's) fuckup. As a jury I would not convict.

I don't want prosecutors sleeping on the job, bringing in fraudsters laymen and psychics to accuse people, etc.

The prosecutor should use a company that can present independent proof that their system actually works.


So what if I say i need to review camera footage of me doing a crime and it will take me 100 years to validate it?

You need to establish reasonable timelines for this or any guilty person will claim any technology used will take 100 years to verify.

There is no reason you can’t analyze a DNA analysis codebase in 6 months. Unless you also need to verify the science.


I am not clear what you are arguing - it there a real problem where people invent fantastical amount of time to review simple evidence?

The claim about the amount of time was not made by the defendant, it was from the company that produced the code.


Thanks for the clarification -- I actually misread it and thought the defendant was making the claim. With it being the company, then I agree that the company has an obligation to make review simple enough that it can be done in a reasonable amount of time (from someone knowledgeable in the field -- I still don't think we need to allow someone 10 years to understand statistics and progreamming before even beginning the code review) or recuse the tool from use in a trial.


We should flip the responsibility. How does the company providing the software prove it is not flawed.


As long as he’s actually reading the code, and still in jail unable to hurt anyone, let him have it. He’ll likely pass before that time is reached and if he’s just wasting time he’ll get bored having to spend hours a day looking at code that he doesn’t care to look at. I get what you’re saying but my point is him wasting time reading code isn’t hurting anyone and is a better use of an inmates time than sitting around.


If he’s in jail doing it then that’s fine.


For practical purposes its not in the interest of justice to punish people before we have found them guilty as a matter of law and justice. If it really requires 8 years of work to prove whether the tools they are using work or not then they haven't independently validated their work in the first place ergo we have no reason to suppose it works.

We should either pay for multiple people to work on it so we can have the answer in less than 8 years or we shouldn't use it at all.


The burden of proof does not suddenly fall on the defendant just because one for-profit conviction company can’t back up their product.


Who said they can’t back it up? The defendant claimed he needed to review it. The company presumably has done its own validation.


There is no reason to presume they did their own validation correctly absent proof of same.


Hopefully they have to have demonstrated some level of quality to be used as legal evidence in convicting someone. Although, I would hope "code analysis" is an infinitesimal part of the validation with the majority being real world end to end tests. (e.g. we can take 10,000 samples, divide them in two, mix them, then use our tool to pair up the samples with 100% accuracy).


I think this is the crux of the problem. If the defendant gets to decide when validation is sufficient then should they ever say it is sufficient?


When the jury agrees the evidence is there beyond a reasonable doubt. No one should trust any black box when it comes to criminal prosecution.


So the prosecution can present its case and say they gave the defendant the code for 9 months and here are five other independent reviews. The defense can argue they needed more time. The jury then decides if there is reasonable doubt.


If the company can't show that the process has been independantly validated it probably should be tossed before the jury hears it.


There are papers that do validation of TrueAllelle. I don't know the product well enough to know if it is the same one used in the article, but there isn't info in the article to know if independent validation was done or not.


>presumably

>their own validation

That is not how things are proven.

“You don’t need to know how we came to this scientific conclusion.” An appeal to authority doesn’t fly in science and it certainly doesn’t in law.


Such code should have tests, analysis, and documentation that ought to go a long way towards proving it correct in the time required to transmit data. If they can't produce THAT then they shouldn't use it in a court of law at all.


This is similar to a case a few years back where someone asked to see the source code of a breathalyser that had found them guilty.

AFAIR, the breathalyser was incorrectly averaging the readings, giving disproportional weight to the first reading.

I don't know if it was enough to rule in their favour, but I'm sure it called the data into question

Edit: Looks like it was a Draeger breathalyser https://www.schneier.com/blog/archives/2009/05/software_prob...


A tool designed to find people guilty is biased to find people guilty.

As far as I know it is fairly easy to take a generic dna sequencer meant for healtcare diagnostics, and repurpose it for STR analysis. The only major difference between the healthcare versions and the forensic versions is the software i/o.


> A tool designed to find people guilty is biased to find people guilty.

I don't see those particular issues make it biased, just inaccurate - it could go either way.


And yet somehow whenever you take a closer look at mislabeled product prices, the average is always in favor of the store. And that's far from the only industry.

Complex tools are the product of many thousands of individual decisions taken by humans, humans aware of who's the paying client.


> And yet somehow whenever you take a closer look at mislabeled product prices, the average is always in favor of the store.

This could just as easily be selection bias: the errors in favour of the customer are less likely to get reported by customers.


> And yet somehow whenever you take a closer look at mislabeled product prices, the average is always in favor of the store.

What is this based on?


I had a 'fun' experience along these lines with health insurance and medical bills a couple years ago. I can confirm that in our case at least, /every/ error we found was not in our favor, and took usually about an hour on the phone to get fixed.

The somewhat-less-malicious interpretation is that the companies have a strong incentive to detect + fix errors that cost them money. Meanwhile, consumers are a) non-centralized, uncoordinated, and often unaware of errors, and b) have no way to fix systemic issues that impact them. And the companies therefore have no /real/ incentive to fix systemic problems. It is literally more profitable to fix the bills of the few people who complain, as they still make money on the remainder who don't notice the errors in the first place.

(on edit; exactly what the other comment one subthread over said. :P )


decades of data?


I still believe Hanlons razor applies. I've seen products that have serious performance affecting bugs caused by similar mistakes.


I still think—even when applying Hanlon's razor—there's an imbalance in incentives that leads to a weight in favor of the interests of the party paying for the test.

Take the store pricing example. Suppose the store's pricing & labeling process produce an equal number of bugs at checkout in favor of the store and in opposition to the store.

The store is heavily incentivized to detect the errors that are opposed to them. They are much less likely to detect the errors in their favor. Consider the manager that looks at the cash at the end of the day and notices they are $500 short. They likely dig hard to find the root cause of the issue, detect the pricing disparity and correct it. Now consider the manager that is $500 over at the end of the day. They are much more likely to say: "that's weird", shrug their shoulders and move on.

The same applies to forensic tools. Even if they originally produced bugs in both directions, their own internal QA and the market of police officers are likely to work hard to detect bugs that make them less likely to allow them to make an arrest.

The net result is that the tools end up with a bias in one direction, even if the original developers made an equal number of mistakes in both directions.


Most store managers get as grumpy about overages as undercounts. They mean that some customer got shortchanged. For $500, it probably means a lot of customers got shortchanged, or something even worse is going on. That makes customers grumpy, and it affects your future.

There are plenty of lazy managers who would sweep it under the rug once. But if it happens more than once, it can become their job on the line. They start looking for who's counting wrong. And if they can't figure that out, they get really worried.

I have no idea about police officers and prosecutors. But store managers care about accuracy of counts, not just profits.


When running an experiment and following poor practices (i.e. p-hacking), results that fit the hypothesis will be accepted more readily and negative results will be debugged or re-ran more often.

i.e. The initial error may be randomly distributed. But the follow-up on the error will have a lot of bias.


This is also similar to how Toyota hilled people with control software that would cause car to accelerate randomly. The software audit team concluded that they could not find the bug, but the code was totally unreadable and terrible. They settled.

https://www.nytimes.com/2013/10/26/business/toyota-agrees-to....

Also let's remember that a company in UK was selling fake bomb detectors to Israeli and other militaries, and it took them more than 10 years to notice!

https://www.bbc.co.uk/news/uk-29459896

There needs to be proper scrutiny into these things, I could start some random 'deep learning to find criminals' company tomorrow, and have less regulation than a car mechanic


Every attempt to replicate this has failed, and other vehicle manufacturers have just as many 'sudden acceleration events'.

https://en.wikipedia.org/wiki/Sudden_unintended_acceleration

Occam's razer points to people just hitting the wrong pedal or people's floor mats getting stuck.

The reason Toyota ate it in the press for this was competitive.


> The reason Toyota ate it in the press for this was competitive.

can you explain what you mean? I don't understand this sentence.


I think they mean this: Toyota “ate it” (an idiom meaning to take losses) in the press by having bad stories written about them. These stories were written for a reason. This reason was not that Toyota deserving those stories, but that Toyota’s competition encouraged those stories.


From what I've seen of the Toyota case, the thing that Toyota actually got slammed on had nothing to do with control software, but was about the mechanical design of the pedals and the floor mats.


I wouldn't be surprised if "incorrectly averaging" and similar are very common software errors.

The reasons are manifold, including:

- Normalized values need to be averaged differently the absolute values.

- Floating point has limited precision, even just correctly summing/multiplying numbers need special care if you care about correctness. Results can, in the worst case, be of by a massive amount.

Often you don't need to care about it so it's not uncommon for especially junior programmers to be not so aware about it.

I mean in the last 3 years of working as a professional software engineer/developer I didn't need any of this at all, but once I do I know what to look out for.


Draeger requires calibration prior each use for temperature and humidity. It is easy to get it thrown out as evidence.


When the officer shows up and swears under oath that he "calibrated it that morning" it's not so easy. :shrug:


Logs of calibration procedures along with calibration results should be stored on the device and auditable along with test results.


Temps vary throughout the day.


A rural small town judge probably doesn't have the best insight into that, just that Corporal Bubba said he calibrated it so it must've been right.

I know, this is exceedingly cynical.


Your "Corporal Bubba" isn't just cynical, it leans heavily upon a harmful stereotype that folks in small towns are uneducated simpletons. Stop the polarization, please


Why wouldn't they be? I went to university with people from all over Russia. They don't tend to return to their home towns after getting educated. Does this work differently in America?


> Does this work differently in America?

No, not really.


Not all poor people are stupid.


[flagged]


Is hatefulness an end in itself for you?


Maybe. Thanks, I could probably use some self-reflection.

(The anti-stress effect of covid vaccination seems to be much more immediate than I expected. This is the second time today I find myself saying things highly unusual for people on the Internet in general and my yesterday's self in particular, and the first time was literally a couple of minutes after the procedure.)


I'm glad that you're feeling better :)


I'd love it if more cops had college degrees in the US, but they don't. Ignoring the problem doesn't make it go away.


Sadly, when you consider the available evidence, Corporal Bubba isn’t too far from the truth. I say that as someone whose Dad is a retired police officer and who spent most of his youth in a very small town with sub-1200 people.


Rural areas are subject to brain drain where the best and the brightest disproportionately leave for more urban settings where more money is available to be earned. Living in a rural setting is something one chooses not an immutable fact like skin color and by and large the harmful stereotype is spot on.


I have worked with MATLAB code with 20,000 lines of code. Only over the past years, OOP and unit-testing has become properly available and usable. My guess is that this 170,000 lines are written in the old procedural way (also for performance reasons) and are full of bugs, also thanks to the lack of supporting tools.

Most likely, this grew out of a research prototype that just worked too well to be reimplemented in a proper production environment.


I wonder if bug tracker and other reports would be part of discovery in this case.


Equally interesting is in my opinion who should do the review. Mathworks' own consulting service is probably the best to do so, but I wonder if they would objectively work against one of their own customers.


Yeah, any technical expert in a trial concerns me. I was an alternate juror (meaning I had to sit through the trial but was not allowed to take part in any deliberations) in a trial that involved the testimony of a computer "expert". The expert's testimony was 100% true and appeared to definitely prove X to someone who knew nothing about the subject matter. It was things analogous to saying the system was secure because it had a security chip.

There were 1,000,000 questions I wished had been asked.


Matlab has had classes (both types!) for ages. The unit-testing stuff dates back to at least 2013, and there were toolboxes to do similar things even before that.

The language certainly has some warts, but IMO, the bigger problem is that it's usually learned/used in contexts that focus on code quality: the goal is the resulting number or plot rather than the software that generates them.


That's why I wrote "available and usable". What Mathworks called unit-testing back then was laughable. It only got interesting in the past two-three years. Same with OOP and the features added in the past years (e.g. type hinting etc.) You can see that Mathworks themselves preferred not to use OOP in their own toolboxes. Parts of the Financial toolbox use it (e.g. the SDE stuff) or the datafeed toolbox, but much stuff is still written the old way. Another example would be Appdesigner as the new preferred way of writing GUIs with OOP. It is still much slower than the old GUIDE functions.


If ever there was something that should be fully transparent it is the mechanisms by which a person might be found guilty of a crime. The defendant shouldn't even have had to fight for this. It should be a fundamental cornerstone of criminal prosecutions.


Nothing is going to change until this software convicts a 10M net worth dude for something he didn't do.


This is pretty depressing unless you can elaborate on your take.

Yes, the system is stacked against the poor, but there are people fighting that. If they are fighting and failing then we need to know why.

If they aren’t fighting at all then (in part, but it’s still a significant part I’m afraid) it’s because of attitudes like this.


I think the point is that they are fighting but they are losing. Some 1st world countries are really bad at seemingly basic human rights for the poor. Doesn't mean they shouldn't fight, just an observation.


These cases also often come up with drug and alcohol detection tests, and as John Oliver points out in https://www.youtube.com/watch?v=1f2iawp0y5Y, software used to select jurors.

All of these companies claim that their source code is valuable intellectual property and that disclosing it can hurt their business. Even if this were true, when you're providing something that can be a significant factor in someone being imprisoned or executed, when creating the business you should accept that you're providing a public service that needs to be publicly accountable.

If it's not open source, at the very least there should be a requirement that software code and hardware designs must be provided on-demand to experts in court cases (with a non-disclosure clause to mitigate leaks and corporate espionage etc.).


Without jumping on the conspiracy bandwagon, I'd also like to see this applied to voting software. I know it's a hot topic, and I'm honestly not trying to get political.

Software that is critical to our fundamental human rights, and is being used by our government should be open source, or at least audited by a group of people who sign Non-competes/NDA and can't go work for competitors, or with some other mechanism to protect IP that I can't think of.


The beauty of voting software is that you don't have to verify the code if you hold the vote correctly. If the software provides a voter verifiable paper trail, the voter can verify their vote before turning it in.

The county can then verify the software by manually counting a random selection of paper votes to see if they match the software. If they do, then the software is correct, otherwise it is not. You then have a full by-hand recount and tell the vendor to fix their software.


I feel very strongly that all votes in all important elections should be counted by hand, and be open for anyone to observe the process (within reasonable limits on disruptive behavior).

Not because of the possibility of voting machines being hacked, but because it is important for the public to have trust in the system. It is difficult to trust a system you do not understand, and only a very small minority is ever going to be able to audit voting software.

(I'm not American, so this is in no way a comment on your current predicament.)


I agree, for my own piece of mind. But I am also certain that it would have made no difference in our current predicament with a third of the country thinking the election was stolen.

It has been shown to us time and time again that no actual evidence is required to get people to believe what they want to believe.

And the more technical the evidence (i.e. source code), the less helpful.


>But I am also certain that it would have made no difference in our current predicament with a third of the country thinking the election was stolen.

It would have changed some peoples minds I don't know if the change would have been a few thousand or 10s of millions. I can't say if it would have a dent in the 1/3 of people or not. I can't predict that. It would have helped me with my own peace of mind. And frank I think it's overall the right thing for us to do.

>And the more technical the evidence (i.e. source code), the less helpful.

Disinformation is powerful, I'm not suggesting this alone would fix that. I disagree that more technical evidence is harmful. Global warming is benefiting from transparency and evidence. It takes generations to change political will not years. The evidence there has shifted our whole economy, just maybe not fast enough.

There will always, always be deniers. Global warming, flat earth, vaccinations, etc. Evidence _helps_ battle deniers in these areas, but it takes generations for these ideas to become mainstream and the deniers to go from 99% of people to 2% of people.

Also, 2% of people think the earth is flat? Holy crap. https://www.sciencealert.com/one-third-millennials-believe-f...


Why would people who weren't convinced by reputable evidence in the first place be convinced by slightly better evidence that is only better in a technical hard to express and prove fashion. This is especially true when the people doubting are the least educated and least intelligent.

It's like saying that better proof of evolution would convince some portion of creationists. That's just not how misinformation works.

Misinformation works by targeting vulnerable parties with misinformation that aligns with their existing vulnerabilities and beliefs in order to power relevant action with long stored and fruitful sources of hate, bias, and scorn in a fashion that bypasses the brain and goes right for the gut.

Like 30% in America believe in a young earth that is thousands not billions of years old.

If Bob is a scientist of some sort and presenting interesting scientific work to the community and incidentally advising the government on environmental policy that will harm some business and you want to crush support for this by playing on existing biases with this group you advertise to the young earth crowd about how bob is anti God and see if you can tie bob to as many negative things they already dislike as you can.

You aren't fighting an intellectual battle to set their ideas on bob let alone deeper ideas you are fighting an emotional battle to galvanize existing deeply held beliefs to obtain useful action like calling up and yelling at their congressman or voting.

In that context asking Bob to present a better case is laughable. The relevant parties never engaged their brain in the first place.


Seems like both you and your parent comment are talking about software audits & auditors. I don't know if that exists in this form, but it seems reasonable that if you can get a security audit you should be able to get a correctness audit. And of course those auditors would be under some heavy-duty NDAs, given the nature of the work.


I completely agree, but you can have an auditable voting record in an election without relying on software integrity. That's not really the case, from the defendant's point of view, in a trial that relies on probabilities implemented in the software. In the case of voting, it is software-assisted. In a case like this article, it is software driven.


It shouldn't be considered conspiracy theory that, technically speaking, many things in our nation are an insecure joke, including our system of voting.


...disclosing it can hurt their business...

And there's an entire body of law based around IP which they can use to protect their business, just like everybody else.


Their business interests are also of minuscule importance compared to the impact to society these tools have.


What is unfortunate is that it took going to appeal to force the judge to allow the code review at all.

Without, at minimum, an independent review (and preferably open source code) the software and lab processes being used constitute an inscrutable "black box" process within which any judgment can be made, for any conceivable reason, with life-changing effects for the defendant (and for the victims of a crime if, for example, a rapist or murderer is set free by a non-match decision).

One could even say that unreviewable code here falls under the umbrella of "secret evidence", which much of the world already knows can be easily misused and/or misapplied at the whim of the court.


People sometimes ask me what my “number” is, like how much net worth or “money” I want, what would I do with it

I say “I want to be able to afford appeals court where my rights matter”

Infinite appeals court!

Most people plea out, cant make bail, dont have counsel buddy buddy with the judge enough to get you bail, and lose the ability to keep good counsel for more and more motions and appeals

I want that, there is almost no pride in American rights if you cant afford them. People tie their whole identity to a system they arent even part of


Or just make the entire system based on a sort of "public defender" model. As it stands, a person accused of a crime and then found innocent has still been punished without even being found guilty due to enormous legal bills. It is a highly asymmetric power structure for anyone who isn't wealth: the prosecutors have massively more resources than the average person to call upon. Alternatively, when prosecuting the wealthy, that asymmetry is reversed, which might be equally problematic.


I've occasionally mused that funding for a legal case should go into a pool, which is divided equally between both sides. That way, any money thrown at a case is, at least in theory, aligned with the incentive of "getting at the truth" rather than overwhelming someone with a valid case, but lesser resources. I don't mind someone raising the profile of a case by adding funding, especially if the stakes are high for one party, but it shouldn't be at the detriment of justice.

It's kind of a half-baked idea, and I'm sure it's not totally watertight but the existing problems you've mentioned really bother me.


I have the same line of reasoning when people talk about having enough to feel secure. Even simple civil legal matters cost in the tens of thousands of dollars easy.

And the system works so that you’re either rich enough to be able to defend yourself and the money spent doesn’t affect you, you’re poor enough that you have nothing to lose, or you’re in the middle, busy trying to get from poor to rich, but you are vulnerable to losing it all because you don’t have enough to protect it, but you have enough that it’s worth for someone else to try and take it.


Yes, that middle zone (that most of us probably live in) is terrifying. And it’s why we get conned into so many different types of insurance.

“One disaster and all that progress is gone.”


"Yes, that middle zone (that most of us probably live in) is terrifying"

More terrifying than the bottom where you got nothing to loose? I doubt it. Otherwise, why be afraid of it?


I would say just because of the energy and sacrifices used

At the bottom you don't have to pretend that the circumstances will improve, and there is some freedom associated with some approaches to that. Careers don't need to have continuity, I know many people in hospitality and service industry whose vacation policy is saving and quitting one restuarant, travelling, and getting another job at a different restuarant when they get back. Sure other approaches have lots of energy used on finding food and shelter that day, and service and hospitality work is not necessarily at the bottom, my post isn't about those approaches and dilemmas.

People in the distinct category of "professional" careers, not my term, don't feel like they have that freedom to have any timegaps and are resigned to earning small periods of time off, and often times that is true.


"At the bottom you don't have to pretend that the circumstances will improve"

Well, sorry, but I would also say, you don't know what you are talking about.

First of all, there is no bottom at the bottom - you can always fall deeper, until there is no more escape than suicide. I know people who did.

What you maybe mean, are people who don't care abobut materialism and live with little to no money by their choice. I lived with those people for quite some time and it was fun.

When you are young and healthy and on your own, you don't really have to worry about a lot of things. I worried about my backpack with my laptop and that was it. I slept in a tent or under the stars or wherever. When the money was gone, there were always places or ways to get food. Work a little, travel a little. Easygoing.

But now I have a family. Now I cannot not have money.


You couldn't possibly be more wrong. Have you ever not been able to afford medication that you know you needed to breath and gone to sleep to have a nightmare about being attacked and suffocated and woke up to find it was real save for the fact that it was your own body?

Ever wondered if you could afford to keep a pet from dying due to being able to afford the care?

Ever wondered if losing your home was going to stress your marriage so much that it might splinter?

The only people who think the bottom is less stressful have never been there.


To be clear, I don’t think the bottom is any less terrifying. It’s just the anxiety of being poor and powerless is now replaced by reality.


Appeals? Only ~3% of people charged even go to trial, the rest plead out.

Just giving everyone a substantive right to trial would amount to a revolution.


I've wondered what would happen if jail culture expected people to go to trial.

So similar to how snitches are targeted, if criminals in jail start violently targeting people that didn't go to trial they might be able to tear down the system...maybe?

And to be clear this is a loose idea as I don't really know the system but it seems courts would be so flooded if everyone took this route. Prosecuters would have to stop with these rediculous threats of trial jail time vs plea deal as jails would become too full. And authorities would be forced to stop charging people for smaller crimes as they simply couldn't handle the case load in courts.

Even getting juries might be tough and start the rest of society pushing back if people were regularly being called for jury duty and disrupting their own lives.

...or something else but this would be an interesting 'fight back' by criminals.


George Bush introduced a PREA/Safe Prisons Program that has radically changed prison culture. Prisons are far, far safer than ever before and getting even safer as time goes on. (Since I know about Texas in particular), Texas has spent millions per prison to install hundreds of HD cameras covering nearly every square inch of ground (outside of individual cells and showers). When someone commits serious violence, they are segregated for at least 5-10 years, depending on the severity.

There are still some gang-controlled areas, but they are an exception now rather than the rule. The nanny state is firmly in control of most of the prisons.


Yeah. But just giving myself that right first, forever.


In Canada you can't even get the breathalyzer maintenance records:

https://www.canadianlawyermag.com/news/general/maintenance-r...


Guess you should just start doing these crucial DNA tests against some sort of panel of tests instead of just one lab. It would be a shame for the quality of the code in your one test to convict an innocent or free the guilty.


The co-founder of the company, Mark Perlin, is said to have argued against source code analysis by claiming that the program, consisting of 170,000 lines of MATLAB code, is so dense it would take eight and a half years to review at a rate of ten lines an hour.

This is hilarious. As if you need to read every damn line and you can’t skip blank lines? You can skip whole files that aren’t relevant. Weak excuse


"It could take too long :(" is an interesting excuse not to examine the evidence that could put someone behind bars for the rest of their life.


The statement is odd, at the same time, it's not outrageous for him to make in the sense that - lines of code notwithstanding - the underlying science i.e. the application of the product is the thing in question.

It's a pretty interesting case.

At least the core nature of the algorithm should be made public if we're going to use it for public inquisition.


This is a stupid "argument", I've personally participated in an audit of a program that has 5-6x as many lines of code.


And of course, as soon as you find an issue (which you probably will given that it's 170k lines of MATLAB), you can stop then and there.


Are forensic labs often get blind tested? If there is a bias for guilty cases then it should turn out in those blind tests. Source code is a red herring here, there should be independent evaluations of forensic laboratories/methodologies/etc... regardless of software source code availability. Maybe these checks are already in place, I genuinely don't know.


I don't know the answer to your question, but blind testing is complementary to (not a substitute for) source code review.

It's very common for software to work correctly a high percentage of the time, but fail on rare input data. If, say, the software works correctly 999,999 times out of a million, you're going to be very unlikely to discover that error by throwing random samples at it, especially if you need a physical process (ie, drawing blood) in order to generate a test case.

On the other hand, once you have a known failing case (as you would if the defendant knows the result must be in error because he didn't commit the crime), it's often fairly straightforward to identify the error by reviewing the source and/or using a debugger to examine the progress of the algorithm.


Agreed, blind testing is important for statistical correctness and code review is important to avoid adversarial backdoors like dieselgate.


>Source code is a red herring here

If there were a way to ensure that the test suite applied to these forensic labs was all-encompassing w.r.t. the genetic variables at play, then maybe. But that sounds impossible. What if there's a coding error that causes the software to operate differently/incorrectly only for people with a certain (rare) genetic abnormality?

For what it's worth, I'm totally unversed in genetics, though I have a great deal of experience writing software tests (and seeing them come up short in adequately modelling real-world data).


Not in my experience. Most labs of various types supposed to get certification, but these certifications primarily about chain of custody, operation protocols, record keeping, and such. It has little to do with the veracity of their conclusions.


If an accused person has the right to see the source code that produced evidence against them, is it a violation of their rights for the source code to be obfuscated, or even just so spaghettified that not even an expert can understand it?

I kinda think that should be a violation. But deciding whether a particular piece of code is so bad is so subjective that I'm not sure on how you'd make a legal standard out of it. Maybe start with "the linter found a ratio of warnings to lines > X%" or some such.

Having a legal standard of code coherence/incoherence might help filter pull requests. "This PR cannot be merged to this project because it is configured to reject legally incoherent code."

As code becomes more complex it may become more meaningful to have access to the test suite, and to challenge the evidence if the tests are inadequate to demonstrate the correct code behavior.


You don't need a legal standard. You just need to put doubt into the minds of a jury. You can get an expert to stand up and say "I'm an expert in computers, and I couldn't understand how this DNA test works. I think it's likely there are mistakes in it that neither I nor the people that made it have discovered".


“This system is an absolute mess that is impossible audit and so there is no way that the company using could find bugs in it either.”


There was a court case against Toyota, where one of their cars would randomly accelerate and cause crashes. The software audit said the code was unreadable and nothing could be proven. They settled.

https://www.nytimes.com/2013/10/26/business/toyota-agrees-to....


In the case where code is used to convict or acquit someone, I think it should be a well-tested and established program: generally something with the software quality of Linux, or in this case, whatever DNA testing kit is being used by scientists in top-ranking universities.

We could also use formal verification based on well-established axioms. For example, maybe we could "prove" that the DNA kit reports accurate results as long as the samples it's given are processed correctly.


During discovery, the opposing lawyer can raise an alarm to the court of such obfuscation. There have been many cases where such behavior cost cases going against the party.


These companies are disgusting. They peddle black box "models," that essentially ride the good reputation of DNA as infallible (which it is most certainly not) to get convictions on dubious or no evidence.

The way it works is that if there is a sample from a crime scene, they send it to these guys and they analyze it with their software to detect "statistical" DNA from the sample. These samples are the ones that are too crappy to actually make a definitive match -- they are a statistical match. So you say "I think Jim, Bob, and Alice were on scene," and it says "10% likelihood Jim DNA, 5% likelihood Bob DNA, 45% Alice DNA." Do you think it ever says "99% no DNA" in the sample?

It's basically Theranos, except instead of wasting $50 on a shitty blood test you get life in prison.

Ostensibly, it searches the entire DNA database for matches, and only returns a positive result if there's a positive match.

But it's a statistical model, using inputs that are crappy at best (because if it was an actual DNA match, they would send it off to in house forensics who would be able to do PCR...) and which includes inputs from circumstantial evidence as priors. Like we believe Alice was at the scene therefore if you find any statistical likelihood that this is Alice's DNA boost that.

They often run the model multiple times in a row, and use the result that the DA likes the most to enter into evidence. This is because the models return different results each time -- of course they'd say, iTs StAtiStIcaL, so they can do that...

And the source code is completely impenetrable. They argue that it's a "trade secret" that jeopardizes their ability to make future profits, so it cannot be open-sourced. These guys could have a model that just says "what percentage should the thing read, Señor D.A.?" The entire product is a sham. And because it's 170k LOC, no one has the time or the qualifications (Judges/Attorneys reading source code? Yeah right!) to review it, even if it were open source.

Pure quackery, and often times, decades-long sentences or life in prison for the defendant. These companies are pure filth worthy of the lowest revulsion. It's a wonder any convictions happen at all because of this stuff, but jurors have very inaccurate conceptions of forensic science, thanks to shit like CSI, Law and Order, etc. These companies happily play into that image and people really believe this stuff works.


Issues with source code access aside, your description is mostly wrong. These programs take a DNA profile as input- it's just that the DNA profile is mixed (i.e. from multiple people). It reporting no DNA would be nonsensical. Figuring out exactly how many people are in a mixture isn't quite nailed down statistically (last I knew of), but it's usually pretty clear for up to 4 or so people.

Yes, you could run different models and get different probabilities. For example, the likelihood that the sample is a mixture of the suspect, the victim, and some unknown person vs victim and two unknown people compared to saying the victim isn't in the sample. However, the specification of those models is part of the trial process.

And the output probabilities (at least when being used to determine guilt) are usually quite high, orders of magnitude higher than 90% or even 99.99%.

My point is that the science behind these calculations is well developed- validation studies get published all the time. Whether or not the specific software has errors (or isn't coded exactly as modeled) is an entirely different matter, but it still isn't all that likely. All of these cases rely on expert witnesses anyway- it's not the prosecutor pressing some buttons and printing a report.

There is far more concerning quackery that gets used in forensics- bite marks, hair matching, etc.


Here in Germany we have somewhat similar cases, but where the accusation is way less damaging than the case of this article, in which a false positive would have the drastic result of being labeled a murderer.

The cases are related to new speeding cameras which work with laser, where the defendants are complaining that these new devices are black boxes, and that they demand access to the raw data which these devices process. The problem is that these devices discard the raw data after having processed it and come to a conclusion that the driver was or was not speeding.

The devices in question are Traffistar S350 from Jenoptik and PoliScan SM1 from Vitronic.

There were discussions about a required software update which retains all this data, but apparently the devices lack the storage capability to do so. The National Metrology Institute of Germany (Physikalisch-Technische Bundesanstalt (PTB)) responded to this, that they would not re-certify these devices with updated software because from their point of view they work "as specified".


Now all you need is the proper amount of collusion/corruption between the certifying agency and the manufacturer to have a magic box that does whatever the one paying the bills want. Might seem far fetched in a developped country, until you read about the Boeing/FAA thing happened.


As far as I could find the courts basically decided that there has to be a way to examine how a result was reached if there was any doubt about it. That the PTB still allowed their use didn't change that and if a case got to court the results could get thrown out. The PTB probably doesn't care because only a small percentage of speeding cases end up in court.


Surely there aren't so many speeders that storing a few seconds per violation would add up to a significant amount.


There is so much junk science going on in forensics that it would be great to require everything to be open sourced. Same for voting machines and anything police in general is using (predictive policing is pretty scary). There is way too much stuff hidden and can be challenged only if you have very deep pockets.


Well, DNA matching is crapshoot and hazardous toward the innocents. We can still find unrelated folks with partial match by the virtue of segmentation.

https://dna-explained.com/2017/01/19/concepts-segment-size-l...


I'm curious. I've always been frustrated by this "closed" business model in the legal system. I feel like the entire process & details should be detailed in the open (code, methodology, controls, etc). Of course the counterpoint is that it makes it easier to copy this business & undercut all the time & energy spent on building it (copying is easier). Is that the only reason? I feel like open kimono is a critically important concept for anything related to the legal system because of how any perversion removes its legitimacy. If it really is that prohibitive to run a profitable business in this space, is there open standards that can be enforced (e.g. "this is the core algorithm that is approved" & businesses must get regular audits to continue to be used & any failed audit causes a reexamining of any court cases you were involved in the past year?). That's less ideal because then who audits the auditors but maybe at least it's an acceptable middle ground from where we are?

In general I've been extremely frustrated how regularly & consistently this entire industry keeps everything secretive & trust-based despite consistent examples of how insufficient trust is for this field & how devastating the results are when that trust is violated.


This isn't a DNA testing kit as one would normally think -

"TrueAllele uses a hierarchical Bayesian probability model that adds genotype alleles, accounts for artifacts, and determines variance to explain STR data and derive parameter values and their uncertainty. The computer employs Markov chain Monte Carlo (MCMC) statistical sampling to solve the Bayesian equations. The resulting joint posterior probability provides marginal distributions for contributor genotypes, mixture weights, and other explanatory variables."

https://onlinelibrary.wiley.com/doi/full/10.1111/1556-4029.1...


How is that different from a DNA testing kit "as one would normally think"?


> Mark Perlin, is said to have argued against source code analysis by claiming that the program, consisting of 170,000 lines of MATLAB code, is so dense it would take eight and a half years to review at a rate of ten lines an hour.

So it’s definitely riddled with bugs. And I can’t imagine that much matlab code following rigorous software engineering practices.


I have done a lot of source code review in my time. For security assessments. Our general rule of thumb was about 10k lines per week that we can really get deep on. 10 lines an hour would only be for the most dense code and critical path stuff. They will need a reviewer that knows the domain (DNA), but it’s perfectly reasonable to review that code on a weeks/months time scale, definitely not years.


My exact thoughts. This sounds like a classic example of launching a prototype created by domain specialists (biostatisticians and bioinformaticians) as production software and skipping on the expensive stuff, like sound development practices


There is so little emphasis in production software development in bioinformatics and biostatistics. Despite a lot of groups open sourcing their code it's is nearly unusable and not reproducible due to hard coding, ignoring edge cases, and dumping the majority of the code on a single giant R or python function.


It's a real problem, and I've been struggling with it for two decades, but even so I am legitimately impressed (and not in a good way) if they have 170,000 lines of Matlab code in their production software. That takes a really special combination of productivity and cluelessness, even for academic specialists. Regardless of the facts of this particular case, it should be absolutely horrifying that anyone's freedom is left up to a gigantic pile of unaudited Matlab code. (That said I am almost certain he added some zeros to the number, I have a hard time imagining what they're doing that could be that complex.)


Choosing MATLAB as a language for software that could potentially lead to people dying (in areas where they still have the death penalty) is a gigantic red-flag

I don't know how many job postings ask for a software engineer who knows MATLAB, but I can't recall any


That's my experience as well. My master's is in bioinformatics and I worked for several years in biotech.

I got frustrated because my concerns that my team's development practices were causing issues on a regular basis, were ignored. I was continuously able to predict what issues we would run into, but no-one seemed to care - I even had a manager tell me, that it was good that our software was buggy, since the client would continue paying us to fix it

I've since left the biotech industry. There's a limit to how many times I want to run my head against that particular wall


It also sounds exactly like Ferguson's Imperial College epidemiology model that apparently compelled politicians into imposing hard lockdowns (and was likely wrong by at least an order of magnitude):

- "a single 15k line C file that had been worked on for a decade" [0]

- code review of the model: [1]

- corresponding HN discussion: [2] (including sad appeals to authority: you're not an epidemiologist)

- other HN discussion [3] (including ridiculously blaming programmers for making C++ available to non-programmers)

[0] https://twitter.com/ID_AA_Carmack/status/1254872369556074496

[1] https://lockdownsceptics.org/code-review-of-fergusons-model/

[2] https://news.ycombinator.com/item?id=23093944

[3] https://news.ycombinator.com/item?id=23222338


For reference, whole Tesla Autopilot is 'few hundred thousand' loc.

https://youtu.be/YAtLTLiqNwg?t=947


>170,000 lines of MATLAB code

This is a deep problem. Many scientists don't understand software engineering and more and more need to write bigger and bigger programs. And most of the time they don't open source their code.

Open source science.


That's like an accountant, accused of embezalling, refusing to hand over the ledgers because "there are just too many records to go through". Yeah-no, that's kind of the whole point. We want to find out what you've been hiding in that wall of paper.


I also cant image anyone thinking that would be a winning legal argument... "This software is too complex to look at so just trust us" Really... that is what they went with...


Yeah, that's one of those "defenses" that feels like a confession.


unfortunately, the technical illiteracy of much of the US judiciary has made this effective.


and surprisingly all of the code is of equal importance so you really need to review each line sequentially! Instead of finding stuff that you think is most likely to relate to what you're trying to figure out and debug from there. Wow I would like to see this marvel of engineering myself!


Kind of makes me wonder: if the argument is that the code is too complex to review and understand, does that mean the company is not doing code reviews themselves?


this has actually only happened to me a couple times but it has happened - someone tells me Bryan, go look at the code X did in Y, figure out if we refactor. X would then tell me - that code is really complicated is full of algorithms! I go and look at the code realize that for what it is trying to do can be cut down from 10 pages of printed code to less than 1 and it was incredibly simple what actually needed to be done.

In short when someone tells me the stuff is too complicated because too clever and advanced I tend to disbelieve them.

that said I have of course written my too complicated stuff lots of times, but if asked I don't say it was because I'm clever.

names anonymized so as to not accidentally hurt anyone's feelings.

on edit: actually one time the code was clever but not especially difficult, they just used the algorithms line because they didn't want anyone messing with their stuff.


I think there's a bias towards judging things to be "clever" if they're hard to understand

It's a cliche to have a "what idiot wrote this" outburst, then realise it's your own code, because most of us have written our fair share of "clever" code

My boss explicitly stated that he doesn't want to see any "clever" or "smart" code in our product - write code based on simple fundamentals, benchmark before deciding to optimise, and be respectful in your reviews

I like my boss a lot


If the code was developed in less than 8.5 years, it seems like the answer must be yes based on their previous claims.


Well I mean if something requires 8 man years of work you can do it in a year with 8 people. Not always, but usually.


I would claim that if something requires 8 man years, that it will most definitely take more than a year to develop with 8 people.

Communication takes time, coordination takes time, there is an incremental cost to each news person added to a team. From experience, perhaps with 2-3 people who happen to gel well together you may get close to proportional scaling of output, but with 8 it’s really unlikely in the real world.


On the contrary, such linear scaling would be quite exceptional. I'm speaking from experience but you don't need to trust me; I invite you read any book on software engineering management, starting from The Mythical Man-Month by Brooks.


the mythical man month was first published in 1975, I think the typical applications programmers work on today have changed significantly since then and encompass many different disciplines (to be thought professional) - so many disciplines that one developer is likely to be the master of all. It is true that there is a communication overhead to adding more people so it will not scale linearly, but if a single developer has taken 8 years to build something in our era it seems likely that having 8 people might get it done say 1 and a half to two years.


I would find it extremely unlikely to work that way.


You can do code review on every single commit while not being able to make overall analysis of the massive codebase.


I completely agree, but if the reviewer isn't able to (with some amount of accuracy) predict the impact the committed code will have on overall behaviour, then there's very limited value on doing the review in the first place


In any larger project, the reviewer are not able to predict impact from reading commit.

More importantly, typical reviewer have only small partia area where he has good idea about which commit is bad idea. He however does not understand whole codebase.

Knowing what the whole does and knowing what my module does are two different things.


I again completely agree with you

Looking back at my reply, I think I should have added a bit of background to clarify my comment

My master's degree is in bioinformatics and I worked in the biotech industry until about a year ago. I mainly worked as a consultant for top 20 pharma companies, but also did work on different in-house projects and in academia

From my experience in the industry, I find it very unlikely that the software mentioned in the article is structured in a modular way. I've yet to see good software practices outside one or two academic projects. Most pharma companies still use copying and renaming folders as version control. Naturally I'm sceptical of any code coming from the biotech industry

On top of that, it's written in MATLAB. I have only ever seen this used by statisticians and university researchers, never by software engineers

I'm therefore willing to bet, that when the reviewers open the source code, they'll find unstructured mess of spaghetti code, that has never been refactored, reviewed or tested

So yes - I agree in all your points, but I find it unlikely that they're being applied to this particular project


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: