This bill would set the standard for the entire country, so the testimony is actually pretty interesting and worth watching, especially the opening speech by councilman Vacca.
I also testify at 56:00, albeit hadn't prepared remarks in advance so wasn't as polished as some of the other folks.
That said, me and other C# devs have already found numerous signs that at least parts of the codebase were done by someone clueless or inexperienced with C#. And there's no tests.
If someone spends the time to look into the codebase, I'm predicting a basically 100% chance that parts of this code will be shown to produce inaccurate or completely wrong results. Innocent people will probably go free because of this.
Disclaimer - I mean no disrespect to Nail Technicians.
// if we don't have frequencies, use the default (this should never happen, but here it is)
if (tblFreq == null || tblFreq.Rows.Count == 0)
a = (float)0.02;
b = (float)0.02;
Bonus points for not using the shorthand float literal syntax.
I would expect to see this many goofs only from someone that's been using C# for a month or so. I only used it for a couple years
I'm making assumptions based on experience with C# codebases and I think my argument on double vs float is solid. I don't think I've ever seen floats used in a production C# codebase because there's no point.
Overboard maybe :) but I still think whoever wrote that is very inexperienced.
1. Knows how to use "using" but doesn't use it for myConnection.
2. myConnection.CreateCommand would be more idiomatic than new SqlCommand ... cmd.Connection = myConnection;
3. Doesn't know how to use "using"! using (DataTable myDt ... then goes on to "return myDT"
4. Also just noticed the oddity of myAdapter.SelectCommand = cmd although cmd is immediately disposed after that, then Fill is called later.
That's one function although the rest seem to repeat similar sins (CTRL+C, CTRL+V).
Using out parameters, these are highly discouraged. C# let's you do terrible things (including goto) but the community is pretty aligned that it's bad practice. In my years of doing C# I've never used one once.
Not using var anywhere. This is C# type inference 101.
Throwing "Exception" in catch with only a message. This is horrific because it throws away the original context and any stack trace. You can just say "throw" (just the one word) or rethrow the exception with an innerexception with the original info trivially. This is unforgivable, an experienced C# developer would never do this.
Extraneous uses of "this" when it's the default.
Initializing variables to null
Not using shorthand object initializers(but this is only in newer versions of C#)
Using class variables instead of properties
Using parse instead of TryParse... Like Java exception handling is an expensive stack unwind on failure, not supposed to be used for control flow.
The worst though, is how did they didn't make a method to reuse the SQL connect and data table code and instead copy pasted it around 20 times.
Especially if all they're going to do is throw.
compareMethodIDSerializationBackupDoNotMessWithThisVariableBecauseItsAnUglyHackAroundACrappyLimitiationOfTheFCLsCrappyXMLSerializer = comparisonID;
A cult favorite of devs at my previous employer was to (ab)use the power of LINQ to try to create the longest single line methods possible. I held the unofficial record for a while with a LINQ statement over 1200 chars. Innocent enough since resharper can unwrap them into sane code but lots of fun
A concerning thing is that a dev database username and password was committed to multiple files, like this: https://github.com/propublica/nyc-dna-software/blob/master/F....
The actual codebase itself may be pretty old so WebForms isn't a surprise - the SLN file indicates it was saved with Visual Studio 2010 (https://github.com/propublica/nyc-dna-software/blob/338a1b86...).
Webforms is likely a client request as well, but also keep in mind that as shitty as webforms is it let's you throw functionality together faster than almost anything else. Again proly contractors
The funny comments others mentioned is another clue that this was not developed internally. I miss that about being a consulting company... The client is usually hiring you because they have no idea what they're doing so rest assured nobody there will ever look at the code.
For a very high priority sample, DNA extraction can take a couple of days; sequencing prep can take a few hours to a couple days; sequencing can take a couple of hours to a couple of days depending on how much data you need; quality control can take a few hours to a few days; first stage analysis software (just calling what DNA exists and giving it quality scores) can take a few hours also depending on how much data you're analyzing to a couple days; so from initial extraction start until you have DNA data you can analyze for a specific product is a week if you babysit; up to four weeks if you worry more about cost efficiency. After that depends on what type of customer-end analysis you want to do and using what software. Industry standard university level software can take anywhere from a few hours to a few months... again also depending on how much data it's looking at.
Typically the turn-around time (we receive DNA to we've published results to the customer) is around 6 to 8 weeks for most of the analysis products my company sells.
> You didn't give a single hard number.
Considering end-to-end could take a few days (super high priority sample being baby-sat at every step of the way by experienced technicians) to a few months (statistical analysis on populations)... it's pretty hard to give a single number. It really depends on what analysis you want done.
There's a lot of room for improvement in this field of software!
That's what I was thinking. If it takes a lot of CPU time, it seems that there is likely a huge amount of optimization that would be useful to the people using it.
> it's pretty hard to give a single number.
I wasn't looking for a single number in the sense that it encapsulates everything I was looking for at least one number to see if the run time of these processes is a hindrance. What you have here is very interesting, thanks!
For the 6 to 8 week analysis, how many cores is that typically running on?
Think of it like an assembly line. How many robot arms does the assembly line need to produce something in 6 to 8 weeks? It depends on the type of object being produced; more intricate details, sizes, or etc might need to go through more robots, more arms. An assembly line for a complex item, such as a vehicle, doesn't put all of the materials in front of a single robot and give the robot more arms. It instead goes through many different robots with specialized purposes.
So if your factory produces many different things, but some of those specialized robots could be generalized again for multiple products so that you share those robots among things being produced, you might have one robot that spends 20% of its time producing a thousand of product A and 75% of its time producing a hundred of product B, and another 5% of its time in maintenance.
It's the same for DNA analysis; it goes through many different software steps, each with varying requirements of I/O throughput, CPU speeds, memory size, etc. Each individual step could be (and generally is) run on different machines with different resources suited to a particular step. Sometimes those steps can be shared among different products. But different products will take different amounts of time to work through the analysis.
The machines running our largest analysis steps have 88 cores and 512GiB of RAM. But number of cores is not indicative of why it takes 6 to 8 weeks to process.
There is a lot of anonymized genotype data in the 1000 Genomes Project (http://www.internationalgenome.org/data), but the devil is really in the genotyping details with this kind of thing. It looks like in the validation project (https://www.documentcloud.org/documents/4113877-1-19-17-Exhi...), they went from biological samples to sequence data to analysis results, which is the right approach.
With this kind of low-througput genotyping so much depends on the accuracy of the genotyping method, though, which in turn depends on the lab, the protocol, the technicians, and sometimes even the weather. The software is not where I would start, in worrying about forensic evidence based on this method, though it definitely could be a source of errors.
It's considered a critical QC step.
Throughput doesn't matter if you have false-positive or false negative errors that cause erroneous medical decisions.