Hacker Newsnew | comments | show | ask | jobs | submit | jballanc's comments login

"Processing documents with transducers" is probably one of the more interesting, yet still simple enough to be understandable, examples of transducers: (http://blog.juxt.pro/posts/xpath-in-transducers.html)

Awesome, this helped quite a lot. Thank you!

I'd be willing to place a bet on what sort of database technology they weren't using. Hint: it has three letters, starts with an S, and ends with an L. ;-)

Unfortunately just using SQL doesn't mean your system is free of race conditions nor that every atomic domain operation is implemented atomically.

Of course, it's a great start.

SQL won't force you to respect flag columns in your application layer (unless your entire application is in SQL?).

The way a student is taught to model something like this is to have the 'seed' entry refer to the 'account' entry (or, better yet, the 'game' entry). This is just how one models a one:many relationship. It is then trivially true that a given seed will only be used in one game at a time. As long as one never updates the reference on an existing seed, a seed can't be re-used in this way.

Of course, relational data models only encourage this type of design: they do not require it, nor are they required for it.

Also, this is by no means the only possible bug of this nature. For example, the seed generation might be based on the current wall-clock time. I'd hope someone trying to run a casino would know better than that, but hopes of that nature are frequently unfulfilled.

From the article it sounds like constraints could have been used to prevent having multiple seeds linked to the same account?

Yeah, I guess it depends on how good your constraints engine is versus what your constraints actually are. If your actual constraints can't be modeled, or if you choose not to model them at the database layer (e.g. for performance reasons) then you have to implement the constraints somewhere else.

I worked on a database that allowed (almost) every table to mark a row as deleted with a boolean-type column. This caused problems when you wanted to create a new row that had the same values for the table's key-columns as a deleted row. You can't just add the deleted column to the key and have all the functionality you want (multiple deleted rows with the same values in the table's key-columns). Each table could use (or not) the flag in different ways and there was no simple way to enforce consistent behavior across all tables. The constraint has to be placed somewhere else, either in explicit code, or by creating new abstractions inside the database that allow you to represent the actual constraints of your application.

There are benefits for marking a row as deleted w/ a boolean column though. Undo-ing deletes would be easy to implement, and retaining references to deleted records would be simple as well.

The solution the problem you described would be to use a surrogate key column (typically a UUID/auto-increment), and not natural keys.

There are reasons for everything in the database. Undo and historical records were the primary use case.

All tables had a unique, integer primary key.

However, if you want to enforce a uniqueness constraint across your data [eg. UNIQUE(name, location)], the constraint breaks when you introduce the boolean deleted column [and UNIQUE(name, location, deleted) does not provide the appropriate semantics]. The application semantics must be provided at some other level than SQL column constraints.

Ah, another gamble. How nice.

So, let's see if I get this. Three letters, that means 1/26th chance of getting the correct number. And.. wait, what do I do now?

Now you submit all twenty six guesses via proxy accounts, allowing for random delays between each guess to appear less suspicious.

The issue is that rocket physics (essentially Newton's laws of motion) are for the most part first- and second- order, linear, ordinary differential equations with exact solutions. The plumbing, on the other hand, is governed by fluid dynamics, heat transfer, and the like. These are systems of complicated partial differential equations without exact solutions, requiring numeric methods.

The issue is there is "failed" and then there is "failed". Yes, many times you have to repeat an experiment because of bad reagents, broken machines, little tweaks are needed to some obscure parameter, or someone left the lab door open...

However, if you experiment is well controlled, then the controls will reveal this sort of "failure". When I was still running gels, many times when first running a new setup we'd run only controls. If your experiment fails because your controls failed, then that's just science.

But I've also seen the other kind of "failure". The kind where the controls came out perfectly, but the effect or the association or the expression profile that the researcher was hoping for didn't show up. When these sorts of failures are ignored or discarded, then we do science a huge disservice.

I am encouraged, though, that there recently seems to be a movement toward, if not outright publishing such negative results, then at least archiving them and sharing them with others in the field. After all, without Michelson and Morley's "failure" we might not have special relativity.

>But I've also seen the other kind of "failure". The kind where the controls came out perfectly, but the effect or the association or the expression profile that the researcher was hoping for didn't show up. When these sorts of failures are ignored or discarded, then we do science a huge disservice.

Why does this happen? Clearly this is what the article insinuates. Is publish or perish that strong? Every honest experiment with honest results benefits society. Not every prediction and result combination results in a prize in your lifetime, but that in no way should influence someone's value as a scientist. That science may be used later for something we had not intended (could i offer you the hope of posthumus recognition?). Finding a way it does not work may save someone else some time. This benefits the scientific community.

Not everyone gets to walk on another planet, some people have to build the ship.

For better or worse, most scientific journals still operate on a business model dependent on physical subscriptions. Since this sets something of a limit on how much can be published, and since scientists tend to prefer paying for positive results vs negative, there has been a strong cultural bias toward favoring positive results.

The good news is that this is gradually changing. As scientists begin to understand that online distribution models don't have the same sorts of limitations, and that search can be a powerful tool, there has been a move toward at least collecting negative results. Of course, they still don't benefit the scientists in the "publish-or-perish" world, but even that may be changing...maybe...

>But I've also seen the other kind of "failure". The kind where the controls came out perfectly, but the effect or the association or the expression profile that the researcher was hoping for didn't show up. When these sorts of failures are ignored or discarded, then we do science a huge disservice.


On the "very-opinionated" point, I'm particularly fond of Alan Kay's quote:

> I don't know how many of you have ever met Dijkstra, but you probably know that arrogance in computer science is measured in nano-Dijkstras.

Ha! Yeah.

The older I get the more immune I feel to arrogance; to me it's a symptom of frustration coming from high skill colliding with the rest of the world's lack thereof.

I've always found that quote to be particularly douchey. Then again, I do think that Dijkstra's contributions to CS were more important than Kay's, too, but I suspect that I might not be in the majority there...

I think they are sufficiently different that it is not possible to compare importance on an objective basis.

Dijkstra worked in Computer Science. And the relevant quote there is that "Computer Science is no more about computers, than Astronomy is about telescopes.”

Dr. Kay did phenomenally important work with computers and programming. It’s important in its own right. Either of us could say that we prefer to dwell on the ramifications of one or the other.

I think it’s safe to say that if you took either of their contributions away outright, our world in 2015 would be much poorer for it.

Now if you want a good Dr. Kay quote, “Programming is a Pop Culture” has to be high on the list. I think he’s deeply right about that, but being right doesn’t in any way tarnish the nano-Dijkstras associated with damning the entire field.

The thing is, Dijsktra also did a lot of practical work (first Algol compiler, an OS, structured programming), so I wouldn't discount him as just a theoretician, either.

I agree that they both did very important contributions, and that there's probably no way to compare their magnitude, so it comes down to personal interpretation, there's no argument there. Still, I believe the arrogance comment by Kay was douchey :)

Perhaps a “Kay” should be a measure worth approximately one deci-Dijkstra.

Wikipedia's article on Dijkstra says:

"He was the first to make the claim that programming is so inherently complex that (...)"

Meanwhile Alan Kay had 12 year old kids making computer games (videos on Web Archive).

Dijkstra contributed algorithms. In that sense he contributed as a mathematician. That doesn't diminish what he did. But it does not compare with what Alan Kay did for the sum of these two parts: computers, and people.

Programming is not all of CS, and the fact that everyone and their mom are doing OO does not take away from anything of what Dijkstra said. On a more "people" level, I think that arguing for more rigor and formal approaches to programming is truly more important that creating the next generation of code monkeys. I agree that Kay's work has had much more practical influence, but I don't think it has been necessarily any better because of that. But as I said in the original comment, I'm probably in the minority on this.

Kay : Dijkstra :: Ruby : Haskell

Dijkstra was correct in saying that you don't even need a computer to do "Computer Science," all you need is a pencil, paper, and a mental model of what you're trying to do given computer constraints.

Dijkstra more theoretical; Kay more hands-on.

I think BOTH are necessary, or at least important.

I think that's a pretty good analogy, though maybe you should s/ruby/objective-c/ to bring it closer to smalltalk :)

There is theoretical, and then there is theoretical: one is privately fueled and produces real-world industry-strong gems like Haskell; the other is publicly fueled and produces concepts such as self-stabilization and superstabilization.

If Dijkstra is so awesome someone should write a Wikipedia article with more important achievements and less medals.

I don't really understand what you mean by privately or publicly-fueled theory, but anyway...

Instead of waiting for someone to write on the wiki, you could go out there and google a bit.

Very briefly: he made major contributions to compiler and OS design (including the first Algol compiler and a whole OS, the THE OS), devised two fundamental graph algorithms (shortest path and spanning tree), and spearheaded making programming into a serious discipline rooted in maths (along the way pretty much giving birth to structured programming).

He also wrote (longhand!) over a thousand essays on CS and related topics, which are (as far as I've read, and I've read quite a few) all a joy to read.

You can start here: http://amturing.acm.org/award_winners/dijkstra_1053701.cfm and here: http://www.cs.utexas.edu/users/EWD/

I'd argue that the discovery of Denisovans might be bigger than Neanderthal genes, but they're in the same ballpark. Modern sequencing technology has led to an explosion in discoveries about early human (and other organisms') evolution.

Actually, I'd just put "rapid sequencing" right near the top of the list. When I started grad school (in 2003 if you must know), there was a feeling that it'd be very hard to get any better/faster than Sanger sequencing (and modifications of the same). Now there's a plethora of technologies that are both faster and more accurate.


> Now there's a plethora of technologies that are both faster and more accurate.

Sorry to nitpick, but the inherent error rate of high throughput sequencing platforms is still higher than capillary sequencing. This is more than mitigated for by the massive throughput advantage though.


Right, thanks for clarifying that bit. An individual run still has a much higher error rate, but the ability to do exponentially more runs results in a higher accuracy per "experiment". This is not that dissimilar from the direction that other technologies have taken when facing physical measurement limits (cryo-electron tomography comes to mind, for example).


Yes, though there's the added complication of time to answer however.

You can turn a Sanger run around relatively quickly (I'd guess hours). High throughput runs sequence many millions of fragments of DNA in parallel very slowly. The highest throughput runs take several days to complete.

This means that it's there are still (rapidly diminishing) scenarios where Sanger sequencing makes sense. Because you don't need the throughput, and you want an answer quickly.


How fast can we sequence now? What are the error rates?


About 6 Gigabases per hour, somewhere in that ballpark. Probably something like the equivalent of the human genome project (which took >10 years and 3 billion USD in the 90s) every few days.

Error rates are in the region of 1 to 0.1% generally.

I've never really worked directly with capillary data but understand it's error rate in early bases is limited by the amplification step. Which would be ~0.0001% (1 in 10000).

That said there might be a coarse filter (based on signal intensity etc) that would identify a subpopulation of high-throughput reads which have a <0.1% error rate. But I don't believe I've seen that reported. I'd be interested in hearing from anyone who has thoughts on that though.


The absolute error-rate is not really a problem if the errors are random and you have the budget to sequence enough independent fragments. Sequencing becomes tricky (both Sanger and next-gen) because of non-random recurrent artifacts associated with the sequencing method or molecular manipulations done to the DNA or RNA.


The advantage of the ISO layout is the "~" on the bottom row, instead of waaaay up to the left of the "1".


I don't need ~ that often, and I never feel that it's far away. I need the shift key all the time though - so I prefer not to travel for it :)


There's a radical, yet likely very workable solution to this catch-22... A couple of years back when there was talk of introducing congestion pricing in Manhattan, there was a bridge toll increase around the same time. Someone figured out that if, instead of the usual $0.50 or $1 increase, the bridge tolls in NYC were increased to $20-25, then you could make all of the MTA lines (subway, LIRR, metro north) free of charge.

Of course, as you might expect, it never got anywhere...


There is a bridge over to Staten Island (from the South, I think?) that costs $14. It's free in the reverse direction, AFAIK. Drove it the other day as a tourist in the area and thought "This would suck if I had to commute this every day!"

Presumably the cost is there to encourage everyone to take public transport instead.


Concept and implementation are not the same thing. The concept of LISP hasn't changed in 50+ years, but the implementations have been constantly improving the whole time.


I completely agree. Some languages are written to adhere to an ideal (everything is lists), and make sacrifices in order to stick to that ideal. Some are written to be close to the metal, or to be as abstract and flowery as possible, and make different sacrifices. Every language has some primary goal, and all the rest must bend to accommodate that goal.

Go's primary goal is: excellent tooling.


except for, apparently, poor code coverage tools?


The coverage tool is fine. He didn't really explain the problem well. Coverage is per package, and only counts coverage from tests in that package. This makes sense because packages are independent and you can't count on another package to test yours.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact