Hacker News new | comments | show | ask | jobs | submit | fdr's comments login

The process that works for my team (Department of Data. Heroku.) is simple:

If one expects a review with any timeliness, "@mention" the person(s) you think are qualified to review. Include the urgency of integrating the patch. The default assumption is urgency is low.

The reviewee is free to punt (if they're especially busy, or it's a case of mistaken identity as to the correct expert), but he or she is expected to fill-or-kill their intent to review in a day or so.

Some advantages:

* Not overbearing nor complicated

* Leaves room for non-urgent work

* Targeted (avoiding tragedy of the commons)

* Dynamic and nuanced (not slavish to a strict taxonomy of assignment and priority)


+1 for the @mention idea. The biggest obstacle I usually see at work is no one taking ownership of an issue. If my boss says "someone need to do this before Friday" then I know it will never happen because no one is responsible for it.


Yes: it is the relatively new approach by Cahill et al in 2008, known as SSI: https://courses.cs.washington.edu/courses/cse444/08au/544M/R...

Which, as far as production-common database implementations go, is lightspeed for implementing new academic work (9.1, the first version with the feature, was released in 2011).


You are absolutely right. That's because when a parametric query arrives, the parameters are unbound and the planner cannot take advantage of fit-to-purpose selectivity estimation. It must instead estimate generically.

Newish versions of Postgres (9.2+ I believe) try to paper over this surprising effect by re-planning queries a few times to check for cost stability before really saving the plan. It has proved very practical.

See http://www.postgresql.org/docs/9.2/static/sql-prepare.html's notes section, reproduced here:

    If a prepared statement is executed enough times, the server may
    eventually decide to save and re-use a generic plan rather than
    re-planning each time. This will occur immediately if the prepared
    statement has no parameters; otherwise it occurs only if the generic
    plan appears to be not much more expensive than a plan that depends
    on specific parameter values. Typically, a generic plan will be
    selected only if the query's performance is estimated to be fairly
    insensitive to the specific parameter values supplied.


I'd say the way failover/HA is done at Heroku is straightforward. Will (the designer) or I plan to write about it some day, pending laziness.

It took some time to figure out because it required breaking some orthodoxy, but I'm happy with the result.

The thresholds are documented at https://devcenter.heroku.com/articles/heroku-postgres-ha#fai....

The promotion is done by rebinding the URL of the database and restarting the app. This shares mechanism neatly with changing of passwords, which is one reason we decided it was worth throwing out network-transparent orthodoxy when it came to HA: the clients must be controlled anyway to deal with security considerations.


One approach I used when we migrated data centres to avoid having to manage timing etc. of IP address changes in our apps, was to use haproxy.

We configured slaves in the new data centre, set up a haproxy in the new data centre that used the old data centre databases as the main backend, and the new data centre databases as the backup, changed the apps to point to haproxy, shut down the masters and let haproxy shift all the traffic, promoted the slaves once we were certain they'd caught up. We had a period of seconds where the sites were effectively read-only, but that was it.

We're planning rolling out haproxy with keepalived or ucarp to mediate between a lot more of our backend services


> I'd say the way failover/HA is done at Heroku is straightforward.

> It took some time to figure out because it required breaking some orthodoxy

If it's hard to figure out, it does not sounds too straight forward.


Only because the idea of manipulating clients of the database to do HA is somewhat heretical.


Just wanted to put in a vote for a writeup. Mind sharing a few more details here in the meantime?


I think you may be interested in the "h-index" and related metrics.





The h-index seems like much weaker metric than PageRank. For instance, consider 10 professors you decides to game this. They will create 100 fake papers and cite each other in all. Now suddenly they would have h-index of 100!

PageRank is recursive metric in graph. In above case, if above 10 professors don't have any other links pointing to them, they don't get ranked higher. PageRank was designed to eliminate scenarios like above.

It's amusing that Google Scholar doesn't use PageRank prominently instead of these weaker metrics.


Aren't there areas of scientific research where people doing genuinely valuable research don't get cited at all except within a closed community? I'd imagine e.g. that a handful of researchers all studying some obscure animal species would have a high h-index but a very low PageRank.


An experiment that did just that a few years ago:

"Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting" by Delgado Lopez-Cozar, et al.


[edit: grammar]


"weaker" metrics can be useful, because you can understand it better. If I tell you a PageRank of an author is 3.452. That means nothing to me. If I tell you somebody's h-index is 18. That means they published 18 papers that got 18 citations each. The h-index is useful for comparing people in the same field. There is also a metric for journals called impact factor.



You can even do something similar at the city level and see which city is the most influential in Science (or physics in this case): http://www.nature.com/srep/2013/130410/srep01640/full/srep01...


I wrote a quick script to generate a list similar to (but not as deep or well-annotated as) Prof Palsberg's.


It's a BFS for Google Scholar seeded with Herbert Simon.


That's very humbling to hear (I've spent, cumulatively, a lot of time maintaining WAL-E), yet log shipping is a feature of many other database systems.

That said, I do write and maintain WAL-E for reasons that go beyond mere vanity or NIH, and not every data base system may have an accompanying tools at even WAL-E's level of investment (kudos to Heroku) given the use case.

I'm glad you like it so much as to cite it as an advantage.


> I'm glad you like it so much as to cite it as an advantage.

It was a big motivation for me moving away from a document DB. As a "weekend dev", I don't have time to configure complex, redundant backup systems. But my customers will be paying to display data on my site, and therefore I need to ensure I'm backing it up as often as possible.

WAL-E to S3 was simple to configure, is cheap to run, and (most importantly!) easy to restore. I rm -rf'ed my postgres data folder after letting WAL-E run for two weeks on my dev box. Restored, restarted in recovery mode and I was back in action without any problems.

(thanks for all your hard work!)


> From a very quick glance at the small end the Amazon options are a little more than half the price (Ireland single AZ instance pricing and even less reserved) but multi-AZ options are probably very similar.

Did you add disk? People seldom do, I can attest it's a big part of the bill...

> I don't actually know whether Heroku can failover across an AZ failure.

"Followers" have preferred another AZ for quite some time -- almost since the beginning -- and Heroku Postgres HA is based on similar technology. So, yes.


No, I didn't take storage or IOPs into account but you do get more memory and the flexibility if the Heroku options don't fit you.

So pricing wise both double if you go multi-AZ.

I didn't claim to have done a detailed price comparison but a quick look at the small sizes which are currently the ones relevant to me.


> No, I didn't take storage or IOPs into account but you do get more memory and the flexibility if the Heroku options don't fit you.

No doubt about that. To date, Heroku's product model has preference for fewer, but common choices rather than more flexibility. Heroku's staff sees fit to give in sometimes, but coarsely speaking, that's pretty much how it goes.


Thanks for the idea. A modified variant created https://github.com/heroku/toolbelt/pull/74


You are right: in the fast case of "NULL" the amount of physical work amounts to diddling catalogs around rather than copying potentially gigabytes of data.

So in any case a lock is taken, but I've never seen anyone get too bent out of shape over that momentary mutual exclusion to deform the table's type (exception: in transactional DDL where cheap and expensive steps are inter-mingled).


I'll forward your remarks to the devcenter staff and the steward of the changes; I'm sure both parties will appreciate it a lot.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact