Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not exactly. The relevant lines in the BigQuery are:

   [by] author, score, rank() over (partition by [by] order by score desc) item_rank
   [...]
   where score >= item_rank
The query uses a window function. What's happening is that it's counting the number of submissions with a score greater than or equal to the submissions rank relative to all other submissions by that user, descending.

For example, a user has four submissions with scores of 1, 2, 10, 3. Their h-index would be 2. (10 >= 1 true, 3 >= 2 true, 2 >= 3 false, 1 >= 4 false).

This is an odd way to calculate h-rank (as a measure of quality), however, the implementation appears to pass the test cases on Wikipedia.



Isn't that what I described? That user has 2 submissions with at least 2 points each. I also think that's a fairly sql-esque way to calculate h-index. I can't think of how else I would do it.


Your explanation (user has n submissions with a score of at least n) fails to pass one of the Wikipedia examples. ([25, 8, 5, 3, 3] should result in an h-index of 3)


There's some confusion here, I do believe my explanation means [25, 8, 5, 3, 3] has an h-index of 3.


There are 5 scores with atleast 3 points. :P


You seem to misunderstand h-index. :P

It is the highest number h for which at least h articles have >= h citations.


Yes, if there are 5 scores with at least 3 points, then there are 3 scores with at least 3 points. Of course he left out the fact that the h-index is the largest number for which the property he mentioned holds true, but that was obvious.


> So, ColinWright has 143 submissions with a score of 143 or more.

I think the way OP wrote it was slightly unclear. This is clearer to me:

So, ColinWright has 143 submissions with a score of at least 143 each.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: