

Google Scholar Metrics for Publications - gsivil
http://googlescholar.blogspot.com/2012/04/google-scholar-metrics-for-publications.html

======
noelwelsh
It's interesting to note that the 5th highest ranked publication is arXiv. For
those who aren't familiar with it, arxiv.org is an open-access repository of
academic papers, mostly in quantitative science. In my field (computer
science) it is standard practice to deposit a copy of one's papers in arxiv
before submitting them for publication, and arxiv is the place to find the
latest research.

There is currently a lot of hand-wringing in academia about open access
publications. Everyone wants it, and it is trivial to switch a field to it
(machine learning has done so, for the most part), but it requires the leaders
in the field to lead the change and they are normally too invested in the
status quo. What the high ranking of arxiv suggests to me is that while people
maintain lip service to the idea that the (mostly closed) publications are
important and the maintain the definite version of a publication, the reality
is that no-one gives a damn and goes to arxiv when they want to read
something.

~~~
yungchin
It's still surprising that arxiv is that high up. You'd think most people
would cite the definitive publications, not the preprints at arxiv?

Anyway, given that in some fields _everything_ that's written goes to arxiv,
arxiv will [by definition][1] have a very high h-index. The thing is, the
h-index was conceived to compare individuals, not journals.

[1]: <http://arxiv.org/abs/physics/0508025>

~~~
streptomycin
Google Scholar also indexes papers that are only published on arxiv, which
likely often include citations to other things on arxiv.

------
lyso
So it looks like with this method, if a journal publishes more papers, this
will give it more of a chance to boost its h5-index? This probably accounts
for the high level of arXiv, and PLoS One beating out PLoS Biol.

One problem with impact factors is the way that a few articles can account for
the majority of citations. For instance, a bioinformatics method that is
widely used could attract thousands of citations, boosting the impact factor
of the journal by a few points. This method doesn't solve this, as it
expressly focuses on the top n articles and ignores the impact of the
remainder. For instance, PLoS One's score of 100 is because the top 100
articles got 100 citations - it says nothing about the distribution of the
rest.

~~~
mjn
That does seem to be the case. I'd be interested in some kind of median-
citedness measure as well, to distinguish a venue that publishes 100 high-
impact articles a year, from a venue that publishes 10,000 articles a year, of
which 100 are high-impact.

In particular, it's not robust to one factor often mentioned in the
bibliometrics literature, trivial changes in agglomeration size. Say a set of
200 articles are published by either: 1) a single journal; or 2) two journals,
which publish 100 of them each. In each of the hypotheticals, individual
articles have the same citation counts. Under this metric, #1 gets a higher
ranking, meaning that you can raise rankings without increasing paper quality
by just agglomerating journals. (You can even run the two former journals
separately inside the new journal if you want, with a two-track review
structure, as long as there's only one title on the front page.)

------
lawlesst
It's nice to see that Google is adding features to Scholar. There's concern in
the library community that it will go away since its not a revenue producing
service.

Incidentally, Microsoft Academic Search is pretty impressive so far. They've
added many features. They also have an API that is pretty easy to use, which
Scholar doesn't.

<http://academic.research.microsoft.com/>

~~~
synparb
I just looked at Microsoft's offering, and while it has a lot of nice features
it seems like the citation counts are wildly wrong when compared to the
numbers produced by Web of Science and Google Scholar. Perhaps they have not
completed their indexing yet. A few other concerns that I have besides
inaccuracy are (1) The interface is significantly more cluttered and
confusing, and (2) In their decision to auto-generate profiles, rather than
wait for authors to create their own, you have a glut of profiles that again
are incorrect in the papers that are attributed to them. The chances that a
significant number of people are going to go in and curate their profiles
seems small, instead of having a limited but accurate collection of profiles,
you end of having a majority of incorrect ones.

------
molbioguy
There are definitely things that skew the index that might not necessarily
reflect the quality of the journal. For example, the 20th ranked journal by
H-5 index is Nucleic Acids Research (NAR). However, when you look at the
H-index articles for NAR, you see that they are dominated by articles
announcing or simply cataloging an important database. These get cited very
extensively, becuase anytime you use a database you need to cite it, but they
aren't what I would call high impact research articles. NAR just happens to be
a journal that has a special annual Database issue where bioinformaticists can
drop an article describing their useful database.

EDIT: It would be fair to say that since a database is so widely cited it is
important. So maybe the index is more robust than I originally considered. But
something still seems skewed here.

------
danialtz
Rob J Hyndman has a very nice review on Google scholar metrics [1]. Here is
his ending quote:

    
    
      In sum­mary, the h5-index is sim­ple to under­stand, hard to 
      manip­u­late, and pro­vides a rea­son­able if crude mea­sure of
      the respect accorded to a jour­nal by schol­ars within its
      field. 
    
      While jour­nal met­rics are no guar­an­tee of the qual­ity of a
      jour­nal, if they are going to be used we should use the
      best avail­able, and Google’s h5-index is a big improve­ment
      on the ISI impact factor.
    

[1] <http://robjhyndman.com/researchtips/google-scholar-metrics/>

------
throwaway1979
The only CS conference/journal I saw on the list was "IEEE Conference on
Computer Vision and Pattern Recognition, CVPR". That's not the top CS venue I
know of.

~~~
streptomycin
That conference probably has a lot of biology stuff going on and, if you
noticed the list of journals, they're almost all biology related. CS is a
relatively small field, in comparison.

~~~
fatjokes
Nope. CVPR doesn't have too much biology in it (it does have a little). It
tries to do a bit of cogsci, but it's mostly ML applied to vision. It
generally doesn't use applications in the medical field either (there are
bigger conferences for medical imaging).

------
aaronsw
Is there any reason to believe this h-index method of ranking is a good idea?
Why not use PageRank?

------
fatjokes
Interesting... how is CVPR the only computer science-related publication in
the top 100?

