More tips to enlarge your H-index

4 minute read

Published:

In an earlier post I started talking about bibliometrics, some of its flaws, and what I think has become one of the facets of an ideology, something Jerry Z. Muller called “The tyranny of metrics”. I’m going to add two ways to enlarge your H-index that, at the same time, also represent a totally legitimate action in the overall context of a line research.

First of all, I think I don’t need to stress the importance of publicly available datasets, in the most different areas of research. In particular, I think that (together with the growth of computational power, in particular the diffusion of GPUs), the collective creation of datasets such as ImageNet was instrumental in the growth of the deep learning techniques. It is interesting to note that ImageNet is:

  1. a collective effort, that would not have been practically possible without the Internet;
  2. organized according to WordNet, a lexical resource, not exactly an ontology (or knowledge graph), but certainly something closer to symbolic AI or GOFAI, if you prefer.

Creating and sharing a dataset is therefore an undoubtedly good action, worth acknowledgement by the research community. Moreover it often makes a lot of sense to describe the dataset in a paper, not necessarily presenting any particular advancement in the state of the art or additional content besides the dataset per se, its structure and maybe the details of how the data was gathered. Sometimes gathering new data involves scientific advancement, sometimes it doesn’t, it’s just drudgery (and in fact sometimes it paid - or underpaid - by means of tools like Amazon’s Mechanical Turk). The point is that anyone using the dataset is gently asked to cite the paper, and this is also fine with me, from an ethical perspective. Once again, the problem is the interpretation of those citations, that end up inflating metrics such as the H-index of the authors of the paper describing the dataset.

Someone, especially some colleagues in Italy that strenuously defend bibliometrics as a way to evaluate research quality even of individuals, might say “well, let’s discount these citations, or let’s evaluate dataset papers in a different way” (in a past research evaluation campaign in Italy self-declared review papers were evaluated differently, due to the obvious tendency to gather a higher number of citations, but it’s more complicated to discount citations from structured indicators like H-index). It’s not that simple, though, since sometimes the dataset is described in a paper showing its first usage for innovative research, improving our knowledge on some subject. Eventually, papers should be read and evaluated according to their content, not the venue of publication nor the simple number of citations.

Another kind of paper that has appeared in the last years that also brings up additional issues with reference to indiscriminate usage of citation number as an objective indicator of research quality is represented by challenge papers. Challenges represent totally positive research activities, and they also represent very reasonable ways to promote reproducibility of results, as well as a way to involve members of a research community in a generally healthy form of competition. Moreover, organizing a challenge represents a complicated matter, very often technical issues must be faced and solved to create shared resources enabling participants to share a common basic environment, and ways to produce results compliant to the same rules, and providing the same measurements. It’s a hell of a work, and it deserves to be acknowledged, not unlike datasets. Again, the problem is the indiscriminate usage of the citations to these papers as research quality indicators.

I want to stress that, so far, I did not even introduce malicious tricks devised to distort bibliometric indicators, but completely legitimate and even virtuous actions by researchers. Of course, one could think that sometimes the choice of doing these action could be influenced by the foreseen reward, but that would represent yet another example of usage of an expression often used by an old Italian politician (“a pensare male degli altri si fa peccato ma spesso ci si indovina” - Giulio Andreotti).