Title: SCImago Country Rank Database
Producer: SCImago Research Group
There are a great variety of statistical databases that provide quantitative information about the nations of the world. They are published by international agencies with worldwide coverage, such as the United Nations and its many agencies (UNESCO, UNIDO, UNDP, WHO), the World Bank, the International Monetary Fund, the OECD and by agencies with regional focus, such as Eurostat. Some of them are freely accessible, some are partially open access, and others are available only for subscribers on the Web, on CD-ROM or in print format. They focus on the financial, economical, demographical, educational and health indicators of countries and regions. Some include indicators about the input to and output from the science and technology research and development activities of nations from primarily demographic, educational and economic perspectives, such as the percentage of population who received tertiary education.
There are only a very few that have some information about the most quantifiable scholarly output indicators of countries: their scientific publications in well-known international and national journals and conference proceedings. The databases that have fairly comprehensive time series on scientific publications by nations and regions are few and far between, and they rely –directly or indirectly- on the subscription-based citation indexes and the National Science Indicators databases of Thomson-Reuters.
The most recent edition of the latter covers 27 years of the quintessential data from 1981 to 2007 about the scholarly publishing productivity and impact of scientists and researchers of 180 countries, based on the number of their papers published and citations received, along with some other derivative indicators. This subscription-based database is far the most comprehensive, going back to 1981. That is one of the reasons that most international and regional agencies that deal with scientific productivity, visibility and impact of nations use data from the National Science Indicators database.
The National Science Foundation (NSF) is the most prominent direct user of Thomson-Reuters data. The open access Science and Engineering Indicators report of NSF is created jointly with ipIQ, Inc. The report has been published every second year, presenting –among many other science and engineering-related statistics- scholarly publishing indicators with superb interpretations and graphic representations for the scientifically most important countries in the world. There are options to download (and further process) the data in Excel and PDF formats for individual tables, graphs and narratives that provide compact but lucid documentation for the information presented.
Some of the largest agencies use these data when they present information related to scientific performance about countries. The World Development Indicators (WDI) database of the World Bank is one of them, although the sources are not identified as they should be (and are done in the NSF statistics systematically). Many of the tables in the Science, Technology and Industry Scoreboard of OECD are from the NSF report, although not necessarily of the most current edition. It was recently announced that OECD is to license data from Scopus to publish more indicators. OECD must realize that a very significant part of the Scopus records (about 36%) have no information about the country affiliation of authors and citedness data is available only from 1996 onward (a much shorter time span than typically used by OECD for other indicators in its database). This will be discussed below as it also affects the Scimago database.
It should be mentioned that the Scimago Group unexpectedly updated the database in late October, 2008. Those who have downloaded data for the first release of Scimago Country Rank 1996-2007 in July, 2008 must have been taken as much aback as myself by seeing their collected and analyzed data changed. This unannounced move could have been done with keeping an archive version of the first release of 2008. Luckily, I did save the first release of 2008 as an Excel file (that is available by the click of a button, which is very generous from the Scimago Group), but most of my links and screenshots of the highly appealing interface are from the November release and thus may yield some discrepancies. This insensitive move has not changed my high opinion of the database, but it should be a warning for those who assume that the database is updated once a year – as it was planned. I provide links to the site whenever possible to lure you to this gold-mine of a database, but in case of another update before the release of the 2009 edition, the numbers quoted in this review may change.
The current release of the Scimago database covers 12 years of data from 1996 to 2007 based on records extracted from the Scopus database by agreement with Elsevier, the producer of Scopus. The first edition of Scimago was published in 2007, covering the time period from 1996 to 2006. Although the time span is shorter than the usual time series of science and technology indicators in other databases that go back 25-30 years for wider perspective, it is still a huge data set. As can be seen from the broadest world-wide report summary page records for nearly 16.5 million documents and 122 million cited references are used to paint a very interesting picture about the scholarly publication activity of 233 countries and territories. It is to be noted that more than 7 million papers (43%) out of the nearly 16.5 million have not been cited (yet). This is a very worthy data to know, as ratio of citedness in and by itself is an easy to understand and telling indicator. It is another issue, that this is usually examined with a 3-year delay, to provide level playing field for papers published in the most recent 3 years before publishing the statistics. When applying this rule, the uncitedenss rate goes down to 33% for the 1996-2004 publications.
The number of countries and territories is almost 30% higher than in the National Science Indicator database of Thomson-Reuters, but this breadth of coverage is less impressive when one realizes that authors from more than 50 countries/territories, such as the Cayman Islands, Djibouti, Turkmenistan, Suriname have published less than 10 scholarly papers/year during 1996-2007 and some others (Niue, Nauru, South Georgia, Saint Helena), only 1 in 12 years, which is practically irrelevant even from the perspective of those countries/territories and much more so regionally.
Some of these are not independent countries in the eye of international law, but self-governing territories, or dependencies of other countries, such as the Cook Islands, Guam or Mayotte. For practical purposes, I will use the term country liberally, without any legal implications or intention to incite secession movements.
The coverage of Scimago goes back only to 1996, i.e. for 15 years shorter than in the National Science Indicators of Thomson-Reuters. Although publication metadata are available in Scopus from 1823, cited references have been added only to records created for primary documents since 1996 (except for records about 7,000 documents published before 1996). Publishing statistics could have been created in Scimago with as much retrospectivity as Scopus has master records for, but citation statistics could be produced only since 1996 for the above reason. Even the publishing statistics should have been taken with a grain (or rather a pound) of salt, as in the entire Scopus database nearly 13 million records has no information about country affiliation of the author(s).
For evaluating and comparing the scholarly publication and impact at the country level, 12 years is too short a timeframe (and for 2006 and 2007 publications period it is particularly short, considering that in many disciplines there are no citations, or only very few citations until after the third year of publishing a paper).
The source data is collected from nearly 16,000 journals (although not all of them are covered from 1996 onward). Scopus has kept widening the source base, especially for the past 4-5 years, but in the late 1990s the source base was closer to 10,000 serial publications. On the other hand, in 2008 alone Scopus has filled many of the gaps in its source coverage.(It is quite telling that the first edition of Scimago in July, 2008 included records for papers "only" from 12,751 journals, while the updated version in late October, 2008 was based on 15,922 journals).
Much more importantly, in the Scopus database –as mentioned earlier- there are no country affiliation data for close to 13 million records (37.7%). This has a significant implication for Scimago, even if for the 1996-2007 time period the ratio is slightly lower. For this period the number of records without any country affiliation field is still up to 3.5 million, i.e. about 21% of the 16,6 million master records for 1996-2007 publications.
As the common practice is to credit each of the authors' affiliation country by 1 point for each author in the productivity statistics, the actual credit point lost by each country when there are more than one authors, is obviously higher. The ratio of multi-authored papers keeps increasing, so this is an important issue from the perspective of drawing the scientific profiles of countries. Thomson-Reuters also has incompleteness in this regard, but at a smaller rate. Only 13% of the 16 million master records for the 1996-2007 publications have no country affiliation data. This is a set of nearly 2 million records, but its impact is much lower in assessing the productivity and impact of countries in terms of scientific publications than in Scopus and hence in Scimago.
Bearing in mind the above mentioned shortcomings of the records extracted from Scopus, the lists of countries by a number of variables is a very useful feature of Scimago, even for those who have subscription to the Scopus database. The designers thought of all, or almost all the angles that researchers may be interested to look at the profiles of countries and regions.
For each country, the number of documents, citable documents, citations received, self-citations received, the average citations per document and the h-index of the country is listed. The most comprehensive list is when no filtering is used. This produces a list for 233 countries and territories.
Optionally, the set can be limited by 27 broad subject areas from Arts and Humanities to Veterinary Sciences and/or by 298 subcategories. Even though if there are some overlaps and arguably chosen subcategories (such as Decision Sciences – miscellaneous when there is a Decision Sciences broad subject area), this schema provides a good way to focus the topic of the search. It should be allowed for users to choose two or more subject areas or subject categories at once, such as both Library and Information Sciences as well as Information Systems, as it is reasonable to expect to learn the bibliometric and scientometric data of the countries by these two categories combined. True, the subsets can be downloaded into two spreadsheets and then combine the values, but this may yield double counting, as opposed to a Boolean OR operation between two or more subject areas or subject categories.
Lists also can be produced by regions and regions can be combined with broad subject areas and subject categories – but always only one at a time for each component. There are 10 pre-defined regions, but regions cannot be separated into smaller units (such as Latin America into Central America and South America) or combined into larger ones, such as the Asiatic and Pacific ones – but using a separate function maximum 4 countries and/or regions can be compared, if not aggregated.
The lists can be sorted by any of the data elements – except for country names. This would be a natural function, to find out, for example, how Mongolia is doing in veterinary sciences among the countries of the world. To find that it is ranked 132nd requires quite some scrolling with the lingering feeling that you may have missed the entry when eyeballing the list and are already past the 75th, then the 100th country.
The country profile pages provide more details and excellent visualization for all the countries and territories plus the whole world as a unit. This universal, worldwide census is important because countries' bibliometric and scientometric performance rank are often expressed relative to the world average. Surprisingly, country level details can't be looked up directly, by the name of the country, but only through regions. It is good for improving the disappointingly low rate of geographic literacy of students and young adults in otherwise leading countries, but sometimes it may be considered as user-hostile by the very same users who feel being tested. Even for geographically fairly literate users, it may be a guessing game to find Egypt. Is it under the geographic category of Northern Africa or the Middle East? It is under the latter, more for political than geographic reason. The same is true for Cyprus, which is much more Eastern European geographically than, say, Hungary, still, the former is listed under Western Europe and the latter under Eastern Europe. Interestingly, there is no Northern and Southern Europe in Scimago as a region. The region labeled as Latin America is not perfect. It should be called Latin America and the Caribbean to justify the inclusion of the Virgin Islands (both the British and the U.S.), or Bermuda and the Bahamas – which are not part of Latin America by any stretch of the imagination. Hovering above the world map with the mouse helps in locating a country as its name flashes up.
The extra information available under the country profiles but not in the worldwide or regional country lists, include the average number of self-citations per document, number of cited versus uncited documents, the % of collaboration (through international co-authorship), the ratio of the country's contribution to the publications in the region and in the world as shown for illustration here for Japan. The table of distribution of publications by the 21 major subject areas provides a quick feeler about the disciplinary strength and weaknesses of the country.
The graphics for the number of citations versus self-citations and the relative production of the country in the context of the region and the world, also provide compact and informative visualization. This set of excellent indicators begs for the addition of relative citedness of the country within the region and the world. After all, the combination of productivity and impact (expressed by the number of publications and citations, respectively) determine the rank of the country in scientific publishing, visibility and influence. Of course, the relatively new and widely popular single indicator, the h-index fill this gap to a large extent and Scimago reports this indicator as well – an excellent idea.
The rich and easily re-sortable set of indicator lists is an excellent tool for getting information about the rank of the countries and regions by several criteria. Of course, the list for a particular region, such as Latin America can be quite lengthy and the countries of particular interest for the users may scatter far away, such as Argentina and Uruguay for someone interested in the scholarly productivity and impact of countries in South America. The country/region comparison module is the solution for this – albeit with some limitation. Four regions/countries may be picked and in this module there is direct access also to the list of 233 countries through a browsable list. The data set can be limited to one of the 27 broad subject areas and one of the 298 specific subject categories.
Excellent graphs show a variety of indicators for the selected countries. The indicators are listed on the top of the graphs and by the click of a button the appropriate indicator can be selected. If needed, the exact values of the selected variables can be displayed as shown in the comparison of the number of documents published by authors in Argentina, Brazil, Chile and Venezuela. The indicators include for each year the total number of documents, citable documents, citations received with and without self citation, the number of citations per document with and without self-citations, the h-index, the percent of documents cited and the extent of international collaboration, i.e. the ratio of papers authored by researchers from different countries.
It is as good as it gets. At a glance one can see that in spite the of the absolute dominance of Brazil in terms of papers published and citations received, papers (co-) authored by researchers in Chile and Argentina have a slightly higher rate of citedness. Clicking on the collaboration indicator button sheds some additional light on this, showing that Chile has the highest level of international collaboration (mostly because its much coveted and overbooked telescope system services). It adds to this advantage, that citations per documents is much higher in astronomy than in –say- animal husbandry research and this explains why papers partially originating from Chile have the highest citedness rate of the four countries. It is to be remembered that if there are authors from different countries, each country will get one credit in the full-count system.
There is not much more that could be asked from the software. It does an almost perfect job in every regard. There are only some minor enhancements that could be added. As mentioned before, it would be useful to be able to select more than one broad subject areas and/or subject categories to provide aggregate data for closely related research topics. The country names should be listed in all the modules as they are in the Country Comparison module. I would go even further by offering a search option by any word in the country.
Finding a country is usually not a problem, except for those that add good-sounding qualifiers to their names (but not to their practice in running a state) to make it more appealing. The Republic of Moldova is as much at the bottom of any European country list as Moldova used to be and the enhanced name makes it more difficult to find it in the long list of Eastern European countries. The pride of being a republic may be politically important, but if France can live without rubbing it in by using the qualifier, than Moldova could have done it, too, focusing more on the substance of being a republic. Macedonia had its share of problems with its name when it became independent and its new official name Macedonia, the Former Republic of Yugoslavia, is certainly not a good choice because it must be abbreviated and this can be done in a dozen different formats which is a sure way to loose points in every country-related ranking system because of missing certain variants by simple searching
It is interesting to see that when more qualifiers are added to the name of a country the less they match up to the features implied by their enhanced name. United Republic of Tanzania does not seem to be much united. South Korea is indeed a republic, so the qualifier is justified to distinguish it from North Korea whose official name is Democratic People's Republic of Korea. One may wonder how democratic is that country, how much it is a republic of the people and if it is really a republic in the original sense of the term, i.e. res publica (roughly meaning that state matters are public matters). The same is true for the possibly most corrupt and most aggressive country in the world that calls itself pompously the Democratic Republic of the Congo. Searching for the real key elements of these countries would make it easier to find them. True, the search for Congo would bring up both the normal Congo and the embellished one, but a pop-up result list with the matching names with check-boxes would make the selection of the relevant country easier. This kind of searching and picking function would be needed anyhow, because the search for Guinea would also include Guinea-Bissau, Papua New Guinea and Equatorial Guinea making Guinea look much more productive and cited in the hand of a somewhat careless searcher than it really is.
The color graphs are excellent and very well done, but printing them on a monochrome printer makes them indistinguishable. It would be useful to offer a black-and-white alternative with appropriately different symbols. Using black and white patterns for the column charts when printing the distribution percentage of publications of a country by the major subject areas would also help in distinguishing the components which are not easy to tell apart even in color.
The tables are also very good - for the European readers, who are accustomed to the notation system of using decimal point as the thousands separator, i.e. 3.652.547 meaning 3 million six hundred fifty two thousand five hundred forty-seven for the number of citable documents by U.S. authors/co-authors and the comma as the decimal separator, i.e. 14,51 meaning the average number of fourteen and a half citations received by these documents. This is strange for some of the U.S. readership even when just glancing at the content-rich and highly informative tables (and requires some extra steps with the downloaded tables). This excellent system was developed by excellent Spanish researchers and the highest number among the most productive and most cited countries do use the European notation. I am not insensitive, I grew up on using that system and it took me some time to feel comfortable with the U.S. custom of using the commas and decimal points with numbers. I merely suggest to offer an alternative to let the user choose his/her preferred notation system.
The most useful enhancement for me would be an option to create the user's own country group with a preferred name, such as Ibero-America as an example to include the two European countries and the 10 South American countries where Spanish and Portuguese are the official languages and these happen to be the subjects of my scientometric comparisons and evaluations. Similarly, the user may wish to have predefined categories for the OECD countries, the European Union countries, the ASEAN countries, etc. It would be icing on the cake to have an expandable list of countries to include more than four countries on the template in the very fine Country Comparison module.
Scimago is a top-notch free system with extensive and very important bibliometric statistics about nearly 16,000 serial publications (that I did not review here) and more than 230 countries and territories. From the perspective of the latter, the high percentage of lack of country affiliation even in the 1996-2007 subset is a significant hindrance, that should be always prominently mentioned when reporting bibliometric and scientometric indicators for countries and regions. Still, it shows an excellent model that might justify the expenses of completing the records (based on the name of the author affiliation organizations, which are absent from "only" 2.1 million records (12.7%) in the 1996-2007 subset and most of them can be quickly identified on a large scale as the name of the organization practically defines the country for millions of records such as Sorbonne, Harvard, University of Illinois and thousands of other institutions with very high publication productivity, but currently without any country affiliation data. With this warning, the free Scimago Country Database (which is far more than a Country Rank list), is an excellent free ready-reference source.