Title: Web of Science
Publisher: Thomson Reuters
Cost: to be negotiated
Web of Science (WoS) remains by far the largest citation database. In my estimate, WoS has about nearly 73% (33 million) of its 42.1 million unique records enhanced by more than 720 million cited references. This is the most important measure when comparing WoS with its only competitor: Scopus, which I reviewed last month. I pointed out that with its appealing and smart software it offers a similarly large collection of 38.1 million bibliographic records from 1850, but only 15 million (39.4%) of them are enhanced by about 330 million cited references. The reason for the huge difference in the number of total references is that Scopus has references added to the bibliographic records from 1996 onward as opposed to WoS which has consistently included the cited references with the bibliographic records since 1900.
Scopus -which comes in one edition- has a much broader source base than WoS, but the breadth of coverage of the sources in Scopus is lower than in WoS for many of the common (overlapping) primary sources, especially because of gaps in the coverage by Scopus. WoS is not perfect but is much more consistent with the coverage of the sources - except for the ones that are dropped because they don’t meet any longer the selection criteria. WoS records also have higher level of record completeness of standard bibliographic metadata (document type and language, authors’ country and institutional affiliation, subject area designations).
WoS has fewer and less flexible/liberal search and output features than Scopus, except for its superior query-set management options and for its instantly created, informative at-a-glance citation report with compact graphs and a table of key bibliometric and scientometric indicators.
There are only two databases in the multidisciplinary league of cited reference-enhanced, super-mega databases with around 40 million records: WoS and Scopus. I had a detailed review of WoS in early January 2007 and in 2004. I had in-depth reviews of Scopus in 2009, in 2007, in 2006 and on its debut in 2004.
Theoretically, Google Scholar should also qualify for the super-mega databases enhanced with cited references, and it is an excellent source for resource discovery (and often for free versions of articles). However, its software is woefully inadequate for bibliometric/scientometric/informetric purposes as I discussed and demonstrated repeatedly from the beginning here, here, here and here to warn users of using Google Scholar hit counts and citedness counts for evaluating research by the numbers.
There are many other substantial reviews of these three databases that are worth exploring, and for that purpose Google Scholar is the best resource because it is free and it searches millions of full-text articles.
An increasing number of database and journal publishers, digital facilitators, associations and societies are engaged in enhancing their databases with cited references and with appropriate software for citation based searching and -some- for citation based research evaluation. I discussed them shortly in the context section of my Scopus review of the past month in this column.
I made my tests for this review in early May and at the end of June and, when relevant, the date will be noted. Both Scopus and WoS are updated intensively and extensively daily, and the same search a day later may yield a higher number of hits.
WoS is a genuinely multidisciplinary database, covering all the disciplines substantially - but to different extent and for different time span. Papers assigned to natural and applied sciences are covered from 1900, the ones assigned to the social sciences from 1956, and the ones assigned to arts and humanities from 1975. The version or edition of WoS that a library licenses can be very different from the one licensed by another library, because the licensing institutes can choose which components they want to license and from what year. For example, my university licensed a version that has all the three components, and all of them were licensed from 1980. Three decades is quite a sufficient time frame for most purposes for users, and is very close to my ideal 35-year time frame when it comes to evaluating the lifetime productivity and impact of senior research and teaching faculty. Obviously, the bibliographic and bibliometric data will be very different if the database mix chosen is very different, e.g. when I use the WoS version at the University, versus the full superset that I have temporary access to for researching a topic that requires longer time span, such as the h-index of senior professors in information and computer science.
Actually, WoS is a superset of three databases: one for Science (34,567,213 records), another for Social Sciences (5,955,239 records), and a third for Arts & Humanities (3,785,087 records) for a total of 44,307,539 records at the end of June. This is 5% higher than I referred to in the introductory notes.
The reason for this is that there are records which are assigned to two, and in some cases even to three of the component databases. When two or three databases are selected for searching, the software automatically eliminates the duplicates and triplicates, so the number of unique records in WoS -as of the last week of June- was 42,109.526.
At the same day, the total number of records in Scopus was 38,438,157. The composition of WoS and Scopus cannot be readily compared because the latter is a single database, and the subject delineation must rely on the assigned 27 subject categories (plus the special, Unassigned category name, which is a sincere, informative and welcome gesture from Scopus).
WoS has about 165 subject categories, and each journal (hence each paper that appeared in that journal) is assigned to at least one subject area (such as Information and Library Science, or Emergency Medicine, or Criminology & Penology).
The case for Arts and Humanities is simple as Scopus has a subject category and WoS has a separate database for it (which is very useful for colleges specializing in arts and humanities who may wish to license that component alone).
The difference is stunning in this regard as WoS has 3,785, 087 records for the Arts and Humanities subject areas, and Scopus has for this subject category merely 330,000 records.
This is less than two thirds of what WoS has for the Arts & Humanities - Other Topics sub category alone (421,000 records) - in addition to Literature (1 million records), History (664,000+ records, Music (279,110), Art (248,000+ records), Religion (223,500-), Philosophy (180,000), Linguistics (152,000), Architecture (110,000), Film, Radio, TV (108,000), Classics (79,000+), Theater (78,000), Asian Studies (71,000-). It is another question, that in the case of 421,000 records for Arts & Humanities - Other Topics being the only subject category, illustrating that, everything is miscellanous, to borrow the title (if not the essence) of Weinberger's bestselling book.
In Scopus, there are no searchable names or codes to search for the above-mentioned specific disciplines (for a good reason). It does not mean that there are no records for papers about Asian Studies or Criminology, but there are likely to be too few to create a separate primary subject area name for them. I trust that this tirade of facts will get the message through, and more comparative analysis will be done about the coverage of journals in Scopsu and WoS. To rub in this important issue in the source coverage section, as it is one of the most widely touted, most misunderstood, most misrepresented, and most discombobulating issue in comparing WoS and Scopus.
The comparison of WoS and Scopus for Social Sciences is not as simple as it seems. WoS has close to 6 million records in its Social Sciences component database (going back to 1956). Scopus has a broad category of Social Sciences with nearly 1.3 million records. However, it also has four additional, distinctive primary subject area names that are assigned to journals and thus to records for papers in those. These are 1) Psychology; 2) Business, Management & Accounting; 3) Economics, Econometrics, and Finance; and 4) Decision Sciences. (Combining the number of records for these four and those of the Social Sciences category with a Boolean OR operator, the search yields nearly 2.5 million records for the 5 disciplinary subject areas of social sciences.
The combined hit count means that WoS has 4 times as many records for social science articles as Scopus. Once again, if a smaller subset of WoS is licensed, such as 1980-2009, the difference of coverage of social science disciplines between WoS and Scopus also becomes smaller.
The largest component in WoS is the Science database, and for a long time it was the only cited reference enhanced database. At the end of June, it had 34.6 million records for all disciplines of applied and natural sciences and technology. Scopus is larger in this arena with about 36.2 million records.
The coverage of specific disciplinary areas cannot be compared appropriately, let alone conveniently. WoS has much more granular subject classification scheme than Scopus, and only a few subject area names are exact matches in the two systems, so I looked up the hits counts for Mathematics (WoS: 0.96 million records, Scopus, 1.15 million), Veterinary Sciences (WoS: 453,000 records, Scopus: 336,000) or Dentistry (WoS: nearly 287,000, Scopus: nearly 225,000 records) using for both complete databases.
WoS has records for papers in about 10,000 serial publications, while Scopus which at its debut covered 13,000 journals, now covers -according to its most current PR statement < http://info.scopus.com/overview/what/ > “more than 16,500 peer-reviewed journals”. But here comes the rub.
A much broader journal base does not guarantee a functionally better coverage. Take the example of Arts & Humanities. WoS has created close to 3.8 million records for papers published in about 1,450 A&H journals since 1975. Scopus has created 330,000 records for papers published in 1,600 A&H journals since 1846. The depth of coverage per journal indicator is above 2,600 in WoS, and it is slightly above 20 (yes, twenty) in Scopus.
As for the spread of coverage across the pertinent years it is 108,145 records per year for A&H papers in WoS for the past 35 years (it is a comma in the number, not a decimal point!), whereas in Scopus this indicator is less than 3,000 records per year for 115 years for A&H papers.
These data illustrate what I said about the depth of coverage differences between WoS and Scopus above. It also makes it understandable why I am skeptical about the announcement that records for articles from 1,600 additional A&H journals would be added in April, 2009. As they were still not available at the end of June, I can’t comment about this, but the rate of less than 3,000 A&H records per year in the past is not encouraging.
I wonder what the members of the Content Selection and Advisory Board (CSAB) of Scopus advised in this regard. It costs a lot of money to subscribe to 1,600 more A&H journals just to produce so few records per each A&H journal.
As for CSAB, a statement made in an interview by Scopus with one of the prominent members of the CSAB in 2006 may shed some light on this issue. In summarizing “a number of misconceptions about Scopus content”, she said that one of the misconceptions by librarians was that “Scopus does not contain as much biomedical content as MEDLINE”. The summative response from the CSAB member was that “Not true, as Scopus covers the same source titles as MEDLINE plus EMBASE.”
The fact how many sources are covered by Scopus may not be sufficient to answer the question (or misconception). Without indicating the depth of coverage it has little meaning. It is akin to my claiming that I was a dozen times in Taiwan. Technically it is true, but -until recently- I was a dozen times only at the airport of Taiwan - waiting for a connecting flight.
The comment regarding the second misconception that “There is not much social sciences content in Scopus” followed the same line of argument, asserting that “Actually, Scopus includes all of the social sciences titles in Thomson Scientific Social Sciences Citation Index, as well as an additional few hundred titles”. It may be more relevant to know that WoS has created nearly 6 million records from about 2,500 social science journals (from 1956 onward), while Scopus did produce close to 2.5 million records for the more than 5,300 social science journals (which may have been somewhat below 5,000 in 2006) from 1910 onward (the earliest year of coverage of social science journals in Scopus).
Interestingly, the link to this interview with the CSAB member is still on the information page of Scopus, but now it is a dead link. I trust that the CSAB member asked for the removal of that interview after experiencing the breadth of coverage of Scopus in the social sciences and in the arts and humanities.
I regularly have complained about the shallowness of coverage of some very important LIS journals in Scopus vis a vis WoS, and I did see and welcome the improvement after the impressive, unprecedentedly large scale retrospective “fill-in-the pre-1996 gaps” project when 7 million records were added to Scopus. However, many of them remained shallow and/or gappy in my own sphere of interest. These include the Journal of Scholarly Publishing (that changed title form the plain Scholarly Publishing) in WoS versus Scopus , or the quintessential Annual Review of Information Science & Technology that is still missing records for many precious chapters in Scopus as opposed to WoS - and I am not referring to a seemingly missing volume which was a publishing delay at the changing of the guard of the editors.
These omissions also have an effect on the increasingly important h-index. This can be seen very well on the example of Journal of Documentation with a h-index of 50 in WoS and a h index of 36 in Scopus.
I am not entirely happy with the source coverage of WoS, either. There were and there still are journals covered in WoS that crowd out journals which would much more deserve coverage. There is no formal quota by discipline, but, for example, in the Information and Library Science discipline about 10% of the set of 55-60 journals, should not have been covered in WoS (let alone in the Journal Citation Reports.
I regularly nagged about the inclusion/coverage of the two series of the Russian journal Nauchno-Teknicheskaya Informatsiya, because there were few novel and/or important papers in them, and even one of the series would have been one too many for me and probably to the many other researchers interested in LIS but not reading Russian.
I also have voiced my unhappiness earlier with the coverage by the Journal Citation Reports of the Zeitschrift für Bibliothekswesen und Bibliographie (originally from East Germany), for more than a decade -even if I do read German-, because ZBB barely has a pulse in terms of papers published per year, and it has been clinically dead in terms of citedness for years. I have the same opinion of the journal Library and Information Science published in Japan. I have spent only a few, but very but intense days with librarians and library science students in Japan, and this journal just does not represent the much respected scholarly productivity and clout of Japanese researchers.
WoS stopped covering the two Russian journals long ago but the other two are still covered and even included in the most recent edition of JCR, released when I was working on this paper.
There are several journals in the ILS category alone that would much more deserve coverage, such as the journals that I mentioned in my June review of Scopus that does cover most of them. Unfortunately, both databases neglect journals of school librarianship in spite of its importance and the high quality research papers in the field.
WoS has elementary metadata for publication year, document type, subject areas and language for all of its records. Scopus does not have document type for more than 3 million records, which is important when the searcher needs to distinguish document types (as in calculating productivity, impact factors, h-index). It has about 630,000 records without subject areas, and 884,000 without language.
Of course, there are certain data elements that are not expected to be present in all the records, such as Digital Object Identifiers. That is the case with author names, for example, in many editorials, book and other media reviews, obituaries, and news items. There are tens of thousands of such records for items in BMJ, Lancet, and Nature alone.
In WoS, when this data element is absent, the string [ANON] is entered. This is common practice, but in searching, it coincides with ANON as a real author surname, because the [ and ] characters are ignored, as is shown in this record where the author is JB Anon. There are about 1 million records without author name in WoS. In Scopus there are 1.4 million such records.
In Scopus ANON and Anon have been used only for about 67,000 cases, but in 1,340,000 cases the field is just left empty. To the credit of Scopus, this appears in the result summary matrix as an entry with the string “unidentified” to warn the user. This applies also to document type, and subject area, but not for language. This is odd because in 884,000 records no language is indicated, neither is their absence as you can see for one of the periodical where for 40% of the records no language is assigned. For those who limit their searches by language this is a problem that they should be advised about.
Obviously, there are no country, city, and institutional affiliation data for cases when the author is anonymous, or just absent from the records, but these data are missing from millions of other records as well, especially when the source documents do not have in their byline these data. It is nice of WoS that its indexers often assign the country name -when it is obvious from the rest of the by-line that the author is, say, from Paris, France and not Paris, Texas.
The difference is in the ratio and yearly distribution of records without affiliation information of authors in the two databases. In WoS slightly more than 31 million records have country and institutional affiliation of the 42.1 million unique records. In Scopus, 25.3 million of the 38.1 million total records have information in the country affiliation data field, and 29 million records have content in the institutional affiliation field.
Quite importantly, WoS has much higher presence rate for the authors country and institutional affiliation from 1975 to 2009 (85% for both data elements) than Scopus where the presence rate of country information for the past 35 years is 71%, and presence rate of the organizational affiliation is 78%.
WoS pulls a total blank for both country and institutional affiliation in 2.5 million records between 1945 and 1965 - exactly for the time span when non-military research started to surge at the end of the war. Scopus does the same for a longer, but far less critical period between 1850 and 1900, for which it has less than 235,000 records, so it is a much lesser problem.
WoS has far fewer abstracts than Scopus. I could only check the ratio for the time frame of 1975-2009 at the end of June, and found that while WoS had abstracts for 14.3 million (41%) of the 35 million records for that time period, Scopus had 25.6 million abstracts (nearly 76%) for its 33.7 million records for 1975-2009.
This is the most critical aspect of the citation databases, as using such records provide the foundation for citation based searching and citation based bibliometrics. These metadata elements are the ultimate in value added information, and the cost of their production makes both databases much more expensive than other super mega-databases.
It is useful to know the total size of the database, but it is more critical to know what percentage of the records have been enhanced by cited references. Of course, it is not expected that all records are enhanced by cited references, because there are many papers that have no cited references. So it should not be a surprise that WoS had for the 1975-2009 period in early May about 27.3 million records enhanced by cited references for the 34.8 million master records, yielding a rate of 78%, while Scopus had nearly 15 million such records for the 33.5 million record subset of its 38.1 million record collection, yielding a rate of 47%.
Extrapolating the data from the 1975-2009 subset to the entire WoS database of 42.1 million records yields a set of 32.8 million records enhanced by cited references. For Scopus the cited reference enhanced subset does not change at all because there are no records with cited references prior to 1996 - except for 7,000 records.
The ultimate issue is the total number of references in the databases. This can be estimated only roughly because the average number of references varies widely among disciplines and even within major subject areas. In addition, the size of the entire collection of references depends on the composition of the databases, the ratio of records by document type, and the growth of the cited number of references across the years . If 22 references/record is assumed to be the average, than WoS has about 721.6 million cited references, and Scopus has about 330 million cited references.
Of course, this does not mean that many unique references, and does not reflect the huge differences between WoS and Scopus in the proportion of records for papers in Sciences, Social Sciences and the Arts & Humanities component databases.
Interestingly, Thomson Reuters publicly referred to this data (for the first time that I saw it) stating that WoS has 716,000,000+ million cited references from 1900 to 2008. I have been estimating the size of the cited references in WoS since 2007 and presented my guesstimate summary in my keynote speech at the INFORUM 2007 conference with an update in a research paper in late 2008. It was a gratifying moment to see that my estimates have been quite close to the official announcement. My one is somewhat higher as it includes the references in the records for publications in 2009. It also reconfirmed my long time impression that Thomson-Reuters does not play fast and loose with the numbers in its traditionally rather low key PR announcements, and the claims can be taken to the bank.
On the other hand, some of the PR claims from Scopus, such as the tagline that Scopus is the largest abstract and citation database, is not accurate, and is an unnecessary brag line, given the many really unique and splendidly implemented software features of Scopus that are worth bragging about, such as displaying the citedness count of the items in the reference list, and the option of re-sorting them by that criterion. In light of reality, this tag line sounds like the rhetoric of the Iranian president, whose latest gem after his re-(s)election was that Iran is the most stable country in the world.
Some other PR claims, such as the one promising the “broadest coverage available of Scientific, Technical, Medical and Social Sciences literature including Arts & Humanities” are very misleading for the Social Sciences and the Arts & Humanities aspect.
I wonder what will be the real experience of the anonymously quoted librarian at Duquesne University the morning after the doubling of A&H journals really happens. In anticipation of the event she said that “the addition of the Arts and Humanities content was a central reason why we decided to purchase Scopus. It is absolutely crucial to our university that the Arts and Humanities are covered in Scopus”. I am eager to see how it will improve the disappointing breadth of coverage of A&H journals that I discussed above, once it is delivered.
I have been using the three citation databases of Thomson-Reuters (earlier from the Institute for Scientific Information) since the mid-1970s on CD-ROM, and on the DIALOG system. It was never in my lunch break because of the complexity of the software. The release of the Web version of the database meant a huge change in accessibility with a much more user-friendly interface, and with regularly introduced new features.
Notwithstanding, I still have a mental list of what other changes should be made. I bring up some of these along with the best features, noting that my emphasis is on issues important for the intense professional users, librarians and other information specialists, not the ones of the casual users (whom I do respect and have served, too).
Browsing of the document types, languages, author, group author, journal titles on or through the basic search template is fine, but the author affiliation field should be also made browsable, to facilitate the spotting of the variant names and abbreviations of the same institutions across the years in the master records. Many of these are due to the fact that the institute modified its name. such as Texas A & M University did from Texas A& I University. Others are due to data entry inconsistencies. At least this data element should be made searchable from the Basic Search template rather than relegated to the Advanced Search template, as it is now.
Browsing and direct searching of the subject areas would be useful to help in exploring the many topical categories used in WoS. One or more of these have been consistently assigned to all the 42.1 million records. True, the well-implemented, appealing cluster side bar helps in it by showing the top 5 subject areas and their hit counts for items matching the query criteria - along with the top 5 document types, authors, source titles, publication years, conference titles, languages, institutional and country affiliations. It even offers the top 100 lists of the above, but for unleashed roaming-browsing a resizable and rolling sub-window would be a useful alternative.
The saving, naming, renaming, describing, and re-running of query-series (with optional ad-hoc changes) instead of just saving, editing and re-running individual queries as Scopus does- is excellent in WoS, and a great time saver for research projects.
Now comes the nagging. There is a 10-year limit for using publication year ranges as a search filter. I often need a filter for the past 10, 15 or 30 years, which requires to do these searches repeatedly with varying time span, then combine them, just to find the publishing and citation statistics of a researcher who has been publishing for more than 10 years.
My thirst for dealing with large sets is insatiable, simply because I prefer to analyze full population instead of samples. Some of the limits are still a constraint for me, when I need a complete publishing and citation profile for some countries, journals and institutions.
I complained about these limits in my earlier reviews (when they were far less generous), and I was delighted when the search set limit was increased to 100,000, and so was the limit for the ranking/sorting of the Analyze function to enhance the clustering.
However, I still run into the set limits when searching by country, organization, and even some journal names, so I would like to see that limit removed. It may increase the response time (although it was not my experience when the limit was very significantly extended), but I would be patient for the great convenience.
The same applies to the 10,000 record limit set for the Citation Report, which is my favorite tool. Although I miss the same report for the net citations, i.e. the statistics for the subset excluding self citations which is offered by Scopus, it is an excellent feature in WoS.
It provides a perfect combination of compact but very telling charts, and well presented tabular data. It much facilitates the doing of a superb once-over of the key publishing productivity and impact data far better and far more informatively than mothers can get with a once over of their sons’ new date. (Yes, fathers also do that with their daughters’ new date - unless they are watching ESPN and/or had their six-pack).
Of course, I have some other wishes for the Citation Report feature, but I do not want to appear too pushy. Suffice it to say here that removing the limits would eliminate the self-imposed handicapping of WoS, and would show the facets of the most important assets of WoS, being indeed the largest citation database.
Where I must be pushy is the Cited Reference Search template, where Scopus similar, ill-named, but appealing “More” feature provides a good model. The limit in WoS of accruing maximum 500 references is very inconvenient, because our ability as authors (and I self-flagellate by using the adjective “our”) for misspelling, mis-citing the name of authors, journals, publication years, volume, issue and page numbers is limitless, even though the accuracy of these data elements is critical for matching the citing references to the source document and give credit to it in its citation count).
Quite often the total number of such “stray reference variants” is well above 500 not only for the cited journals but also for productive and influential, widely cited authors. No wonder that most scientometric evaluations just disregard the stray and the orphan references (the ones that do not have master record in WoS, such as books, to hang the references on).
Beyond the removal of this restrictive limit it would also improve the look-up and accrual process if the list of cited references could be sorted by user selected criteria, such as year, volume, page number or citing article count, and then export the list as a tab delimited file (just as in normal search mode- without the -unrelated- 500-item limit applied there).
The impact of making this process more convenient can be very significant in calculating the citedness indicator, the impact factor and the h-index of researchers and journals. I did go through many times the cumbersome process of cited reference searching for a compelling reason.
I was invited to write a paper for the Festschrift edition of Library Trends to celebrate the 75th birthday of Wilfrid Lancaster (F.W. Lancaster for bibliographic and citation searching!). I had freedom to choose a topic and decided to write about the research to determine his plausible h-index < http://www.jacso.info/PDFs/jacso-lancaster.pdf >. The result was worth it. While Scopus produced an h-index of 3 through the very appealing author profile page, and a h-index of 6 (by now7) through the search-sort-scroll process. WoS reported an h-index of 13 for him - quite good, but not high enough for his stature.
My searching/wading through the Cited Reference Search template and process to bypass the limits, was then followed by manually pairing up the stray references and creating pseudo-master records for the huge number of orphan references to his many books, I finally came up with an h-index of 26. In the ILS field only Eugene Garfield is ahead of him with an h-index of 30. That is an indicator that I consider plausible for Wilfrid Lancaster in the LIS field where the highest values for the top researchers hover around the h-index of 17 - with a very few outliers.
In spite of some content deficiencies and software limitations WoS is a top notch resource by virtue of being the largest bibliographic and citation database for citation based topical searching, for evaluating publishing productivity and measuring research impact through citations received. It is the only choice when the subjects of the research performance evaluations are senior faculty members and other senior researchers, traditional journals, institutions, and countries, when decades of research activity must be analyzed.
The number of abstracts - which are important for finding and identifying the potentially most pertinent primary documents - can be fairly easily doubled by importing abstracts from MEDLINE for records where WOS has not included, let alone created the abstract as a matter of policy. It was never meant to be one of the multidisciplinary indexing/abstracting databases. It was born to be an innovative, unique citation index.
The source base could be increased by adding to the present journal set the most influential, most cited couple of journals per subject areas that are not covered currently by WoS. The software development works for easing the cited reference searching process, the matching of the stray references and their master records (I almost wrote mother records) and making better use of the currently orphan references are much worth it. They help to bring out the best of the quintessential asset of WoS: the 721.6 million cited references gathered from the most consistently and most extensively processed, most influential 10-11,000 scholarly journals.