Title: Web of Science Citation Indexes
Publisher: Thomson ISI
Cost: Depends on database combinations and time span
Tested: July 12-20, 2004
The Web of Science (WoS) service is not merely a part of, but rather the backbone of Thomson ISI's Web of Knowledge (WoK) platform, which itself has been significantly enhanced for the release of its latest version. In this review I focus only on the core WoS components of WoK, which includes the three traditional ISI Citation Indexes: Arts & Humanities Citation Index (A&HCI), Social Sciences Citation Index (SSCI) and the expanded version of Science Citation Index Expanded (SCIE). It has been hosted for a couple of years on the Web by its creator, the Institute for Scientific Information (ISI), and for decades on a few commercial online services. Recently, ISI introduced three other databases: Index Chemicus, Current Chemical Reactions and ISI Proceedings. (I have not tested the chemical databases. For the three citation databases, I used a version going back to 1975, the new ISI Proceedings database goes back to 1990).
There are many more databases on the Web of Knowledge platform. In addition to other ISI databases, such as Current Contents Connect or the Journal Citation Reports, WoK has been significantly enhanced by hosting third-party content, such as INSPEC, CAB Abstracts, FSTA, PsycINFO and BIOSIS Previews. (The latter is no longer a third-party database, since Thomson acquired it earlier this year.) The starting page of the new WoK platform illustrates the variety of databases available — depending on which resources a library subscribes to.
This still may not impress some users; after all, there are much larger aggregators. What makes WoK stand out from the crowd is the synergy created through the most powerful system of cited references that links millions of scholarly papers published in the past 30-60 years (more about the time span later) and processed by ISI. Add to (or rather integrate with) these the most widely used commercial indexing/abstracting databases and you have a powerhouse of scholarly information.
As for integration, it kicks in most obviously when the Cross Search feature is used. This extends the reach of WoK to a huge public domain collections of full-text scholarly and professional publications and/or their indexing/abstracting surrogates, such as the NASA Astrophysics Data System and the Physics ArXive for the former, or PubMed, Agricola, ERIC and Popline for the latter. Your only disappointment may be that some outstanding open access sources, like PubMed Central/BioMed Central and TRIS, are not included — yet.
The only comparable system of interlinked, citation-enhanced indexing, abstracting and full-text databases of fee-based and open access scholarly articles is Scopus, whose preliminary beta version I also tested in July. It is to be launched fall 2004.
Understandably, ISI started the most recent and most significant redesign with the WoS component, which is why I'm focusing on it. By late fall, when the entire WoK system redesign will have been completed, I will review some of its other components, such as the Journal Citation Reports and the hosted databases.
WoS in and of itself is a comprehensive and highly interdisciplinary information resource, as it currently has about 28 million records for 1975-2004, from more than 9,000 journals. If you add all of the titles that have ever been covered by ISI, but ceased publication or were dropped, this number goes up to nearly 15,000 journals. In the full version, the coverage of the journals in SCIE goes back to 1945, in SSCI to 1956, and in A&HCI to 1975. Most of the titles are the leading academic and professional journals in their respective disciplines.
The crown jewels of WoS are the cited references. Although there is no quotable source for the total number of cited references for the 1975-2004 period, my conservative estimate is that the total number is around 500 million. These are not all unique references, of course, because many cited articles, conference papers, books, dissertations and reports appear in hundreds, sometimes in thousands and even in tens of thousands of source documents.
The reason for this gigantic number of total references is that ISI has enhanced all of its records (where applicable) since day one with cited references — to the tune of about 20 million citations per year on average. There were fewer updates in the early years and more in the past decade. This is especially true with the explosion of information and the increasingly excessive citing behavior in some fields, such as psychology, as is discussed in the study of Adair and Vohra. To put this number in perspective, PsycINFO, which recently started to enhance its records with cited references, has 9.8 million cited references for 227,000 records (as of March 2004), averaging 43 cited references per item. It must be noted that the average number of cited references in PsycINFO is more than twice that in WoS because the former extensively covers books as source documents and the number of references in psychology books often go up to the mid and high hundreds, as is shown by the first few records added to PsycINFO in 2004. The search result list of my test search in PsycINFO shows books with 1,400, 1,276, 798 and 738 references.
Inclusion of cited references and their indexing in bibliographic and full-text databases (especially in the publishers' archives) has become one of the hottest features in the past few years. Nearly 50 years ago, Eugene Garfield best explained why cited references are crucial for efficient information retrieval, but beyond the ISI citation databases, citation indexing got the full attention it deserves only lately.
Suffice it to say that tracing cited references of potentially relevant articles is a very empowering tool for efficient information retrieval as (most of) the cited references are likely to be closely related to the topic at hand. As opposed to efficient subject searching through descriptors of controlled vocabularies, it does not require the user to be familiar (among others) with the preferred subject terminology of databases to find records related to the topic. The collection of cited references at the end of articles may be more relevant than the results of an hour long search in several abstracting/indexing databases which require search expertise, familiarity with the index structures of the target databases and the search syntax of the host systems. Searching for cited and citing references, however, may be too involved for many potential users, depending on software issues. The enhanced software of WoS shows that the process can be made easy and instantly rewarding.
Others can only dream about having such smooth facelifts as we see in the new WoS interface. The overall design, with its clean and intuitive layout and navigation, reminded me of Mac applications.
For example, in the earlier version you always had to get out of the query templates and track back to a set-up page if you wanted to change the databases and/or the time frame of a search. No one liked this, and casual users never knew if the query would be retained when they returned to the query form after changing the time frame or the databases to be searched. In the new version, these parameters are part of the query template and you can easily hide that section. The preferred setting now can be made the default, i.e. it remains valid across search sessions.
It was irritating in the previous version that if you wanted to look up, for example, which papers cited Garfield's publications in the past 10 years (in journals processed for ISI) you had to list each year separated by commas. Now, you can specify them as a cited year range, in this case 1995-2004. As such, searches can be time-consuming, it would be nice to see some gizmo to indicate that the search is in progress — the rotating browser logo is not sufficient. I also wish that there were an option to re-sort the result list by year of publication instead of by the cited source's title.
As for the cited publication's name, they are labeled on the template as cited work. I have never agreed with this because this is true only for cited monographs. In case of journals, the cited work, in my opinion, is the journal article, not the journal's title (name). The label would be more informative if it were called Cited Source Title. Not accidentally, in the general search template of WoS the journal name cell is labeled Source Title.
The good news is that the index of cited sources (cited works in ISI's parlance) and cited authors can be browsed. Instead of cutting and pasting the variants, jockeying back and forth between the index and the search template, they can be added to a transit cell and transferred to the template in one fell swoop. On the general search template, the author and source title indexes can be browsed, and search terms can be picked in a similar fashion. This also applies to the new group author index that was introduced for indexing organizations or institutions that are are credited with authorship.
Unfortunately, we still have to live with the baggage of an old ISI decision to use only the last name and initial(s) for authors. This is fine for rather unique last names, but becomes a real problem with common names and surnames. Zeroing in on Carol Tenopir as the cited author is no problem, but how do you distinguish Herbert White and Howard White, both of whom are professors of library and information science? If you know that the former has a middle initial 'S', and the latter a middle initial 'D' you would be OK, but even so there are a chemist and neurologist with the same middle initials under the single entry "White HS." From the abbreviated journal or conference proceedings names on the result list you can often guess which is the correct paper of the information scientist Herbert S. White, but sometimes you might just be lost as authors may publish in journals that are not associated with their fields of specialties. I, for example, had an invited paper about journal impact factors published in a special issue of Cortex and I know nothing about neuroscience. When only the title of the journal and not the title of the article is displayed next to my name, one might think that it is another Jacso P.
I am glad to see that the total number of hits is now displayed both on the top and the bottom of the result list . In the previous version, it appeared only at the bottom and users had to scroll down to get a feel for the size of the list. The history page has a better layout and makes combining previous sets a clickable option. The global look and feel of the interface makes WoS much more inviting, and encourages the exploration and use of the new and enhanced browsing, searching and output functions.
The most apparent functional enhancement is that up to 100,000 records can be retrieved — a big improvement over the previous limit of 500 records. It's not as if anyone wants to look up 100,000 records, but in step-wise query construction and refinement, large initial sets are created so that they can be whittled down with Boolean and/or proximity operators.
Sorting by date is also now possible for up to 100,000 records (and it is the default). The same generous limit applies to sorting by relevance and it is surprisingly fast. Sorting by source title, first author or citedness are also possible. This latter is a particularly valuable feature for selecting the most cited, and therefore probably most respected, articles. It could be made even better if its citedness score was displayed with the bibliographic citation in the result list, as it is done so well in CSA and, especially, in Scopus, where it is a prominently displayed and sortable data element in the result list.
Notice, however, that only a relatively small result set can be sorted by these three elements as sort keys. If you exceed the limit, an error message is displayed, but it does not specify the limit. Consulting the help file will tell you that for these sort criteria the maximum number of records is 300, which can be too limiting for some bibliometric/scientometric searches. The limit for marking records from one or more search set(s) is 500. It would be reasonable to consolidate these limits and set it to a common size across the board — e.g. 2,000 records, which is the limit applied to the best of the new features: the Analyze function.
Analyze is similar to DIALOG's powerful RANK command, but it is implemented in a much more intuitive style and format in WoS. You may choose one of the following data elements to rank the set: author, institution name, document type, language, publication year, source title and subject category. You may also choose to rank the first 500 records or all of the records (up to 2,000) in the result set; the number of records displayed; and whether a minimum threshold should be applied to eliminate certain results, such as singletons. The rank list may be presented in rank order or in alphabetical order, which is good for seeing, for example, how a topic evolved and became increasingly written about in the past 20-25 years.
Take Islam, for example. It can be found in the title, abstract or author keyword fields of 1,800 records for the timeframe of 1975-2004. Its first occurrence is from 1979 with a modest seven records. After a surge in 1980, it averaged 38 records per year, then began to rise, with occasional dips in 1995 and 1997. It then rose again until 2001 and shows a surge from 2002 and 2003 with 157 and 174 records respectively — for obvious reasons. Finding the most productive scholarly authors on the topic is just two to three clicks away, changing the rank criterion, the order of display and, optionally, the minimum threshold. Re-ranking the set by source journal is just a click away, as is ranking by author affiliation.
This is a very powerful automatic tool, even if it may need to be checked and consolidated by the searcher if there are variations or spelling errors in the author and/or affiliation names.
Nearly equally useful is the enhancement of the Related Items function. In the earlier version, clicking on the Related Items button brought up a list of records related to the parent record from which the function was launched. The relationship was determined by the fact that these records had one or more of the same cited references as the parent record. Obviously, the more references the parent records included, the higher the chance to find related records.
To offset this bias (even in the earlier version of the software), the user can choose which of the cited references of the parent record to search for related records. This allowed users to easily eliminate too-often cited books on, say, statistical methodology and style standards. The list of related items was ordered by decreasing the number of shared references, so that the most related items appeared at the top of the list.
The extra beauty of the enhancement of this high-brow feature is that now there are two columns displayed next to the Related Items list that show the total number of citations and the number of shared citations directly on the result list.
It is a significant bonus that clicking on the value in the shared references cell in the result lists will bring up only those cited references. This feature is a mighty time saver for making educated choices when selecting related items, especially when the number of related items is more than a dozen. In the test search about Islam, one of the articles in the result list had 4,009 related items. Not even the most devout bibliometrician would engage in identifying the shared references manually, but the ISI software rendered this service in a second. The most related two papers had five items in common with the parent record, and those are displayed with the click of a button.
There are other enhancements in the new version, such as the reminder service when an article is cited, but the ones mentioned above are the best. Of course, I would like to see the Analyze feature in the Cited Search mode as well in order to easily check, for example, what the self-citation rate by authors is in a given topic or in some journals, or what the distribution of the citations received by a specific article is in terms of publication years.
More importantly, I would like to see a new feature that would substantially enhance the power and appeal of WoS: the inclusion of the title of the cited articles or conference papers in an alternate result list format. Of course, I don't mean a posteriori manual addition of the titles of hundreds of millions of cited papers.
Currently, the majority of cited references are listed with the author's surname and initial(s), the abbreviated title of the cited journal and the volume, starting page number and year of publication of the cited items. Not all records have all these data elements, of course, because the cited documents themselves don't have those attributes. Conference proceedings typically don't have volume number and articles in Web-born journals, like D-Lib magazine in our example, have no page numbers.
Although the new version is visually much more pleasing than the earlier format, it is still too cryptic for users, unless they are so familiar with the literature that from the data elements displayed they can tell you the article title. This format was justified in the print world where adding even partial titles would have increased the page numbers of the citation indexes by 25-30% . Inclusion of the titles of the cited articles and conference papers would have also meant a considerable increase in data entry expenses. However, in the Web version there is a possibility for enhancing the list of cited references — in my guesstimate for more than half of the cited items. A title-enhanced format (which I mocked up for illustration) would be far more informative, even if only the partial titles of the cited articles, dissertations or conference papers would be included, and even if "only" a few hundred million cited references were enhanced at the beginning.
Many of the cited items are from journals that are covered by ISI; in other words, ones that are ISI source journals. For articles processed from these journals, ISI creates a bibliographic record with the traditional data elements, including the title of the paper. These are the records to which there are links from the list of cited references. The user needs to click like crazy and jump back and forth between the list of cited references and the source counterpart entry of items to see the important, typically content-rich, informative title.
The title fields could be automatically extracted from these counterpart source records programmatically by using some of the data elements that now appear in the result list, such as a search key consisting of the journal name, volume and starting page number. For journals not covered as source publications by WoS, there are two new ISI sources that can be used to extract the title: the BIOSIS database and the ISI Proceedings database. In addition, there are also the open access mega databases (PubMed, ERIC, AGRICOLA, etc.).
By combining the name of the primary authors, the abbreviated journals titles, the volume and starting page numbers (or some of their subsets), a program could identify the matching source records in the databases mentioned above. It could then extract the titles and add them to the cited references in a batch process without human intervention. Ambiguous matches could be flagged and held for additional checking. The large scale impressive experiments of autonomous citation indexing projects, such as CiteSeer, CiteBase and ParaCite, have clearly proven the viability of this approach. Adding even partial titles (say the first 20 characters of the title) would help users in knowing at first glance what the topic of the cited paper is. Omitting the volume, starting page number and the article ID number is not a problem because they are not important at this stage of results scanning. Replacing the unnecessary long string "View Record" with an eyeglass symbol would release enough space to accommodate at least a significant portions of the title.
The unparalleled content and the enhanced software combination keeps WoS a very valuable and unique source, but ISI can't rest on its laurels — the largest scholarly publisher, Elsevier, will launch its very comprehensive Scopus system later this fall, which will challenge some of the so far unmatched features and functionality of WoS. I plan to review Scopus in the September issue of this column.