Publisher: Reed Elsevier
Cost: To be negotiated
Tested: July 25 - Aug. 28, 2004
When in the late 16th century Isaac Elsevier engraved the logo still in use today, he would have never dreamt that his motto (appearing on a small banner), Non solus (Not alone), would become so true centuries later, although in a somewhat different sense. With the launch of Scopus in November, the publishing conglomerate Elsevier will take to the max the maxim that one is not alone when doing research as one learns from the experiences of others' published research citing earlier publications.
Scopus and ISI Web of Science (along with a few other databases on a much smaller scale) bring the best out of the network of cited references, adding the convenience of digital links to the power of intellectual links, while in most other databases bibliographic records are aggregated but remain alone, waiting for users to connect them using traditional search-by-subject-words tactics.
The question was not if Elsevier would bring out the synergy of the variety of its information products and services, but when and how. Based on the preview two or three months prior to the official launch of Scopus, this column tries to illustrate the "how" aspect of the question.
Usually the reviews in this column appear at the launch of a database or service, or later, but Scopus is not just a database or service. It will have a significant impact on the knowledge industry (a puffy term I rarely use), just as the ISI Web of Knowledge service had when it deployed the powerful but easy-to-understand and easy-to-use Web of Science (WoS) service. This brought Eugene Garfield's ideas from the mid-1950s of citation-based searching of scholarly information to the practical level for users who don't settle for just "good enough" searches.
Although Scopus and WoS are said not to be in direct competition, they certainly have the same target audience and the same exquisite search strategy. But which has more power? In some regards WoS, in others Scopus. This preview aims to point out the most important differences between the two. The August installment of this column has a detailed review of the new WoS system that was released in July 2004.
Elsevier requires no introduction, and not just because it is the largest publisher of scholarly journals and books. It has more additional assets and more hats to wear in the information industry than most anyone and is rivaled only by Thomson.
If you are in a college or research library, you likely have print and/or digital access to many of Elsevier's more than 1,700 scholarly journals through ScienceDirect. Chances are also good that you use indexing/abstracting databases owned by Elsevier, such as EMBASE, GEOBASE, FLUIDEX, Compendex, World Textiles, the Voyager library system, the EnCompass metasearch engine and/or the LinkFinder Plus link resolver software.
If you don't have access to any of these databases, digital archives and applications, you can still get a feel for the digital prowess of Elsevier by logging in to its open access Scirus system, which has tens of millions of bibliographic records with abstracts from its own journals and from PubMed and BioMed Central, along with two million full-text patents, articles, conference papers and research reports from scholarly preprint archives.
It also has a special collection of Web pages gathered from educational sites, not necessarily with scientific content . (I must emphasize that I still disagree with Scirus's motto — "For scientific information only" — for reasons amply illustrated in my first review of Scirus.) Searching the huge archive of "educational" Web pages collected by the crawlers of Scirus still shows more than 100,000 hits each for two of the most common four-letter words. This indicates how much material there is in this special collection of Scirus that just disgraces the worthy content and is anything but scientific. Dick Cheney's recent use of one of the words in the Senate has not lent it a scholarly status).
Scopus includes Scirus, whose results are displayed under a tab labeled Web. The features in this review do not apply to the Scirus component, thus my references to Scopus are meant to be understood within that, well, scope. Scopus records are combined from Elseviers' own abstracting/indexing databases, PubMed and publisher submitted records. This does not mean that there are zillions of duplicates, given the overlap among these databases. Elsevier retains one record and fuses information elements from the duplicates and triplicates. This is a very difficult task to do correctly, but I found few duplicates and those were because of non-matching page numbers or volume numbers, fooling the otherwise impressively smart deduplication program. I was told that the program will be refined to catch such duplicates.
Size and Composition
Scopus currently has 26 million records, close to the nearly 28 million records in the 30-year (1975-2004) citation database segment of the Web of Science. At launch, Scopus is to have 27 million records. By then WoS will also grow, and it must be kept in mind when reading the PR claim that Scopus is the largest scientific indexing/abstracting database that the Science Citation Index component of the full WoS system covers a 60 year period, with an estimated total of about 40 million records for scholarly articles.
There is also a fairly new product, ISI Proceedings, that has 3.5 million records for conference papers, which is not part of WoS, but is part of the Web of Knowledge (WoK) platform. Records about conference papers are part of Scopus and amount to approximately to 2.3 million records. Scopus is meant to be licensed in its entirety, whereas WoS can be licensed for different time frames, like 1990-2005 versus 1945-2005, and database components can also be sliced and diced to the customers' requirements.
The inclusion/exclusion of document types has an implication, as in some disciplines like computer science, conference papers are important citing sources, so the citedness scores cannot be directly compared. This is worth mentioning, as there will be a flurry of comparative evaluations between the two systems, including bibliometric, scientometric and informetric studies which should bear in mind these differences.
Scopus also includes books that often have several hundred cited references and therefore are particularly useful for reference-based searching. Having said that, WoS, which does not process books as source documents, has many of the same publications, such as the Annual Reviews series, correctly labeled as serials, while Scopus has them labeled as books. As for document type, my test search is just an approximation as I don't know if in crating the Scopus records the document types sorely missing from nearly 1.5 million Compendex records have been added or not. There will also be records for dissertations, but these were not available during the test.
More than 75% of the source documents are, not surprisingly, in English. However, with such a large database, even 4.5% German-language and 3.3% French-language materials mean a huge subset. The ratio of Spanish- and Italian-language materials is almost identical (1.14% and 1.15%) and is still yielding close to 300,000 records for each of these languages.
The main scope of Scopus is science and engineering. Within that, health represents 34.6% of the data, followed by life sciences at 27%. The absolute numbers of results by subject area code searching could be misleading, as health alone yielded 14,236,000 records and life sciences alone 11,027,451 items. This would leave little for anything else in a database of 26 million records. Obviously, the majority of records are assigned to multiple subject areas.
Indeed, more than nine million items have both health and life sciences assigned to them, so 16,255, 454 records represent the correct 62% proportion of health and life sciences materials. This is not surprising considering that EMBASE and BIOBASE are among Elsevier's largest indexing/abstracting databases and that PubMed records were also added to fill in when EMBASE or BIOBASE had no records.
Engineering is the third largest sub-domain with almost eight million records, followed by agriculture and biology materials (more than three million records); then chemistry, earth and environment sciences (each with more than one million records); and finally physics and math.
There are more than 450,000 records assigned to the social sciences category and nearly 292,000 to psychology. Once again, close to 133,000 records are assigned to both categories, so the total set of psychology and social science records is about 610,000. For comparison, ISI added more than 135,000 records about documents published in 2002 alone to the Social Science Citation Index database.
Scopus coverage goes back to the early 1960s (with 265 records before 1960), but it really kicks in with a substantial number of records from 1965 and steadily rises for the past 40 years, with the exception of dips between 1986 and 1988 and again between 1991 and 1992. For the past few years it has added about 1.2 million records per year on average, a volume almost identical to what ISI has added to its three citation databases during the past few years (counting only once the records that appear in more than one citation database).
One significant difference between the WoS and Scopus records is that the latter has abstracts for a larger percentage of their records than the former. The reason for this is that ISI does not create abstracts, it only includes them if they are present in the original documents. On the other hand, the Elsevier abstracting/indexing database records are enhanced by abstracts for many (but not all) of the articles that do not have abstracts in the source journal.
This presence of abstracts may significantly increase the recall when searching by word, and also helps users in selecting the most pertinent items from the results list. It is another question that during my testing of a small subset I found that the quality of abstracts and the indexing terms assigned by Compendex left much to be desired.
Cited references represent the value added information element, which makes both WoS and Scopus stand out from the database crowd and provide the prerequisite for finding related items on the same topic. The ISI citation indexes were created from the start with this idea in mind . Scopus retroactively adds the cited references to the records imported from its own set of abstracting/indexing databases and presumably extracts cited references from the well-structured digital archive of more than six million articles from the journals of Elsevier and its imprints, as well as from the digital records of its partner publishers. This is a massive undertaking. Currently Scopus goes back to 1996 with the citation enhancement project. In my estimate this implies the enhancement of a 10 million record subset. To put things into perspective, there were about 130,000 records to which cited references were added when the enhanced version of PsycINFO was launched in 2002. As of March 2004, there were 227,000 records enhanced with 9.8 million citations in PsycINFO.
WoS has cited references for the entire database (when the source records included them) and has a far larger collection of cited references than Scopus. Being second to WoS is still a pretty big deal and the Scopus software brings a lot out of the cited references.
The most important advantage of Scopus in terms of the inclusion of cited references is that it includes and makes searchable the title of the cited journal articles, while WoS does not. This small excerpt from the list of the cited references for the same paper illustrates the differences between the content of the cited references in the two systems.
WoS lists the first author, the title of the journal (or conference proceedings) and the start page of the cited paper. Scopus additionally includes all of the authors (within some limits), the title of the journal article or the conference paper, and the closing page (which helps in gauging how long the paper is).
Inclusion of the article title in the list of cited references right on the results list is very useful, as the titles usually orient the user about the content of the cited papers. In case of cited books, both systems include the book title in the result list, but Scopus also adds the subtitle and the name of the publisher in the cited reference list, both of which can be informative. WoS does not cover books as a source item, therefore there can't be links for book master records. The list of cited references in WoS is much tighter, but Scopus could also rearrange its layout to the grid format implemented so well in the search results list (as we shall see later). This could be especially rewarding when the cited references are enhanced by the information on how many times the items in the reference list were cited by documents covered in Scopus. WoS also has the citedness information, but only in the full record and only for journal articles and chapters of serial publications, such as the Annual Review series, not for books and reports.
A very important and visible difference between the two systems is that Scopus includes the URL for open access articles and reports not only in their master records, but also when they are cited references. This excerpt shows an example for the excellent set of enhancements in the cited reference list of Scopus. Most of these enhancements also empower the search process.
Traditionally, the findings of my review of the software capabilities are presented in three groups of features for browsing, searching and results displays, in this sequence. This time I am starting with the last group of features, as it is the most logical transition from the discussion of the content and is the best part of the Scopus software.
Scopus present the search results in a way that I have always advocated for the sake of quick scanning. The results list the most important bibliographic elements — publication year, article title, author(s), source title and citedness count — in a grid layout. The items on the list are clearly separated by alternating white and shaded backgrounds. It is conducive to scanning and scrolling quickly through the results list. In cases of long article titles, journal titles or several authors, the grid may become less efficient, though still useful.
It could be improved if, with due respect to cultural diversity, original titles of non-English-language articles were not squeezed into the title cell after the English title translation. Instead, they could use a three-character language code, use only the first three authors and/or use better journal abbreviations. As a compromise, limiting each entry on the results list could be acceptable if hovering over the cells displayed the full content. What would be worth squeezing into the grid lines is the number of cited references, because users may prefer to choose items from the results that have more cited references.
The results list is displayed in reverse chronological order by default. Alternatively, it may be sorted by author name or journal title, and re-sorted by publication year after using any of the other sort options, as well as by citedness and relevance.
Sorting by citedness is a very appealing feature because one may wish to select items from the results list by their perceived clout (implied by the citedness score). The reasons for inclusion of some cited references are not always purely academic, but irrespective of the citing motivation, this sort option is perfectly implemented, very fast and transparent. WoS also offers sorting the results list by citedness, but it does not display the citedness scores directly.
Sorting by relevance is not transparent. Still, it is of interest and is potentially powerful. Having looked at several results lists sorted by relevance, my impression is that the cited articles that have all of the search word(s) in the title lend the highest relevancy to the source items which cited them. It may be a good relevance ranking algorithm, but users deserve some explanation about the logic behind it.
If Scopus would list the cited references in the same grid-like format as the results list is presented, it would significantly increase the efficiency of scanning the cited references. It is excellent that the citedness score appears in the entries for the cited references — similar to the smart and elegant solution of CSA. The grid presentation of the cited references in WoS could be a good model with the modifications I mocked up for that review, also adding the citedness score of the references. (It is somewhat disheartening that I keep preaching for this feature, but it seems to fall on deaf ears, even though a less-structured grid format that I recommended for CSA showing the result with the alphabetically listed cited references re-sorted by citedness scores may be a walk in the park for the talented programmers at CSA to implement.)
There is another appealing feature in Scopus — the automatically (and instantly) generated summary matrix of the results. It shows the distribution of maximum 1,000 records in the results list by journal name, authors, publication year, document type and subject category. It is a gold mine for scientometrists and also informative for the average user to see which are the most productive journals, authors and publication years for a given topic. The subject category could show the interdisciplinary nature of the topic if there were more specific (sub)categories as there are in WoS. The result matrix can be expanded to show more details — a really smart solution.
WoS previously introduced a similar result matrix (the ANALYZE feature), which is one of the highlights of the new version, even though it must be initiated by the user and it is limited to a set of 500 records. Suffice it to say that Bradford and Lotka would highly appreciate this tool that saves the drudgery of manually analyzing the concentration/scatter of authors and journals. They would see that their oft-cited principles (developed when the abacus was high technology) would play out in the scientific literature of the new millennium. A few quick-and-easy tests using Scopus and/or WoS will show that Bradford's and Lotka's laws still prevail in most results of well-formulated searches yielding several hundred items.
Linking to open access sources is not only much more common in Scopus than in most other commercial databases, but explicit (not hidden) links often appear in cited references, a precious asset. It would be useful to have an option to download records in a tab delimited or comma and quote delimited format with data elements selected by the user (including the Cited By score, even though it changes dynamically). WoS offers a perfect model for this and anyone who would like to upload results lists into a spreadsheet for bibliometric analysis and other post-processing purposes would appreciate it.
Browsing and Searching
It is not a good sign that I have collapsed the two criteria groups into one. Indeed, I can't tell much about browsing in Scopus, because at this stage the only browsable element is the list of source journals. It shows not merely an alphabetic list, but also indicates the number of items included for each year; however, as of early September it is not complete. I will revisit this issue in another review to see which journals that don't currently appear in the list (even though they have been processed or appear with much fewer yearly intake than they actually have in the database) will have been updated and completed by the official launch time.
I hope that other indexes, such as author and journal names, will also be made browsable, both as citing and cited authors and journals. WoS makes browsable the author and group author indexes; the source journal list; the abbreviation list of author affiliations from the General Search template; and the cited author and cited work indexes from the Cited Reference Search template. These browse options are badly needed for comprehensive searches given the volume of variations and oddities in cited references (in addition to blatant errors).
Scopus has three search modes: Quick, Basic and Advanced. The first is a single cell displayed on every page and is very handy for launching a new search in the title, abstract, keywords and author indexes. The Basic Search is a well-designed template with pull-down menus to specify individual or combined indexes for searching; and date ranges, document types and main subject areas for limits. This latter uses checkboxes, but the document type limit field does not. By default, the Basic Search searches the same indexes as the Quick Search, except for the author index. You may choose the All Fields option for a sweep through search, which includes the cited references fields as well.
The Advanced Search template is meant for searching by indexes or attributes that are not offered on the Basic Search template, such as language, DOI number or book (which does not appear on the pull-down menu on the Basic Search template). You must use a command language on this template.
Although there is a small help panel showing the field qualifier codes and some examples, it is intimidating for the average user, partly because of the unnecessary and long field codes, such as
AUTHOR-NAME instead of the common
AU, and the pre-combined indexes that must be spelled out, such as
TITLE-ABS-KEY-AUTH, which will please only those who read German and have gotten accustomed to the mile-long agglutinated words such as Fussballweltmeisterschaftsqualifikationsspiel (which is their soccer aficionados' way of referring to the qualifier matches for the World Cup).
There are many more indexes than meet the eye on the Advanced Search template. These can be looked up by opening up a long help file. This is not a good idea. The template should offer pull down lists to pick the index instead of typing in the label
REFSRCTITLE when searching by the title of the referenced source document.
It is very useful to have subfield-specific indexes for the cited author, cited year, cited source, and cited pages, so that Fortune as a cited journal name can be distinguished from the word fortune in the title, but it does not have to be made in such a user-hostile format. However, there is no adjacency operator in Scopus, therefore a query like
REFAUTH ("garfield, e") AND REFPUBYEAR IS 1955 does not guarantee that only articles citing Garfield's seminal article will be retrieved. Actually, the search retrieved an article citing the 1955 Yearbook of Anthropology and one of Garfield's articles published in 1984. True, I was fishing for the example, but it is easy to find examples when the Boolean
AND is just not enough to make the search as precise as the
SAME operator in WoS, or the
s ca=garfield e (s) cy=1955 command in Dialog's command driven version of the ISI citation indexes. It specifies that the two search criteria must appear in the same occurrence of the repeatable cited reference field. For fairness, very few software can handle this correctly, and not even budding doctoral students doing citation analysis would pull this syntax off of the top of their heads.
There are no arithmetic operators to search for date ranges, such as
PY =1993-1996. It must be spelled out in a rather cumbersome way as
PUBYEAR AFT 1992 AND PUBYEAR BEF 1997. Not even this option is available for searching by the cited publication year range which I needed in some of my testing.
I'll hold my opinion about the Related Items feature, which is meant to find articles that share cited references with a record that the user chooses from the results list. It is perfectly implemented in the new edition of WoS, and Scopus representatives promised that by launch time their implementation of this worthy feature would be up and running.
Scopus handles truncation automatically, i.e. "search," "searched," "searches" and "searching" all retrieve the same records, although "searcher" does not. For such cases, explicit truncation symbols for single and unlimited characters are available.
All in all, Scopus is impressive and offers another excellent tool for searching by cited references in a huge collection of scholarly sources. Its very smart features for presenting results will help spread the gospel of the advantages of citation searching, which may delight Eugene Garfield, President Emeritus of ISI, even though Scopus undoubtedly enters a territory that was until recently the exclusive turf of ISI. As for Isaac Elsevier the printer and Lowys Elsevier the founder, they would see Scopus as flying high that original banner with the motto Non solus by not merely aggregating, but also smartly integrating various Elsevier assets. We shall see what this success for Elsevier power and convenience for users will mean to libraries, librarians and their patrons financially.