Title: Scitation (beta version)
Publisher: American Institute of Physics
URL: http://scitation.aip.org/labs/features/searchbeta/search.jsp
Cost: Partially open access
Tested: September 20-30, 2009
The beta version offers some useful additional options, such as result clustering and quick filtering by descriptors, classification codes, journal names, author, publication year etc. But the search is still limited to bibliographic metadata excluding the text of the papers, Open access full-text searching is the norm these days in almost all societies' archives. It aggravates the problem that Scitation is also the host/digital facilitator of journals and conference proceedings of other societies and associations.
For many years I have organized my reviews in this column into three main sections (context, content, and software) to organize my pro and con arguments about the databases. I found the context section especially needed, because a database can be judged objectively only if the reviewer knows the other related databases and/or the implementations of the same database on different software platforms, such as the varieties of the PsycINFO database implementations through ProQuest, CSA, OCLC, Ovid (and for a time on SilverPlatter), or most recently, on the American Psychological Association's site. Another reason that I feel the need to put a database in context is that its content or software may have been novel or even unique several years ago, but by now it may not even catch up to similar databases in its peer group. This is the case with Scitation.
I have reviewed the Scitation database twice before in the Peter's Picks and Pans column in Online. After its debut (under the name OJPS for Online Journal Publishing Service), it was my pick in 2003 because it was among the first to offer open access searching of bibliographic data (metadata) when it was a novelty.
In 2008 I panned it, because it did not keep up with the trend of offering open access full-text searching, which has become the norm throughout the community of the best scholarly publishers, most dominantly in the subject area of physics and its sub-disciplines, such as astrophysics and high-energy physics.
The PROLA database of the American Physical Society, the journals archive of the Institute of Physics, or the Astrophysics Data System (ADS) of NASA hosted by Harvard University, run circles around Scitation of the American Institute of Physics(AIP) by virtue of offering full-text searching free. The open access ADS which — in spite of its name — covers all sub-disciplines of physics and mathematics, stands out with nearly 8 million bibliographic records, most of them with abstracts.
Many offer the option of viewing/printing/ saving some full-text documents free directly from their own journals, and more than half a million papers through the arXiv preprint archive. The note to the beta version (which was released eight months ago) promised that the search engine would go through continuous changes and improvements, and the changes would be recorded in a change tracking log. The log remained completely empty.
The AIP claims that Scitation has more than one million records, but my test showed that the beta version has more than two million records, and the official version has 1.92 million records. Of course, two million is more than one million, but it would be in the best interest of AIP (as a journal publisher, and as a digital facilitator of other scholarly publishers) to update the information about the size of the Scitation database. Two million records about scholarly articles and conference papers in physics is an impressive number.
About a little over a quarter of the two million records are for journals, magazines and conference papers published by AIP. The rest are from journals and conference proceedings of other publishers whose digital libraries are hosted on the Scitation platform.
There are 12 AIP journals, two AIP magazines and the AIP Conference Proceedings (which includes hundreds of proceedings from the various series, such as the ones on Astronomy and Astrophysics, on Plasma Physics, on Nuclear and High Energy Physics, etc.). The importance of this is elevated by the fact that some of the AIP journals have been top-ranked in the appropriate subcategories of physics in the Journal Citation Reports by their absolute citedness count and by the recently introduced new Eigenfactor score.
The only comparable publisher of journals and other periodicals in physics is the American Physical Society, with nearly half a million papers (all of them full-text searchable on the native site of APS, but not through Scitation, in which APS is a partner ). (For comparison, the respected Institute of Physics Electronic Journal archive has 150,000 papers — all of them full-text searchable).
The other half of the records are for papers published by many respectful groups of publishers (beyond AIP and APS), which is quite a good company for partners and clients. The list of AIP partners/clients includes the Acoustical Society of America (ASA), the American Astronomical Society (AAS), the American Society of Civil Engineers (ASCE), the American Society of Mechanical Engineers (ASME), the Electrochemical Society (ECS), the Institution of Engineering and Technology (IET), the Society for Industrial and Applied Mathematics (SIAM), the Society of Exploration Geophysicists (SEG) and SPIE (which is not an acronym, but is the name of the society for optics and photonics) — to mention only the most widely known societies who have records for a substantial percent of their journal/conference paper records available through Scitation. An odd client in the group — given its emphasis on science and technology — is the American Accounting Association, with 15 journals. Quite tellingly, while it appears under "the" in the alphabetic and publisher list of the component databases of Scitation, there is no entry for it in the category list and can't be chosen from the pull down category menu that indicates well the scope of coverage of Scitation.
No matter how good and wide Scitations's source base, its depth of indexing is very inadequate because of the lack of full-text indexing. The advantage that the bibliographic records of several physics publishers' digital archives can be searched at one fell swoop is not unique anymore.
The free Scirus system from Elsevier offers cross-archive search for many of the same publishers' digital archives plus others such as that of Elsevier, the IoP, SAGE, Royal Society Publishing, as well as 20 other preferred Web sources (renown institutional repositories, disciplinary reprint archives) — all of these with the possibility to expand the search to the full-text — to the tune of 56.5 million records. This figure does not include the nearly 287 million Web pages indexed by Scirus, as these represent a medley of documents with very different quality (but they can be easily excluded from the search). Then there is the variety of new science-oriented free federated search engines, partially overlapping Scitation, from the government and from private entrepreneurs, such as WorldWideScience or Scitopia, which I will review soon.
The software in the beta version has not been extended to the full-text document. The broadest search still is the full bibliographic record. This is the biggest disappointment. AIP does a disservice not only to its own journals and conference proceedings but also to its partners/clients. The difference can be easily seen when searching the digital archive of one of its partners through Scitation versus the native search engine of one its partners. The database is PROLA of the American Physical Society, but it could be any of the other partner/client databases of Scitation.
The search for cryptography AND quantum dots" finds 175 hits when searching through the native search engine of PROLA - using the full-text index and limited to the Physical Review journal family. The number of hits is 5 when searching the same journal family through Scitation using the broadest index: the full bibliographic record index. When the search is limited to the title or abstract index both versions bring up very similar number of hits, clearly indicating that Scitation uses the entire PROLA database except for the two Physical Review Special Topics Series.
On the positive side, the beta version of Scitation has been enhanced by comprehensive clustering. On the side bar of the screen the software shows the top five keywords, publication years, PACS (Physics and Astronomy Classification Scheme) codes, authors, journals and article types found in the result set produced by the search.
It instantly shows the most commonly co-occurring metadata elements, which in turn can be used to filter the search by limiting it to one or two favorite journal(s) or author(s) of the searcher, or to a document type, or to a particular aspect of the topic, such as radiation pressure or atom-photon collisions in laser cooling. It makes the clustering even more useful that the hit counts associated with the cluster element within the result set are shown, predicting how much the result will shrink, if the cluster element is used for filtering.
This clustering/filtering option could be better only if the user, such as a graduate students not yet familiar with the sophisticated, structured PACS codes could have the choice to display their text equivalent in the cluster. Allowing the display of the next five most commonly associated keywords, etc. with the query would also be a great help in optimizing the results.
The result list is very clearly presented with an almost optimal font type, size and color. I would choose blue for the title (as that takes the user to the more detailed record with abstract), and use bold for the publication year rather than the volume number as the former is far more informative to the vast majority of users when pondering if the paper is current enough.
True, the results can be sorted by three criteria (relevance, most recent, oldest), but if relevance is chosen, than one can intuitively make the best compromise intuitively. It would be very useful to show the citedness count of the items and offer a sort option by it. This is an increasingly popular criterion in selecting items from a long list of results, and are implemented in more and more databases, such as ADS, PROLA and the SPIRES High Energy Physics databases. Marking the preferred items is much needed, to be able to retain on the list only the items which are pertinent for the searcher.
Although I like this format, the option of saving the result list in one or more of the standard bibliographic citation format(s) would be much appreciated by students and also by young researchers, not yet familiar with the vagaries of the excessive numbers of citation formats.
The lack of full-text searching in the two million record collection hosted on the Scitation platform is hard to understand. The expensive Verity software certainly could handle that with aplomb. The much larger storage area needed for full-text indexing is not an issue when 1 terabyte disk drives are available for about $100. It seems to be just a bad management decision, depriving users from discovering items where their search terms may not appear in the title, abstract or descriptor field of the bibliographic record, but are used by the author frequently in the full-text, and are obviously relevant for the topic.