About Gale

Title List Changes

Business Development

Press Room

Outside U.S. and Canada

Product Information:

Customer Service:

Customer Resource Center:

Free Resources:

Reference Reviews

Péter's Digital Reference Shelf

June 2008

Title: Scirus
Publisher: Elsevier
URL: http://www.scirus.com
Cost: free
Tested: June 23 – July 9, 2008

The Context

Scirus stands out among the few multidisciplinary science-oriented databases that are freely available on the Web by virtue of its capable and intelligent software (see the exception later), and deep indexing and searching of more than 53.5 million scholarly and/or professional documents. Windows Live Academic could never come close to it with indexing its very modest collection of a few million academic documents, with its highly unintelligent software, and was shut down a month ago — getting the badly needed coup de grâce to put it out of its misery after two years of disgrace. Google Scholar has significant content and is excellent for discovering a few good articles (and sometimes identifying even a free version for some of those), but its software does a brutally primitive parsing of the documents made available by most of the scientific publishers, now including Elsevier. This has serious implications for known-item searching, and taking its reported hit counts and citation counts seriously. Unfortunately, the developers of Google Scholar did not jump on the opportunity of free access offered to them by Elsevier. They could have replaced the records gathered from indexing/abstracting databases and digital repositories and scraped from the citations given to papers in Elsevier journals by papers published in journals of other publishers that offered full-text access to their collection to Google Scholar (for indexing and showing bibliographic metadata with abstracts free for the users).

Only a fraction of the more than seven million records for papers published in Elsevier journals are from the ScienceDirect database (all of which are fully searchable through Scirus, showing directly and free much more information than Google Scholar’s existing full records and skeletal citation records).

I recently revisited Google Scholar for an overall evaluation and for an analysis from the perspective of using Google Scholar for calculating the increasingly popular h-index, so I am not going into further details here about its serious problems of illiteracy and innumeracy.

Another alternative, the Scitation database of the American Institute of Physics, is very small. It offers less than 1.6 million records for all the items in the journals and some other serials of AIP and its hosted clients. More importantly, it does not index the full text but only the metadata of the collection hosted on the Scitation platform. The same limitations are true for the relatively new Scitopia database, a joint project led by AIP and IEEE, but at least it offers information about somewhat more than 3 million records. See more details about Scitopia in my most current review.

I have not finished the testing of the WorldWideScience database and federated search engine, so I hold my opinion for a forthcoming column. The same is true for the more promising use of the same software by its developer (Deep Web Technologies) in the ScienceResearch portal.

Some of the largest scholarly publishers are multidisciplinary, and do full-text indexing of papers published in their journals, such as Springer, or Taylor & Francis, but they are not in the same category as Scirus even when we consider only their directly compatible journals subsets. For the first time, I do not mention among the best publisher of digital collections Blackwell’s Synergy database, because John Wiley & Sons went ahead with its plan after its acquisitions of Blackwell, and discontinued the Synergy service on July 1. Wiley InterScience has been an inferior site from day one. I have criticized it long and harshly. I liked Blackwell’s Synergy, and I was dreading this moment as you can read in the July-August, 2008 issue of Online, which I submitted in February.

Suffice it to say here that as I am working on this column on the first day after the long weekend of Independence Day, my e-mail is full of complaints from the SERIALST listserv about the royally botched transition which made hundreds of Blackwell titles inaccessible for subscribers.

I hope that Wouter Gerritsma, a librarian who runs the informative Wouter on the Web (WoW) blog (mostly about scholarly and pseudo-scholarly databases), is not the only one who read my early warnings about the likely consequences of the merger for the Blackwell content under Wiley InterScience software, and hopefully many did something for disaster preparedness. Having assigned the transition to the team that showed its competence and care for years, was as smart as the U.S. sending “heck-of-a-job-Brownie” to organize the assistance provided to the victims of the tropical cyclone in Burma.

The Content

Scirus has three major components. The component names are not perfect, and may even confuse users, but I stick to them for reference purposes. The first component is called Journal Sources, the second one is Preferred Web Sources and the third one is Other Web Sources. Of course, all the journal sources also are Web sources, and they are certainly preferred Web resources for the majority of searchers.

Most of the sources are publishers’ full digital journal collections, except for two databases, PubMed (which is indexing/abstracting database), and the fast-growing, splendid PubMed Central, which is a full-text repository, and on top of it all its content is open access, a small part is delayed open access.

Some are delayed for three years, but it is still a treasure, rivaled only by the collection of HighWire Press, the most sophisticated digital facilitator, which has 4.8 million articles from many of the best known scholarly publishers, including Oxford University Press. With nearly 2 million of those articles available free, it is ahead of PubMed Central, which focuses on life sciences, and in that disciplinary area it is the best.

Journal Sources

The first component is the most valuable as it includes copies of published papers that are reviewed by peers and/or the journal editors. (Much emphasis is given to peer-reviewed versus editorially reviewed papers in academia. I am not that convinced about the flawlessness of even double blind peer-reviewed papers. On the hand, comments, corrections and suggestions made by the editor of Online magazine, Marydee Ojala, with in-depth and very current knowledge about the professional searchers’ arena, has been at least as good and useful and face-saving as I get or give for peer-reviewed manuscripts. At the same time, I do read peer-reviewed papers that have important factual errors).

The Journal Sources component includes information about items from the digital collections of some of the most important publishers. Using the new software feature of Scirus, one can get a quick feeler about the publishers covered, and the scope of the worthy science coverage by Scirus. This subset has information about nearly 27.8 million journal/proceedings papers.

By far the largest is Elsevier’s Science Direct database, with more than 7.6 million items as of early July, 2008. More importantly, coming through the backdoor of Scirus, the full text of all these articles can be searched free whereas the native ScienceDirect site this option is available only for subscribers, guest can only search it by authors, journal name and words in the title and abstract.

The implications of this are huge. Searching for the phrase “anterior spinal fusion” yields 26 hits in ScienceDirect native search for guest users, while through Scirus (where everyone is a guest user), there are matching the query 147 hits. These extra 121 hits come from the full text of the journal articles of Science Direct. True, not all of them are about anterior spinal fusion and may mention this technique only in passing, or in the cited references, but certainly most of them have some relevant information about the procedure.

The same advantage applies to the AIP databases via searching it through the Scitation software of AIP. Oddly, it does not allow full-text searching even for subscribers of the AIP serial publications, but Scirus does offer this option. Again, the implications are significant, and in this case there is no other alternative. The search term “cold fusion” finds 27 hits from AIP journals and proceedings through Scitation, while Scirus finds from the same sources 47 hits.

It is to be noted that Scirus covers only the AIP publications within the Scitation database (which is obvious from the fact that it has 431, 857 items from that source), while Scitation includes the digital journal archive of several other association hosted by AIP to the tune of nearly 1.6 million records. In this case I restricted the search to AIP’s own publications by using the Digital Object Identifier of the American Institute of Physics.

Scirus has information about nearly 426,000 papers published by The American Physical Society. It further strengthens the physics emphasis of Scirus that it has information about nearly 254,000 papers from sources published by the Institute of Physics.

Information about 422,000 items in journals published by the Nature Publishing Group contributes to the well roundedness, and multidisciplinary nature of Scirus. With that said, you must realize that only a small portion of these items are about genuine research articles by genuine researchers, the rest are short news items by roving reporters of Nature who are filing news about avian flue on one day, and skeletons in the medical cupboards the next day, often with headlines that make reporters of the National Enquirer salivate. If there is no important scientific event to report about, the reporters of Nature can fall back to writing something about Google which is a perennial universal filler, just as Paris Hilton is for the gossip rags. It is quite telling about this filler invasion that the search for Google in the Journal Sources subset finds 36,129 hits, and 32,119 are from sources published by the Nature Publishing Group (and syndicated by a zillion other sources because of the Nature name).

The rest of the journal sources, are from smaller, but well respected traditional and new publishers, usually specializing in a specific discipline such as Proejct Euclid in mathematics, Maney Publishing in arts and humanities and BioMed Central in life sciences.

The Royal Society of Publishing is one of the oldest journal publishers, and in spite of its description in Scirus, a significant part of its digital paper collection is open access, and the research papers (as opposed to editorials) are open access without any delay.

The size of the PubMed Central Collection is suspiciously small in Scirus. One reason may be that open access papers available through Royal Society Publishing, BioMed Central and IoP, are not counted in PMC to avoid double counting, but not even this would justify a nearly 600,000 item difference. Delay in indexing materials added to PubMed Central may be another factor, because it grows at such a pace as a teenager, soon to reach the 1.6 million mark — when searched directly.

Preferred Web Sources

This component has more than 25 million items. With one exception these are from a variety of digital repositories at global, national, institutional and topical levels of scientific article, thesis and dissertation pre-print and re-print services. The exception is a subset of the huge, 22.6 million item patent information subset from Lexis-Nexis, which is part of the Reed-Elsevier emporium. It is a perfect example for the pros and cons of federated searching.

Scirus indexes and searches across the files of the largest patent offices of the world, and spares the trouble of jumping to each one of them in a systematic way, to run the same query on each. On the other hand, only the same generic bibliographic fields can be used for each resources, even though patent records have the richest set of metadata elements, exactly for the purpose of restricting the search to, say, specific dates of filing, pre-granting, granting and issuing of patents. There is not much to complain about this, however, because users are taken to the original databases where they can further refine their initial generic query to their heart’s content — without the tedium of searching multiple databases one by one.

These are very useful resources, and there are more of them than meet the eye, as the Digital Archives segment itself includes institutional repositories of more than 1.5 million documents from the repositories of more than 100 universities and research institutes.

Other Web Resources

This is by far the largest component with nearly 417 million items out of the entire volume of 470 million items in Scirus. I still have some reservations about one subcategory of this component, but I am not nearly as concerned as I was seven years ago when Scirus debuted, and had a much larger proportion of information about trash Web sites than now.

Filtering out such sites and pages is difficult, because the seven words that the late comedian George Carlin identified as you can never say on TV poses a difficult question. There are many scholarly articles that include one or more of these and other four-letter and longer words as part of transcripts of patient interviews, subjects’ search logs. The abstract of a recently published scholarly article about the disinhibition in Web-based chat systems shows a good (and safe to click) example for this. Some of the seven words may even happen to be the name of the authors in languages other than English. Limiting the indexing to Web pages on sites within the edu domain does not help, because there are students who are incapable for normal communication in a face-to-face situation, and can hadly wait to let their frustration and expletives loose on their web-sites ending with the edu suffix. There is now enough information in Scirus about worthy papers and web sites that more than compensate for the inevitable sicko sites.

The Software

Earlier this year Scirus enhanced its already good software by adding a sidebar that instantly shows the distribution of search results among the top three categories and among the uniquely identified sources within those categories. For example, searching the full documents about the h-index, the new, combined measure of scholarly publishing output and impact of teaching and research faculty, developed by Professor Jorge Hirsch, shows the distribution of the more than 45,000 hits.

This is suspiciously high, even if this measure became very popular since its introduction in 2005. The search can be easily limited to the year range of 2005-2009, and as opposed to Google Scholar, it does work in Scopus as expected, reducing the result set to 38,227 hits. It is more telling that the matching number of articles in the Journal Sources segment went down from 729 to 216, and in the preferred Web Resources segment from 1,102 to 344.

It is still too large in these two components, probably because the term h-index is used for other measures as well. Theoretically we could limit the search to one of the subject areas, such as computer science, but it would backfire, simply because the h-index is used yo measure scholarly publishing productivity and impact in many fields, including economics, medicine, physics and if computer science would not be also assigned to such papers, they would not be retrieved.

A logical next move could be to include the last name of the developer in the full-text query as his name is certain to be present in most of the papers, if nowhere else than in the cited references of the academic papers.

However, here comes an unexpected, and unacceptable twist and blow. Adding the name of Hirsch produces 62,057 hits—more than the initial search for “h-index” did. Scirus ignores the explicit Boolean operator, even though it is not needed because space between words implies logical AND.

This is like using Google Scholar, and if Scirus stoops down to that level of software capabilities, it looses its forte. Librarians would be also unhappy as here goes their effort to explain the implications of Boolean AND, OR and NOT operators time and again.

The hyphen may have discombobulated Scopus, but it should not have. Actually, the query h-index with and without quotation marks (to indicate exact phrase) retrieves the same number of hits, so that is not an explanation. I have never seen this in Scirus, so I tested if this nonsense happened in plain word queries. The search phrase “impact factor” finds 193,000 hits. Adding the word journal (outside of the phrase) reduces it to 115,409 — as it should. So hopefully, this is only a temporary mental block triggered by something unusual by the hyphenated word with a single character in front of the hyphen.

Scirus kept its earlier good feature (just moved it under the results clusters by category) to show the most common keywords words and phrases co-occurring in documents that have the term h-index to refine the query. These clearly indicate that the majority of results is relevant. These terms may be picked from the list and are added to the query. However, if more than one is picked it will AND-ed to the previous. What user would need is a check-box to mark the candidate terms, then indicate if they are added as an OR-group.

This is the most likely scenario, as in this case one would add in an OR relationship some of the terms shown, such as bibliometrics OR citation analysis and connect them in AND relationship to the previous search term, rather than adding bibliometrics AND citatation analysis, that the current term picking method accommodates.

There is an excellent software feature in Scirus (not available in Google Scholar), that makes refining your search in a different way very efficiently. The problem is that Scirus still hides it from the users. It is clear from the pull down menu that you can limit the search to the title of the article, the journal, to author’s name affiliation keywords, ISSN and part of the URL — all of which are very useful and quite unique in free services.

The hidden feature is that it is also possible to limit the query to the abstract field, which is an ideal way to narrow a search which produced too many records. If the term appears in the abstract chances are much higher that the topic identified by the term is the focus of the article. In our last case the word journal AND the phrase “impact factor” yielded 115,409 hits, limiting the search to the abstract field by using the abs: prefix yields and order of magnitude fewer but more relevant hits 13,436. It can be further indeed by putting all the three words in a phrase, i.e. “journal impact factor” which yields 118 hits . If you wish to increase the result just a bit, use truncation symbol after the word factor — Scirus is intelligent to accept it within a phrase. Try any of the above moves in Google Scholar and you’ll understand while it is far from being the sharpest knife in the drawer when it comes to advanced searching.

There is one area where Google Scholar does something better than Scirus. It is collapsing different instances of the same records from different sources (indexing database, pre-print server, publisher archive) into one. Google Scholar does not always make this right or consistently, but Scirus never does this, and there are many duplicates from different sources, like for this journal article. Such duplicates deflate the hit counts, and make the result list redundant and longer than necessary.

Then again, Scirus is much better to provide the abstracts or the substantial part of them then Google Scholar. This is true not only for articles published in Elsevier journals but also for papers extracted from many other publishers site that clearly identify the abstract with a metadata label, but Google Scholar ignores them, and picks irrelevant words from the menu options on the screen as a cheap screen scaper program. It is a constant irritation when I see hits from the excellent ACM Digital Library through a Google Scholar search, where the standard menu texts take up much of the space set aside by Google Scholar developers —in theory- for the summary.

Items can be selected from the result list conveniently in Scirus. If there are many records on a page that you want to e-mail, save, or export to a bibliography management software you may choose to select all of them through one click, and then deselect the ones that you don’t need. This is a much better approach then clicking on 15 of the 20 items displayed on a page. The number of items per page (10, 20, 50 or 100) can be specified in the preferences. Strangely, for the save operation there is a 25-item limit, but this can be overcome by e-mailing the selected result to yourself, or displaying in the browser. Beyond standard plain ASCII text format, the result may be exported also in RIS format, one of the standards for bibliographic data.

The clustering of results mentioned earlier is also very useful when going for the jugular, when you must have the full text without wondering if your library has a subscription of the digital version for the particular volume of the particular journal. Practically all the items in the Preferred Web Resources category are open access, and this is true also for 95% of the papers in PubMed Central and BioMed Central in the Journal Sources category.

Scirus has been progressing and expanding steadily since its debut in 2001. With the significant additions of tens of millions of high quality scholarly papers, it is an outstanding source even for users whose libraries don’t have subscription to many digital journal packages. It is a very good free federated search engine for those who are looking for scholarly publications, and can live with finding a few far less than scholarly pages in the process.

Careers at Cengage   |   Contact Cengage Cengage Learning     —     Gale   |   Course Technology   |   Delmar   |   Academic   |   Nelson
Privacy Statement   |   Terms of Use   |   Copyright Notice