Title List Changes

New Titles

Outside U.S. and Canada

Customer Center

Product Center

Free Resources

Reference Reviews

Péter's Digital Reference Shelf

May 2008

Title: Scitopia
Publisher: Alliance of Scientific Societies
URL: http://www.scitopia.org
Cost: free (for indexing/abstracting records)
Tested: May 15-31, 2008

The Context

It was about a year ago when Scitopia was announced (reported by Barbara Quint) and went into a public beta test. I planned a review in October, but a—not surprisingly—informative review by Joe Murphy was published in the Charleston Advisor, so I postponed my review to see what would happen in the next few months. I refer readers to Joe’s review as I focus on issues that are new, still problematic and/or have been corrected.

A decade ago when the ready reference tools in most libraries consisted of subscription-based indexing and abstracting databases, many of us would have been very happy with a free service like this. By mid-2008, however, our expectations have much increased. In many universities, you will find at least one of the largest scientific full-text databases, such as Academic OneFile from Gale, part of Cengage Learning (27 million records), Research Library from ProQuest (8.5 million records) and/or Academic Search Complete from EBSCO (16.1 million records). The title of the first clearly and rightly alludes to the convenience of one-stop searching, but we must realize that no single-source searching can be complete as the name of the EBSCO database suggests, which is, by the way, barely more complete than its more modestly named (and priced) Academic Search Premier database (15.7 million records).

These days, searching free the full text of the digital journal archives of the largest publishers, the preprint and reprint repositories are taken for granted. The problem is that unless the library has a good federated search engine such as Serials Solutions 360 Search, MetaLib or Muse Global, covering all or most of the library’s digital resources, users have to keep hopping like kangaroos from database to database—and they just don’t do that. Even the most caring librarians stop after a few hops, which also requires entering the query complying with the different syntax rules. I just tried this after the natural disaster in Burma, looking for information about the mitigation of the effects of cyclones. Even a modest query such as “tropical cyclone*” AND (predict* OR forecast* OR alert* OR monitor*) took many hours to run in more than 30 of the most promising digital journal archives, reprint and preprint repositories, indexing/abstracting databases. Then I had to check and cherry-pick the result list, then correct and re-run the searches if the host system’s software did not accept truncation within a phrase, or expected unusual symbols for unlimited truncation. This is where free metasearch services such as Scitopia can help (and could help much more without significant programming efforts, as I will discuss later).

Of course, there is always a compromise with metasearch engines, as they can’t be expected to speak fluently and without an accent the dozens of dialects of the various host systems, and to include the best, but unique features that work only in one or two of the host programs, such as the frequency search criteria in Ovid (FREQ) and Lexis-Nexis (ATLEAST), which often work out far better than any of the secret relevance ranking algorithms. When searching for full text, instead of the yes or no simplicity (or rigidity) of the Boolean AND operation, I find it rewarding to specify how many times my search terms must appear at least in the documents to qualify as a hit.

The question in native searching is usually what else search engines offer to help in selecting the most promising documents on the topic searched. (I use exceptionally large numbers of links in this section, in order to take you directly to the search page of the partially or entirely open access service providers, to run some tests and get a feel for the huge differences between the best and worst alternatives). In federated searching, the major issue is what are the possible advanced mode options that all or most of the target systems can understand without excessive, behind-the-scenes tinkering of the search to tailor it for not only different hosts, but also individual databases. I like to limit my search to those items that have cited references, but in metasearching including such criteria would exclude too many otherwise relevant hits, even if the differences between the software platforms to specify such criteria are taken care of by the federated search engine.

The software differences become obvious when using the digital document collections of the digitally most advanced journal publishers, such as Elsevier (through its Scirus service you have more options than as a guest of ScienceDirect), Springer, Oxford University Press, the American Physical Society, ACM and practically any clients of the best digital facilitators: HighWire Press, MetaPress of EBSCO and Atypon (which does not have a central search facility, so I link to its clients list, which takes you to the search page of most of its partners.

In these services, anyone can search the bibliographic data and the full text; display, print and/or save the results lists free. Subscription (or ad-hoc payment) is needed only to view the source documents. It is to be noted that most of the publishers (directly or through their digital facilitators) offer some free articles to any searchers (including the two to be mentioned in the next paragraph). In case of HighWire Press and its clients, “some” is a gross understatement: the number of free papers is huge, more than 1.9 million as of early June 2008. This is close to 40% of the nearly 5 million item digital collection of many of the most renowned scientific journals.

There is a reason that my list above did not include two other digital facilitators: IngentaConnect and Scitation of the American Institute of Physics. They still do not offer full-text searching, let alone clustering of the results list by publication year, journal titles, author names, or sorting by citedness to make the search more effective. Both of these two digital facilitators could do it, and two years ago a representative of IngentaConnect told me that full-text searching is coming. It still has not arrived. As for the Scitation hosting platform, its software seems to have been designed from the beginning to limit the searching to metadata, and this has a huge impact on Scitopia for reasons to be discussed.

There are several outstanding open access database producers and digital projects that show the best software features for discovering and ranking content, and often also lead the users to free full-text versions, such as the Astrophysical Data System (ADS), PubMed Central (both the US and UK “editions”) and SPIRES (which was renamed INSPIRE at the end of May). Scirus deserves credit also for making the full text, of not only the more than 7 million articles from journals of Elsevier and its imprints searchable, but also from journals of several other large publishers, institutional and subject-oriented repositories and governmental depositories. (I will have a detailed review of it this summer.)

Unfortunately, there are a few publishers, including one of the largest, IEEE, that allow non-subscribers to search only the metadata of its 1.8 million documents through its Xplore service. This is odd because the full text is searchable through some third parties, such as CrossRef (which is powered by Google and gave them the idea to create Google Scholar).

Then there are a few publishers who claim that they have full-text access to documents, but their pathetic search engine freezes when asked to do a full-text search. This has been the case with the sorry Haworth Press, which has been so bad with timely delivery of print journals that the permanent failure of its full-text search function was barely noticed. (If for some masochistic reasons you really want to experience the very poor design and search inabilities of the Haworth Press digital collection, click on this link as even the search cell is hard to find. If you can’t access it, here is a link to my recent review from Online magazine’s Peter’s Database Picks and Pans for some illustrations and comments.)

The good news is that the content assets of Haworth Press were acquired by Taylor & Francis (T&F) earlier this year, finally putting Haworth Press out of its misery and giving hope to libraries (and authors) that the mostly good content will be accessible digitally. The bad news is that the acquisition included Haworth Press’ management and staff and the digital content would not be available for about another year through the much better digital library of T&F.

The Content

Scitopia was born more than a year ago from the cooperation of 12 scientific societies, three patent offices and the Department of Energy. It has three major components. The excellent Information Bridge database of the DOE, patent records of the European, Japanese and U.S. patent offices and the digital collections of records of scientific societies.

The patent component is important for a minority of users and appropriate only for casual searches to get a feel about patents on a certain topic. Real patent searches require much more sophisticated features, using class codes, a variety of dates (application, issue, re-issue), names (inventor, assignee, agent), that are not available through the Scitopia software.

Information Bridge is a good medley of more than 173,000 full-text documents created by employees and consultants of U.S. government agencies. These include books, software manuals, conference papers, theses, dissertations and primarily technical reports related to energy related issues from a variety of agencies, not just the DOE. This and the patent components of Scitopia are the only ones where the full text of the documents, not just the metadata, are searched.

The flagship part of Scitopia is the set of records for journal articles and conference papers published by scientific societies, associations and institutes. Originally, Scitopia had 12 society partners. Although the announcement in April 2007 included the names of 13 societies, there were only 12. The American Institute of Aeronautics and Astronautics was listed on the homepage, and is still listed in the FAQ file, which now identifies 18 partners, but actually there are 17, as the excellent AIAA Digital Library is still not searched by Scitopia. It may not hit hard every user, but Senior Systems Architect and blogger Paul Parkinson certainly will be disappointed when searching Scitopia, as in his blog he identified AIAA as one of the sources particularly important to him. Although—not surprisingly—Scitopia still does not have a single record on aeronautics (or anything else) from AIAA sources, there are from 14 other association 828 hits, one from the patent office and more than 100 from Information Bridge. The number of hits is limited to 100 per each source, which is a reasonable limit, and motivates users to use more than a single word in their queries. It would be appropriate to clearly indicate in the FAQ list that AIAA is not yet available (after all, we are not in GoogleLand, where hit counts and citedness counts are like bluffs at poker tables). It would be good to update the reference from 17 to 15 the number of societies on the search template. While at they're at it, also to upgrade the claim to more than 4 million records, which seems to be the right number, not just a number from thin air.

My estimate—based on test queries for almost all of the native versions of the databases of more than 4 million records—does not even include the patent records. The list of partner associations on the advanced template is not only correct but impressive and functional, allowing the users to limit the search to the associations which are most relevant for them. It is a nice touch that this can be done prior to and after the search very easily.

The number of records—which would be useful to be included after the names of associations on the advanced template as it gives a clue for the users—is even more impressive. IEEE is far the largest contributor with 1.8 million records, followed by AIP (456,000+), APS (428,00+), IoP 330,000, SPIE (260,000+), ASA (118,000+), AGU (90,300), SAE (87,000), the Royal Society (63,000), ASCE (51,000+), ECS (50,000)—assuming that all the records are accessed on the native site of the associations are accessed by Scitopia. SIAM, ASME and the Institution of Mechanical Engineers contribute nearly 90,000 records. I could not test the size of the American Vacuum Society, and the Optical Society of America through their native search engines. Once again, there is no way to test the size of the component files in Scitopia directly because 100 items are retrieved at most from each source.

Of course, it is as important which society sources are not covered by Scitopia as which are covered. It should be obvious that the absence of for-profit publishers’ collections is a significant limitation, but Scitopia makes it clear that its scope is the collections of science and technology societies. However, within that scope and focus, I would like to see as partners those that are mentioned by Joe Murphy: the American Chemical Society, the American Mathematical Society, the Royal Society of Chemistry (which is promised to be available shortly) and then some, such as the American Society for Information Science & Technology, the Association for Computing Machinery (which is one of the few that offers free viewing of cited references, typically a feature reserved for subscribers)—to name a few from my sphere of interest, information and computer science. What I would find especially useful is some of the best science and technology pre-print and re-print depositories, such as the Astrophysics Data System (ADS)—including the arXiv repository—that covers most areas of physics beyond astrophysics and regales you with links to an increasing numbe of free full-text papers of the 7.7 million items for which it has clean bibliographic records. (It also has a highly intelligent software, but it would not matter much in federated searching where the focus is on plain vanilla queries).

Although it is not spelled out explicitly, the leaders behind Scitopia are apparently IEEE and AIP. IEEE is not merely the largest contributor, but also had the Scitopia name, URL and IP-address registered. As for AIP, many of the databases of the Scitopia partners are hosted on the Scitation platform of the American Institute of Physics. This leadership has its imprint on the software functionality of Scitopia.

The Software

As I mentioned before, IEEE is one of the few science-technology publishers that allows only metadata searching of its huge collection, reserving the full-text search options on IEEE Xplore for subscribers. AIP is one of the few digital facilitators that does not offer full-text searching of journals and conference proceedings in the digital collections of its clients. This has set the philosophy for Scitopia, and it is not a good one.

It does not serve the members of this alliance, who would like to expose their rich content to as many (potential) customers as possible, both to subscribers and to ad-hoc, pay-as-you-go customers. IEEE at least allows full-text search for Xplore subscribers, but AIP’s Scitaton clients (and their users) don’t have this option, even though they subscribe to journals and/or conference proceedings of AIP, ASCE and many of the other associations who do have a digital library with the full text in HTML and/or PDF format.

The Advanced search mode of IEEE’s native Xplore search (available only for subscribers), quickly finds 113 papers that include the term “beach erosion” in its huge collection. Certainly, not all of them are ABOUT beach erosion but just mention it in passing in a paragraph or in a cited reference. This comes with the full-text turf, and is expected.

The guest users of Xplore are allowed only to search the metadata and it brings up 14 hits. brings up the same 14 hits Scitopia because it is allowed to search only the metadata—just as guest users. But then, the search in regular brought up 122 hits Google—more than IEEE has, because Google treats the bibliographic records, the table of content entry and the full-text records as three hits. Google will be Google, will inflate the hits, will double count and triple count items as tricky waiters/waitresses do, but publishers (and users) still treat Google as deity and their best friend, until they get burned in real research projects by taking its numbers at face value.

does not use this cheap trick of triple counting IEEE Xplore, so for bean counters, Google seems to bring the most out of IEEE’s collection. To its credit, Google Scholar behaves nice in this particular case and does not out-explore Xplore, actually brings up 104 hits, i.e. 9 items less than Xplore, and 18 items less than regular Google, and no duplicates and triplicates, which otherwise it regularly includes.

Throughout my test searches, Google Scholar served up its usual oddities and Alzheimerish difficulties in correctly identifying authors. There are two entries in the main Google Scholar result list for the same paper, but I try to celebrate Google Scholar’s effort to identify and tuck under one entry the extra quintuplets. As a seasoned and skeptical Google Scholar searcher, I am not surprised that two versions have only a single author, S Winds, and the rest have by S Winds, T Forecasts, and M Hurricanes shared authorship. I understand. These are possibly descriptors or identifiers such as sea winds, tropical forecasts and maritime hurricanes, and the parsing software of Google Scholar fancied them to be author names. After clicking on a few links to the source records to verify this assumption and most of them failed, I gave up when I got the message—indirectly—that this is a dog.

As for AIP, the metadata-only search policy also backfires. The search for nanofluids in the full records, reports 39 hits. Through Google Scholar the number of hits for the query nanofluids site:aip.org is 133. True, only about 100 are from AIP journals, the rest are from journals of other publishers hosted on the AIP Scitation platform, such as SPIE, and ASME. Still, the odd trait is there, AIP allows full-text search through Google Scholar and regular Google, but not through its very own platform, and implicitly, not through Scitopia, due to its apparent credo of “don’t put your best foot forward”, but let others do it who are not in the Scitopia partnership, like Google and Scirus. Someone should explain to me the logic behind this policy, slowly and tenderly.

More importantly, the software has three significant limitations that can have crippling effect on the search results. One is that Scitopia still can’t do Boolean OR correctly. Jim Murphy brought this up in his review and found this in the patents and government documents segments. I have bad news. It affects also the societies component. When you run the example used in the Help file, the query frogs finds 470 hits, toads finds 184, and frogs OR toads finds 463. Scitopia ”compensates” for this shortchanging behavior when the search is limited to the title. The query frogs finds 123 hits, toads finds 34 and frogs OR toads finds 231 in the society journals segment You don’t need a triple PhD to realize that this is also absurd, and we are in GoogleLand, not a reinforcing move for building trust. It is one thing that Scitopia has trusted sources in the database (and in its tagline), but the software should be also trustworthy. It has an appealing look, and its e-mailing of result lists is one of the best I have seen, it is not enough.

It is bothersome that in reporting the hits for exact phrase searches, Scitopia counts as hits also the records where the terms in the query between quotes are not next to each other in the given sequence, i.e. they are not in unidirectional adjacency, or they are not even present. True, you can get a sense that something is fishy, when the 1-5 star(s) indicating relevance in front of the record, suddenly disappear from the result list, suggesting that they are irrelevant, but that does not justify this approach.

I could not fathom how could the search in IoP through Scitopia yield 80 hits when my native search in the IoP system found only 34 hits for the query “cold fusion”. Scrolling down the list I found that from the the star(s) disappeared 32nd item, clearly indicating that beyond the first 31 items the remaining 49 “hits” are irrelevant. They may have cold and they may have fusion in the record but not as a phrase, as I found out when checking the entire document for the exact phrase. I got “cold feet” as Big did before the wedding.

The third big concern in the software relates to the author name search. The advanced template clearly indicates the choice to use the last name only or the last name followed by the first name initial, separated by a comma. Last name alone is certainly asking for trouble in almost all of the cases. First initial is usually not sufficient in many cases to distinguish John Doe and Jane Doe, but I was compliant and entered hawking, s. The 174 hits seemed encouraging and right in the ball park, although I was surprised to see in the pull-down menu that he published also for journals of the Audio Engineering Society and IEEE. I am not oh-so knowledgeable about Hawking, I just spent many hours computing and analyzing his h-index in a number cited reference enhanced databases, so while I don’t know, let alone understand his writings, I happen to be familiar with the numbers, types and venues of his publications from WoS, Scopus, Google Scholar, ADS, PROLA, AIP, IoP and SPIRES.

Soon after scrolling down beyond the first third of the result list, names like started to appear Hawking, J. A, Hawk, S.R., followed by items with author names entirely different from Hawking. Nearly two thirds of the items were not authored/co-authored by Hawking, let alone by Stephen W. Hawking. There were 100 items from journals published by the Audio Engineering Society, none of them by S.W. Hawking. (It’s good that there is the 100-item-per-source limit in Scitopia). Putting the query between quotation marks does not make any difference. Using Hawking, S.W. reduced the “hits” to 157, but retained all the nonsense items, and removed those where he appears only as S Hawking or Stephen Hawking.

It is ironic that the software stops in the middle of the search (often much sooner) to ask you if you want to continue, in order to save bandwidth and finish the process more quickly. At the same time, it keeps fetching obviously irrelevant items by ignoring the query specifications. Once again, this is not GoogleLand, where due to the strongly flawed parsing software of Google Scholar and the decision of its developers to ignore the clearly tagged metadata made available in the records—there are zillions of authors with names like Introduction, Preface, Foreword, Conclusions and Password as I discussed and illustrated in my recent review in the January issue of Online Information Review Scitopia shouldn’t follow the practice of Google Scholar in so widely lowering the bar for query-matching (and later hopefully also for citation matching) as this service is meant for academic purposes.

In spite of its current shortcomings, I think Scitopia can be a useful source as a free federated indexing/abstracting search engine, if it does not does not sacrifice basic tenets of searching to make the hit counts look higher at any cost. It should be enhanced to extend the searches to the full-text documents of the partner societies, and additional partners should be recruited. It should be included in the repertoire of college libraries in order to be invoked through proxy servers. That way, when linking to primary sources to read the source documents, the host system would be able to recognize the remote users as authorized patrons of the given library that may have a digital subscription to many of the journals and conference proceedings covered by Scitopia.

Careers at Cengage   |   Contact Cengage Cengage Learning     —     Gale   |   Course Technology   |   Delmar   |   Academic   |   Nelson
Privacy Statement   |   Terms of Use   |   Copyright Notice