
Title: PubSCIENCE
URL: http://pubsci.osti.gov
Publisher: Department of Energy (DOE)
Cost: Free
Tested: August 20-26, 2002
PubSCIENCE has been the talk of the town among information professionals since the announcement made in early August that the Department of Energy (DOE) proposes to discontinue the PubSCIENCE service. This public suicide notice generated a lot of postings on Web forums and a flood of messages in my e-mail box.
I am not at all concerned by the proposal because this is a poorly implemented database considering the whopping $500,000 a year it gets from Congress. That money should be used for much better projects and services of the DOE, such as the excellent InfoBridge page image document delivery service or, for that matter, the far better cousin of PubSCIENCE that does not get enough press coverage: the DOE Energy Citations Database (read the companion review).
I am presenting this review a little differently than usual, focusing on the most annoying deficiencies, absurd claims and sheer misrepresentations. The high-falutin' claims and vision statement by the publisher only add to the anguish of information professionals, who should worry about losing money for a free service and not about losing this database. The few good links that PubSCIENCE has should be transferred to the much better databases of the DOE, and the budget earmarked for PubSCIENCE should be spent on enhancing those.
Unfortunately, the Office of Scientific and Technical Information of the DOE never made it clear exactly what types of records are included in its version of the Energy Science & Technology (ES&T) database, which has been available for many years commercially at $70 an hour and $1.55 per record displayed/printed. The DOE refers to this database as EDB (Energy Database), but I avoid using that acronym as it may lead to confusion.
Suffice it to say, PubSCIENCE is an approximately 1.3-million record subset of the ES&T database, which had about 4.2-million records at the end of August. It is the smallest member of the ES&T subset family. Estimating the size correctly is particularly difficult because of the excessive number of duplicates and triplicates in the database. Other databases have duplicates, but not nearly as many as PubSCIENCE. The fact that they are not next to each other or at least congregated decently like passengers in a London bus stop, but rather scattered in the results list as numbers 2 and 6, 3 and 5, or 4 and 8 makes spotting them even more difficult.
PubSCIENCE was launched in late 1999 and touted as the science and technology counterpart of PubMed. The publicity blurb made it very clear: "PubSCIENCE is completing the circle in providing electronic access to the physical sciences as PubMed has done for the medical sciences. PubSCIENCE is addressing the information needs of researchers, students, and the public for information in the physical sciences and other energy-related disciplines." No it isn't and no it doesn't. To paraphrase an old-fashioned, no-nonsense senator, PubSCI you are no PubMed.
Clearly, PubSCIENCE was riding on the waves of PubMed, but it is a far cry from it in terms of quality, concept and implementation, as I will demonstrate. Suffice it to say, PubMed's specialists do create or update records records received from publishers by assigning MeSH terms, other check tags and additional value-added information for feature articles (except for 6-8 percent of the records submitted by publishers that are found ephemeral for PubMed's scope). The links that take you to the article, not just the publisher's home page, are just one more, undoubtedly useful, feature in the informative, content-rich records. PubSCINCE policy is sharply different: it downloads the batch of records from the publishers' sites and stores them on an "as is" basis -- very often without abstracts and without descriptors, and certainly never assigns terms from its Energy Thesaurus, let alone create abstracts. The implication of this is clear for any information professional.
The official blurb says that "in essence, PubSCIENCE is a modernization of Nuclear Science Abstracts and the Energy Science and Technology Database." It then refines that by adding that "PubSCIENCE is a World Wide Web service developed by the Department of Energy's (DOE) Office of Scientific and Technical Information (OSTI) to facilitate searching and accessing peer reviewed journal literature in the physical sciences and other energy-related disciplines."
In reality, PubSCIENCE is very oddly composed, and that has a serious impact on searching. A large part of the database consists of records providing information about journal articles published since 1990. This is referred to as the ALL section of the database. The help information above the search template claims that the ALL section covers information "from the past 10 years to the present." This is probably government speak for "information published in the past 10 years." This may have been true when the database was launched in 1999, but now it means a nearly 13-year time span. Not a big issue, except for the fact that ALL is certainly not all of the database. Actually, the Archive set of the database, with its approximately 550,000 records, makes up 43 percent of PubSCIENCE. (The Archive filter option is at the very bottom of the pull-down menu of publishers' names. Mixing time period categories and publisher names in a single menu of nearly 50 items is a bad idea when a simple check box would do.) The ALL section, which has about 730,00 records, could have been called CURRENT, or just 1990 onward. It doesn't help that too often the results from the presumably current ALL segment cite publications from the mid-1980s and even earlier. There is no reason for this weird splitting of the database, as among the few filters that you can use to narrow your search is publication year range.
Strangely, the ALL section lists as a filtering option the publishers that cooperate with the DOE and provide bibliographic information and, occasionally, links back to the their own databases for the abstract and/or the full-text of the article.
Presumably those records that you select by clicking on the publisher's name will retrieve articles from that publisher. Unfortunately, this does not work all the time. For example, when you choose Ziff-Davis, the publisher of many widely read -- though not scholarly, but definitely practical -- magazines, like PC Magazine, you will be surprised to see a record for a paper published in Astronomy and Astrophysics. One ill-assigned record is not an issue, of course, but it was good enough to whet my appetite to look further. A search for Ziff-Davis as the publisher and Astronomy in the bibliographic citation brings up 23 "matching" records. This would still not be the end of the world, but it is a tad excessive that out of 124 records attributed to ZDNet, the online arm of the publisher, 80 percent are actually from other publishers.
Even 124 Ziff-Davis articles from its best journals would be rather puny, but almost all of the genuine ZD records are for the rather fluffy online column AnchorDesk. Even worse, the links are to the Computer Magazine Archive hub (CMA), which has been renamed, removed and retooled many times. It is now called BizTech Library and the user may feel like a tourist dumped somewhere in Los Angeles looking for a hotel's address remembered from a Raymond Chandler novel. This is a large scale problem, seen in tens of thousands of records, such as most, if not, all of those from the New England Journal of Medicine and the AAPRG Bulletin. It adds insult to injury when the publisher's site has an inferior search engine, like that of Nuclear Technology Publishing.
Clicking on another publisher in the pull-down menu will also disappoint you. Marcel Dekker checked in with three records -- yes, three -- in the entire database. At least it is present, unlike some of the other mouth-watering publishers who appear only in the publicity materials as forthcoming, like Oxford University Press and the American Society of Civil Engineers. The American Society of Mechanical Engineers has not shown up yet, and probably will not. Charter member Ziff-Davis seems to be content in allowing links to a few dozen puny records on one of its Web sites, but not the online versions of its mainstream journals that are proudly listed on another publicity page. The list is worth looking at if you want to see some interesting developments in journal acquisition. I know firsthand the fervor of buying and selling journals. Still, I was incredulous to see that the American Meteorological Society so vehemently acquired such a variety of journals from a variety of disciplines as AMBIO: A Journal of the Human Environment, American Museum Novitiates, Biology of Reproduction, Copeia, Environmental Entomology, Invitro Animal Cellular & Developmental Biology, Journal of Paleontology or the Journal of Parasitology -- if you believe the Web page of PubSCIENCE about its partners.
These links, whose presence is the most touted feature of PubSCIENCE, are still better, of course, than having no links at all from the records. PubSCIENCE claims that "once the user has found an interesting abstract, a hyperlink provides access to the publisher's server to obtain the full-text article. The article will come up immediately if the user or his/her organization has a subscription to the journal." Oh boy. No other ifs, ands, or buts.
I can't tell you exactly what percentage of the records have any kind of link (good, bad, ugly or useless), but in the many hours I spent using the database, only much less than a quarter of the records had links, and they showed a great variety. Yes, some of them are good and work as they should, so let me start with an example to set the scene and level of expectation.
Links to the American Physical Society are typically good. They take you to the top of the article with the free abstract -- and other possible goodies, such as free, full-text articles in HTML and PDF formats, articles citing back to the one being viewed, links to PubMed records, etc. -- in sight. This is often the case with the journals hosted by High Wire Press.
It is because of these superb extra goodies that I find it incomprehensible as to why there are no links to the journals known to have free abstracts and bibliographic citations, and even full-text articles (maybe after a 6-month or longer moratorium, such as is the case with the Proceedings of the National Academy of Sciences (PNAS) where everything older than 6-months is free, and beautifully presented). But, this is the PubSCIENCE record: no abstract and no link to the article, so you are left on your own to go and find the article.
Then there are the useless links from the record to the site that do not add anything. It is quite predictable, for example, that for book reviews there will be no abstract, only the bibliographic citation data already available in the PubSCIENCE record. Such trips to the publisher's archive are fruitless. And there are many variants of this.
Just click on the illustrations as you read this passage to feel my pain as I tried to benefit from the links in a real search about recent articles on energy information sources. My search yielded eight results, three of them had no links at all and one had a link to an abstract that was already available in the PubSCIENCE record. Another record linked to one with no additional information. One PubSCIENCE record specifically promised an abstract, but did not deliver anything extra. Yet another also promised an abstract and could not even locate the item at the destination site. Another promised an abstract, but just teased me with a PDF, not the abstract. The last one simply linked to the wrong article. This is not necessarily PubSCIENCE's fault, and may well be that of Blackwell -- but both of them should learn that linking based on volume, issue and page number does not suffice and may confuse things when two or more articles start on the same page. That's exactly why the Digital Object Identifier (DOI) was devised to avoid ambiguity.
This whole PubSCIENCE project was much myth-making for good money. The majority of the data was already available in the Energy Science & Technology Database. The only pluses were the added links in some of the records to some of the publishers' archives, some of the time. But slapping poorly designed software on a data file that cannot correctly reconfirm the queries you enter -- omitting, for example, the field qualifier (such as title or author) and leaving you in limbo --is just a waste of time and money.
The irony of this whole project is that it got all of the press' attention in information professionals' circles, while two excellent DOE sources hardly got any publicity. I reviewed one of them, the Energy Citations Database, so that you can see that it is a much better idea to enhance this database with the good links from PubSCIENCE and to let the rest of it rest in peace.