Title: PsycINFO
Publisher: American Psychological Association (APA); available from CSA-IDS, OCLC, EBSCO, DIALOG, DATASTAR and Ovid
Cost: Based on number of users
Tested: Sept. - Dec. 2003
This classic reference database, richly enhanced by the inclusion of nearly 10-million cited references in more than 212,000 records and of tens of thousands of links to publisher and journal sites, could have been a contender for a best enhanced reference database award had it not been marred by:
In spite of all these problems (most of which can be fixed by switching from MARC format to XML format, testing links with URL validation programs then applying global changes to at least the syntactically wrong URLs), PsycINFO has great potential. But, as of the end of December, the above flaws spoil the database.
Citation searching has been the exclusive feature of the citation indexes of the Institute for Scientific Information (ISI) for decades. Although CINAHL, the Current Index to Nursing and Allied Health Literature, has offered this feature since 1995, it has not made big waves.
In 1999 the e-psyche database burst onto the scene with promises of including cited references in the bibliographic records created for articles published in more than 4,000 journals. Although this didn't turn out to be the case (see my companion review about e-psyche), e-psyche had one advantage: it prodded APA to enhance its databases with cited references.
Inclusion of cited references in bibliographic records is a mighty empowering value-added component, but it is a time-consuming and expensive process. It allows users to trace down earlier and (in the digital implementation) later works related (or likely related) to the item being consulted. It is an ideal reference tool for quickly and easily finding documents discussing the same topic. It complements other types of searches based on combinations of controlled vocabulary descriptors and free-text keywords very well.
PsycINFO has been reviewed often, so I will focus this review on the most promising new features: the inclusion of cited references and links to publisher and journal sites. The software implementations of the enhanced PsycINFO vary widely. By far, the best is the one by CSA Internet Database Services (IDS). It deserves an entire article about its very intelligent features that multiply the value of the cited references. (I have covered this software in the Cheers and Jeers for 2003 column on my Web site.)
To put PsycINFO into perspective, suffice it to say that it has about 2-million records (75% of which are journal articles), a very good mix of source documents and covers about 1,900 journals (including all of the most prestigious ones in mental health, psychology and psychiatry). It also includes records for more than 75,000 books, about 133,000 book chapters, 232,000 dissertations and 48,000 conference papers, which are particularly important as these are not covered by the traditional citation databases of ISI. Nearly 90% of PsycINFO records are in English and 90% of the records have abstracts.
On the down-side, the publisher field shows massive inconsistencies in punctuation and abbreviations, as well as typos and a great number of syntactically incorrect links (to be discussed later).
The biggest improvement in PsycINFO is the inclusion of cited references. This very important enhancement has two dimensions: 1) the type and number of sources that have cited references (the width of enhancement); and 2) the type and number of cited references that are included for the source documents (the depth of enhancement).
APA provides a credible, quarterly updated running tab on its site about the number of records and the number of cited references added to the database on a year-by-year basis. It clearly states that cited references have been added comprehensively since 2001 and selectively since 1988. Selectivity in PsycINFO means that only some records have been enhanced with citations, not that capriciously selected citations have been added for some records, some of the time (as is the case in e-psyche).
While most of the claims of PsycINFO can be taken at face value, the one about the comprehensiveness from 2001 onward is not entirely accurate. Out of the 196,798 records with a publication year of 2001 or newer, 158,154 records (80%) have cited references. This number could be OK because there are records in PsycINFO for items that usually don't include cited references, such as call for papers and guidelines for submission. However, in my opinion, records should not have been included for these items.
There are also records for more than 13,000 dissertations from 2001 and none of these have cited references in the records as a matter of policy, even though the source documents profusely provide cited references.
So where is the problem with comprehensiveness? Even a casual scanning of the list of items without cited references (excluding the ephemeral materials and dissertations) would show journal articles that are very likely to cite references even if their PsycINFO record shows none. As for the items shown in the previous link, I could check the print version of the first, fourth and fifth articles, and they had 43, 19 and 24 cited references, respectively. The omissions are obvious for the scholarly books and substantial scholarly articles. In light of the above considerations, non-inclusion of cited references from 2001 onward is likely to occur in about 2% of the records. Priority should be given to enhancing these records.
The enhancement of records by cited references has another dimension, too. That is the depth of enhancement, i.e. to what extent references in a given source document are added. PsycINFO is good at it (although there are some glaring omissions), but it does not come through, for reasons to be discussed below, in any of its online versions, except in OCLC's.
Of course, there is a very wide range in the number of items cited. Of the citation-enhanced PsycINFO records, about 10% have between one and nine citations, 14% between 10 and 19. On the other end of the spectrum, there are records for books with several thousand citations, such as the book on abnormal psychology that has 3,690 cited references. However, the hasty implementation by APA prevents the display of several hundred thousand cited references (at least temporarily) and that's a big problem.
The number of records with more than 250 cited references represents a relatively small portion of the database, but several hundred thousand cited references become invisible for all users (except for OCLC users). Why? Because APA has been distributing the records in MARC Communications format (also known as ISO 2709 format) to online services and released the XML format 18 months after the launch of enhanced PsycINFO.
The MARC Communications format was a brilliant brainchild of Henriette Avram in the late 1950s and has superbly served the library automation community, but recently it has started to reach its limits.
It has been enhanced continuously, but there is one area where it cannot be enhanced and APA should have realized this before it launched its enhancement project. The MARC format limits the length of the record to 99,999 characters simply because the record length must be defined in a five-digit segment of the fixed length MARC Leader field.
A PsycINFO record can reach that limit with 200 or 400 or 600 citations — depending (among others) on the types of cited references (multiple authorship with large number of authors), the type of documents (conference proceedings can have very long names, dates and locations) and even such bibliographic data elements as excessively long titles (common in scientific papers). The example shown here illustrates that a record conks out at the 228th citation exactly because of the many long citations.
On the other hand, all 683 citations in a book, mostly with short legal cited references, could be accommodated within the MARC record limit. (I know that short legal anything is an oxymoron, but there are some exceptions.)
Although APA warns on its Web site that "all published references are included in the PsycINFO record, except those not referring to publications, in non-Roman alphabets, or those that cause the character limit of PsycINFO records to be exceeded," very few, if any, end users of PsycINFO know about it. Users would be befuddled why dozens, hundreds or even thousands of citations are missing from the records of many articles and books that refer to published materials in Roman alphabets.
At least CSA-IDS clearly warns users how many cited references are present in the source documents and how many are displayed. The difference is either because of the record length limitation or because references were omitted, even if they would have qualified under PsycINFO's inclusion policy. These two values appear in the MARC record as two subfields. Still, they are not used by all online services, or are used incorrectly or inconsistently.
In DIALOG, for example, only the first subfield is displayed, leaving users in the dark until they realize that the alphabetic citations stop a tad prematurely at Bremner. Ovid displays both values correctly in some records, but not all of them. For example, for the article by Pinquart, the Ovid record claims that there are 355 citations present and 355 displayed, but as you scroll down the record you realize that the list of cited references ends at 69. Again, CSA-IDS tells it as it is and OCLC has the right numbers, but mixed up the labeling. This record is one of the examples where PsycINFO omits a great number of cited references. Together the record length limitations and (to a much lesser extent) the inexplicable omissions affect too many records.
It is also the sign of hasty implementation that PsycINFO did not heed theadvice of the APA Publishing Manual about the importance of minding the syntax of URLs, let alone the one about checking their validity. The manual is right in saying that "the URL is the most critical element: If it doesn't work, readers won't be able to find the cited material, and the credibility of your paper or argument will suffer. The most common reason URLs fail is that they are transcribed or typed incorrectly."
APA might as well have added that the credibility of your database will suffer if it has tens of thousands of wrong URLs. It may not have been easy to type the forward double slashes in the URL as backward double slashes (my word processor automatically corrects them, for example), but PsycINFO managed to do it and did it relentlessly in 2002 and 2003.
To its credit, not all the URLs were messed up all the time this way, just 11,852 in the publisher name field alone, counting only those cases where the URLs have backward double slashes instead of forward double slashes. These links are cold in all implementations except CSA-IDS, which corrected them. OCLC must have been so frustrated by the volume of incorrect URLs that it decided not to display the publisher field, although you may search and browse that field.
APA has not discriminated against certain publishers. You can find this URL problem in records for articles published by Elsevier, Taylor & Francis, Kluwer, Cambridge University Press, Sage and Wiley, to name a few. APA did not spare its German partner Hogrefe either, and in a truly democratic way, APA used the syntactically incorrect URL for its own Web address in the publisher field 746 times.
To break the monotony of the backslashed URLs, there are other syntactically wrong URLs in great numbers. PsycINFO graced the Haworth Press with a secure and non-secure prefix that, however, does not act like a belt. Then, as if to compensate for the excess, you will also find URLs without the protocol prefix, colon and slashes, which is at least a better fault as many browsers can handle such minimalist URLs.
However, Ovid's programmers in charge of PsycINFO may have looked only at such prefix-less records in studying the record structure when they decided to add the "http://" prefix to many URLs, making even more dysfunctional URLs from the original ones.
PsycINFO indexers sometimes may have had second thoughts about the publisher URLs, as they stopped typing mid-stream, leaving it as a puzzle. Ovid dutifully added the "http://" prefix, but the URL still won't take you anywhere when it has nothing but "www".
There are also plenty of wrong URLs that may look good and indicate an actionable link, but you will get red in the face when you click on them and get an error message because "academicpress" is misspelled as "acedemicpress" in 356 records. More difficult or cryptic URLs are not that easy to decipher for misspellings.
Publisher URLs were not the only data elements in PsycINFO that suffered from URL syntax disorder. Journal URLs also shared this pain and quite often none of the URLs were correct in the original record. Some online services tried hard but could not help.
Quite tellingly, among the many implementations I looked at, only CSA corrected the wrong URLs systematically and comprehensively, as illustrated by its implementation of the record shown before from Ovid and in hundreds of my other test records. CSA has done a big favor to APA and to the users who can happily click in CSA-IDS on actionable links that are dead in many other implementations.
Some online services apparently were more distressed than others by the hasty implementation of the enhancements by PsycINFO. DIALOG may have been the most stressed out, as witnessed by the tens of thousand of wrong descriptors that were erroneously extracted from the often changing format and tagging of PsycINFO records. Just look at some of the entries in the basic index that start with double-d, such as "ddistress." All of them are descriptors, the sacred cows in every decent database. Practically all of them are from records added after the new tagging and distribution mess of the enhanced records started.
They are easy to spot anywhere in the index in pretty large numbers, such as the 901 dpsychosocial and 441 dpsychopathology, although looking at them is not the best psychotherapeutic treatment for those who got psyched out when missing half of the records while searching by the root word "psychotherap."
Dealing with PsycINFO must have also taken its toll on Ovid, which usually has smart solutions in processing databases. But Ovid is one of the few implementations of PsycINFO that omitted the Digital Object Identifiers (DOIs) from the records, the key for licensing libraries to link simply to their full-text digital versions of journals from PsycINFO records. To my delight, (as I figured out from the CSA-IDS version of the database) there are more than 208,000 records in PsycINFO that include DOIs. Of course it increased my regret that I could not find any in Ovid.
On the other hand, I did find many duplicate records in Ovid's version of PsycINFO and alerted the company by showing them a small gallery of these duplicates. They are the original, erroneous records and their correction records. I was told that these are not true duplicates (which is true from a lawyer's perspective), but correction records (which is also correct).
But I was also told that "we believe that it is in the user's best interest to get the correct information even if it creates the impression of a duplicate record." Now, this is not correct in my book, or in the book of any of the other online information services that replace the erroneous records. Of course, after reading the PR hype about the e-psyche database, I may have turned just too skeptical about sentences that too often include the word "believe" and mean just the opposite.
This policy will not do Ovid (or their customers) any good. Correction records in an update are meant to replace erroneous records, not to add them, in order to make sure that the users get the correct record and only the correct record. I believe (ahem) that is what best serves the interests of the users. After all, replacement is one of the many advantages of being digital. It allows you to make mistakes disappear as if they had never existed (unless you digitally capture them for posterity).
When looking at the duplicates, users have no idea which one is correct and which one is not, and the records do not carry such distinctive labels. If the erroneous one shows up first, the user may not look at the corrected record. Beyond the annoyance of handling these duplicates, the number of duplicates may also distort the result of bibliometric studies that use Ovid's version of PsycINFO. As my test search at the end of 2003 showed, the policy has not changed — the duplicates are still there.
PsycINFO is far from perfect and its errors of commission and omission are due to sloppiness and harried implementation. Its claims are realistic (if not always accurate) and are not knowingly misleading.
APA should focus on enhancing the current segment of the database (say from 1990 onward) with cited references, rather than continually adding skeletal records for articles published since Gutenberg invented the printing press. APA should also get rid of the Mental Health Abstracts (MHA) database (which they acquired without releasing any PR statement, understandably) and should stop using it to "enhance" PsycINFO with sorry records for articles published in the mid-1900s. MHA is a very poor and dirty database. As the proverb says, "he that lies down with dogs, shall get up with fleas." APA can't afford to let PsycINFO get that dirty. On the contrary, it must stay squeaky clean, no matter how dirty the tricks its former and phantom competitors have been playing.