Title: jake
URL: http://jake-db.org
Publisher: Yale University Cushing/Whitney Medical Library and others
Cost: Free
Tested: October 27-29, 2002
One of the most common questions in the rapidly developing world of online bibliographic and full-text databases and digital archives is this: which databases cover journals A, B and C since 1994 in full text? It has not been an easy question to answer. If you are familiar with the largest online information services, you may go to their sites, one after the other, and check their increasingly user-friendly offering of lists of titles covered in their various databases. Unfortunately, it may take an hour or more just to check a few of the largest online services's title list for a couple of journals. Aren't the serials directories, like Ulrich's and Ebsco, supposed to provide such information? Theoretically they do, but practically, their data are neither accurate, nor complete. Nor do they provide information systematically about the time period of coverage.
Enter jake, the free service developed at the Cushing/Whitney Medical Library by talented librarians in 1999. The name is a rather clumsy acronym for "jointly administered knowledge environment," but the service is excellent. It provides an immensely useful, free tool to determine instantly which databases cover which journal for what time period and in what format. By just typing in the journal's name or ISSN you get a matrix with much information at a glance.
The basis of jake is a database compiled from data provided by a variety of players in the online information industry. Who are they? Database producers like the American Psychological Association, online aggregators like Ovid, companies that both produce databases and aggregate them with content by 3rd parties' like Gale, ProQuest and Ebsco. Then there are the many journal publishers who set up their own archives directly, like Elsevier and its Science Direct archive of more than 1,200 journals. And there are the outstanding facilitators like HighWire Press, Catchword and Ingenta (which bought Catchword), who organized the digitized archives of many publishers.
The whole process of maintaining jake is digitized. The developers of jake apparently get the digital title lists from their partners. These include the names of the journals and their International Standard Serial Number (ISSN), along with the coverage of these journals in the various databases. The coverage is typically broken down by major digital formats (citations, citations with abstract, ASCII text format, HTML format, page image format, text plus image). For each format, the starting and ending period (if applicable) is indicated.
The exact content of these individual files varies. Here is a sample from Gale's title list for the Computer Database, and Ebsco's list for its Academic Premier database.
Even a cursory look at the content would suggest that consolidating hundreds, if not thousands, of lists is not for the faint of heart. Even if the labels of the variables were identical (and they are not) across aggregators, the syntax of indicating the date coverage varies enormously. While for a human, a start period that appears as 1994-02 could be the same as 1994/2, for the computer it is very different. Of course even a human may wonder if 02 is the month or the issue number within 1994.
Furthermore, the journal names are often inconsistent even within a single database, but at least the ISSN is so simple that it is a perfect primary key for merging the data coming from a variety of sources. It is another question that if the original content provider messes up the ISSN and assigns the same number to different journals, or just misspells the ISSN, it causes trouble. I did not have to look hard to find a wrong ISSN in the UnCover database, which I have never considered to be a leader in quality control.
At first glance it catches your eyes why there are two entries for Online Review. Well, one has an ISSN assigned, the other does not. This is very common in the entire UnCover list. To aggravate the situation, UnCover assigned the same ISSN to Online & CD-ROM Review that Online Review had. Other document delivery and abstracting services may make the same mistake, even publishers may forget to apply for and use a new ISSN for a while after significant change in the title, but they usually correct the error quite soon. In UnCover, errors of commission and errors of omission in bibliographic data are pervasive, so whoever is or will be assigned to incorporate their title list into jake, will have my sympathy.
Luckily, jake's developers were smart enough to not rely on ISSN alone. There is a jake id assigned to every journal that helps in rectifying problems. Still, consolidating tens of thousands of bibliographic records from hundreds of suppliers is like herding cats.
Sure, producers of serials directories have to go through somewhat similar processes in collecting data from far more publishers, often dealing with mom-and-pop outlets, but they are charging big bucks while jake is free. Furthermore, in the case of jake, ISSNs have a literally pivotal role, and with many database producers reporting the coverage of many of the same journals, inconsistencies are visible from a mile away and need to be addressed.
Having dealt with automating union serials catalog projects, I can appreciate what jake has achieved. While the result may not be perfect yet, it has definitely been worth the blood, sweat and tears.
In jake's database, there is information about the database coverage of more than 30,000 journals. If you think that this is dwarfed by the close to the quarter-million records in large serials directories, think again. Those include information about tens of thousands of serials that are of extremely limited interest to 99.9% of the users. The journals and newspapers in jake represent the serials of widest interest. After all, journals covered by 50 or 60 databases are certainly far more valuable than, say, the Boy Scouts' Chronicle from a tiny village published every third year.
The data in the original jake database, which is available at http://www.jake-db.org, is a tad outdated, but still has potential and the concept is very appealing. Adding current data is child's play compared to the conceptual design and implementation.
Actually, jake garnered a lot of support from fellow librarians, who in turn developed additional freeware versions with added functionality and/or a different look and feel, or data subset. One of the best is sfu-jake from Simon Fraser University, and I am using that version to discuss some software issues and to provide some humble suggestions for enhancing this excellent service.
The simple interface offers a number of features. You may search by title, ISSN, jake-id or subject. I find this latter particularly useful when searching for journal coverage with which you are not familiar and because of that you want to see which journals are covered by the most databases.
It is highly informative to see a list of 64 journals with the subject scope of energy and the list of databases. The first two titles listed for energy, for example, make it clear that none of the databases have full-text coverage, while the 3rd entry, the Annual Review, is covered in full text by three databases. This is telling, especially in comparison with other journals' coverage.
Nothing proves this better than when you scroll down the screen and see that the Energy Journal is covered by 28 databases in full text. Obviously, it would be lovely to be able to sort the result list by decreasing full-text or indexing coverage. As there is a check box for sorting by title (which is the default) probably further options will be introduced (beyond the expected ISSN and jake-id).
Clicking on an individual journal displays the very informative matrix. On top of the matrix you will see the LC subject headings and LC and DC codes. If you choose, you may display the name variations under which your journals may appear. This is a laudable idea, alerting people to think about variant spellings.
The matrix is generated in a few seconds, and its content speaks for itself. It effectiveness could be furthered by allowing users to sort the results by the providers' name and, more importantly, by date of start of coverage, in ascending order.
This latter sorting option is also important to make sure that the user does not stop scrolling after the first 4-5 entries, when even meatier coverage may pop up. This was the case with the Energy Journal shown in the previous figure. It is covered by many databases listed alphabetically, starting with ABI/INFORM. However, the meatiest coverage is provided by General Reference Center Gold, General Reference Center International and InfoTrac OneFile, which appear on the very bottom of the second screen of the alphabetical list. Some sort options, such as sorting by earliest coverage in full text, by longest abstracting/indexing coverage or by content providers would be very useful. There is an analogy for this in the listings provided by the best airline reservation systems which offer radio buttons to display the result list -- among others -- by departure time, flight duration or airline, in addition to the typical default sort by price.
Truncation is turned on automatically, but can be turned off, which is quite useful when you want to find the coverage data for TIME magazine without being buried among the various Times listed on the top of the list.
Actually, if you turn off truncation, the matrix pops up immediately, showing not only the coverage data, but also the LC and Dewey class numbers. One advantage of the SFU version of jake is that it identifies not only the database, but also the online service, which is especially useful when five listings of a database would not let users know exactly which version of MEDLINE the data applies to.
A database of such excellent content and smart software is exactly the kind of project that should get massive support from the government, and acknowledgment from the profession.