
Title: CSA Illustrata
Publisher: CSA
URL: http://info.csa.com/csaillustrata/
Cost: Price to be negotiated
Tested: December 25, 2006 – February 4, 2007
There have been relatively few innovations in indexing/abstracting (I/A) databases in the past 40 years after this database category started to become an important and efficient digital resource for finding scholarly and other publications. CSA Illustrata is a major innovation. Beyond the relatively minor and much less important enhancements such as adding the authors' e-mail address, the publishers' and journals' URL, I recall only four major innovations by content producers. (The use of structured abstract and digital object identifiers are useful but they originated from the scholarly publishers, and have been adopted only by very few I/A database producers. The same is true for Open URL linking, which came from computer scientists, especially from the SPARCling (er, sparkling) Herbert Van de Sompel.
One of the four major innovations by I/A content producers was the launch of the Citation Indexes by Eugene Garfield by adding cited references to traditional bibliographic indexing records about scholarly articles. There have been no peers to these databases even in a discipline-specific field until 2000, when APA launched its project to enhance records in PsycINFO with cited references. (This move was triggered by the very short-lived e-psyche database whose creators promised a rose garden but what they delivered after constantly cutting corners wilted faster than un-watered flowers).
A little later, the producers of CINAHL (acquired since by EBSCO), and the CSA Technology Research Database started to add cited references in significant volume, followed by enhancements of the Sociological Abstracts and WorldWide Political Abstracts databases. Simultaneously, a few of the host systems added a very important feature, displaying the number of times the item has been cited. Some of the journal publishers or their digital facilitators also jumped on this bandwagon, but few can do it well.
The other important innovation in the I/A database arena was A Matter of Fact database of Pierian Press (hosted as FactSearch on OCLC but not updated since mid-2006 — and will not be).
It has been covering articles, government agency reports, transcripts of congressional hearings and press releases which have statistical and other factual information. These factual sentences are extracted and fused to create an "abstract" in the non-standard sense. The page numbers of the cited stats are identified to make it a perfectly quotable ready reference source. It was a great idea, the database was one of the reasons I started my Picks and Pans column, but it barely received the recognition it deserves.
The third significant innovation was the TableBase database developed by Dick Harris of Responsive Database Services (RDS) about 15 years ago, available from RDS, the content producer, as well as from Dialog and Thomson Gale.
It provides in tabular format the content of charts, graphs, bars and other graphical information embedded in articles and other documents surveying, evaluating, comparing companies, products and a variety of business services. The search template offers many limiting options by a variety of criteria. Searching for the market share, market size and other market aspects of drugs for treatment of multiple sclerosis shows the articles which have factual information. The record for the nearly 1,600 word article is very simple. It gives the facts in a simple table format converted from the pie chart in the original article. It was a very efficient format and solution when illustrations were not used in online databases simply because of the low bandwidth.
The fourth innovation is CSA Illustrata, especially appealing to the visual-learner types of any age, and for any of the millennials with an interest in scholarly literature. All four innovations have the common trait of focusing on the delivery of informative summaries, the essence of the works abstracted with the Just The Facts, Ma'am attitude, but CSA goes a giant step forward by including the illustrations of the scholarly articles in different size for the different stages of the search process
Yes, there are huge image databases, and you can easily find in the Google Image database nearly half a million pictures of Paris Hilton in less than a second if you really need to, but for the query looking for the "Hilton in Paris," Google Image seems to be stumped, to say the least, as you can see in this (non-X-rated) screenshot.
While the most apparent content novelty is represented by the illustrations, the traditional content elements still deserve a few paragraphs.
In its debut edition, focusing on natural sciences, CSA Illustrata offers close to 1 million richly indexed illustrations extracted from about 165,000 articles published in about 900 journals from Blackwell Publishing, BioOne (the digital facilitator for many smaller publishers), Oxford University Press, Allen Press, BioMedCentral, the National Research Center of Canada (NCR) Press. I/A records and illustrations for articles in journals of Taylor & Francis, IOS Press, EDP Sciences, IOS are to be added in the coming weeks. More than 60% of the article records, and 59% of the images come from journals of Blackwell Publishing whose digital collection I just reviewed last month.
Suffice it to say here that although Blackwell has a worthy collection with good software, it has no options to search in the captions of illustrations, or even limit the search to articles with illustrations, let alone to search illustrations by object-specific descriptors, taxonomic terms or geographic subject terms- which can be done with the 100,000 articles and more than half a million illustration records from Blackwell-Synergy currently in CSA Illustrata.
The journal source list does not include some of the journals from Elsevier, Springer and John Wiley, but I did find a few thousand article records with illustrations from these three of the largest scholarly publishers as well.
Importantly, there are several thousand illustration records from the highest impact factor multidisciplinary journals and other serial publications, such as Science, Nature and the Proceedings of the National Academy of Sciences.
It is apparent at a glance that on average there are more than six illustrations per article items in the database. Obviously, there are ones with a single illustration, and there are ones with more than a dozen of illustrations. A quick visual scan of the results list for any query will show the typical spread in terms of number of illustrations.
At the launch of CSA Illustrata, it covers the past ten years, with the bulk of the articles (88%) from 2003 to 2006. This is made clear in the PR material, which explains in the date of coverage section that "the majority of objects (and implicitly, the articles) date from literature published from 2000 forward". Importantly, the Natural Science module of CSA Illustrata itself is planned to be updated with about 150,000 images each month, so it is to double its current size by the middle of 2007, by both broadening the journal sources, and deepening the breadth of coverage for the 1997-2002 period. Ten years seems to be sufficient, as in many of the disciplines covered by the Natural Science module of Illustrata, the average cited half-time is less than 6 years.
The composition and sources of CSA Illustrata at the document type level is simple. Practically all the records are for articles from English-language articles. The four French-language and 27 German-language articles probably were included only to test the diacritical characters, or demonstrate Illustrata to French- and German-speaking researchers. There is one article record for each article, and as many illustration records as there are figures and tables in the articles.
There are digital object identifiers (DOIs) for 86% of the articles, an unusually high proportion, even when considering the fact that 90% of the articles are less than 4 years old. This is important because article DOIs are the most reliable, unambiguous data element in looking up the digital holdings of a library to find out if the journal issue is available for the patrons of the library in the digital collections of the journal publishers to which the library subscribes to, such as Blackwell-Synergy, the Oxford University Press Journals, etc. Abstracts are available for 94% of the articles.
As for special terms, there are taxonomic and geographic terms assigned at the article level to 27% and 11% of the records, respectively. That seems to be a reasonable ratio as only some of the science articles can be assigned taxonomic terms and the geographic location of the study are relevant only in a small proportion of the research papers.
The fact that 98% of the records for the articles have author names, 100% of them have journal names and publication year, and 99.7% have ISSN is very important from the perspective of algorithmically constructing OpenURLS to look up the availability of the complete documents in one of the full-text databases subscribed to by the library, such as Thomson Gale's Academic ASAP, ProQuest Research Libraries, H.W. Wilson General Science, or EBSCO Academic Search Premier. Even if you don't have digital access to a publisher's journal collection, these aggregator databases may deliver the full-text articles chosen from an Illustrata search.
And here comes one of the reasons to sing the glory of CSA Illustrata. Many full-text databases omit the illustrations and provide only the text portion of the document. In the best case scenario, they may reproduce in plain text the simpler tables, include the captions for the more complex tables, charts, photographs and the diagrams, or at least warn the users that figures are omitted. Some databases just skip the illustration as a matter of policy and remain silent about it. Very often the article cannot be understood or is almost useless without the illustrations.
There are 912,970 illustrations in the debut edition. To my surprise, 100% of them have their own digital object identifier. An article with four illustrations would have a DOI for the article itself, and four DOIs, one for each illustration. Interestingly, there are a few records which have DOIs for the illustrations but not for the article itself, such as some NCR Press journals.
All of the illustration records carry the title of the article it was extracted from, its year of publication, the name of publisher (99.8%) and the journal as well as its ISSN (99.7%). This may seem to be redundant, but is not. If you pick an illustration to be incorporated into a research report or a PowerPoint presentation, the availability of these data right in the illustration record come very handy for appropriately crediting the source.
Almost all the illustrations have one or more object identifier(s), object descriptor(s), a caption, and one or more illustration category types, to describe the type of illustration. These are of key importance for high precision searching for the right illustration in terms of topicality and illustration type.
Taxonomic terms are available for 16.5% of the objects, half of what I found at the article level. This is understandable, as the illustration itself may not be of or about, say, a species even if the article is. The same is true for geographic terms, 4.6% of the illustrations have them versus 16.2% of the articles.
There are two top categories of illustrations (tables and figures). There are four major category terms (graphs, maps, photos, and {other} illustrations). The latter without the adjective may be confusing, that's why I added the qualifier in parentheses.
Illustration is the broadest term, and the source for the namesake of the database which was well chosen as it can be fairly easily pronounced and even comprehended without a Ph D in linguistics or being just a polyglot.
There are more than 50 illustration category types. Luckily, the category type index can be browsed, and the term can be picked. Even more luckily, the category types are word indexed, so chart is sufficient and would retrieve area charts, bar charts, pie charts, and flow charts, even though the first three are under graphs, and the last under the illustration category. The same is true for diagrams, cluster diagram, rose diagram, vector/raster diagram belong to the graphs category, but Venn diagrams are in the graphs major category. In the print brochures this is clear but in the index they are listed alphabetically, not classified under the four major categories.
Line graphs, bar charts/histograms, time series plots, scatter plots and photomicrographs are the most commonly used specific illustration categories. Within the maps major category bathymetric, topographic and geological maps are in the minority compared to study site maps, of which there are more than 5,000, almost an order of magnitude larger subset than the other three map types together. That's good news because bathymetric, topographic and geological maps of an area are easier to find on the Web but not the study site maps which may not even have a name. The nearly 16,000 transmission/emission images may represent less than 2% of all the illustrations but they are important for medical research as they are mostly X-rays and MRI images. Equally important are the 6,700 gene and protein maps and sequences.
As for the quality of the illustrations, it is not that much in the eye of the beholder than in the capabilities of his/her monitor, printer, paper and in the one-too-many-times-shaken printer cartridge (when you imitate your favorite bartender, to squeeze the last drop of ink from an overused cartridge).
Even if the original illustrations in the glossy print edition of the journal would make Amy Lebowitz to salivate, and Illustrata would reproduce them in the same color depth and resolutions, your $299 monitor, and $99 color inkjet printer, or $79 monochrome laser printer cannot do justice to them. Based on my samplings, it is safe to say that the majority of the illustrations are monochrome, anyhow.
In most of my tests the illustrations were as good as displayed, printed straight from the publishers' digital archive, better than the average images from databases with paqe images, and much better than the copies I received through the ever more sorry, sloppy and expensive document delivery services of the British Library or Infotrieve (which work together hand in hand so the alternative is often Hobson's choice).
Illustrata is built on the CSA-Illumina platform which offers very good features for searching and for most of the output functions. The browsing capabilities should be extended to additional data elements and also enhanced.
Currently there are four browsable indexes: by author, journal name, object category name and object descriptor. All of the indexes should show the postings information (i.e. to indicate in how many records the index term appear). These can orient the users in formulating the query.
Browsing the index entries also helps in spotting variant formats in many data elements: such as author names with different spacing of first and middle initials, journal names including or excluding the subtitles, using 'and' or '&', including or omitting the French version of the titles of Canadian journals, assigning object descriptors both in singular and plural formats such as sandy bottom and sandy bottoms.
In some cases it may significantly broaden the result set if all the variants of a journal name are picked for inclusion in the query. In other cases the contrary is true. In my tests the spelled out versions of some of the journals did not find any matching entries (such as BJOG: The International Journal of Obstetrics and Gynaecology, the latest name format, while the query BJOG with and without the truncation symbol retrieved 890 records. It is another question, that the index entry British Journal of Obstetrics and Gynaecology picked up 23 additional items from 2003 because the old name used until 2001 was entered without the acronym, even if it was superceded by BJOG: British Journal of Obstetrics and Gynaecology, then the adjective British was replaced by International. (I know that it is hard to be a serials cataloger or indexer. I stopped dealing with serials automation in the 1980s when I felt that handling serials is like trying to nail jelly to the wall, they keep changing their title, subtitle, parallel title, place and frequency of publication, publisher, ISSN and every data element you can think of.)
The object category names are consistent, accurate and available for all but 300 items, they just would need the posting information for orienting the searcher. For example, the vertical section photograph is used only for 24 articles, so seeing the posting information in the index, the user would not be surprised to find no hits if the subject search is restricted to items with such illustration type.
Making browsable the indexes of several other data elements, such as the statistical terms, taxonomic and geographic terms would help the users to spot query terms important for focusing the search, for example, to papers which use specific statistical methods, and/or show the findings in cluster diagrams or histograms or as time series.
The search options are excellent, from the intuitive, matrix format query template, to the variety of searchable fields, including the ones related to the illustrations. Some of the professional online information services allow the users to limit the search to items which have illustrations, such as Gale's Infotrac, H.W. Wilson is one of the very few systems which allows the limiting of the search by types of illustrations. In Ebsco's Image Collection you may search in substantial captions of photos of people, places, historic events, but these are standalone images, and in the natural science categories the photos I have seen are more like nature photos Only CSA offers searching by a variety of illustration-specific indexing terms which appeared in scholarly articles to which the illustrative materials are linked.
The strong point of Illumina in the output options has been the customization of the output content, and the exporting of the results to RefWorks which was born as a Web-tool, and offers the most sophisticated importing and post-processing features. In CSA Illustrata the handling of illustrations steal the show.
In the short result list there are pinky nails images (to borrow the illustrative term for tiny thumbnails from Diane J Hoffman, Senior Director at CSA, and the editor of this unique database). These –along with a partial list of the object descriptors on the right margin- are meant to give a hint about how many and what kind of illustrations there are in the articles.
Then the thumbnail images of the figures and tables are displayed, along with the various descriptors and the traditional bibliographic data.
Hovering above the thumbnails will show a metadata cloud or bubble: the caption, the illustration types and the subject terms assigned to the object. Clicking on the thumbnail image will display a larger version to see the details of the chart.
While most of the illustrations at that level can be easily consulted, sometimes the legends, the axis and other labels are blurred because of size reduction. This is the case with this highly informative vertical section illustration. There is yet another layer which is displayed when clicking on the "show original" link. This is most often needed with long tables, which are otherwise not legible because of the reduction. The original image shows a somewhat larger version by itself (i.e. without metadata), which is typically legible.
I did not often need the largest version , and I don't have 20/20 vision. In a few cases, when the table is very long and/or wide, as is the case with the research ranking of UK universities with 97 rows and 20 columns, even the original imposes some strain and requires both horizontal and vertical scrolling. The monster tables can be a headache when a column value wraps around to a new row and the position of the rest of values gets shifted and unaligned.
It could help in this scenario if CSA would allocate a larger part of the screen to display the images, by narrowing the margins and thus increasing the frame within which the illustrations are displayed, no matter what Web designer would say. For large illustrations, the size of the frame within CSA could be larger and the capture be optimized for this larger frame.
Also, for wide but short illustrations, rotating the image for capturing and when displaying may be an alternative. It is worth trying on a sample even if looking at rotated images may make you look as if you had spent too much time reading the New York Times on microfiche or watched a sob story on one of the talk shows with your head tilted in the I-feel-your-pain sympathy pose.
Studies since the 1970s have shown how much the comprehension of concepts and processes can be improved by good illustrations. With today's prevailing high-speed information transfer technology, and much increased feasibility of and interest in hyper-intense visualization, the timing of the introduction of an indexing/abstracting scholarly databases enhanced by illustrations seems to be perfect. If a contemporary remake of Dragnet were released, Sergeant Friday's line certainly would be changed to: Just the Facts, Ma'am, and the charts, the graphs, the tables and the photos. And that's what CSA Illustrata delivers.
— Péter Jacsó