Title: SportDiscus
Publisher: Sport Information Resource Centre (SIRC)
URL: http://www.sirc.ca/
Cost: To be negotiated
Tested: July and October 2004
While SportDiscus at first looks like a robust database, the reality is that it has a lot of duplication and inaccurately and inconsistently described records, which result from binging on a variety of acquired and partner databases. This is true even for the "normal" versions of the database hosted by Ovid, SilverPlatter and EBSCO.
The Dialog version makes the condition of SportDiscus even worse by supersizing it from a 730,000 record database to a nearly 1,468,000 record monster by apparently double-loading the records. What are annoying duplicates in the "normal" versions become aggravating and vexing quadruplicates in Dialog, wasting the users' money and time in trying to clean up the results and make sense of this nonsense. In addition to the absurdly uncontrolled record intake, the Dialog version also shows pervasive symptoms of Alzheimer's disease by not recognizing valid terms when browsing the thesaurus. The PR materials should make room for a label with the paraphrased warning of the Surgeon's General: this database can be dangerous to your mental health and reputation.
I am accustomed to high-quality ready-reference products originating from Canada, even when they are free, such as the Canadian Encyclopedia Online and the Encyclopedia of Music in Canada. SportDiscus (which recently changed the spelling of it name from SPORTDiscus) is presented on the homepage of SIRC almost as the best thing since sliced bread, but it is hardly edible.
There are a number of fee-based sport databases available. There is the relatively small AUSPORT maintained by the National Sport Information Centre of Australia (NSIC), which has an Australian focus, and the indexing-only Physical Education Index of CSA-IDS. It covers about half the journals purportedly covered by SportDiscus and has about a quarter-million records, i.e. about a third of what the "normal" versions of SportDiscus contain.
There are also the free sport-related citation databases that are much better (although much smaller), such as the French Heracles database with more than 100,000 records, the SportScan database of NSIC with about 20,000 records, and the Amateur Athletic Federation's (AAF) database that has not only indexing/abstracting records, but also 40,000 full-text searchable documents.
Then there are the open-access government databases, such as PubMed and ERIC, that have many sports- and fitness-related records. The digital facilitators, like MetaPress, Ingenta and Highwire Press, make freely available the bibliographic citations and abstracts for millions of articles, including a significant number of scholarly articles dealing with sport medicine and psychology.
The knee-jerk reaction to all this would be that SportDiscus streamlines and integrates the records available in the free databases into one source and applies a controlled vocabulary to facilitate easier access for reasonable costs. Hold your knee — this is not the reality.
SportDiscus is undoubtedly the largest database dedicated to sport, fitness and physical education. The recently updated about page claims that the database has more than 700,000 qualified references. Indeed, a test in the Ovid version corroborates this claim, showing nearly 730,000 records. However, bigger is not necessarily better, as we shall see. As for the qualified nature of the references, I can't endorse it because samples have shown that the database has many duplicates, and very often one — or both — of the records are inaccurate and/or incomplete.
This large size could justify the sample result on the new homepage that shows the superiority of SportDiscus over Medline, PsycINFO and ERIC in finding records about some topics, such as nutrition and active youth and nutrition and active older adults. I would not have doubted this claim (after all these topics are not closely related to education and psychology), but running some simple searches revealed that all three databases returned more results than SportDiscus, or at least yielded an equal number of hits. Unfortunately, SIRC does not indicate how it arrived at its published results.
So, I ran a simple query as the average user would by just AND-ing the component words together: "nutrition AND active AND older AND adults." The SIRC brochure shows the following hits: ERIC (30), Medline (254), PsycINFO (25) and SportDiscus (314). My simple search resulted in ERIC (9), Medline (14), PsycINFO (25) and SportDiscus (6).
I am not an inductee in the Super Searchers Hall of Fame, so I won't claim that my search was exemplary. However, in an effort to step up to the level of the avid users of SportDiscus, I included synonyms and used truncation for the older adults component — such as elderly, senior$, mature, aged (the preferred politically correct descriptor of SportDiscus) — and dropped the word adults. Medline and PsycINFO still produced more results than SportDiscus. Of course, quantity does not make quality, but quantity was the measure of choice by SIRC. (I did not use synonyms for the nutrition and the active components of the query.)
I used a similar strategy for the other s ample topic chosen by SIRC. Once again, all three of the other databases yielded more results than SportDiscus. And, for the synonym-enhanced version, two had more than SportDiscus. I really would like to see the exact search strategy that demonstrates the overwhelming superiority of SportDiscus.
Composition
SportDiscus includes records created by its staff, along with records from Heracles, AUSPORT and the AAF databases mentioned earlier, as well as from a few other databases. It also includes records created by sports information centers around the world.
Coordinating such a variety of inputs can be a daunting task. You can get some insight into the process by reading the open access paper by Jean-Michael Johnson, the director of indexing services at SIRC, that was presented at a conference in Lausenne, Switzerland. He felt he did not need to explain to the audience why rigorous standards enhance the value of information because he assumed that the regular users of the SportDiscus database "probably understand full well that each and every record represents a rigorous editorial process that evolved over the more than 28 years the database has been in production." Nevertheless, he assured those who are less familiar with this rigorously edited database that they "may expect that all databases contain quality material organized in a logical fashion."
I could feel his passion, as in my salad days I was also in charge of developing and maintaining an abstracting/indexing database and struggled to "ensure consistency in indexing so that the same topic is uniformly described over time and across records."
Nevertheless, I was somewhat skeptical regarding indexing consistency because I did not experience it in the few hundred duplicate records (the best litmus test for checking consistency) that I ran into earlier this year when teaching a database-searching course using Ovid and Dialog. Actually, the record pairs showed more than the usual level of inconsistency and sometimes not only in indexing terms, but also in author names.
With subject indexing terms it is always possibile to argue that all of the terms are appropriate. But there is no chance that two names are correct. The author's names are either Hervey and Knibbs or Heruey and Knibles. One cannot always hope to find the source to decide which is the correct name, although in this case I was lucky. In this very typical SportDiscus record pair there were a total of 21 subject index terms assigned (nine and 12, respectively), but only five were common in both, yielding an unusually low level of inter-indexer consistency. More about duplicate and quadruplicate records later.
SportDiscus has truly international coverage, although few users may get excited to learn about the availability of articles in Azerbaijani, Bengali, Hungarian, Icelandic, Latvian, Marathi, Burmese, Malay, Macedonian, Moldavian or Swahili. Furthermore, many of the languages are represented by a single record, so even those who read well in these languages may not benefit from their talents. While some of the records that do not have a language code may refer to articles in one of the singleton languages, more than 80% of the documents are in English, while French and German represent another 15%.
The database has a good mix of document types and the dominance of journal articles is clear. SportDiscus (which is primarily recommended for academic and medical libraries and researchers) covers many of the scholarly journals, but the proportion of articles from general interest magazines is excessively large. By far, the largest contributor is Sports Illustrated with 17,580 records. Users are advised to exclude from their search Sports Illustrated if they pay by connect time/resource use and by records displayed, because they can find not only the citation, but also the full text and images on the publisher's Web site for most of the articles — at least for the current years.
The database claims to go back to 1830, but it may as well have claimed 1573 as the starting year because there is one record from the cinquecento. True, there are four records from 1830, but then years with a single records are in abundance for the next 100+ years.
Documents are assigned a code to describe their intellectual level: basic, intermediate and advanced. This is a good idea, but the code assignments are not completely reliable, even in obvious cases. Fifty-seven percent of the source documents belong to the basic category (a too large proportion), 15% to the intermediary level and 28% to the advanced level. Duplicates in the database show the contradictory level assignments. For example, the pair of records for the same article list it once as basic and once as advanced.
Size
The more than 700,000 records in the database are impressive at first, but after even casual use it becomes clear that there are many duplicates in the database — with erroneous, incomplete and conflicting information.
Users who access SportDiscus through Dialog may feel that they got lucky by having twice as many records as the users of Ovid, EBSCO and SilverPlatter versions. Their joy will be short-lived when they realize that Dialog managed to load almost every record twice. A search of all records returns a count of 1,468,329 — almost twice as many records as you would find in the "normal" versions of the database.
Sampling any search results after sorting the records by title will bring up the 700,000+ duplicates. These are in addition to the duplicates that are in the "normal" implementation, as shown by these records from the search on steroids. If you had a duplicate for a record in the other implementations you will have a quadruplicate in the DIALOG version, as is the case for an earlier test record. In a number of searches I even found records that seemed in the short format to be quintuplicates, but which turned out to be a duplicate and triplicate of an article, and a conference paper by the same author with the same title and abstract.
Duplicate detection is not and cannot be perfect. There are false duplicates, such as regular columns with the same title by the same author. So the duplicates in SportDiscus may be somewhat fewer than the de-duplication result shows. Then again, many genuine duplicates are not identified as such because one record may have the author's name misspelled, or some other data element used for duplication detection are not identical in two records due to different punctuation, etc. This is true for every database.
The "normal" implementation by Ovid shows that SportDiscus has duplicates for more records with steroid and its variants in the title and/or abstract than ERIC or PsyINFO. This duplication ratio is dwarfed by the Dialog implementation where the duplicates make the result set almost twice as large, suggesting that almost every record has at least one duplicate pair.
Record Content
About 30% of the records have abstracts, which is in sharp contrast with ERIC, PubMed and the scholarly publishers' archive where 80-85% of the records have abstracts (and are free).
Unfortunately, many of the records appear in all uppercase, which is known to make scanning and reading the results more difficult. More importantly, many of the records have inaccurate and incomplete information in the bibliographic description. This becomes obvious when duplicate and triplicate records appear next to each other in the result list.
The journal names, their punctuation, abbreviations and chronological-numerical designations are more inconsistent and erroneous than usual. A relatively simple journal name shows the typical variety for the name that you must contemplate when searching by journal name. Luckily, the Ovid version allows searching not only by exact names, but also by words in the journal name.
The number of duplicates is very disappointing, as the director of indexing services elaborates on this in his paper by emphasizing the practice of verifying content "to ensure that no records in the new submission are already in the database." According to the director "great diligence is crucial in order to uphold the reputation of the database as a consistently dependable source for high quality, authoritative information." Indeed, great diligence is crucial, but it seems to be lacking in SportDiscus.
The new pamphlet uses a new buzzword: robust references. I don't think that robustness is the word that comes to mind when looking at records in SportDiscus. I could not decide which of this record pair was more robust, until I scrolled up and found another version with author names in all uppercase, so I thought this may be referred to as robust. Then again, the mixed case author name seemed to be more robust to me simply because it did not misspell the name of one of the authors. While uppercase and lowercase makes no difference for most of the search software, hyphenation and abbreviations are important and are expected to be consistent. I was hesitant again about the robustness issue when I encountered this pair of duplicates. I felt that accurate references would be better than robust ones, no matter its meaning.
Somehow, consulting the result lists in SportDiscus always reminded me of players on a hastily recruited team who lack uniforms and don't march to the same drummer — in spite of all the PR talk that conjured up images of synchronized swimmers in matching outfits with perfectly matching gestures and smiles.
Thesaurus terms
The biggest problem with the record content, however, is the inconsistent indexing. This is not surprising, as the thesaurus is less than perfect. And if the indexers use the thesaurus in the Dialog implementation, then they are doomed to failure, as will be discussed in the software section.
While I agree with what the director of indexing services said about controlled vocabulary, a thesaurus is much more complex than a flat, non-hierarchical list of words that includes the terms approved for indexing. It is quite a large thesaurus because it includes not only topical descriptors (subject headings in SIRC's parlance), but also thousands of personal names, geographic names, corporate names and some other entity names. Many of the descriptors also appear in French, but you can't use them interchangeably because the number of records would be quite different for most of them. To add further confusion, many of the French descriptors have zero postings.
Ever since I saw in Ulrich's International Periodicals Directory that the Bowling Magazines' members-only publication, WB...For the Woman Who Bowls had a circulation figure of four million, I had new respect for bowling. (Although after I voiced my impression of the figure, Bowker deleted the circulation number from the entry.)
With that said, I find it odd that in the sports thesaurus there is a term for "woman bowler" but none for "bowler"; let alone "man bowler" or "male bowler." (If you're wondering, the SportDiscus thesaurus prefers singular terms.) Then again, there are descriptors for the various bowling associations for women, but not for men, so I'd better study this issue more closely before passing judgement.
However, I am less hesitant to voice my opinion that it is not appropriate to create a thesaurus term like "women's bicycling voice." Not only because bicycling men should also have a voice, but because this term is used for a position in the Canadian Velo NB Board in addition to the more traditional positions, such as chair, treasurer and secretary. The scope note did not help. Google only finds one record with this term, and that should tell you something.
There are hundreds of odd terms in the thesaurus, and even more in the records as descriptors. Apparently in SportDiscus, presence of a term in the thesaurus is not a prerequisite for using it in records. Even though the thesaurus does not include a term for kiteboarding, the term has been used since 2001 in several records. You can also see from the previous sample that there are records that have no subject heading (descriptor) assigned.
I was also surprised to see many personal names in the thesaurus. I am not against using Pierre de Coubertin as a descriptor, but most of the names have only one or two record(s) associated with them . Including misspelled versions of personal names in the thesaurus is not much help for the users, either.
I am perplexed by some of the geographic descriptors, their position in the thesaurus hierarchy and by the absence of some. While SIRC's internationalism may be appealing in this eu(ro)phoric "we-are-the-world" atmosphere, when it comes to implementation, SIRC shows very strange geographic sensitivity and little knowledge.
For example, there are 23 records with the descriptor Ivory Coast. Fourteen of them are in French (understandably as it is a Francophone country) and nine in English. Most of them don't even have an English-title translation. The thesaurus, however, shows 23 records for Ivory Coast and no records for Cote d'Ivoire. It suggests that the descriptor is approved, but that no records have been assigned to it. (Actually one record had, but with a different spelling and another one had it as a misspelled subheading.) This is mightily confusing for the user and questions the validity of the fairy tale claims of the rigor in quality control and of streamlining records imported from external databases.
There are serious errors of omission and errors of commission for the narrower terms of South East Asia. The SIRC thesaurus lists under this region as narrower terms Pakistan, Nepal, Bangladesh and Sri Lanka, but it omits Brunei, Burma (Myanmar) and East Timor, even though there are records for all of them and Burma appears as descriptor. It does not list Ceylon, the former name of Sri Lanka, even though it also appears in records as descriptor. Former names of countries really should be listed among the geographic subject terms (if current names are included) although not in the thesaurus. After all, this database goes back to 1830 and Ceylon changed its name only in the early 1970s. With a little serendipity you may find former country names, some of the time.
Thesaurus hierarchy
Many of the narrower and broader term relationships defy any logic and common sense. Rugby has team sport as a broader term, but handball does not. Basketball is a narrower term under contact sport , along with wrestling and a few others. Some over-enthusiastic NBA players may have tried to turn the game into a WWE wrestling show, but that does not make it a contact sport. For karate contact sport is not listed as a broader term, although its French version, sport de contact, is — but it has no posting.
Kickboxing is not a preferred term, but at least it shows the preferred term: full-contact karate. Maybe that's why for karate contact sport is not listed a as broader term. Maybe semi-contact could be assigned in the next update of the thesaurus, along with the French demi-karate.
Tae kwon do is the preferred spelling for the popular martial art in the thesaurus, but there is no cross-reference from the common one-word spelling of taekwondo. Ironically, it appears in records as a single-word descriptor, and in the abstract and title fields it appears 315 times as one word and only 173 times as three words. In yet another twist, tae kwon do has hwarangdo both as a narrower and a related term to discombobulate the users. It is icing on the cake that strategy is designated as a narrower term for tae kwon do, as well as for more than 40 other descriptors, from aikido to lawn bowling and from lutte to yachting. This has a devastating effect on a popular strategy used on all the software platforms that host SportDiscus.
There are five online services currently hosting SportDiscus: Ovid, SilverPlatter, EBSCO, DataStar and Dialog. None of them can alleviate the pain caused by the incomplete, inaccurate records, the absurd thesaurus design and descriptor assignment practice. Ovid goes the farthest in trying to hide the warts of the database with its intuitive, informative and visually appealing thesaurus navigation, but it's like getting alternatively-dressed teenagers into haute couture tuxedos on their prom night.
None of the software can tackle the consequences of assigning generic, narrower terms as subsets to specific terms. Tae kwon do is assigned to 627 records. Its "narrower" term (according to the thesaurus) strategy is assigned to 6,919 records and it has several narrower terms, such as defence and offence, assigned to a few thousand other records.
If you check the explode box of tae kwon do, it will run a search OR-ing together all of its narrower terms, including strategy and all of its narrower terms. This is the universal interpretation of the explode function, a smart function if indeed narrower terms appear in the thesaurus. The result is 12,470 "hits". The vast majority of these have absolutely nothing to do with tae kwon do. Pity the users, and the librarians who need to explain what happened.
Dialog adds thick layers of confusion to the database by not only doubling the pervasive duplicates, making them quadruplicates, but also by showing signs of Alzheimer's disease when not recognizing valid thesaurus terms as such.
Lesbianism is a thesaurus term in Ovid's implementation of the thesaurus, but Dialog draws a blank and shows the beginning of the thesaurus — giving a taste of typos for simple terms and corporate names alike, such as the A.C. Nielsen Group spelled as A.C. Nielson. The nearly half-million postings for the descriptor "A" is particularly puzzling. I know that the letter "A" can be part of a compound descriptor, such a Vitamin A, but this is not the case here.
And while Ovid displays both the broader (Eastern Europe) and the narrower (Budapest) terms for Hungary, Dialog does not recognize the country as a thesaurus term, let alone its capital, which Ovid finds. Interestingly, Dialog does find Eastern Europe as a thesaurus term and lists Hungary as a narrower term, but when you click on the country name it still rejects it as a thesaurus term.
Likewise, religion is a valid term in Ovid's version, but Dialog does not recognize it. Nor does it recognize Buddhism and Judaism. Surprisingly, Dialog recognizes Christianity , a narrower term of religion, but clicking on its broader term still does not show religion as a valid thesaurus term. It can recognize the narrower term Muscular Christianity and fetches 77 records, showing pseudo muscularity by the duplicates. Ovid has 38 records for this term. For good measure Dialog recognizes Islam, but has more than twice as many "hits" as Ovid, due to double-loading the records.
This database reminds me of the Information Science Abstracts database, which showed the same signs of lethal deficiency. It also imported records from other databases, including some English-language Russian databases. It had duplicates and triplicates (I estimated the number of duplicate pairs to be 12,000) and an embarrassingly inappropriate controlled vocabulary.
SportDiscus may face the same fate, especially its implementation by Dialog, which charges $70 per connect hour and $1.90 per record. I don't know who is responsible for the double loading of the records, which is the equivalent of administering twice the amount of medication. Nor do I know why the sorry thesaurus looks worse in Dialog than in the other software implementations. But I do know that it is unacceptable to pass on to users the burden of wading through duplicates, triplicates and quadruplicates of records, looking up variants for figuring out what gives. These activities incur out-of-pocket expenses while the meter is ticking, and the per item record fees, which keep increasing — not to mention the frustration of and the time spent by end-users, librarians and other information professionals. It adds insult to injury that users are kept in the dark about valid thesaurus terms. This databases requires a series of emergency operations or it will soon end up in the database morgue.