Title List Changes

New Titles

Outside U.S. and Canada

Customer Center

Product Center

Free Resources

Reference Reviews

Péter's Digital Reference Shelf
May 2006


Title:Windows Live Academic
Publisher: Microsoft
URL: http://academic.live.com
Tested: April 14-18, 2006
Cost: Free

The Context

Indexing/abstracting (I/A) print publications and databases has been the staple of every library in order to provide information about the literature. Their sizes range from 15,000 to 17,000,000 records, with prices from $500 to $5,000 per year. Seemingly, it is good news that there is a new I/A database, Windows Live Academic (WLA), with purportedly six million records about journal articles and conference papers primarily in physics, electrical engineering, computer science and related fields, such as information science. The service is free and often links to the full text of articles and conference papers (mostly accessible to members of subscribing libraries). In reality, this is not a big deal in 2006, especially from Microsoft, for several reasons.

There are many large, open access I/A databases by the government in several disciplines (medicine, agriculture, education, criminal justice and transportation). Some of these have better software and many have better content, such as large integrated or linked open access full-text collections, than their subscription-based counterparts.

More importantly from the perspective of WLA, there are huge open access databases in physics, computer science, economics, library and information science, and — to a lesser extent — in engineering, often with substantial open access full-text collections. In addition, Elsevier's free Scirus database has been offering a multidisciplinary I/A service for journal archives, repositories and (less successfully, for purportedly scientific information of individuals' Web sites) for years. It is far larger and far better at finding open access bibliographic records (with a higher ratio of open access abstracts) than WLA on every subject. In Scirus, there are more records in the ScienceDirect subset alone than in all of WLA and the total number of records for journal articles is 25 million.

The Content

As for the content of WLA, I looked at the subject scope, size of the database, the record content provided and the publishers that offered access to their archives, as well as the number and type of journals and conference proceedings covered.

Subject Scope

Microsoft deserves credit for admitting that in the initial beta version of WLA only the fields of physics, electric engineering and computer science are covered. Actually, the scope of coverage is broader. You will find records for publications in the field of medicine, nursing, life sciences, psychology, sociology, economy, women's studies and in a variety of fields within the arts and humanities. This is obvious even if you only look up the list of journals covered, whether you glance at the beginning, the middle or the end of the journal list.

Sample searches confirm the multidisciplinary coverage. You can find more than 500 records for articles and conference papers about toxoplasmosis, more than 7,000 about multiple sclerosis, nearly 40,000 for the word psychology, 300,000 for the word education, and more than 10,000 for the word honesty. Some of these may be in the context of physics, engineering and computer science, but the vast majority are not from those fields.

Database Size

WLA claims to have about six million records. There is no foolproof way to determine the exact number of records (as is possible in most of the professional databases), but it seems that Microsoft increased the size by 50% in its announcement.

My tests searches, using the most common words in the full-text of the WLA records, clearly indicate that the actual number of records is more likely to be below four million. If I extrapolate the number of duplicate records I found in my first tests, the total number of unique records is even less.

There are many duplicates (as there are in Scirus and Google Scholar), but they are not easy to spot because they are scattered in the result list, which is strange for allegedly relevance-ranked records. They may not obviously appear as duplicates because of incomplete and incorrect data in the record-pair, such as the omission of one author in the second record and the wrong publication year in the first record in this pair. They appear juxtaposed only because I did an exact known item search.

Looking at the source information of these two records, one can't help but get concerned. How could the software extract the wrong year of the first record, ignore the second author in the other record and extract only his surname ? These represent a problem in Scirus and Google Scholar, but not in the CiteSeer database, whose crawlers do the best job in every regard.

Record Content

The records include the usual bibliographic information, chronological-numerical designations of the source documents and the Digital Object Identifier (DOI) of the articles and conference papers when available.

It is deeply disappointing that the indexing software apparently can't reliably determine the availability of abstracts. Seeing very often the false claim that the "abstract is not available," is alarming, knowing, for example, that the majority of articles in the Journal of the American Society for Information Science & Technology do have abstracts and they are clearly labeled as such in the source. Once again, as you can see in the previous screenshot of the CiteSeer database, it correctly recognizes, collects and identifies the abstracts.

Source Coverage

Microsoft claims to have collected "more than 6 million records from approximately 4300 journals and 2000 conferences" in the fields of computer science, electrical engineering and physics.

Microsoft does not specifically mention the number of publishers whose archives it crawled in collecting data, but the page about publishers, journals and conferences at http://academic.live.com/journals has about 120 entries in the publisher section. However, these include a lot of weird combinations of publisher names. True, there are publications which are jointly published by two or more publishers, such as the volumes of the Joint Conference on Digital Libraries, a cooperation between ACM and IEEE, or by a commercial publisher on behalf of a scholarly society, such as the Journal of Digital Information, which originated from the British Computer Society and Oxford University Press.

True, there are journals which were published first, say, by Elsevier, then sold to Kluwer, such as Scientometrics. For articles in such journals and conference proceedings the joint listing would be understandable, but none of the above-mentioned sources are covered by WLA.

What I am referring to is the nonsense pairing of publishers, such as the one for Science magazine. In more than 10,225 records, the publisher field includes the Nature Publishing Group (NPG) and the American Association for the Advancement of Science (AAAS). Science is published by AAAS; NPG has nothing to do with it. These two archrival publishers of the most cited journals (Nature and Science, respectively), form a really odd couple as presented by WLA.

In the source list, there are more than 4,300 journals identified, but this number is also grossly exaggerated. There are hundreds of identical journals appearing twice with slightly different spelling, such as Scandinavian Journal of Medicine and Science in Sports and Scandinavian Journal of Medicine & Science in Sports and Planning Theory and Practice versus Planning Theory & Practice. There are also some British versus American spelling differences, such as Paediatric Anaesthesia and Pediatric Anesthesia, which are not that apparent as duplicates. There are duplicates for reasons of typographical errors, such as this journal whose misspelled variant is automatically corrected by Word. The combination of the above errors and inconsistencies makes some journals appear three times, four times or even five times in the journal list. The variety is well-demonstrated by these nicely juxtaposed journal names. There are also journals listed which are not covered at all.

It is also interesting, and discouraging, to see how many high-impact factor, influential journals are not included in WLA. For an obvious example: Key research journals of IBM are entirely ignored. It adds insult to injury that all of the articles in the 45 volumes of IBM Systems Journals, and in the 50 volumes of IBM Journal of Research & Development are offered by IBM in full-text format.

As for the conference proceedings, indeed there are more than 2,000 listed. Once again, the numbers can be easily misunderstood. Counting each yearly occurrence of a conference separately inflates the number of sources and is akin to counting the volumes of journals.

The Software

This is home turf for Microsoft, but they do not deliver with WLA. Essential search features are missing from the software: I could not find a truncation operation, nor did there appear to be a way to refine a search by limiting it to a publication year or year range.

There is no good way to search by journal names for two reasons. One is that there is no option to search for the word(s) or exact name of a journal title. You may use the journal name as a search criteria, but the result will include every item that matches the search term(s) anywhere in the record. This is especially frustrating for journals whose name is a single word, like Science, or is not distinctive enough, such as Evidence-based Cardiovascular Medicine.

There is no way to make a distinction in searching for a journal such as the source journal, and you would get many hits where your journal is the cited journal. For example, searching for items published in the Annual Review of Information Science and Technology (ARIST), the current no. 1 periodical in the Library and Information Science field, you get a list of 235 hits. However, about 200 of them aren't records for chapters published in ARIST, but for articles citing a chapter published in ARIST. You have to scroll through 80 hits before the first record appears for an ARIST chapter.

If you believe that this can be improved by sorting the result list by journal name, give up your hopes. The sort puts at the top the records for articles published in the Journal of the American Society for Information Science and Technology, an obviously incorrect sort procedure.

In addition, it should be noted how few ARIST chapters there are records for in WLA. For perspective, Web of Science has 484 records for chapters published in ARIST. The reason for this enormous difference is that the publisher has the digital versions of only 10% of this high-impact periodical that could be collected and indexed by Microsoft, whereas ISI has been indexing all 40 volumes of ARIST.

There is no citedness score listed yet for the items retrieved, but Microsoft promises to work on it. It is good that the results can be downloaded.

The output is a continuous stream, which appeals to some reviewers. I find it distracting as you have no idea after a time where you are in the result list. More frustating, if you click on an item to get to the source document, then return to the result list, you are positioned at the top of the result list, not where you left it. Finding your jumping-off point time and time again is annoying and time consuming.

You may choose from three different output formats (short, medium and full) using a slider. This is a gimmick and three clickable icons would have been just as good. There is a side panel to show the complete bibliographic record parallel to the list. Its most serious deficiency is that it does not show the abstract. Even if there is an abstracts, it claims that it is not available. Someone should have caught this glitch, but apparently everyone was working on the gimmicky gizmos, and no one paid enough attention to what is displayed.

You may limit your search to the title field, but it doesn't work consistently. For example, searching for "odd couple" finds no matching record. Searching for the same term without the title field restriction retrieves 29 records. The first dozen items have the exact term "odd couple" in the title field. Similarly, the query intitle:"medical informatics" finds only nine records, and none of them have the phrase in the title. Four of them have the database name BioMed Central instead of the actual title of the article.

The much-touted side panel also offers options to display the record in BibTex and EndNote format. The latter is an insult. When the software can identify and retrieve the abstract, it includes the first 50 or so character of it. It does not include the title, the journal name, the authors or the DOI. This is the record in the default mode and this is in the EndNote format. This is a useless feature that Microsoft should hide instead of bragging about it.

A good feature is the use of the Digital Object Identifier which links the user to the most authentic version, the one posted by the publisher. The full text is available if it is an open access document, or if your library subscribes to the journal and qualifies for access to the specific issue.

Conclusion

Windows Live Academic is a deeply disappointing product, even for a beta release. There may have been more time and effort spent on PR materials than on testing functionality. The sloppiness and incompetence of the programming work is appalling and undermines the reputation of good Microsoft products. The propaganda material is as accurate as statements from spokes persons of malfunctioning government agencies. If this is what Microsoft is capable of doing in 2006, the company is in big trouble.

Careers at Cengage   |   Contact Cengage Cengage Learning     —     Gale   |   Course Technology   |   Delmar   |   Academic   |   Nelson
Privacy Statement   |   Terms of Use   |   Copyright Notice