Reference Reviews

Péter's Digital Reference Shelf

July 2008

Title: Cuil
Publisher: Cuil, Inc.
Cost: free
URL: http://www.cuil.com
Reviewed: August 4–10, 2008

THE CONTEXT

The search engine market has lured many software developers, individuals, small and large companies by its potential for instant fame and big money in the long run through advertising. The reality is that “many are called, but few are chosen.” These days Google and Yahoo have by far the largest share as search engines, while Microsoft’s Live Search keeps trying to close the gap. There is not much room left for the other search engines, not even for the best ones such as GigaBlast, ExaLead, or Clusty, even though they may have unique and useful features, such as clustering of results, offering proximity operation and/or suggesting filtering options for the search.

None of these services received a zillionth of the pre-release propaganda that Cuil founders and staff may have worked on so hard that there was not enough time left to realize most of the claims that they dispensed about Cuil.

Debut fiascos

On the first day of the launch, the enormous propaganda attracted so many users, that the system kept crashing and crashing. From blog reports, it was clear that the underestimating of the traffic and the consequential inaccessibility was a huge annoyance for users. Shortly after the launch, Google reports to have found 144,000 hits for the query cuil fails. As Cuil is a free service, this fiasco was not as outrageous as the official opening of Terminal 5 at Heathrow four months before, where tens of thousands of luggage were irretrievable and many flights were canceled, leaving thousands of passengers stranded and British Airways embarrassed.

Then again, Cuil’s failure was more surprising as it has an undoubtedly qualified trio of founders. However, Cuil, Inc. also has a whopping deficit of realism to recognize a posteriori how premature this launch was. The founders should have asked independent experts who do not faint in awe if they get an invitation to preview a search software, such as Gary Price, Greg Notess and Nick O’Leary, who eat search engines for breakfast, or John Dvorak. The long-time columnist of PC Magazine, who has seen and tested a few hundred search engines, could have given a long list for the developers to work on before launching had he been invited for a badly needed reality check. His column’s title The New Search Engine Cuil Sucks minces no words, neither does the description of major deficiencies.

Microsoft had the same delusion about the capabilities of its inferior Live Academic service at its debut and two years later when introducing the citation frequency count feature but it finally withdrew this embarrassing search engine in May 2008.

Then again, on the citation frequency count feature, Google Scholar remains massively misleading because of the stunning illiteracy and innumeracy of its sorry citation-matching software module, which purportedly knows—as seers in ancient Greece or palm readers in contemporary India— about the future. It knows not only about articles and conference papers to be published after 2008 but also about their citedness —by papers published in the 1900s.

When I revisited Google Scholar a few months ago, the phantom records and citations were alive and kicking, but Google Scholar can get away with these because it excels in finding scholarly papers—while Cuil is often unable to find any article on a given topic, let alone scholarly ones, in spite of the impressive academic background and heritage of its founders.

Myth and reality

Cuil, Inc.'s founders, who must have spent too much time in the unreal world of GooglePlex, which—along with the endless adulation by the media—makes many Google “associates” believe that they can walk on water, only to drown when encountering real waves in the real world.

Choosing the name of a search engine seems to be a typical symptom of the naïve, navel-gazing philosophizing attitude with mandatory ethereal, new age music playing gently from above the couch to inspire deep thoughts. I am the greatest fan of Gaelic music and performers, but choosing a word that according to the long explanation on the FAQ section of the Cuil site is the Gaelic term for both knowledge and hazel, and should be pronounced as cool apparently isn’t. According to the research done by Nancy Gohring that explanation is new to Foras na Gaeilge, the group that is essentially the official keeper of the Irish language, responsible for promoting use of the language as well as developing dictionaries and new terminologies. “I am unaware myself of the meaning ‘knowledge’ being with the word ‘cuil’ in Irish,” said Stiofán Ó Deoráin, an official on Foras na Gaeilge's terminology committee. I bring this up only because the same inaccuracy and wishful thinking apply to many of the other claims about the content and software of Cuil.

THE CONTENT

The most prominent claim of Cuil is that it “has indexed 120 billion Web pages, three times as many as any other search engines”, and that it is the biggest search engine. Actually, the number increased since the July 28 announcement by 1.6 billion pages, as it is proudly shown on the otherwise really ascetic search page. More importantly, the larger size does not mean much when users are flooded by unfathomable results, hits from spam sites, etc. Danny Sullivan, the founder of the superb Search Engine Watch site, went unusually ballistic 3 years ago witnessing the revival of the “who’s bigger” size war which then stopped, until Cuil warmed it up—asking for trouble and resentment.

Size could matter

Undoubtedly, it was the ideal claim for the media people in the middle of the summer lull. They fell for it hook, line and sinker, as you can see from the Google search result, which shows that there are 101,000 pages for the query cuil “biggest search engine”. Yahoo reports 433,000 hits for the same query. Cuil itself reports only 39 hits –a good week after the launch. Is this modesty? No, it is one of the signs of shallow coverage in Cuil in spite of the purported size of the search engine’s database.

The flip side of this bragging (as is often the case with such “running your mouth” type of claims) is that there was also huge coverage of the huge flop on the Web both by the official media and by the frustrated and disappointed pedestrian users through the blog universe. The query “cuil fails,” produced 156,000 hits in Google, an incredible 4,830,000 on Yahoo and 128 on Cuil.

If you do the math, it was as smart for Cuil to make its search engine look three times as large (and better) than Google, as Spike Lee’s brutally untrue, unfair and unsubstantiated statement about Clint Eastwood—just to get some badly needed press attention. Doesn’t this huge difference suggest that there is something wrong with the claim of Cuil?

Test queries and results

It seemed so to me after I made dozens of test queries that there was not a single search across Yahoo, Google and Cuil to prove that Cuil covers the most Web pages—except for the one about my name. I rush to add that the hit counts of all the search engines are grossly inflated and there is no way to verify them.

Out of the dozens of my test queries, here I show the results of the possibly most objective test queries in Yahoo, Google and Cuil about Yahoo, Google and Cuil and three other queries of interest to me about a company name, a personal name and a topic that I worked on.

Exceptionally, I include (not just link to) two tables to drive my point home even if you don’t click (which hurts me as I create many screenshots for my reviews to illustrate and clarify many of the pros and cons of the databases reviewed.

As some of the hit counts are so huge it may be easier to look at the hit rate of each query in Google and Yahoo against Cuil, whose hits counts total was considered to be one for this purpose.

The rates of differences are surprising for four of the six queries, and for Cengage and Cuil itself they are monstrous. For the query “cuil,” the very high hit numbers in Yahoo and Google are realistic, because cuil has many meanings (but apparently not knowledge that the founders so desperately wanted). One of the meanings is teaspoon, not in Gaelic, but contemporary French—it is the abbreviation of the French word for teaspoon (cuillère) , which appears on millions of pages of recipes. If it appears so often in Yahoo and especially in Google, then it should appear about three times as often in Cuil, not nearly 20,000 times fewer.

But isn’t it possible that Cuil does not inflate the hit counts as the other search engines do? No, it isn’t. On the contrary, Cuil does this inflating even when its absurdity is right in your face. For example, when searching for “jacso scopus” (space means AND as in most other systems), Cuil promises three hits, brings up one. For comparison, Google promises 1,860 hits for the same query, but delivers only 730—of course, that “only” is not much to complain about in contrast to the one hit in Cuil.

In some cases, the hit count in and of itself makes it incredible that Cuil has three times as many pages as Google. Cuil reports 151 hits for toxoplasmosis, the parasitic disease, which is widely written about. To wit: in Google, the hit count is 1,350,000—a much more likely count.

THE SOFTWARE

This is where the developer really gets carried away verbally in touting the features of Cuil. This is where some editing by a copy editor who grew up in the Midwest or on the East Coast would have come in handy.

The essence of the software is wrapped in a message. It ensures you that “rather than rely on superficial popularity metrics, Cuil searches for and ranks pages based on their content and relevance. When we find a page with your keywords, we stay on that page and analyze the rest of its content, its concepts, their inter-relationships and the page’s coherency.”

If so, then why is it that the duplicates, triplicates, quadruplicates, quintuplicates of the same page don’t line up cheek to cheek in the result list? Why is it in the first place that so many of these clutter the results instead of getting hidden as is done in Google or Yahoo? Take as an example the result for the search about using Scopus for calculating the h-index. The query for it was just Scopus calculating h-index.

Nonsense results

To see the problems, go to http://www.jacso.info/cuil/ where I reproduced the three pages Cuil uses to present its findings. I did that for two reasons. One is that if you run the above search you may get a very different result and can’t follow my references. Surprisingly, it often happens that a search that is re-run even a few minutes later, will produce much less results than the original run. Try to explain that to your users and students. The second reason is that I used color to highlight the problem. The third reason is that I numbered the snippets for easy reference. This is the time to open another tab and go to the http://www.jacso.info/cuil/ page.

Cuil promises 35 hits for the query. It delivers 18. These are presented in two or three columns that some journalists liked, because they remind the users to a magazine. Maybe, but it would have been nice to offer the traditional layout for the old-fashioned users who don’t believe that result lists are to be treated as magazine pages. I am personally obsessed with presenting everything in matrix format. I think I wrote even my love letters to my wife in matrix format, but I don’t like this layout and the many more clicks needed to go through the first hundred hits. I know that most users look only at the first 20 or 30 hits, but there are many who like to go further, much further in the result list

The artsy result presentation could have tiled the result, in this case in a 2*9 or 3*6 format. Instead, page one has eight snippets, page two has six, and page three has four. This layout does not guide the eyes well, especially when there is no title, as for item two, or item eight on the first page.

Much more importantly, there are only seven unique items out of the 35 promised, and 18 delivered. This is an annoyingly inferior ratio. Google promises 86 hits but delivers 150—if you ask it nicely when arriving at the 86th item, it informs you that these were the most relevant results (even though it uses popularity without heavy breathing explanation about relevancy). It also informs you that it would be happy to repeat the search with the omitted results included. You’d better do that because there are many highly relevant gems in the remaining 64 hits. There are a few duplicates and some irrelevant records to the tune of less than 10%, a very good deal without breaking a sweat.

Two hits in the Cuil results (#1 and #7) have no duplicates. Item #3 has five duplicates, item #6 has three duplicates and the rest have one each. Item #6 is of particular interest as all the four hits allegedly come form the same site, whose URL includes this character string “the first-lady-of-the-united-states-enc”.

Dearth of search options

As for other typical search features, the situation is not better. There is no phrase searching, no truncation, no automatic stemming, no OR or NOT operation. Neither is there an option to limit the search to a general domain, or to a particular Web site. Neither can the search be limited by language, file type, harvesting year, country of origin, numeric range. Most painfully the search cannot be restricted to terms extracted from the title field, a feature that is often needed to focus a search.

Much is made of enhancing the snippets with thumbnail images to guide users. However, these are not extracted from the Web sites but rather are assigned from unknown, unidentified, uncredited sources by the Cuil software. These range from cheap clip-art pieces, to company logos and even X-rated pics. A funny, G-rated sample of the massively mismatched biographies and thumbnails can be found here.

I was surprised when I found the Web of Science (WoS) logo next to my article about Amazon. I do write about WoS, but in that review I mentioned WoS once or twice and Amazon appeared almost a hundred times. Cuil picked a WoS thumbnail rather than one about Amazon, the best online bookstore. Next time it will be perhaps a thumbnail about the river or the rainforest as the smart software fancies it.

Looking up my paper about the ISI database, I found a snippet from Gary Price’ Resourceshelf which mentions that review. Cuil’s entry has a heading about ISI, the notorious Inter-Service Intelligence Agency of Pakistan, purportedly from the www.bestshopingstreet.info/isi.htm site (with the misspelling). Gary doesn’t associate me with the Pakistani agency, neither does he endorse his review being published on these spam sites. What was Cuil’s software thinking?

Not all the good services had a perfect start, but Cuil had, and still has, unprecedented deficiencies. In Texas, they say and sing “Big hat, no cattle” or “All hat, no cattle”. The special message from Cuil, Inc. “explains” the technical fiasco in a strained cuil style, but the founders still do not seem to recognize how crippled their software is, and how absurd their claim about the 120 billion page coverage is in light of the abysmal results that Cuil serves up.

Careers at Cengage   |   Contact Cengage Cengage Learning     —     Gale   |   Course Technology   |   Delmar   |   Academic   |   Nelson
Privacy Statement   |   Terms of Use   |   Copyright Notice