Title: World Development Indicators 2009 (subset)
Publisher: World Bank
Cost: The subset is free, the full version's price is to be negotiated.
Tested: April 27-30, 2009
There are many sources that provide demographic, economic, social and a variety of other indicators for nations of the world. One of best is the constantly updated CIA World Factbook, which has turned into an excellent source since its Web debut long ago. It covers the largest number of countries/economies, however, it does not have information about the U.S.A, simply because it was set up to monitor developments in foreign countries.
Various United Nations agencies regularly publish/update statistical data for the 193 UN member states but, disappointingly, there is still no one-stop, comprehensive shopping alternative for the many worthy UN digital resources, but the UNData System may come up with a federated search engine for many more than the current 22 databases, including ones from UNIDO and UCTAD.
Several other digital statistical collections are –by design– focusing on a region, such as EuroStat, or the much lesser known, but outstanding bilingual RICYT system that covers 26 Ibero–American countries including not only Latin America, but Spain and Portugal, as well as the U.S. and Canada, with strong emphasis on science (which is one of the weaknesses of the free WDI subset in my eyes).
OECD, the Organization for Economic Co–operation and Development, is not limited regionally but functionally, covering the 30 member states and the candidate countries from around the world. The free subset created from the databases in the SourceOECD family – available through the Statistics Portal has the most systematically organized, smartly arranged fusion of statistical tables, spreadsheets and compact but lucid narrative summaries.
The World Economic Outlook (WEO) database of the International Monetary Fund is in the same league as the World Development Indicators database but –understandably– it provides predominantly monetary data series with impressive breadth and depth as long as you or your users want to know the price of, say, safflower oil or sawnwood.
The splendid Human Development Reports databases of UNDP– one of those that do not look at every trait through the prism of money, – puts the emphasis on social indicators (in the context of essential economic indicators), including some innovative and enlightening composite indices about poverty, gender (in)equity and human progress. Its various yearly or biannual editions are not yet aggregated into a single database to form time series at a glance for the best perspective.
Then there is Wikipedia. It has quite a number of entries with country oriented indicators. They are presented in visually pleasing formats (with tiny flags of countries) and with features that other service providers should adapt, such as the very smart, instant dynamic re–sorting of multiple parallel lists by rank position or country name.
I like the cute and pretty (inter)faces of these indicator lists as much as the teenager next door does, but I must also look at what is inside the lists and I am often less happy. I know that this would trigger quite a few hate mails and very few hail mails, but the undocumented decision of the page creator or the updater of the lists about what countries to include from the original and official source documents is not ethical. This is not a theoretical issue but a very practical one and worth a paragraph or two before discussing WDI.
I use as an example one of the most essential indicators in many of the country ranking lists. It is the Gross Domestic Product of the countries, which is the basis of many other indicators, such as the Gross Expenditure on Research & Development in absolute or per capita figures. This indicator can be accepted as a compromise and necessity but it should not be made as so much drummed–up virtue.
Sometimes it does not represent as much the virtue of a nation, the hard work of its blue collar and white collar citizens, but rather its fortunate geographic, political or religious position. This is the case for the permanently top–ranking Luxembourg by the GDP/capita indicator in the entire world (and #1 in alcohol consumption per capita among the OECD countries, just behind Uganda and way ahead of the other traditionally perceived as heavily boozing countries worldwide without OECD membership). Luxembourg is in the lucky position of being the headquarters of several international organizations in the center of Europe (not from a geographic perspective but symbolically). Switzerland is in a similar geographical and political position, but its GDP/capita is about half of that of Luxembourg. Does this mean that Luxembourg is twice as good, worthier than Switzerland? Through the so prevalent GDP/capita prism it is ranked considerably lower (9 positions lower in the CIA rank list) than Luxembourg, but it is way ahead of Luxembourg in many regards (not gender equity, for sure), such as scientific research.
I am equally skeptical about the top ranking by GDP/capita of some of the richest OPEC member states, which just happen to be lucky sitting pretty over huge oil fields (and/or religiously sacred land areas), can get guest workers for the dirty work very cheaply (while abusing them socially).
The creator of the visually appealing GDP/capita lists, may also have some reservations about the fairness of this indicator. However, she/he decided to leave out most of the countries of the Middle East and many Muslim countries in other parts of the world without any notes about his/her decision – which is highly disagreeable in an encyclopedia.
Surprisingly, none of the usually hyperactive social computing Wikipedians (who spot and elaborate on the misuse of a semi–colon versus a comma) felt the urge to modify this error, which distorts the list and the ranking of countries in this meshed–in import from the previous edition of WDI.
Among others, Quatar and Brunei, which are top ranked in terms of GDP/capita and appear in the original WDI list, were left out from the Wikipedia list for World Bank. Re–sorting the list by country name makes this clear – Brunei should be between Brazil and Bulgaria alphabetically.
Population and GDP data are among the most consistently available data for all the countries and economies in WDI, so the omissions of most countries are the personal preferences of the creator of the page. This is certainly not obvious for the casual users and there are no footnotes about the significant omissions. The omissions caught my eye, simply because I spend quite some time with profiling countries (primarily from a scientometric perspective).
My emphasis here is on the hidden bias and the manipulation of information extracted from a traditional ready–reference source that has its own content (and software) deficiencies, even in the subscription–based version. It is given credit, but the Wikipedia list misrepresents the real WDI list of the World Bank. As a lawyer by my primary education, I am sensitive to such misrepresentations.
In light of the wide variety of sources, it was a wise decision from the World Bank to get data series from other sources and mesh it with its own data. It has about 30 partners that provide input. The subscription–based 2009 version of World Development Indicators (WDI) was released together with the free subset in the last week of April.
The difference between the two editions is in the number of variables for which data are provided. It is huge, because the subscription–based version has about 950 indicators (depending on how they are counted, while the free version offers only 54.
Somewhat surprisingly (in a positive way) human–related indicators make up a third of the set (under the category name Social Indicators), including education, health, population, poverty and income data). I wish the Human Development Index would be included to see the most essential composite indicators at a glance – although a side trip to the Human Development Reports is always a pleasure (and a new release may be coming soon).
The second largest group relate to the National Accounts Indicators category with 12 indicators, including the share of industry, agriculture and services, as well as the ratio of the import and export of goods and services in the GDP. As there are absolutely no indicators about scientific activities in the countries, at least the gross expenditures for research and development as percentage of the GDP should be provided in this category. It is available in the subscription–based version.
There are seven indicators related to the environment such as CO2 emission, energy use (in oil equivalent) and electronic power consumption per capita. The ratio of agricultural land and the size of the forest are useful, but the surface area of the nation is not worth an entry as it is available in any reference source and barely changed since 1960 – with a very few exceptions of some of the islands nation of the Pacific, like Kiribati and Tonga – but these are not reflected in the data series anyhow. They appear with 180 and 750 square kilometers from 1961 to 2007 consistently for each year, so this variable is much ado about nothing – and a waste in the limited 54–indicator set in the free version.
There are five indicators within the Development Framework category, about the percent of roads paved, high–technology export (of all manufactured exports) and the number of Internet users and mobile phone subscribers per 100 inhabitants. These latter ones are useful and often asked for these days. The number of days required to start a business is illuminating and in some cases very encouraging or discouraging for an entrepreneur. Purportedly, in Suriname it takes 694 days – making bureaucrats in Russia look super sprinters.
The other indicators report about foreign direct investments, external debts and other, somewhat esoteric measures of financial vitality. Sure, they are not esoteric for the talented, diligent, honest and competent Wall Street bankers and financiers, but they probably have parallel access on multiple monitors to all the data of the World Bank, the IMF and other similar institutions.
These 54 indicators are not all distinct ones, but different representations of the same data, such as Gross National Income (GNI) in current international $, current US $, or as derivatives such as GNI per capita in US $ and in international $. Given the size of the set of indicators, the per capita values would suffice as those who need to know the data et the country level can download the data to a spreadsheet and do the math in one fell swoop.
WDI does not provide data up to the end of the previous year. This is not realistic to expect and the reference to a 1960-2008 time series by WDI is equally unrealistic and misleading.
To its credit, WDI offers a tool to check the availability of the data for every year for all economies. It is good if you find indicator values for 2007 as illustrated by the snapshot that I took using that excellent software tool.
Of course. there are certain variables for which census is made only once in a while, such as the percent of births attended by skilled health staff. Of course, there are indicators that became important with the advances in technology in the past decade, such as the number of Internet users per 100 people.Of course, there are countries that could not care less about providing statistics for one reason or other. Of course, there are many countries where mere survival of the population is the issue, not gathering, processing, verifying and delivering of statistical information with any regularity at all.
In many of the now–sovereign countries there were barely any kind of censuses since they gained independence, such as in Angola, where population census was held nearly 40 years ago, or in Zimbabwe, which was turned into a pariah with a world record inflation rate by its despot and his cohorts who abused the people of the country for the past 30 years in a way that makes Cecil Rhodes roll in his grave.
The geographic coverage is very comprehensive – at least in having an entry for each country that has an anthem and some others – with strange inconsistencies. Theoretically, there is some information about 209 countries/economies and 18 aggregated entities in the database. The term "economy" is used to include entities that are not countries, or independent territories, such as Hong Kong or Mayotte.
This is a useful and politically soothing solution, but makes it even more inappropriate that all of a sudden Taiwan was excluded from the World Bank's list. Apparently, the copywriter of the classification note is not aware of it, claiming that “Taiwan, China also is included in high income”. No it is not – in any of the categories.
It was covered in the previous years and just because the UN has a one–China policy, it does not mean that the World Bank could not and should not include information about this country whose growth and/or performance improvements in many economic, political, social and technological regards have been very remarkable. This stands out considering the fact that there is information about 209 economies, many of which barely have any resemblance of a normal country, such as Somalia, the ultimate example of the failed state for the past 20 years.
The geographic and economical classification of countries is confusing even with the explanation. It remains an ill–conceived idea that “classifications and data reported for geographic regions are for low–income and middle–income economies only” – forming a cumbersome mix as shown in the spreadsheet offered for guidance. It is a hurdle for users as I explain in the software section. It seems to have been created for the print edition of WDI and is like the pre–coordinated subject headings of the card catalog for most of the young and not so young Web searchers.
All these may backfire on the work of the compilers of WDI, leading to inconsistencies and plain errors. For example, the Isle of Man is included but Guernsey and Jersey are not – although all of them are bailwicks in a literally legal sense of British law. I only knew about the Isle of Man because one of the very productive scientists, Quentin Burrel (not the American football player, but the scientmetrician), resides there.
I would not grieve about the handling of these islands if the compilers had not included the Channel Islands, which is the name used for the bailwicks of Jersey and Guernsey (and the tiny, inhabited and uninhabited isles of the latter). It should have been a warning to the WDI compilers that the two–character country code to Guernsey and Jersey but not to the Channel Islands. It is one of the few inhabited entities on earth that has no two–character ISO code (the UN Statistical Division did assign a three–character code to Channel Islands but probably will regret it).
I still would not grieve if this farce had not screwed up a huge file that I was creating by meshing time series from several sources (including the subscription–based WDI) for a research project. Luckily, I spotted the nonsense values for the Channel Islands before printing a tabloid size, expensive color output, but apparently the WDI staff did not do a simple visual check using common sense. They attributed to the Channel Islands an absurdly high number of published scientific articles.
This high number would be just impossible for the indicated population even from a highly developed country with billions in research support. After some sniffing around, I figured out that the data for Switzerland were entered for the Channel Islands for some of the variables, some of the time by World Bank. They apparently imported data from the National Science Foundation file, which in turn uses data from a subset of the Journal Performance Indicators of Thomson–Reuters. Neither of them uses this “country” name – so this is clearly a mess made at the World Bank – which still exists in the brand new complete 2009 edition.
It is more enigmatic as to the reason for an absurd number visible only in the subscription–based version, but giving cold feet for the users of the free version. It is the alleged number of researchers per million people in Tonga that caught my eyes. It is . claimed by WDI to be 45,454 per million people. The entire population in Tonga was 97,414 in the year reported for the number of researchers. You don't need a calculator to sense the absurdity of this figure – again, still not corrected in the latest edition of WDI, which issues errata notes for each yearly volumes. These corrections, however, are not as critical as the ones that I spotted.
I also found grossly suspicious data in the free subset. The extreme wait time (649 days) for starting a business in Suriname was an example of that. Even some of the other data for the country looks very unlikely, or simply impossible, such as the remittances by the citizens working abroad. The range of fluctuation of this amount between 2002 and 2006 is from 15 to 23 million to 9, 4 and 2 million dollars. It is implausible and its skyrocketing to 140 million dollars is just impossible. True, nearly half of the citizens of this former Dutch colony live and work abroad, but they have not migrated all of a sudden (actually, only 16,000 Surinamese left the country in 2005), so there is no reason for this extreme fluctuation, not even if some Surinamese soccer players in the excellent Dutch league may have gotten some huge bonus in 2007 that they sent home.
My concern is that if such obvious absurdities pass through the editing phase of the compilations, how many other very inaccurate data may be overlooked. This is not just a question from a ready–reference perspective but also from that of the financial aid distributions of many agencies that rely on World Bank data.
It is definitely a Web 1.0 software, which will be disappointing to Generation 2.0 users. There is no possibility for saving preferences for countries, variables and time periods. In turn, this would require registration and log–ins, but some users would be happy to do so for the great convenience of saving preferences instead of selecting them over and over again.
This is particularly troubling because after about 10 minutes of inactivity users are kicked out. When they just want to reload the page an odd message appears, “You are not an authorized user or the query ID is invalid. Please contact system administrator”.
Users may wonder why they would need authorization for an open access, no registration required resource. Why has the query id become invalid? Most of the users don't have a system administrator to contact. They just give up and understandably go to Wikipedia for a warm hug and face additional problems in getting reliable information for reasons explained at the end of the CONTEXT section of this review.
There is no sort option for displaying the results. This is so important in using such statistical tables and is so smartly and elegantly done in Wikipedia.There are several options to display the results in graphic formats, on charts or maps, which is welcome, but the quality of the charts and maps is not comparable to those in the state of the art visualization systems, neither to the good statistical maps in Wikipedia.
The incomplete geographic classification discussed earlier also backfires in searching. In order to get a list of indicator values for some of the 92 highly developed countries there is no choice to display them and pick the ones of interest directly in the search. There is a choice for choosing such countries in the Aggregates category but it only provides a single aggregate value for all of those countries, not a country by country listing for each of them. This requires to keep clicking through all the geographic regions and pick the countries individually – an irritating process, especially as the choices can't be saved for a later session.
It is good that the scale of the values can be customized, i.e. requesting, say, data in millions or thousands with two decimals. It works when you want indicators on the same magnitude, such as Gross National Income (GNI) figures in constant, current and international US $ values (to compare the countries at purchasing power parity, which is somewhat similar to the handicapping system in horse racing). This does not work, however, when the result list also is to display values of different magnitudes, such as the GNI per capita, or the GDP growth as these would display 0s, or 0.00s when choosing millions or billions for scale. Leaving the indicators displayed in their own natural unit makes the display very wide and hard to read. It would be nice to have an option to specify the scale at the individual variable levels, especially if these could be saved.
The alternative is saving the result list in Excel or CSV format, which is potentially of great help and not available for all systems, surprisingly not even Wikipedia. However, saving the results brings up a problem if the purpose of saving is not merely post–processing the data, but mashing them with data extracted from other sources.
One of the hindrances in this process is to match country names used differently in almost every system. For example, the country commonly referred to as North Korea also appears in many different formats, such as the Democratic People's Republic of Korea, or Korea, Democratic People's Republic of, or Dem. P. Rep of Korea or DPRK and in a dozen other formats with differently abbreviated words and different punctuations or without punctuations. For the human eye they are OK, but for software matching it is a nightmare.
The only sane way to combine indicators from different sources is to use the 2 character or 3 character standard abbreviation. Unfortunately only relatively few statistical sources use these and some use the former and more use the latter. You may not see these when displaying records, but they may be included in the downloaded files.
World Bank deserves credit for including the 3–character code in the exported output. However, World Bank deserves criticism for using some obsolete codes (and names), which make matching and meshing data tedious. For example, it still uses ROM for Romania instead of ROU, WBG (for West Bank and Gaza) instead of PSE with the much debated name of Palestinian Territory, Occupied as an ISO Standard.
In some cases, the prolonged amnesia of the World Bank of well–known country name changes is stunning. What has been known for well more than a decade as the Democratic Republic of Congo and was assigned the code COD went through frequent name changes in its history (and as most countries that insist of including the adjective, it is anything but Democratic).
World Bank does not seem to realize fully the change and still uses the code ZAR, instead of the current official code. Well, it should know that we are not in Zaire any more and Muhammad Ali does not return for another rumble in the jungle with George Foreman. But this ZAR choice may floor efforts of meshing data.
Even if the subset of the WDI is a freebie, one would expect more from the World Bank. Buyers of the book edition ($75), subscribers to the CD–ROM ($275 for single users) and online version (to be negotiated) of the full WDI may be especially disappointed because World Bank data are widely used in negotiating a large number of billion dollar questions and in making decisions. They also are used in many other statistical databases. Use the data with reservation and common sense.