‘All the surveys listed follow scientific sampling design, which is widely accepted internationally as well’. Photo Credit: Getty Images
Recently there has been controversy about how robust data collection practices are in India, especially when it comes to some important national level surveys. An article (by Shamika Ravi, member of the Economic Advisory Council to the Prime Minister), which was published in a leading national daily on July 7, 2023, raised doubts about the robustness of data collection procedures of some surveys such as the National Sample Survey (NSS), the National Family Health Survey (NFHS) and the Periodic Labor Force Survey (PLFS). The authors suggested a ‘shift to a larger sample size’ so that the survey estimates reflect the ground reality in the country. In support of her argument, the author states that of the 11 surveys listed in her article, from 2011-12 to 2019-21, ‘every survey (except NFHS-4 of 2015-16) overestimates or significantly underestimates the proportion of urban population.’ As a result, the estimates of these surveys ‘systematically underestimate reforms across the country.’
After the publication of this article, articles contrary to his point of view started coming.
It is important to re-emphasize that all the surveys listed follow scientific sampling designs. It has also been widely accepted at the international level. However, the fact cannot be denied that there is always room for improvement in the sampling design. In fact, the sampling designs of the NSS have been revised from time to time after due deliberations in NSS round-specific working group meetings, which have been given final approval by the National Statistical Commission (and earlier by the Governing Council of the National Sample Survey Office). These committees and bodies have been chaired by or have as members some of the most eminent economists, statisticians and demographers that India has ever had.
Bias in Population Estimation
On the issue of surveys underestimating the proportion of urban population or overestimating the proportion of rural population (as the author points out in her article on July 7), it is important to remember that the sampling design of the NSS or PLFS is not intended to estimate the number of households or population. Instead, they aim primarily to estimate key socioeconomic indicators that are related to the subjects of interest. An estimate of the number of households or population is helpful information. Data users adjust survey-based estimates separately for rural and urban areas as appropriate, using population figures estimated based on the census. Even though it sounds repetitive, it must be mentioned that the rates and proportions relating to key survey characteristics for rural or urban areas based on NSS broadly reflect the ground reality. This fact has been accepted by the National Statistical Commission (Rangarajan, 2001).
Nevertheless, the fact of population underreporting has been a perennial problem in the NSS. What is worrying is that the extent of underestimation, especially for urban areas, is significant enough to require remedial/corrective measures. In this context, it is noteworthy that unlike the estimated population, the number of households based on the NSS matches closely with the census-based number of households. The result is that even if no adjustments are made, the average level of combined performance of rural and urban areas (based on these surveys) should be fairly reliable as far as household-level indicators are concerned.
The author’s allegation that the samples based on these surveys are not representative given the use of older sampling frames also loses its relevance to a large extent because: First, these surveys primarily rely on population census lists (available only once in 10 years) of villages and towns/urban blocks for sampling purposes, which are, in any case, exhaustive in coverage. And second, for sampling urban blocks, the NSS and PLFS use the latest list of urban frame survey (UFS) blocks (i.e., the equivalent of the list of urban census enumeration blocks) covering all towns in the country – this partially corrects the post-census urbanization framework through state government notifications. On the issue of rural-urban classification of geographical areas, all these surveys consider census towns as part of the urban sampling frame.
systematic bias in response rate
The authors rightly point out that refusal to provide information is ‘never random’ and that response rates fall with an increase in the income level of households. The same problem is encountered in similar surveys internationally. However, the survey method provides for replacement of such households by similar households to the extent possible. Here, of course, the relatively low level of income of the replaced households cannot be ruled out, leading to some bias in the overall estimates, which are closely linked to income levels. However, given that most of the government’s welfare programs are targeted at low-income households, the aforementioned problem of non-response – with very low non-response rates in these surveys – is unlikely to have a serious impact on the overall household-level interest indicators estimated through these surveys.
room for improvement
Sampling design and data quality are two different components of a survey. Both are important. When it comes to sampling design, great care is generally taken in these surveys by adopting a scientific sampling design. However, on the issue of sampling frame, given apprehensions over insufficient representation of wealthy households, it may be worth exploring whether a list of such households could be developed by tapping alternative sources and covering a representative sample of them with a conventional survey of the rest of the population.
Furthermore, given the extent of underestimation of the urban population, it may be worthwhile to examine the coverage of the UFS frame. The establishment of a methodological study unit to conduct other similar studies oriented towards improving the survey design could also be a step in the right direction.
The aspect of field personnel training, field inspection, concurrent data verification and dissemination measures can be strengthened to improve the quality of primary data, which is most important in any survey.
Finally, while there is always room for improvement in survey results, criticizing all large-scale official surveys on the grounds that they do not adequately capture improvements is throwing out the baby with the bathwater.
GC Manna is Professor at the Institute of Human Development (IHD), New Delhi. He was previously the Director General of the Central Statistics Office and the National Sample Survey Office. He was also a member of the National Statistical Commission