Available technologies to identify data rigging are still insufficient to address all possible situations
Available technologies to identify data rigging are still insufficient to address all possible situations
an editorial in nature genetics In January, a very Mendelian yearreminded us 200th Birth Anniversary of ‘Father of Modern Genetics’ Gregor Mendel‘, on 20 July 2022. Mendel’s legacy is intriguing. Mendel conducted controlled crossing experiments with garden peas on about 29,000 plants between 1856 and 1863. He recorded many observable characteristics, such as seed size and color, flower color, and formulated two theories of heredity. His seminal paper, ‘Experiments on Plant Hybridization’, was published in 1866 in the Proceedings of the Brune Society for Natural Sciences. However, he received posthumous recognition when, in 1900, the British biologist William Bateson unearthed Mendel’s paper.
fraud case
Importantly, in 1936, Sir Ronald Fisher, the eminent British statistician and geneticist, published a paper titled Has Mendel’s work been rediscovered? By reconstructing Mendel’s experiments, Fischer found that the ratio of dominant and recessive phenotypes is predictably close to the expected ratio of 3:1. He claimed that Mendel’s data agreed better with his theory than expected under natural fluctuations. He concluded, “Most, if not all, of the experiments’ data have been falsified enough to closely agree with Mendel’s expectations.” About the time of the centenary of Mendel’s letter, Fischer’s critique, which began in 1964, attracted widespread attention. Later several articles were published on the Mendel-Fisher controversy. 2008 book, Ending the Mendel-Fisher dispute, Alan Franklin and others held that “the issue of the ‘too good to be true’ aspect of Mendel’s data as found by Fischer still persists.” Fischer, of course, attributed the falsification to an unknown assistant of Mendel. Modern researchers also give Mendel the benefit of the doubt.
Actually, the 1982 book, Betrayers of Truth: Fraud and Deception in the Hall of Science, is a compendium of case histories of misconduct in scientific research by William Broad and Nicholas Wade. While data manipulation in the scientific and social sphere is more likely in today’s data-driven and data-obsessed world, the data and the resulting findings, in many cases, lose their credibility. data is expanding; So that’s bogus data.
In a paper published in 2016 in the journal Statistical Journal of IAOSIn this article, two researchers illustrated that one in five surveys may contain fraudulent data. They presented a statistical test to detect fabricated data in survey responses and applied it to over 1,000 public data sets from international surveys to get this worrying picture.
Furthermore, Benford’s law states that in many real-life numerical data sets, the ratio of the times to the various prime digits is fixed. A data set that does not conform to Benford’s law is an indicator that something is wrong. The US Internal Revenue Service uses it to sniff out tax cheats, or at least narrow the area to better channel resources.
However, detecting fraud is not easy. Available technologies to identify data rigging are still insufficient to address all possible situations. Several procedures exist for testing the randomness of data. But they can only doubt the data, at best. In most cases it is difficult to conclude fraud. The data may, of course, be non-random due to excessive inclusion criteria or insufficient data cleaning. And remember that a real data set is only a ‘simulation by nature’, and it can take any pattern, no matter what the small probability.
Nevertheless, a skilled statistician would be able to identify discrepancies within the data because nature induces some sort of inherent randomness that would not miss the fabricated data. However, if raw data is not reported and only a few brief summary results are given, it is very difficult to identify data rigging. Nevertheless, if the same data is used to calculate a variety of summary measures and some measures are rigged, it is often possible to detect discrepancies. There is no such thing as ‘correct misappropriation of data’.
Back to the Mendel-Fisher controversy. Its 1984 . In book review treacherous of truth, Patricia Woolf notes that Ptolemy, Hipparchus, Galileo, Newton, Bernoulli, Dalton, Darwin and Mendel are all accused of violating the standards of good research practice. ,[T]There is little acceptance that scientific parameters have changed over the two thousand-year period from 200 BC to the present,” Woolf wrote. Take, for example, the importance of the natural fluctuations of data during Mendel’s era. was probably not as clear as it is today. Thus, placing these veterans under a scrutiny created by current ethical standards is probably unfair.
Fraud decision making is an ongoing process, empowered with new technologies, scientific explanations and ethical standards. Even if your conclusion is correct, future generations will continue to judge you.
Atanu Biswas is Professor of Statistics at the Indian Statistical Institute, Kolkata.