Excerpt
Only in passing does he mention one of the main defects of such studies, namely inadequate sample size in association studies. The adequate sample size for studies on DNA polymorphisms and disease is influenced by many components that can be defined, for better or worse, by the investigator. In a setting of a case–control approach investigating only one factor [e.g. differences of allele frequencies of a single nucleotide polymorphism (SNP) in ‘diseased’ and ‘healthy’ subjects], at least 239 cases and 239 controls are needed for a study with a statistical power of 80%, and a significance level of 5% [2]. This generously low sample size is based on the assumptions that the average minor allele frequency of SNPs is approximately 7% [3], and the proportion of patients carrying at least one copy of a susceptibility SNP allele is most likely to be under 30% and more typically closer to 15% [4] (i.e. the value used in this calculation).
In reality, these values may be distinctly different for a given SNP, and this calls for an adjustment of sample size. In the case of the angiotensinogen M235T SNP, which is at present a particularly popular SNP in nephrological research, a sample size of 5377 for cases and 5377 controls would be needed to detect a difference at a significance level of 5% with a power of 80%. This number is based on the reported average minor allele frequency of 51.2% for T and 48.5% for M [5]. The largest studies on this SNP published to date had 9184 samples in a cross-sectional setting [6], and 1204 cases (with 647 controls) [7] or 720 controls (with 638 cases) [8] in a case–control setting.
The report [5] used to calculate the necessary sample size of 5377 for meaningful analysis of the AGT M235T SNP aggregates a total of 6916 controls and 6308 cases, mostly ‘whites’, but also ‘blacks’, ‘Asians’ and ‘mixed’ populations. It finds racial- and age-dependent variations of allele frequency, which indicates that, even at the level of a very naive meta-analysis, the combined cases and controls for a given subpopulation may not yield a number sufficiently large enough to study the question of whether this particular SNP is associated with hypertension at all.
One way to assure a better study design might be to require that all such studies are registered in a public-access electronic database, similar to the procedures required for clinical trials [9]. Such databases could, and should, be created and maintained by the relevant learned societies, rather than interested individuals. One thing remains clear: only pressure from the editorial side with respect to banning inappropriately designed studies from the review process will succeed [10]. After all, everybody wants to publish, and not perish.