10 Things to Know About Working With a Statistician
Involving an expert in statistical methods in research can improve study design and results.
Published March 15, 2022 under Around DGHI
On the occasion of the DGHI Research Design and Analysis Core’s 10th birthday, the RDAC team has put together this list of tips for working with quantitative collaborators. This includes statisticians, epidemiologists and others whose specialty is working with and helping make sense of data. We use “statistician” as a catch-all term for such quantitative collaborators. We also highlight the separate yet closely aligned roles of statisticians and data managers/architects.
You can also check out RDAC's helpful website for more information on managing data issues in research.
1. Involve a statistician as early as possible.
The best time to contact a statistician is last week! The second best time is when you are still formulating your research question. Ideally, researchers should reach out several months before a grant is submitted or the expected start of work. Giving your statistical team adequate lead time allows us to provide the most thoughtful and thorough contribution to the design or analysis of a study and makes for a better result in the end. Just as you don’t want a “rush job” on a home renovation, you don’t want a “rush job” on your study design.
We strongly recommend you submit a request for statistical support at least two months prior to the grant submission deadline. Review the RDAC guidelines for faculty collaborations for more details.
2. A statistician knows about more than just data analysis.
Good data analysis goes hand-in-hand with good study design. Not even the most sophisticated statistical fireworks can overcome the limitations of a study that was not designed properly to answer a particular research question. Although statisticians are often thought of as “number crunchers,” we also have expertise in study planning and design, as well as in data collection and data management. A statistician’s eyes on your project at the outset will help ensure that your study is designed in the best and most efficient way to answer the scientific questions of interest.
3. Good study design and analysis is a team effort.
Statisticians have many different areas of expertise. A complex project might require researchers with specialties in data management and survey design, analysis of complex study designs (such as cluster randomized trials), epidemiology and psychometrics, among many other areas. Budgeting for appropriate study design, data management, and analysis expertise throughout the project will save money and headaches later on.
Generally, statistics effort will be lower in the early stages of the project during data collection, and higher near the end of the project during data analysis, whereas data manager/architect effort will be higher at the beginning of a project when survey and data collection instrument design is needed. The RDAC guidelines for faculty collaborations has helpful suggestions on how much faculty and statistical effort may be needed on different types of grants, although each project will have its own unique needs.
4. Budgeting properly for data management can save you time, money and headaches.
With new requirements for data sharing going into effect at the National Institutes of Health, proper planning of all phases of data provenance – from survey design to collection to processing to preservation -- is more important than ever. Investing in a data manager/architect can help ensure a statistician spends less time fixing errors and more time on analysis and reporting.
Although many statisticians can do well at data management, data managers are specialized experts who can make sure survey design and data collection go according to plan. Data managers and statisticians should work together closely during the initial stages of the study to make sure not only that all relevant data are collected, but also that study data can validly answer the intended research questions.
In addition, some data collection instruments, such as REDCap, may require investment in a data collection system programmer. A good programmer is essential in coding complex survey logic such as skip patterns, and thus ensuring the integrity of your data.
5. A predefined analysis plan will ensure analysis goes smoothly.
A statistical analysis plan (SAP) is a formal agreement between a statistician and collaborating investigators defining the hypotheses and primary and secondary outcomes, how the analysis will be conducted, and how it will be reported. Having a well-defined SAP can save a lot of time in data analysis and also can be used to draft the methods section of your manuscript.
The SAP includes crucial information defining the scope and purpose of an analysis and details needed to run analysis from beginning to end. It also defines aspects of data provenance -- that is, how the data moved from collection to storage to processing and analysis -- necessary to ensure the scientific integrity of the process. The SAP will range from a few pages for small observational studies to dozens if not hundreds of pages for large trials. Some trials even publish their statistical analysis plan, and most top journals require an SAP to be submitted with the outcomes paper. Principal investigators may create the first draft of an SAP, but a statistician should have final edits and approval.
6. Allow for adequate time to process data and conduct analysis.
Data processing and analysis are never as simple as plugging data into code and outputting numbers. Although budgeting for a data architect/manager at the beginning of the project will significantly reduce the amount of time needed for data cleaning, data will still need some amount of processing to be ready for analysis. And even with a clearly defined analysis plan, unanticipated challenges often arise. Writing and checking analysis code, making modifications after checking statistical assumptions, and producing publication-quality statistical tables and figures all take time and should not be rushed.
7. Think about clinical significance, not just statistical significance.
Have you considered trying to write your next Discussion section without using the phrase “statistically significant”? The research community is trending away from an undue reliance on statistical significance of effect estimates (usually defined by the strict threshold of p-value less than 0.05) to a more holistic approach considering clinical significance or public health impact in the broader research context, along with a measure of uncertainty, such as a confidence interval.
You should also consider clinical significance during the grant planning process. In the case of an intervention trial, what is the minimum change that would make you consider your intervention a success? For an observational study, what difference would be considered substantial from a policymaking standpoint? Adopting this approach can improve the validity of your scientific inference and the chances of other scientists replicating your findings in the future.
So ditch statistical significance as much as possible and instead craft your narrative around the rigor of your study, any potential sources of bias, and the significance of your research finding in the context of the broader field of research.
8. Interpreting and reporting results requires everyone’s input.
A statistician is not just useful for drafting the methods and results sections of a manuscript. We can also review and edit conclusions, for example to check that they do not go beyond what the results suggest or to help you get rid of that pesky “statistically significant” phrase (see above).
A statistician will know many of the pitfalls of interpretation of study results, such as how odds ratios are commonly misinterpreted, when measurement error may be a bigger problem than confounding, and how conclusions often go beyond what the study design and statistical results allow. Statisticians also know how to interpret the results of complex analyses, such as analysis of longitudinal data. But, of course, a statistician will not necessarily know how the results should be interpreted in a clinical or public health context.
Properly reporting and interpreting statistical results is trickier than you think, and easy to do wrong. Only by communicating clearly about results with the whole team can you arrive at the most complete interpretation of your data.
9. A statistician can help you pay attention to reporting standards.
If someone had only the information reported in your paper, could they design and implement a study in the same way? If someone had your study dataset and the manuscript, could they reproduce the analysis and obtain the same numerical results? Working with a statistician can help to make sure the answer to these questions is “yes.”
Even before your study starts, pay attention to reporting standards for your particular study design. In order for your study to be as replicable and reproducible as possible, manuscripts should adhere to recommended reporting practices such as CONSORT (for trials) and STROBE (for observational studies), all of which statisticians should be intimately familiar with. Planning ahead of time will ensure that numbers and information needed for flow charts are collected and updated throughout the study, and that thorough information on data collection practices and randomization procedures are available.
10. Know what you don’t know.
A statistician is not an expert in your field, although we will glean knowledge over time as we work with you on projects. In the same way, you are not an expert in statistics (and even if you are, you can still benefit from an independent statistician’s eye to ensure the scientific integrity of the analysis). Listen to your statistician’s advice, and do not be afraid to tell us when you do not understand something. The methods used to analyze your data and the interpretation of your results should not seem like a black box, nor should the clinical questions and context be a mystery to the statistician. Create room for statisticians to ask questions about your research. This kind of open, mutually supportive collaboration will be most effective and most rewardin