This guideline is intended to include elements necessary for a successful report applying statistical methods to answer research questions using real data. It cannot, however, list every element required to make a well-written, convincing, and comprehensive report.
Production
Reproducible report |
Report is produced using R Markdown, Quarto, knitr, or similar tools that allow text and results to be generated from the same file |
Report is produced in Word or Google Docs, or any tool that requires manually pasting in figures and results |
Reproducibility |
Results, tables, and figures are generated using code embedded in the document |
Some results, tables, or figures were manually created and pasted into the document |
Format |
Report is a PDF with standard-size pages (8.5×11” or A4) |
Report is not a PDF or has unusually sized pages |
Executive Summary
Answers questions |
Addresses all substantive questions |
Ignores some questions |
Summarizes methods |
Summarizes methods used and their limitations |
Fails to note important caveats of the methods used |
Suited for audience |
Written in terms understandable to a subject-area audience, rather than being written for statisticians |
Summary excessively mathematical or uses statistical jargon, would be hard for non-statistician to understand |
Stand-alone |
Can be read so that the core idea can be understood without reading the rest of the report |
|
Introduction
Motivates the problem |
Introduction sets out substantive questions to be answered and their importance |
Unclear what questions will be answered in the report, or does not provide context for the questions that illustrates their importance |
Gives theories or goals |
Introduction states theories or alternative explanations to be tested using the data, or outcome to be predicted |
Unclear what theories will be tested using the data, or what outcome will be predicted and why |
Describes data source |
Introduction describes the source of the data and summarizes its relevance to the problem |
Unclear what the source of the data was or how it was collected; unclear how the data is relevant to the research questions at hand |
Written for right audience |
Introduction is readable to expert on the data’s subject, rather than being written for statisticians |
Introduction contains lots of statistical details or would be hard for non-statistician to understand |
Exploratory Data Analysis & Data Summary
Data is clear |
Meaning of all relevant variables given (with units, when known); size of dataset is given |
Many relevant variables not explained or have no units; unclear what rows represent |
Data explored |
Distributions of variables checked; notable outliers explained |
Important aspects of variables not checked |
EDA is connected to modeling |
EDA supports the modeling by exploring relationships and variables that will be useful for model |
EDA includes lots of numbers and graphics that are not useful, such as summary statistics for every single variable; many figures don’t show anything relevant to modeling or results, or text/caption does not say what is interesting about them |
Missingness handled (if present) |
Missing data noted and strategy to account for it explained |
Missing data is ignored or inappropriate methods used to account for its effects |
Limitations |
Limitations of data for the research questions noted, including problems with generalizability or omitted variables |
|
Methods
These criteria apply to the statistical analysis used to answer each substantive question addressed in the report.
Connected to substance |
Chosen analysis is clearly connected to substantive question |
Analysis does not or cannot answer the substantive questions |
Connection explained |
Chosen analysis is explained in terms of the substantive question, not purely technical |
Text discusses models and code but does not explain how these connect to the substantive question |
Models explained |
Report explains choices made in model building, such as how variables are coded or transformed, or which variables are included |
Unclear how or why the models were made, or why variables were chosen or transformed |
Appropriateness verified |
Analysis is supported by diagnostics or proper metrics that assess its appropriateness for the data |
Unclear if model assumptions are checked or if the model is appropriate for the given data |
Choices justified |
Text clearly explains why this analysis was chosen over alternatives |
Unclear why modeling choices were made, whether chosen model is better, or if alternatives were considered at all |
Problems noted |
Caveats and problems are noted and their potential effect on results explained |
Problems are hidden and their potential effect on results ignored; or problems are mentioned but their effect on conclusions is not described |
Results
Results answer questions |
Statistical results answer the questions asked |
Statistical results do not answer the substantive questions |
Statistical results clearly presented |
Tests and estimates are presented clearly and accurately |
Explanation misstates what the results show or misinterprets the statistics |
Statistical results correct |
Statistical methods correctly implemented |
Errors in code or math mean the results are incorrect |
Hypotheses and distributions clearly stated for tests |
When hypothesis tests are used, the null hypothesis being tested is clearly stated, as is the null distribution of the statistic (e.g. χ2, F) and its degrees of freedom (or, if the bootstrap is used, the method used is stated), ideally in APA format |
Unclear what is being tested or how the p values are being calculated |
Effect sizes given when appropriate |
Whenever possible, sizes of effects are given, not just their significance, and interpreted in substantive terms |
Results are presented as significant or insignificant without effect sizes |
Demonstrates the sensitivity analysis and the comparison to the alternatives |
Sensitivity analysis provides a degree to which the proposed method can be trustworthy. Comparison with the alternatives (including baseline) bolsters the reasoning for why a certain method was applied. |
Provides only the results based on the proposed method without empirical comparison or any stress test. |
Conclusion
Conclusion presents results |
Summarizes conclusions presented in report |
Describes conclusions not justified or described in the body of the report, or doesn’t say conclusions |
Conclusion notes limitations |
Conclusion notes any limitations of results and what could be done to address these |
Conclusion ignores limitations of results or is too confident |
Writing and Style
Grammar and style |
Grammar is correct and style is appropriate to audience |
Grammatical or spelling errors make text hard to read |
Formatting |
Formatting is clear and legible |
Text is poorly typeset or in unusual sizes; figures are in inconvenient places; math is hard to read |
Logic and flow |
All text works to support the conclusions; there is a clear logical flow between sections |
Purpose of some text is unclear; sections redundant or not clearly split |
Precision |
Descriptions of methods and results are clear and unambiguous |
Unclear what some results are or how the analysis was conducted |
Uncertainty always given |
Wherever results appear in report, they are presented with clear measures of uncertainty, such as confidence intervals or standard errors, in [APA format]{.und erline} |
Many results not in APA format or given without any uncertainty |
No code in report |
No code is presented in the report, only relevant results |
Chunks of code are shown directly in the report |
No output in report |
No output is presented in the report; results are formatted as tables or plots, or given in the text |
Raw output or warning messages are included directly in the report |
No math |
Report explains results in words, without mathematical formulae or derivations (except definitions of models, as required) |
Report contains detailed mathematical derivations |
Sourcing (if applicable) |
References are clearly given to information from outside sources; quotations clearly mark any verbatim text from sources |
Some references not given; text or figures used without attribution.
Warning: using text or figures from other sources without clear attribution is an academic integrity violation. |
Citation style (if applicable) |
Citations formatted in a common style used in statistics journals |
Citations poorly formatted or incomplete |