SURE 2023: Report Writing Guideline

This guideline1 is intended to include elements necessary for a successful report applying statistical methods to answer research questions using real data. It cannot, however, list every element required to make a well-written, convincing, and comprehensive report.

Production

Excellent Needs revision
Reproducible report Report is produced using R Markdown, Quarto, knitr, or similar tools that allow text and results to be generated from the same file Report is produced in Word or Google Docs, or any tool that requires manually pasting in figures and results
Reproducibility Results, tables, and figures are generated using code embedded in the document Some results, tables, or figures were manually created and pasted into the document
Format Report is a PDF with standard-size pages (8.5×11” or A4) Report is not a PDF or has unusually sized pages

Executive Summary

Excellent Needs revision
Answers questions Addresses all substantive questions Ignores some questions
Summarizes methods Summarizes methods used and their limitations Fails to note important caveats of the methods used
Suited for audience Written in terms understandable to a subject-area audience, rather than being written for statisticians Summary excessively mathematical or uses statistical jargon, would be hard for non-statistician to understand
Stand-alone Can be read so that the core idea can be understood without reading the rest of the report

Introduction

Excellent Needs revision
Motivates the problem Introduction sets out substantive questions to be answered and their importance Unclear what questions will be answered in the report, or does not provide context for the questions that illustrates their importance
Gives theories or goals Introduction states theories or alternative explanations to be tested using the data, or outcome to be predicted Unclear what theories will be tested using the data, or what outcome will be predicted and why
Describes data source Introduction describes the source of the data and summarizes its relevance to the problem Unclear what the source of the data was or how it was collected; unclear how the data is relevant to the research questions at hand
Written for right audience Introduction is readable to expert on the data’s subject, rather than being written for statisticians Introduction contains lots of statistical details or would be hard for non-statistician to understand

Exploratory Data Analysis & Data Summary

Excellent Needs revision
Data is clear Meaning of all relevant variables given (with units, when known); size of dataset is given Many relevant variables not explained or have no units; unclear what rows represent
Data explored Distributions of variables checked; notable outliers explained Important aspects of variables not checked
EDA is connected to modeling EDA supports the modeling by exploring relationships and variables that will be useful for model EDA includes lots of numbers and graphics that are not useful, such as summary statistics for every single variable; many figures don’t show anything relevant to modeling or results, or text/caption does not say what is interesting about them
Missingness handled (if present) Missing data noted and strategy to account for it explained Missing data is ignored or inappropriate methods used to account for its effects
Limitations Limitations of data for the research questions noted, including problems with generalizability or omitted variables

Methods

These criteria apply to the statistical analysis used to answer each substantive question addressed in the report.

Excellent Needs revision
Connected to substance Chosen analysis is clearly connected to substantive question Analysis does not or cannot answer the substantive questions
Connection explained Chosen analysis is explained in terms of the substantive question, not purely technical Text discusses models and code but does not explain how these connect to the substantive question
Models explained Report explains choices made in model building, such as how variables are coded or transformed, or which variables are included Unclear how or why the models were made, or why variables were chosen or transformed
Appropriateness verified Analysis is supported by diagnostics or proper metrics that assess its appropriateness for the data Unclear if model assumptions are checked or if the model is appropriate for the given data
Choices justified Text clearly explains why this analysis was chosen over alternatives Unclear why modeling choices were made, whether chosen model is better, or if alternatives were considered at all
Problems noted Caveats and problems are noted and their potential effect on results explained Problems are hidden and their potential effect on results ignored; or problems are mentioned but their effect on conclusions is not described

Results

Excellent Needs revision
Results answer questions Statistical results answer the questions asked Statistical results do not answer the substantive questions
Statistical results clearly presented Tests and estimates are presented clearly and accurately Explanation misstates what the results show or misinterprets the statistics
Statistical results correct Statistical methods correctly implemented Errors in code or math mean the results are incorrect
Hypotheses and distributions clearly stated for tests When hypothesis tests are used, the null hypothesis being tested is clearly stated, as is the null distribution of the statistic (e.g. χ2, F) and its degrees of freedom (or, if the bootstrap is used, the method used is stated), ideally in APA format Unclear what is being tested or how the p values are being calculated
Effect sizes given when appropriate Whenever possible, sizes of effects are given, not just their significance, and interpreted in substantive terms Results are presented as significant or insignificant without effect sizes
Demonstrates the sensitivity analysis and the comparison to the alternatives Sensitivity analysis provides a degree to which the proposed method can be trustworthy. Comparison with the alternatives (including baseline) bolsters the reasoning for why a certain method was applied. Provides only the results based on the proposed method without empirical comparison or any stress test.

Conclusion

Excellent Needs revision
Conclusion presents results Summarizes conclusions presented in report Describes conclusions not justified or described in the body of the report, or doesn’t say conclusions
Conclusion notes limitations Conclusion notes any limitations of results and what could be done to address these Conclusion ignores limitations of results or is too confident

Figures and Tables

Excellent Needs revision
Function and purpose Connection between figures and narrative is clear from text and caption Not clear what point the figures are making or how they support the argument; no captions given
Legibility and design Figures are clear, simple, legible, and attractive Figures are hard to read; fonts are often too small; some graphics are blurry or squished
Labeling All legends and axes are clearly labeled (including units, when known) Labels missing or use raw variable names instead of descriptive text
Choice of figures Types of figures are well-chosen, illustrating the intended points clearly Figures do not make intended points and should be replaced
Used when needed Figures are used whenever the text needs them Many points should be illustrated with figures but are not
Numbering Figures and tables are numbered, and text refers to them by number Text refers to “figure below”, or it is unclear which figures correspond to which results

Writing and Style

Excellent Needs revision
Grammar and style Grammar is correct and style is appropriate to audience Grammatical or spelling errors make text hard to read
Formatting Formatting is clear and legible Text is poorly typeset or in unusual sizes; figures are in inconvenient places; math is hard to read
Logic and flow All text works to support the conclusions; there is a clear logical flow between sections Purpose of some text is unclear; sections redundant or not clearly split
Precision Descriptions of methods and results are clear and unambiguous Unclear what some results are or how the analysis was conducted
Uncertainty always given Wherever results appear in report, they are presented with clear measures of uncertainty, such as confidence intervals or standard errors, in [APA format]{.und erline} Many results not in APA format or given without any uncertainty
No code in report No code is presented in the report, only relevant results Chunks of code are shown directly in the report
No output in report No output is presented in the report; results are formatted as tables or plots, or given in the text Raw output or warning messages are included directly in the report
No math Report explains results in words, without mathematical formulae or derivations (except definitions of models, as required) Report contains detailed mathematical derivations
Sourcing (if applicable) References are clearly given to information from outside sources; quotations clearly mark any verbatim text from sources

Some references not given; text or figures used without attribution.

Warning: using text or figures from other sources without clear attribution is an academic integrity violation.

Citation style (if applicable) Citations formatted in a common style used in statistics journals Citations poorly formatted or incomplete

Footnotes

  1. We thank Prof. Alex Reinhart for sharing these guidelines with us. They were originally developed by the teaching staff for the Undergraduate Advanced Data Analysis course (36-402), as taught at CMU.↩︎