Statistics from Altmetric.com
In any study, observations on each individual are made. These observations vary both between and within individuals and are thus referred to as “variables.” We may summarise the data collected in a study either numerically, in the form of summary statistics, or in tabular or graphical form. The advantage of the first is that individual statistics (such as means or proportions) can be used to summarise the data simply; on the other hand all, or most, of the data can be presented in a table or figure. The appropriate summary method (as well as the statistical analysis) depends on the type of variable and its measurement scale. For example, only if the distribution is approximately normal (symmetrical and bell-shaped) should the mean be used to summarise the data.
There are essentially 2 main types of measurement scales: categorical and numerical (table 1). Categorical variables have a set of labels for category membership (eg, diabetic, non-diabetic); numerical variables are either a count (eg, number of GP visits), a measure on a particular instrument (eg, blood pressure), or a summary score (eg, SF-36 score).
Tabular and graphical presentation
Tables and graphs can present a distribution simply (table 1). For a single categorical variable, the frequency of observations in each category can be tabulated. The graphical equivalent is a bar graph or bar chart. For a numerical variable, a histogram is the simplest way of presenting the data. In order to present the data in a table, unless the scale is very narrow, categories will need to be created representing the number of observations within particular group intervals. The number of observations within each interval is presented in the frequency table, allowing the calculation of both relative frequency (the percentage of observations in each category) and cumulative relative frequency (the percentage of observations in that category or below it).
Both categorical and numerical data can be summarised using summary statistics (table 2). Appropriate summary statistics for categorical data are the number of observations, and their proportion or percentage, in each category. Numerical data are summarised using an “average” value, such as the mean or median, together with a measure of the spread of the observations around this value, such as the range or standard deviation. The mode is only rarely used. The mean and the standard deviation are the most informative measures since they use all the data in their calculation. They should, however, only be used for normally distributed numerical variables since any skewness in the data will also distort the values of the mean and the standard deviation. Non-normally distributed variables should be summarised using the median and either the range or interquartile range.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.