#3. Tabulating numeric data in statistics and frequency tables

MERLIN provides two ways of tabulating a numeric variable on the vertical axis of a table – known as statistics and frequency tables. Users are sometimes unclear about the difference between the two, so we will illustrate it using a variable called $NUMBER. This statement…

T#1=$NUMBER * $CLASS,

…produces a statistics table like this (the statistics shown depend on FORMAT settings)…

Often this type of table is joined onto the end of another table (maybe showing ranges of values) – or several statistics tables may be joined together to produce a statistical summary from different variables.

This statement…

T#2=$NUMBER(15) * $CLASS,

…produces a frequency table showing the frequency distribution of the values, as well as the statistics…

The syntax ‘(15)’ tells MERLIN to create a table with 15 rows (plus total, undefined, and the statistics rows) – in other words, we are allowing for up to 15 different values being found in the variable $NUMBER. This is an estimate, and our table shows that only 12 different values were found, so MERLIN automatically suppressed the unused rows.

You may be tempted to specify a high number like ‘(9999)’ but this is very inefficient since MERLIN will unnecessarily create a huge table – which is likely to slow the run down and (if there are many such tables), could cause limits to be exceeded. Since there are only 20 respondents in the sample, there cannot possibly be more than 20 different values, so you should specify 20 or less. If the number you specify is too low, MERLIN will report this and abort the run so you can increase it.

 

Format TNP

If your script contains many frequency tables, all containing the syntax ‘(100)’ for example, it is more efficient to use format TNP. If you set this to 100, it means that all table specifications from that point onwards which have a numeric variable on the vertical axis but no number in brackets, will generate frequency tables with up to 100 different values. If you want to use a different number (higher or lower), you can specify this in the usual way and it will take precedence over the TNP setting. If you want to produce a statistics table instead of a frequency table, you can specify ‘(0)’. For example…

SELECT BANNER $CLASS,

F=TNP100, !allocate 100 rows from now on
T#1=$NUM1 *, !frequency table with 100 rows
T#2=$SVAR *, !not a numeric variable so TNP ignored
T#3=$NUM2 *, !frequency table with 100 rows
T#4=$NUM3(200) *, !frequency table with 200 rows
T#5=$NUM4(0) *, !statistics table
T#6=$NUM5 *, !frequency table with 100 rows

 

Statistics requiring frequency distribution

The following statistics can only be produced in frequency tables, since the calculation requires a full frequency distribution:

ILE quantiles
ILH highest value
ILL lowest value
MDE mode
MED median

 

If you do not wish to show the frequency distribution, you can suppress it by setting format DIS false, e.g.

T#7(F=MED/NDIS)=$NUM6(150)*,    !frequency table with median but no distribution rows

 

Character tables

Finally, although not strictly part of our topic, we will mention that the tabulation of character data (e.g. postcodes) works in a similar way to numeric data. You can either show summary rows only (answered, not answered, undefined) – or you can show the full distribution by specifying ‘(n)’ after the character variable. Format TNP may be used as for numeric data.

Any questions? Email support@merlinco.co.uk.