#8. Introducing weighting in MERLIN

Weighting is a vast subject, and MERLIN can handle anything from the simple application of existing factors to the calculation of complex multi-stage weighting.

In this blog we will start from scratch by defining the terms used in weighting and giving some simple examples, then expand on this in future articles.

Respondent and quantity weighting factors

A weighting factor can be a wvar or an ivar (a variable containing a number with or without a decimal point) and is used when incrementing tables, marginal counts, and frequency counts (but we will confine our discussion to tables). So, if a respondent has a weighting factor of 3, they will be counted three times instead of once when incrementing a table.

MERLIN distinguishes between respondent weighting (WR) and quantity weighting (WQ) and, to illustrate the difference, we will take the example of an ivar called $CARS containing the number of cars each respondent owns. If we use this statement…

SELECT WR $CARS,

… all tables following will be weighted by $CARS until we specify SELECT WR with another variable, or SELECT WR OFF to stop weighting. By default, MERLIN will show an unweighted total row (the number of respondents) and a weighted total row (the number of cars), and all other numbers and percentages in the table will relate to the weighted data. Whenever WR is used, MERLIN creates two internal tables, one for unweighted and one for weighted data, so although unweighted figures are usually shown only in the total row, they can be shown anywhere if appropriate formats are set. If, however, we specify…

SELECT WQ $CARS,

…  only the unweighted total row will be shown, but it will contain the number of cars – in other words, the table is incremented entirely in terms of cars rather than respondents (so there is only one internal table). Some may not view this as weighting in the truest sense, but simply a way of counting the data. When doing this, it is sometimes necessary to count different parts of a table using different factors so, to save repeatedly specifying SELECT WQ, MERLIN allows you to add the relevant variable to the end of the table statement, e.g.

T#1 = $SIDE1 * $TOP + $FACTOR1,       !apply quantity weight $FACTOR1
+T#1A = $SIDE2 * $TOP + $FACTOR2,  !apply quantity weight $FACTOR2
+T#1B = $SIDE3 * $TOP + $FACTOR3,  !apply quantity weight $FACTOR3

A factor specified in this way will temporarily replace any factor specified with SELECT WQ.

SELECT WR and SELECT WQ may both be applied to a table, and will generate an unweighted total representing the number of cars (in our example) and a weighted total representing the number of cars weighted by some additional factor.

The usual understanding of the term “weighting” is respondent weighting (WR) and, unlike quantity weighting, it is unusual for different factors to be required within a single table, or even within an entire run. Respondent weighting is usually done to correct imbalances in the sample, e.g. we have failed to interview enough females, so we give more weight to their answers

From now on, we will assume that we are using respondent weighting, and that it is being used for its usual purpose of correcting an imbalanced sample.

The weighting matrix

We first need to identify the groups to which different weighting factors will be applied, and will use the simple example of male and female. Together, these groups constitute a weighting matrix and each group is known as a cell. It is important that every respondent falls into one cell and one only, so they each receive one factor.

We don’t usually know the factors when we start the analysis, but should know the targets, i.e. the ideal number of respondents in each cell (also known as universe or population figures) – so we call this target weighting. Targets may be expressed in various ways such as percentages or population estimates, but it doesn’t really matter since they are essentially ratios, showing the target proportion in each cell. Let’s assume we have them as percentages, as shown in column (a) below, and the actual number of respondents is shown in column (b). The factor for each cell is the target divided by the actual sample, so that gives us the factors in column (c) and, if we apply them, we will arrive at the weighted sample in column (d).

Our main aim has now been achieved in that the cells are in the correct proportion but, as it stands, the total weighted base will be 100, which probably isn’t what we want – maybe we want a population estimate in thousands, or we want it to be the same as the unweighted. Either way, we can simply apply an overall “grossing-up factor” of the figure required divided by the current weighted total, e.g. 56100 / 100. In other words, we don’t need to convert all our target percentages into numbers.

The good news is that everything can be done in a single MERLIN run which increments the number in each cell, calculates the factor, then applies it to produce weighted tables. New users are sometimes surprised that MERLIN has no specific statements or functions for doing this – but that is because the MANIP stage already provides a powerful tool which enables us to treat a MERLIN table like a spreadsheet where we can reproduce the above table then apply the factors calculated. Item 11.2 of the MERLIN Tips and Examples library shows an insert file (PTARG.INC) which has been developed for this purpose, and can be used in any script where target weighting is required – such as example item 11.4.

Interlaced matrices and rims.

Let us now suppose the weighting relates to more than one variable so, as well as the 2 gender groups, we also have 4 age groups and 3 social class groups. How we proceed depends whether we have interlaced targets (i.e. 2 * 4 * 3 = 24 figures) or only the totals for each variable (i.e. 2 + 4 +3 = 9 figures).

In the first case, we can create a single variable which interlaces gender, age and class, so it is another example of target weightingdiscussed above. The interlaced variable is easily created with this MERLIN statement…

DS $MATRIX = $CLASS.BY.$AGE.BY.$GENDER,

… in which the first variable is the ‘outer loop’, i.e. the cells will be in this order…

$CLASS/1, $AGE/1, $GENDER/1,
$CLASS/1, $AGE/1, $GENDER/2,
$CLASS/1, $AGE/2, $GENDER/1,
… and so on.

The second case described above is called rim weighting because we only know the rims (i.e. the totals), and we will discuss this in a future blog.

Applying calculated factors

Once we know the factors to be applied, we can use a ‘data lookup’ statement to specify the factor for each cell, maybe gross it up, then apply it, e.g.

DW $FACTOR = $GENDER (0.0996,1.1059),
DW $FACTOR = $$ * 56100 / 100, !gross total up to 56100
SELECT WR $FACTOR,

The number of factors in brackets must equal the number of items in the matrix variable.

Since MERLIN runs so fast, users often allow it to re-calculate the factors in every run but, if you are doing many runs using the same weighting factors, it makes sense to replace the code that calculates them with the code above.

Any questions? Email support@merlinco.co.uk.