The MERLIN Technical blog

We have introduced a MERLIN Technical category on our BLOG page – primarily written for MERLIN users, and intended to inform or remind them about MERLIN features. Because MERLIN is such a comprehensive survey processing language, even experienced users are often unaware of everything it can do.

The MERLIN Technical blog is also available to non-users, as prospective purchasers of MERLIN may find it a useful guide to some of MERLIN’s many features.

Any questions? Email support@merlinco.co.uk.

Using Excel and CSV data in MERLIN

When computers were first used to analyse market research data the most common means of storing data was column binary (card image) format but, with the advent of telephone and web interviewing packages, ASCII data became the favourite. Now, however, more and more data are stored in Excel or CSV files – and there are numerous advantages to this approach including:

  • Easy to view, edit, sort, etc in Excel.
  • Data fields may be referenced by names in the header row rather than absolute locations. So if Q3 is in the 11th column of one data file, but the 13th column of another, this doesn’t matter so long as you refer to it by the name Q3.
  • Variable names can be inferred from the header row names, or vice versa.
  • More efficient storage (no empty spaces in the data file).

MERLIN can read and write data in Excel or CSV files, and the principles described below apply to both. We refer to data fields / cells in such files as freefields (free format fields) to distinguish them from fixed location fields in ASCII and column binary files, which we have traditionally called fields.

Assuming that the first row in each data file is a header containing the names of the variables (which is nearly always the case) you will need to specify one of these RCPs (Run Control Parameters):

  • IGHEAD means the header row is to be ignored
  • USEHEAD means the header row is to be used, and enables you to refer to freefields by these names, rather than absolute locations.

The RECORD LENGTH statement needs to specify the number of freefields in your data file(s), rather than the number of characters, e.g. RECORD LENGTH 57 FIELDS. There could, in fact, be more freefields, but this would mean you are only analysing the first 57.

To refer to a freefield, you can use one of three approaches (which can be mixed freely within the same script). In all cases, the reference starts with $* (as opposed to $ on its own, used to refer to ASCII and column binary data locations):

  • By number, e.g. $*10 means the 10th freefield in the data record.
  • By letter (as used in Excel), e.g. $*AC means freefield AC, i.e. the 29th
  • By name, e.g. $*’Q25′, e.g. the freefield that has the name Q25 in the header row of the current data file. Freefield names are not case sensitive, so you could type $*’Q25′ or $*’q25′. If the freefield name is the same as the variable name, you can omit it in a definition, e.g. DS $Q25=$*/1-5 means define a single-coded variable called Q25 from the freefield of the same name (with a value range of 1-5). Names may only be used (or implied) if you have specified RCP USEHEAD.

The last approach is the most flexible, since Q25 could be in any location, and its position may vary from file to file (which is often the case when you are analysing different waves of the same survey together).

Another advantage of Excel or CSV files is that multi-coded variables can be stored in a single freefield, as opposed to the traditional approach of entering code 0 or 1 in a series of locations (known as a ‘bitstring’). The bitstring approach may still be used with freefield data but the former is easier to work with, and more flexible if the number of valid codes changes. A single freefield could therefore contain “1,7,10” to signify these 3 values (the quotation marks are optional in Excel, but compulsory in a CSV file). To indicate that a freefield may contain more than one value, an extra asterisk must be added after the freefield reference, e.g.

  • DM $Q30=$*15*/1-20, means freefield 15 may contain 1 or more values between 1-20
  • DM $Q30=$*’Q30’*/1-20, freefield named Q30 ditto
  • DM $Q30=$**/1-20, ditto omitting freefield name (same as the variable name)

If your data file is not in Excel or CSV format, you can use MERLIN Toolkit to convert it, e.g. you can import from SPSS, and you can automatically generate a MERLIN script defining the variables.

Any questions? Email support@merlinco.co.uk.

MERLIN 10 released!

Merlin 10

MERLIN 10 has been released, and many of our clients are already enjoying the benefits of new features and increased capacity. If you would like to try it, request a FREE TRIAL at the foot of this page.

 

MERLIN ‘far better’

Automation

We have just received this emailed comment from a client who started using MERLIN this year, after previously using STATA and SPSS…

“MERLIN has been a fantastic addition to our data processing toolset, far better than what I was using previously.”

Automation

Automation

Merlinco software was mentioned a number of times during the ASC one-day conference on Automation, attended by Merlinco directors John Tebboth and Keith Hughes, as well as a number of our clients. The ability to import from, and export to, the Triple-S standard makes it easy to incorporate MERLIN runs into a series of streamlined tasks, as emphasised in the papers by James Eldridge and Steve Taylor.

 

MERLIN processes 130 million data records

130,000,000

Following an enquiry as to whether MERLIN could process 120 million data records, we mentioned this to a client who decided to try 130 million!

He constructed a dummy ASCII data file 1.6Gb in size, containing a serial number and two variables, then wrote a MERLIN set-up to generate 4 tables from it. The job ran without any problems, and took just 2 hours 5 minutes. We would be interested to know if anyone ever analyses more data records!