Content analysis


This is a method of summarising a large body of fairly short statementsinto a small statistical table in a report. The method described here presupposes a spreadsheet; you can find specialised computer programs for doing the same thing.


This method imposes a discipline on the process of summarising user comments that will produce a more objective result than simply picking out 'representative statements.'


The method of content analysis (CA) described here is applicable to analysing a body or corpus of many short statements or records. Each statement should be no more than approx. 20 words long and there should be at least 12 such statements in the corpus. The statements may come from different persons, from one person, or be notes in the behaviour of one or more persons. The starting point in any case is a series of short records which can be represented in written form.

First ensure that each record represents one and only one theme. Look carefully at statements with connectives such as 'and' and 'but' in them. Such statements may need to be broken down into simpler units. Enter the statements into the second column of a spreadsheet, one statement per cell. You may if you wish add any identifying information about each statement in the cell to its right.

On each cell to the left of the statement, insert a numeric code corresponding to the theme of the statement. For instance any mention of the legibility of the screen fonts may be coded as a 20. It is best to code these in tens (10, 20, 30 etc.) at first.

Once you have coded most of the items, sort the columns with data on the leftmost column. Cells which are as yet uncategorised will appear together; you will be able to update your categorisation scheme, and create new levels of 'delicacy' to your analysis by using units, and if necessary decimal points. Thus legibility and colour may be coded as 22, legibility and size as 24, and so on. References to legibility pure and simple will still however remain as 20s.

Iteratively refine and sort until you have categorised the items to your satisfaction. It is common to have a number of 'miscellaneous' items. At this stage you may add up the number of instances of each category, and rank order the record types by frequency of occurrence in the corpus. This enables you to make statements such as "the most frequent cause of complaint about this software is the legibility of the menu wordings." In academic research, CA is usually also subjected to a process of verification. In such a process a second rater applies the categories the first rater generated to each of the statements 'blind', ie without knowing how the statements were categorised by rater one. A common criterion is to use an 80% agreement as a criterion that the categories are generally replicable. More stringent methods and criteria can be employed, but they are not usually applicable to usability testing.

More Information

Two classic texts on CA are:

Holsti, O.R. 1969. Content Analysis for the Social Sciences and the Humanities. Reading, MA: Addison-Wesley.

Krippendorf, K. 1980. Content Analysis: An Introduction to its Methodology. Beverly Hills, CA: Sage Publications.

See also the discussion in:

Kirakowski J and M Corbett (1990) Effective Methodology for the Study of Human-Computer Interaction North Holland/Elsevier.

Computer programs to help with content analysis can be found at:

NUDIST (the industry leader)

Alternative Methods

Card sorting is a way of carrying out CA which however emphasises the subjective nature of the sorting activity and keeps the evaluator 'at a distance' from making decisions about the data. Concept Walls and Affinity Diagrams are ways of eliciting the latent structure in the records being sorted, usually expressed as hierarchies.

Next Steps

CA is usually carried out as part of an analysis of a large corpus of data, for instance, audience reactions or user suggestions. It is not uncommon for 12 person-hours work to be summarised in one small table in a report. The table is usually 2 columns by n rows, where in each row the first column is a brief summary of the category, and the second column is the percentage frequency of occurrence of that category in the corpus.

Case studies

West, Mark D., ed. Theory, Method, and Practice in Computer Content Analysis. Westport, CT: Ablex, 2001.

Background Reading

You can find a page of resources on content analysis at:

©UsabilityNet 2006. Reproduction permitted provided the source is acknowledged.