Video and Analysis

It is not uncommon for a researcher using film in data collection to run into people concerned with the validity of the method.  Sometimes the concerns revolve around whether film and video are art or science.  Because of its interpretive, creative, impressionistic, and emotional attributes, art is sometimes assumed to be in direct conflict with an objective, value-free “science”—apparently creating an unavoidable conflict between the goals of film as art and user research as science. Consequently, people—academics and professionals alike—often assume limited possibilities for film.  The status of film as a serious analytical resource has remained fairly marginal.

Film is sometimes seen as a humanistic pastime, not significant scientific work. It is meant to appeal to the audience’s emotional pliability.  Ultimately, the producer of the final visual document is seen as selectively building subjectively constituted data and constructing a piece that reflects his/her interpretation rather than “the facts”.  However, the same can be said for any written document, particularly when behavioral research methods are applied to data collection for a specific task or client need.  A logo-centric culture prevents researchers from benefiting from the full breadth of insight and information available, treating video as if has less validity than the written word.  However, written reports often have pictures, films often use written narratives, subtitles or intertitles.  They always have accompanying written material.  The reality is that while the film-focused researcher does indeed run the risk of compromising the complex realities of a particular behavior or series of behaviors, the risk is no greater than that of the researcher relying primarily on the written word.

Typically, film is accepted most openly is when it is considered to fit the documentary archetype.  This stems from the widely held belief that film is a mirror for the world.  The argument is that the camera is a device for scientifically recording data about human behavior that is more objective than other types of information because of the mechanical nature of the collection device.  While this may be true, it probably is not.  However, given the context of the work (time limitations and constraints imposed by the nature of contractual research), the footage supplied by the camera may be as close as we can get to a check of objectivity.  The reality of research purchased by a company is such that it assumes, even demands, a final product that is easily used, applies to a wide range of internal needs, and can be readily disseminated.

For some, manipulation of the footage (editing it into a film, altering, etc.) destroys its “scientific value.”  The model is that teams go into the field to film material, the scientist studies the footage, and the filmmaker transforms into art.  In actuality, this fantasy is never realized.  The footage is indeed dissected and analyzed by the researcher, typically transformed into a product the client will readily consume, but by its very nature qualitative research always has a degree of subjectivity.  In fact, any and all research, be it in the field and interpretive or in the laboratory and highly controlled, involves degrees of subjectivity and personal biasing.  This hardly invalidates the work or the means by which data are captured and displayed.  Validity and reliability are not necessarily one and the same.

If researchers are supposed to make films intelligible to client audiences, they must learn what common sense, such as it is, dictates as constituting a good documentary film, that is, they should emulate the aesthetic conventions of documentary realism.   Pieces of the puzzle are, of course, missing from any documentary film, but the most important themes and primary informational pieces remain for consumption by a wide range of viewers.  The pieces selected for a final edit do indeed play to the emotions of the client, but without that emotional impact clients are likely to forego the deeper issues entirely, unwilling or unable to sift through the informational tome so often presented by researchers.  By communicating customer needs, reactions, behaviors, etc., film spurs viewers to delve deeper into the research findings and examine the totality of the research in greater detail.  Film can be used to access a level of emotional response and personal identification or conflict which is difficult within the lexical constraints of writing.  By a series of movements in a sequence, films can communicate in concrete and specific terms what in written words would be abstract expressions.

Another argument against video documentation as a primary means of disseminating findings is that because prior consent is always sought, there is always some degree engagement by the participant with the camera and therefore the findings are inaccurate.  However, the very fact that participants are recruited for any study by definition means that there is some degree of awareness and engagement.

Consequently, whether the awareness and engagement take place with the researcher exclusively or with the researcher and camera together, the authenticity of an activity, context, or behavior should not be in dispute.  After all, typically, the camera is soon forgotten, but the person asking questions and watching over the shoulder remains.

The Case of Cell Phones, Youth, and Japan

We applied video field data gathering and mini-documentary reporting during a recent study for a wireless communications provider who wanted to understand mobile phone usage in Japan.

A company had contracted with the consulting firm for whom I was working at the time in the hope of gaining a better understanding of how  portable information devices (such as PDAs) and internet-ready cellular phones were used in the context of daily life.  They were interested in uncovering what characteristics other than image quality, sound quality, and functionality were determinate in the decision to purchase a PDA or cellular phone in urban centers of Japan, and why those “peripheral” issues were important.  The term “peripheral” is the term used by executives to describe how they viewed the work – they were skeptical of the notion that culture impacts perceptions and uses of technology.  So, while the team was ensured work, there was little guarantee that the findings would be implemented.  In addition, the researchers were given half the time to conduct the research that they had originally requested.  Gaining the attention and interest of primary decision makers became in order to conduct further, more in-depth research at a later date became almost as important as the findings.  Without continued research, the researchers feared that the company would act without consideration to the needs and cultural patterns of the population.

The team was asked to identify some of the behavioral and cultural motivators in the purchasing decisions of young (16 – 30 years old) Japanese from middle-income homes.  The research took place in several locations in Japan to provide a range of cultural practices.  However, because the researchers (two ethnographers and one social psychologist) were out of touch most of the time but needed at the end of the project to build a single, cohesive series of conclusions, they needed to capture the participant observation sessions on video for later shared analysis and review.  Added to this was the fact that only two of the researchers spoke Japanese well enough to effectively communicate.   The other had to rely on interpreters or the language skills of the informant.  The researchers decided it was imperative to capture on video exactly what was said for later analysis and translation.

Because of time constraints and the limited language skills of the researchers, the goal of the research centered greatly on material culture, display,  and overt patterns of interaction.  Consequently, activities, objects, spaces, and moments of interaction needed to be captured on video so that the researchers could return to the tapes later to catalogue patterns.  Without the video footage, much of the information would have been overlooked or misinterpreted – video allowed the team to accurately assess their assumptions, catalogue use patterns and artifacts, and check for validity.

By returning to the video over a two-week period, the researchers were able to determine with some accuracy what designs were preferred and why, what levels of functionality were important, what was most significant in terms of brand and image, and what patterns of interaction were taking place.  It also allowed them to demonstrate what they did not know and thus get buy-in to conduct more extensive research.  The final video presented to the company ensured that business planners and designers would be sensitive to cultural aspects of products to be used in Japan.

Sampling: Why Individuals Don’t Matter

We spend a lot of time talking about samples when talking with our clients.  Samples are constructed differently in ethnography than for focus groups or surveys. Ethnographers sample settings and interactions as much as individual people. The individual is rarely the unit of analysis.  Sample is defined in the social interaction and the contexts in which activities occur. Asking a person if, say, they have specific impressions of a brand of beer will doubt yield information.  Unfortunately, it’s simply nothing more than a first-hand account of what you get from a survey.  Interacting with a group of people as they move from bar, to dinner, to party will yield significantly more information about brands and the contexts of selection and use.

All too frequently, sample devolves into a discussion of validity, with a portion of the clients pointing fingers and declaring the shortcomings of the methods in question because, quite frankly, they in fact know precious little about statistics and the epistemological constructs around them.  But reliability and validity are by no means symmetrical.  It is possible to obtain perfect reliability with no validity, but perfect validity would assure perfect reliability because every observation would yield the complete and exact truth.  This notion is what leads to an obsession with “typing” individuals and limiting our ability to uncover new, meaningful insights.

Loosely speaking, “reliability” is the extent to which a measurement procedure yields the same answer however and whenever it is carried out; “validity” is the extent to which it gives the correct answer.  As an example, imagine 100 people participate in a survey on grocery shopping to determine the optimal placement of goods on the shelves, thereby increasing how quickly people get in and out of a store.  Regardless of the specific questions, the point is that the survey will produce statistically reliable data about individual units of analysis (or what I like to call “people”).  The questions it does not address are what people really do when they shop and why do they do it.  Understanding THAT requires thinking about the sample in terms of context, not individuals.

The number of individual participants involved depends on the relevant diversity of the target population. A skilled ethnographer may use multiple methods in the recruiting process and not rely only on professional recruiters. This different approach to sampling also means that sampling is often built as part of fieldwork, and refined once a team is on the ground and collecting data. This can, and often does, scare a client absolutely shitless.  After all, they are working from psychographic models and segmentation schemes, all the while worrying about which of the other stakeholders will call the work out as a way of currying favor or establishing greater power in the boardroom.  But, the fact is that while statistical work is valid in many respects, it is only one way of envisioning the world.  When faced with the complexities of human interaction, these schemas break down.  In practice, the power of an ethnographic process lies in uncovering unexpected patterns, not in reifying the segmentation work that has already been done. While an ethnographer will no doubt have specific sampling parameters from a client, they should also be able to articulate why sampling may change once the research begins. If they can’t, then you’ve wasted your money.

Ethnography, Usability and Field Testing

There are significant methodological and philosophical differences between ethnography and laboratory-based processes in the product development cycle.  These differences set users of these data collection methods at odds with one another. Frequently, these debates occur less within the user research community and more among the people using or responding to the findings and solutions presented. Whenever these arguments come up, the naysayers endlessly debate methodological purity, ownership and expertise. One side fears a lack of scientific rigor, and the other worries about a contextually detached environment yielding irrelevant results. Both sides make valid points, but the debate draws attention away from the fundamental question of product design: Does the product work in the broadest sense of the term?  Can the people for whom the product is designed use it in the correct contexts? To defuse the debate and get back to this primary question requires an approach that blends the rigor of laboratory-based processes with the contextual richness of ethnography.

In the iterative product design process, what typically shapes the design are findings from in-lab usability testing. However, while the data are reliable in a controlled situation, they may not be valid in a real-world context. It is possible to obtain perfect reliability with no validity when testing. But perfect validity would assure perfect reliability because every test observation would yield the complete truth.  Unfortunately, perfection does not exist in the real world, so the reliable data recorded during laboratory testing must be supported with valid data that is best found through field research..

Consider RCA’s release of the eBook in 2000. The product tested very well, but no one asked where, when and how people read. Consequently, the UI did not match user real-world needs.  Had it been tested in context, the company might have avoided millions of dollars in losses.

To ensure validity, an anthropologist or ethnographer can spend time with potential users to understand how environment and culture shape what they do.  When these observations inform the design process, the result is product innovation and improved design.

At this point, however, the field expert is frequently removed, and the product moves forward with little cross-functional interaction. The UI designers and usability researchers take responsibility of ensuring that the product meets predetermined standards of usability. While scientific rigor is a noble goal, the history of science includes countless examples of hypothesis testing and discovery that would fail to satisfy modern rules of scientific method, including James Lind’s discovery of the cure for scurvy and Henri Becquerel’s discovery of radioactivity. Arguably, both scientists conducted bad science from the standpoint of sample size and environmental control, but that doesn’t negate the value to the millions of people that have benefited from these discoveries. Similarly, by allowing more testing in the field, we can learn insights about a product’s usability that might go undiscovered in a strictly controlled environment.

If we fail to account for the context in which the product will be used, we may overlook the real problem. A product may conform to every aspect of anthropometrics, ergonomics, and established principles of interface design.  It may meet every requirement and have every feature potential users asked for. It may have also improved participants’ response time by a second or two in a lab study. But what if someone using the product is chest deep in mud while bullets fly overhead?  Suddenly, something that was well designed and tested becomes useless because no one accounted for shaking hands, awkward positions, and decrease in computational skills under physical and psychological stress.  Admittedly, some conditions can be simulated in a lab. However, it would not be cost effective or ethical to create the heat, dirt, fear and general discomfort described in the example above. Furthermore, users in their natural environment have a reduced need to provide answers that would placate the researcher.  Context, and how it impacts performance is of supreme importance, and knowing the right question to ask and the right action to measure become central to accurately assessing usability.

So what should be done?  Designers should detach themselves from controlled environments and the belief, often held by people outside the user research and design departments, that the job is to yield the same sort of material that would be used in designing, say, the structural integrity of the Space Shuttle.  The reality is that most of what we design is more dependent on context than it is on being able to increase efficiency by one percent.

Consequently, for field usability to work, the first step is being honest with what we can do and able to articulate this to the other groups within the business. A willingness and ability to adapt to new methodologies is one of the principal requirements for testing in the field, and is one of the primary considerations that should be taken into account when determining which team members should be directly involved. I point to a colleague who works at Jet Propulsion Laboratories. While he is a brilliant engineer and designer, field testing is simply too uncomfortable for him, though he recognizes its value.

The process begins with identifying the various contexts in which a product or UI will be put to use.  This may involve taking the product into a participant’s home and having both the participant and other members of the social network use it with all the external stresses going on around them.  It may mean performing tasks as bullets fly overhead and sleep deprivation sets in.  The point is to define the settings where use will take place, catalog stresses and distractions, and then learn how these factors impact cognition and performance.

For example, if you’re testing an electronic reading device, such as the Kindle, it would make sense to test it on the subway or when people are laying in bed, because those are the situations in which most people read.  Does the position in bed influence necessary lumens or button size? Do people physically shrink in on themselves when using public transportation and how does this impact use?

A product or UI design’s usability evaluation is only relevant when taken outside the lab into the real-world context where it will be used.  Some of what occurs in the real world can be replicated in the lab, but in the end it is still a staged environment, devoid of the complexities of real contexts. Social interactions and cultural practices are often lost. Rather than separating exploratory and testing processes into two discrete activities that have minimal influence on each other, efforts can be maximized by employing a mixed field method that bridges the gap between ethnographic and laboratory approaches.  Innovation and great design will follow.

By Gavin