5.10 Advanced - The more of video or the ‘multimodal’

In the core section we considered our first look at the recording, the pairing of the participants’ actions and what categories are relevant to, and generated in, the setting. In the advanced section we will shift on to examining a more complicated video fragment while also then bringing in how gestures, objects, movement and the environment are involved in the visual organisation of the action.




What more, then, does video give us if we do look at what else is happening in conjunction with the audible speech?

Firstly, it gives us the eyes of others: the eyes of the participants. It is not quite right to say ‘eyes’, because we want to concentrate in examining the recording on what those eyes are doing: gazing, glancing, staring, winking, widening, welling, frowning, twinkling and more. In producing these actions with our eyes, the eyes themselves are not alone of course, they are seen as a gestalt of the face (as a whole expression of e.g. disgust, sympathy, recognition etc.). Depending on our purposes we can pay less attention to those details, focusing, for instance, just on who a speaker alights on with their gaze. Our purpose there being to examine how gaze selects a person as the next one to speak. Depending on how the camera (or cameras) were set-up during recording, then, we may also be able to follow what happens next from the person selected by the speaker’s gaze. Whether they return and hold that gaze or, perhaps, downcast their gaze, thereby attempting to evade, or to indicate their reluctance to, speak next.

Secondly, video offers us access to the other gestures of participants beyond the movement of their eyes. This may be as simple as fingers pointing at objects while making a requests. It can also be accomplishing actions in place of, or in support of gaze. Participants directing the palms of their hands toward the current recipient of their talk or selecting them as next speaker. While listening to a current speaker, a participant may nod or shake their head in relation to various statements being made.

While we might begin to think we could study gestures by themselves, as if they were like a sign-language for the hearing-impaired, gestures remain parasitic upon talk. For our studies this means we need to keep track of what is being said too. This will help us differentiate between the different things the same gesture achieves in different courses of action. For instance, that a nod can be a listener showing agreement or can be a speaker pointing something out by nodding toward it while speaking (an example of which we will see later).

Thirdly, video provides the accompanying objects, environment and movement in relation to one another. Just as ignoring the talk quickly makes many gestures unintelligible so it is that removing the objects and environment that go with them deprives us of their sense. In Charles Goodwin’s research he provides two helpful examples of this:


Fig 1 - Gestures without objects

Stacks Image 1602


In figure 1 we cannot make sense of what the person is complaining about from either their speech or their gestures. It is only once the object is placed back into their hands in figure 2 that we can also understand the source of their troubles and the reason for their complaint.

Fig 2. Gestures with objects
Stacks Image 1604


(Fig 1 & 2, from Chuck Goodwin (2002) Multi-Modal Gesture. Paper presented at the First Congress of the International Society for Gesture Studies.)



In the second example, again borrowing from Chuck Goodwin's work, we see how quite what happened is impossible to appreciate without the details of space it is happening in. In the first image, we see a rugby player on top of a ball but it is only when we can then also see the larger environment that is the pitch that we can recognise what the player is doing.


Fig 3 Rugby player with ball

Stacks Image 2005


Fig 4 Rugby layer, ball & pitch with markings
Stacks Image 2030
(Image courtesy of Craig Marren, Flickr)

These additional elements of the setting captured by video recordings are referred to by researchers from a background in linguistics as ‘multi-modal’ or multi-modality. This helps underline how linguistic communication happens through more than the single ‘mode’ of communication that is audible speech.