De Corporis Voce

In what was once the CRM (Customer Relationship Management), then evolved to CX (Customer Experience), one of the key issues has always been that of communication channels, that is the ways a customer (I’m using this term in the broadest sense possible) could get in touch with the institution or company - and vice versa - to get the answers it was looking for or to get its needs satisfied.

The need not to set virtually any limit to these channels of communication has led, over time, to their progressive integration, moving from their simple existence and management, up to a perfect blend (this term is often used in literature in the most extensive form of Channel Blending or, in some cases, of Organic Channels), based on the assumption that we, when we communicate or, specifically, manage a request that is articulated over time, indistinctly use different channels, depending on the specific moment in time and what characterizes it, and we do that in an unplanned or unplannable way, and, in some cases, even potentially in parallel, for example talking on the phone with an operator while we are on their website to do something for which we need help.

What I consider interesting, and probably the source of further challenges, is that, alongside this heterogeneity of communication channels, already in itself of difficult management, there is another, related to our communication channels - verbal, paraverbal and non-verbal - well known and studied, which ideally should be managed with the same accuracy with which the others are managed.

Beyond the famous rule of Albert Mehrabian (footnote [A]), which assigns different importance and expressiveness to these channels, what is important to note is that this second dimension of communication is orthogonal to the first, in the sense of being able occurs, with the appropriate characterizations, practically for each of the channels managed within the scope mentioned above and, if it is immediate to think of them during a de visu communication (for example, when one goes to an office or counter to talk in person with someone), maybe it is of less immediacy for those channels where, at first glance, this would seem difficult, if not meaningless, even if this concerns are contradicted by different experiments, which confirmed, for example, how the body posture (non verbal component) influences the verbal and paraverbal components of what is said during a telephone conversation (1), or how the paraverbal component has its equivalent in a written message (for example, the use of upper / lower case letters, emoticons, and abbreviations).

Basically, therefore, we have a communicative space with two dimensions: the first (technological) are the channels managed by the systems that govern this type of interaction; the second (semantics) is the one we put in place on each of these channels. It is evident that, for an effective communication, it is then important to manage both the dimensions and, if the first is necessary to guarantee the customer maximum freedom of connection, the second is to understand what is really said, beyond the words used to say it.

The second dimension, that relating to how we communicate, is much more important than the first, since it is the one that conveys the content of communication and not the medium through which it occurs, a medium that is however partially indicative of the speaker's emotional state, since it is the experience of all that certain things, with certain tones, sometimes we prefer to say verbally, sometimes in writing; sometimes with channels that impose an immediate response (a phone call), sometimes not (a mail). In other words, even in the first dimension there are indications of what happens in the second.

This second dimension, moreover, is in turn decomposable into two sub-dimensions, since, as confirmed by Paul Ekman's more than thirty years studies, while facial mimicry is symptomatic of the onset of emotions, fleeting or lasting, the movements of the body represent (not only) the way we react to them. For example, the onset of an emotion of contempt, one of the seven universal emotions, signaled by our face (FACS Action Units U12A + U14A), can then be followed by a closing and flight posture, like the crossed arms, the distracted eye contact and the body oriented in a different direction of the source of the emotion.

We should therefore ask ourselves if the next step for such systems is not to increase the managed channels, but rather to refine the way in which they are managed, investigating how technology can help us to analyze all the components of human communication, with the objective to automate the understanding of paraverbal and non-verbal components, in order to give the listener the maximum possible information on what the customer is actually asking or communicating.

This need is in my view fully justified by the definition of Customer Experience, a definition that, with the differences of the case, basically tells us that CX is "the overall experience that customers experience throughout their relationship with the company", and we cannot talk about experience without talking about emotions.

Fortunatamente, accanto a studi generali sulla comunicazione uomo-macchina (1) (2), cominciano a svilupparsi anche quelli specificatamente mirati a dare un supporto automatico al riconoscimento degli elementi paraverbali (pochi) e non verbali (decisamente di più), attraverso sistemi di analisi automatica del parlato e dei video, in modo da poter catturare ciò che deve essere catturato, alla ricerca del significato profondo di ciò che viene detto (riferimenti da 4 a 11).

Fortunately, alongside general studies on human-machine communication (1) (2), there is a growing interest in those specifically aimed at giving automatic support to the recognition of paraverbal (few) and non-verbal elements (much more), through automatic analysis of speech and video, to be able to capture what needs to be captured, looking for the deep meaning of what is being said (references 4 to 11).

The greatest benefit in this sense is, of course, in communication de visu, especially virtual ones, where these systems can analyze facial expressions in real time in search of the emotions experienced or repressed (it does not seem reasonable to analyze the movements of the body, since these communications usually occur by framing only the face), integrating the speech with elements that allow correct reading (11).

But even if the visual component is absent, for example during a telephone conversation or in the case of a voice message, it would still be possible to analyze the paraverbal component (tone, rhythm, volume, speed, ...), so as to arrive at a similar enrichment of information, for the benefit of who, then, will have to follow up what the customer has said or asked for, certain to be able to read, with a certain confidence and, as is often said, even between the lines.

In conclusione, la tecnologia progredisce spedita e, nello specifico, sembra ragionevole ritenere che alcuni suoi ambiti specifici, primo fra tutti quello del Machine Learning, possano portare - e in parte lo hanno già fatto - alla predisposizione di modelli predittivi sempre più sofisticati, in grado di cogliere ciò che viene veicolato dalla mimica facciale, dai movimenti del corpo e dagli elementi paraverbali, modelli che potranno quindi essere integrati in tutte quelle soluzioni che, a diverso titolo, debbano gestire l’interazione con il cliente.

In conclusion, the technology progresses quickly and, specifically, it seems reasonable to believe that some of its specific areas, first of all that of Machine Learning, can lead - and in part have already done so - to the predisposition of increasingly sophisticated predictive models, able to capture what is conveyed by facial expressions, body movements and paraverbali elements, models that can then be integrated into all those solutions that, in different ways, must manage the interaction with the customer.

Andrea Zinno - De Corporis Voce

Footnotes

[A] - The rule, proposed by Albert Mehrabian in 1967 and which assigns to non-verbal, paraverbal and verbal components, respectively, 55%, 38% and 7% regarding their role in understanding and interpreting what is being said, it is very often used in too general terms, outside the areas in which it has been proposed that, specifically, are those in which the listener is in the situation of "making an opinion" about the speaker.
Bibliographic references

Kasia Wezowski, Patryk Wezowski - "Without Saying a Word: Master the Science of Body Language and Maximize Your Success" - 2018
Frederic Landragin - “Man-Machine Dialogue: Design and Challenges” - 2013
Nikolaos Mavridis - “A review of verbal and non-verbal human-robot interactive communication” - 2015
Kaustubh Kulkarni et al. - “Automatic Recognition of Facial Displays of Unfelt Emotions” - 2017
Landowska, Brodny and Wrobel - “Limitations of Emotion Recognition from Facial Expressions in e-Learning Context” - 2017
Mehta, Faridul Haque Siddiqui, Javaid - “Facial Emotion Recognition: A Survey and Real-World User Experiences in Mixed Reality” - 2018
Byoung Chul Ko - “A Brief Review of Facial Emotion Recognition Based on Visual Information” - 2018
Social Media Week - “4 Emotion Detection API’s You Need to Try Out” - 2017
Bill Doerrfeld - “20+ Emotion Recognition APIs That Will Leave You Impressed, and Concerned” - 2015
Carnegie Mellon University - “Computer Reads Body Language” - 2017
Yuanyuan Zhang, Jun Du, Zirui Wang, Jianshu Zhang - “Attention Based Fully Convolutional Network for Speech Emotion Recognition” - 2018
Paul Ekman - “Emotions Revealed” - 2007

For a two-dimensional multi-channel communication