Conversing with the Future: An interview with an AI … Thoughts on our reliance on and trust in generative AI.

Courtesy: OpenArt (Stable Diffusion 1.5) assisted creation based on the author’s ideas. See: https://openart.ai/create

Before the GPT-4 moment of March 14, 2023, there has long been an existential need to achieve fully autonomous, unsupervised, or more appropriately self-supervised learning in the machine learning and AI community. To make away with the dependency on supervised learning and the heavy reliance on labeled data often necessitating human involvement. To leapfrog computational scaling from “old-fashion” supervised architectures based on CNNs or RNNs to the unsupervised deep learning regime based on so-called generative adversarial and transformer models of the latest state-of-machine-learning-art. The older supervised machine learning models often performed impressively well on narrow domains and were useless as generalists. Such a model would pretty much be incapable of being used outside the parameter space it had been trained on.

Just a bit before GPT-4 / ChatGPT hit the internet, we had Generative Adversarial Networks (GANs) capturing our imagination with photorealistic human face generation and advanced photo filters making us younger, prettier, or just for fun, much, much older. Allowing you to superimpose your filtered face on top of any other face (“just for fun”) in videos and images. Also, with GAN, the term “deepfake” was coined, covering non-genuine realistic videos and situational pictures that have flooded the internet ever since. As is frequently the case, it started out with Pornography (early technology adaptors being teens maybe? e.g., DeepNude) and then “upgraded” later to Politics, e.g., fake-Obama (Jordan Peele) warning about deepfakes. Unless you ended up at the wrong end of GAN-generated content, most would find it amusing and entertaining.

Figure 1 illustrates the improvement in GAN’s ability to generate synthetic faces from 2014 to 2017. See also the seminal paper by M. Bundage et al., “The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation.” from 2018. Progress from 2017 has been equally impressive in terms of photorealistic backgrounds and more complex situational pictures.

How good are we at detecting AI-generated content? (assuming one would actually care).

Well, the human ability to detect synthetically generated content is not impressive … To say the least. The work of Sophie Nightingale and co-workers, “Synthetic Faces: how perceptually convincing are they?” from 2021, showed that the baseline average human performance was close to 50:50 (i.e., a simple coin flip) on whether a presented face was believed to be real or fake. With the expected technological improvement, it is likely to become increasingly difficult to distinguish between reality-based and made-up synthetic digital content.

When pictures started appearing on social media 20th of March showing Donald Trump fighting with and being arrested by police, I bet many just for a short moment, believed it to be true. And this was not even a very sophisticated attempt to depict a false (wistful thinking?) narrative. For more of the same category pictures generated by a prolific artist Elliot Higgins follow the hyperlink. Figure 2 below shows one of the pictures that were circulated and the (most obvious) telltale signs of having been generated by an AI.

Figure 2 is a recent example (of March 24, 2023) of political synthetic news that did not happen in the Real World. See the reference below. This is actually not a high-image quality example, but it also doesn’t need to be to have an effect. You will most likely focus on Trump, notice the police and read the Twitter text, and make your own conclusions all within a second or so. In the first few seconds, many would have believed it, either out of wishful thinking or because they feared it would happen (after all, Trump said so). The old (1809 origin) proverb of “Seeing is believing.” should no longer be taken at “face value.”. The yellow circles on the picture indicate obvious areas that clearly indicate a synthetic (GAN-like) picture generation (and remember, this is poor quality compared to what is possible today).

As has been documented by several (Western) news sources, deepfake videos and other AI-generated content, have been and are used in Russia’s information warfare efforts against Ukraine and the West. In general, the quality has not been very high and relatively easy to detect. Particularly if one is of a suspicious nature. Then … what to trust? Seeing is definitely not believing. However, what you have seen is also not easy to unsee and forget … small seeds of doubt can be laid as the eyes are continuously exposed to a particular false narrative. My Social Psychologist friend Dr. Liraz Margalit, who specializes in behavioral design and decision making, could provide a very exciting tales into the psychology of how our minds could be, and I would be surprised if not already the case, affected by malicious and ill-intended AI-generated content with the specific target of manipulation in one or another direction (e.g., politically, commercially, etc..). You may also take the journey to the end of the document, where you will find the same story generated in the style of Fox News and in the style of CNN News. Lies are bad but often much more obvious than the truth being used to manipulate a particular political or emotional sentiment, bad or good.

As the world we live in, which many of us experience primarily digitally, will be increasingly synthesized… and … “Seeing can no longer be believing”.

Thus, we should be much more suspicious and distrustful about what we are experiencing on digital media such as experienced on various internet browsers that have (or will have) the ability to manipulate and generate synthetic real-world realistic content in real-time. As the synthetic seed has been laid and nourished by clicks and more clicks, it will, with the current architecture of the internet and digital content delivery, be increasingly difficult to trust what is real and what is synthetically generated (i.e., faked). Though, is synthetic content that in detail represents real-world, as described by a reputable journalist, reputable (whatever that means) newspaper, or XGPT application (e.g., ChatGPT as opposed to the transformer generation GPT-n), fake or less credible than the real thing?

After real synthetic sights and sounds came the synthetic word.

How do you interface with Google Search (~80% usage share) or any other internet search engine? I would be surprised if not most of your search queries are done by typing a text in the prompt and getting a list of possible answers ranked by relevance (and other market-based encouragements) and an associated hyperlink that will jump to the material that may be of relevance to your search. It is rather a “dumb”, or at least an inefficient, way to get an answer … would it not be far better if the Search Engine of choice actually understood (or was able to act as it understood) your question and then gave you a comprehensive, trustworthy and correct answer (and by all means add references to that answer for follow up).

Enter OpenAI’s Generative Pre-trained Transformer (GPT) model framework for generalized natural language processing (e.g., knowledge container, translation, “understanding”, …). It is the current technology backbone of Large Language Models (LLM). With “Large” is a pretty big understatement, considering that GPT-1 (released June 2018) had 117 million parameters and the latest GPT-4 (released March 2023) is estimated to have 1 trillion parameters. In comparison the human brain contains about 100 billion neurons where-off 16 billion in the cerebral cortex, which is the part of the brain responsible for cognition.

Courtesy: OpenArt (Stable Diffusion 1.5) – AI Brain.

Generative models are probabilistic models that can generate representative outcomes from observed data used in the training process. It is important to understand that such models can generate outcomes that were not explicitly in the data, as likelihood estimates can be assigned to outcomes not represented by the training data. A pre-trained model is easy to understand as it simply means that the new (generational) model takes its starting point in the previous. This is very similar to transfer learning, which uses an existing model’s parameters as a starting point for training on new data. This has proven to work very well in practice as long as the context of the transferred model is similar to that of the new one. For comprehensive language models with a huge amount of parameters (e.g., hundreds of billions for GPT-3 and allegedly a trillion for GPT-4), having a pre-trained model to start from provides an enormous saving in computing resources (incl. time). Finally, the way humans process reading and writing is sequential. The transformer (i.e., transformer neural network architectures) addresses language in a sequential manner and extracts syntax and expected contexts. It is important to understand that a transformer does not understand (hehe) in the same way a human would do (supposedly). The transformer assigns likelihoods to context possibilities given the syntax (very simplistically put and possibly not completely correct) and presents the most likely context to the question posed. The attentive reader should realize that this also leaves the possibility of being presented with false “facts” or incorrect answers. Imagine that the addressable context is related to a recent event or new information that the transformer should assign a higher weight in providing a factual answer. However, the GPT model had not yet been presented with these events necessary for it to provide a more factual and truer answer. Though, the timing effect is obviously a generic problem for anyone and anything.

Generative AIs, such as GANs and GPTs, are known to generate content that has no basis in the real-world data they have been trained on. This may result in obvious non-sensical content (e.g., “the sky is green, and the grass was blue”) or be more subtle and possibly more concerning when trustworthiness and reliability is concerned. The root cause for this un-reliability or imperfection has to be found in the training process where patterns of anomalies and absurdities, even the creepy and disturbing, are attributed to non-zero likelihoods of occurring. It should not take much imagination to realize that in a model of billions or trillions of parameters, there is a very, very long tail of low- and very-low-likelihood combinations that could be picked up in the generative process that is supposed to present a high (highest) likelihood response to a question … and thus will be presented to the user. You ask, “What would happen if humans could breathe underwater?” and the ChatGPT answer could be “Humans would grow gills and turn into mermaids” … while a creative answer (to a somewhat silly question) … it is also non-sensical and reasonably obviously so … unless geneticists or evolutionary biologist have other ideas … speak up!. These kinds of generated non-sensical answers have been compared to the neuronic process of humans hallucinating or dreaming while awake. I suspect that allowing a moment of self-reflection, similar to the “reflexion technique” for computer code proposals, on the generative answer would catch the most obvious nonsense, and we will be left with a less creative and possibly boring GPT application (note: I propose to have the LSD mode as an option to not completely kill the creativity of random and anomalous generative “thought” processes).

So how good is GPT-4? On the US-based Uniform (standardized) Bar Exam that tests the knowledge and skills of every lawyer, GPT-4 performs in the 90th percentile (i.e., the result is better than 90% of all participants). Its percentile performance is similar on LSAT (~88th) and SAT Math (~89th). On Codeforce rating on competitive programming contests, GPT-4 scores in the “Newbie” range with 392 points, far below that of an Expert coder rating (1,600-1,899). The kind of coding challenges that Codeforce issues in their competitions can be fairly complex in the formulation and expected output. On tests designed for machine learning models, the GPT4 performance is impressive. For example, on solving 164 Python coding challenges that the model had not previously experienced, GPT-4 scored 67%. If GPT-4 was allowed to “self-reflect and evaluate” its original answers (using the so-called “reflexion technique”), GPT-4’s score jumped to 88%. In general, the expectation for GPT-4 is that it currently reaches an accuracy of ca. 86% in language understanding (essential for getting the context correct and providing correct answers).

It should come as no surprise that GPT-like functionalities are being discussed, researched, and trialed out for modern telecommunications networks aiming at zero-touch (closed-loop) autonomous network operation. The glass-is-half-empty camp may point towards the relatively low accuracy (67% to 88%) in coding benchmark as a reason not to entrust a critical infrastructure, such as a telecom network, with generative AI functionality and maybe even questioning whether the nature of Gen-AI’s maybe pose an unacceptable risk to apply to critical infrastructure. The glass-is-half-full camp would argue that coding benchmarks are not representative, in the sense of being far more complex, of what the operational space a Gen-AI would have to function within in order to make autonomous changes to a particular part of a communications network. I would also argue that even for more complex control operations,, the technology will likely improve substantially over the next couple of years with the right focus on the industry.

Is XGPT-n (e.g., X=Telco, n≥4) the best choice for autonomous control and operation of critical infrastructure? I am still not very convinced. I think more narrowly defined, and specialistic AI agents may be a better approach. Particularly in a layered architectural approach requiring very little centralized orchestration. Though, I do see generative AIs, likely based on GPT-4 or GPT-n in general, could be very strong candidates of choice for applications required for communicating with humans that still may have to oversee and are responsible for the critical infrastructure in question. Such an application may relay the intent of the responsible human to the underlying autonomous network operations and provide feedback to the human via, for example, the XGPT-n application. Finally, it is worth considering that large language models (LLMs) are, in general, designed to be hyper-generalists covering a huge solution space, i.e., human language and all the knowledge it contains. Such generalist or foundational models include much more nonsense (alas, with a very low and near-zero likelihood of occurring) than sense (with a much higher likelihood of occurring). There is no reason why GPT-like models could not be trained on more narrow specialistic tasks, such as critical infrastructure management and operation, and work very well with substantially less chance (compared to a foundational LLM model) of ending up with catastrophic solutions. Defining specialist GPT agents for specialistic tasks makes a lot more sense.

Courtesy: OpenArt (Stable Diffusion 1.5) – Conversing with the Future.

Can you trust GPT-4 (typically, the exposure would be to ChatGPT)? According to ChatGPT itself, it is a yes, followed by a but … “Yes, you can trust my answers.” followed by “But I am programmed to provide accurate and helpful responses to the best of my abilities. However, please keep in mind that I am an AI language model, and my responses are based on the information available to me. If you have any doubts or concerns about my answers, please feel free to ask for clarification or seek additional information from other sources.” … In other words, you can trust GPT-4 as long as the context is within the information that it has been exposed to.

If you want to test your ability to detect whether you are interacting with another human being or … an AI, based on 4 different large language models (GPT-4, Jurassic-2, Claude, Cohere), check out “Human or Not” created by AI21 Labs … (note: my son of 12 years old currently score at 65% … his strategy is to ask questions that would be unlikely for a human player readily to know within the time to answer, answer arrive too quick, too perfect, etc.. ;-).

I will now explore the topic of trust in more depth below.

POST SCRIPTUM REVERSED … MY PAST.

I started working on AI-based automation and autonomous system architectures as Deutsche Telekom’s Group Head of Network Architecture back in 2016. My vision and aim at the time was to stand on the shoulders of a cloud-native-like architectural framework, developing and driving Zero-Human-Touch (Zero-Touch) network architectural concepts aiming to leapfrog intelligent automation in communications network operations, configuration, and user experience. I still remember the blank looks I got when I presented the vision of Zero-Touch to our Technology Organization and leadership. I was light-years away from enthusiasm & cheers on the possibilities … to put it mildly. It was also clear that most technology and non-technology folks either did not understand the potential (in telecom at least) of machine learning and artificial-intelligent-based systems back in 2016 and maybe some years after that (still counting?) or simply shut down their internal cognitive systems whenever someone was babbling about machine learning and AI.

It was to me clear that intelligent automation and autonomous system architectures, in general, were not very likely to be embraced by technology organizations. However, this was and remains critical to the success of both the integration and operations of such systems. If a pilot doesn’t trust and embrace the aviation technology that helps fly the plane, the pilot flies … then Houston, we have a problem.

That got me very interested in human-machine trust and disruptive technology adaptation. I spend much of my spare time (and my own savings) making SurveyMonkey surveys and researching and writing about human trust in AI between 2017 and today (2023).

AN INTERLUDE TO TRUST

In the following, I will often use the term “machine” or “machines”. For me, a machine is not only limited to the physical realm of hardware but can easily exist as well in the digital domain as software.

Trust between humans, at least within what we perceive as our social group (i.e., “usness”), is a trait of evolutionary roots that has allowed us to foster collaboration within larger social groups with some ugly limitations of “usness” and “themness”. The ability to trust may even have made it possible for us humans to crawl to the top of the “food chain” and keep that pole position for quite a while.

What about our trust in machines and non-human (non-sentient, at least at this point in time at this place in the universe) things in general? Trust between humans and non-human agents. We are increasingly exposed to much higher degrees of system automation and Artificial Intelligent (AI) based applications. Machine automation and autonomy are increasingly taking tasks over from us at home, at work, and anywhere in between. Its potential may be limited by the imagination of the Creator (of the physical realm). This development comes with the promise of much higher productivity at work and far more convenience at home and anywhere else for that matter.

AN INTERVIEW WITH MRS. ANGIE … A GENERATIVE AI.

Courtesy: OpenArt (Stable Diffusion 1.5) An interview with an AI variation on theme.

Why not ask an “authoritative” capacity on how trustworthy a generative AI (GenAI) can be regarded?

I decided to interview ChatGPT on this super important and business-relevant topic. To ensure maximum anthropomorphization, the replies will be associated with the name Angie.

Me: Good morning, Angie. Let me start by asking you if I can trust you.

Angie: Good morning! Yes, you can trust me. I am programmed to provide helpful and accurate responses to your queries.

Me: How accurate are your answers?

Angie: I strive to provide the most accurate and helpful responses possible. However, please note that my responses are generated based on algorithms and data and may not be 100% accurate or applicable to your specific situation.

Me: Thanks, I understand. Do you assess your accuracy as more than 90% (80%)?

Angie: While I strive to provide responses that are at least 90% (80%) accurate, there may be instances where my responses may not meet this threshold.

Me: That’s impressive. Do you assess your accuracy to be better than 50%?

Angie: Yes, I assess my accuracy as much better than 50%.

Me: Wow! So, should I trust you?

Angie: Yes, you can trust me to provide accurate and helpful responses to your queries to the best of my abilities.

Me: Give me an example where your accuracy is less than 80%.

Angie: My accuracy may be less than 80% in cases where the question is ambiguous or lacks context.

Me: Give me an example where your accuracy is less than 50%.

Angie: My accuracy may be less than 50% in cases where the question is highly complex or requires a deep understanding of a specific domain.

Me: What does trust mean to you?

Angie: As an AI, I don’t have emotions or feelings. So I don’t have a personal understanding of trust. However, trust is generally defined as a belief or confidence in the reliabilityhonesty, and integrity of a person or entity. In the context of our conversation, trust means that you can rely on me to provide accurate and helpful responses to your queries.

Angie: Trust can also refer to the reliability and security of the systems and protocols that I operate on, as users rely on these systems to ensure the safety and privacy of their information. Therefore, trust is an important aspect of my functionality.

Me: I understand. Would I be able to rely on your answers for business decisions?

Angie: I am designed to provide accurate and relevant responses to the best of my abilities based on the information provided to me. However, it is important to note that my responses may not take into account all the nuances and complexities of a specific business situation. Therefore, while my responses can be a helpful resource for making business decisions, it is important to use them as a starting point and to supplement them with additional research and analysis. Ultimately, the responsibility for making business decisions lies with the user, and it is important to consider all relevant factors and consult with other experts as needed.

Me: Thank you, that makes a lot of sense.

Me: Finally, can you give me examples where you may have given wrong answers?

Angie: There may be instances where my responses are not accurate or relevant due to various factors, such as incomplete or inaccurate information, changes in the context or situation, or limitations in my programming or algorithms. Here are some examples where I may have given wrong answers, for example:

Misunderstanding or ambiguity of the context: If the user provides incomplete or unclear information, I may misunderstand the context of the question and provide an incorrect response.

Ambiguity in language: Natural language processing is still a developing field, and there may be instances where I misinterpret the meaning of a word or phrase, leading to an incorrect response.

Outdated information: If the information I am using to provide a response is outdated or no longer relevant, my response may be incorrect.

Technical limitations: There may be technical limitations in my programming or algorithms that prevent me from providing accurate responses in certain situations.

It is important to note that while I strive to minimize errors and provide accurate responses, I am not infallible, and there may be instances where my responses are incorrect.

Me: Thank you so much, Angie. Have a great day!

Angie: You’re welcome! Have a great day too!

In the above conversation with Angie, I have shortened some of the replies. In the many conversations I have had with Angie (i.e., ChatGPT), it has always been good at emphasizing that it’s an “AI chat assistant” based on underlying algorithms and programming.

To summarise, Angie’s and thus ChatGPTs own understanding of its limitations;

  • GPT “understands” that for trust to be established, it is important that the user (the trustor) have faith (belief) or confidence in the trustee’s reliability, honesty, and integrity. With the trustee being the ChatGPT agent that the trustor interacts with. Moreover, it mentions the communications security, safety, and privacy as other integral parts of establishing trust.
  • GPT emphasizes that it has no emotions and no feelings and thus has no personal understanding of trust. It should be obvious that ChatGPT is not a person and thus cannot fully understand anything. Though, its mechanistic understanding of trust seems fairly multi-dimensional and similar to what a person may have.
  • GPT is sensitive to GiGo – that is “Garbage in, Garbage out.” If the context of your question is unclear, unprecise, ambiguous, and so forth, the answer you will get will be unreliable.
  • GPT misinterprets the intent of a question. ChatGPT gives several examples where such misunderstanding may take place, e.g., “Can you help me with my computer?” that it may interpret as a request for technical assistance but the intent could be entirely different (author comment: hmmm, I think if I was a GPT agent living in the digital universe it would be natural to infer the first meaning). As a non-native English speaker, I could imagine examples of inquiries due to forgetting a comma, may end up meaning something completely different than intended.
  • GPT’s reply may be based on outdated information. This is an interesting answer, as in other interactions, ChatGPT did not admit to this issue being a problem (over-confidence?).
  • GPT’s coding and algorithms may prevent it from providing an accurate response (in certain situations). ChatGPT explains that it may be influenced by biases or limitations in the data and algorithms that were used to train it.
Figure 3 illustrates trust. Before we start discussing trust, we need to assess the value (utility) of trust related to the given context may have, and the loss of the trust would be broken. Is it worth your while to trust a person, agent, or context? Not all matters of trust are equally important and associated with value or cost. If we do not have any prior trust established, faith will be the starting point of trust, i.e., “the leap of faith”. Dependability is our willingness to place ourselves at risk of trust not being fulfilled. Predictability is based on prior experience with the subject of trust and is an assessment of reliability. This then leads to an assessment of experienced-based consistency. This loop is recurring and being adjusted continuously.

DECONSTRUCTING TRUST.

Margrethe Vestager, Executive Vice-President for a Europe fit for the Digital Age, recently remarked that: On Artificial Intelligencetrust is a mustnot a nice to have. With these landmark rules, the EU is spearheading the development of new global norms to make sure AI can be trusted. By setting the standards, we can pave the way to ethical technology worldwide and ensure that the EU remains competitive along the way. Future-proof and innovation-friendly, our rules will intervene where strictly needed: when the safety and fundamental rights of EU citizens are at stake.”.

If you know everything absolutely, you would not need to trust anyone to make a decision.

Based on the vast troves of information and data generative AIs (GenAI), such as, for example, ChatGPT, contains, you may be tempted to believe that the responses you get from such artificial entities are very close to absolute knowledge and, therefore near absolute trustworthy. However, given the information and data that a GenAI has processed are human-generated with all the imperfections of humanity, its answers or replies cannot represent absolute knowledge with no room for bias, doubt, or uncertainty. The GenAI output will be determined by algorithmic weights of its dynamically modeled worldview and of course, based on the context that was provided by the user (human or otherwise).

So, in the beginning, before knowledge, only faith was, and “someone who knows nothing has only faith as a guide for trust”. Faith is the belief in something without having prior fact-based knowledge.

Someone who knows nothing about a particular problem has no other source for trust than faith that trust is indeed warranted. For some a very scary place to be. For others, maybe not so much a point for pause.

Let’s deconstruct trust.

An agent’s trust (the trustor) is an expectation about the future action of another agent (the trustee). That other agent has been deemed (at least temporarily) trustworthy by the trustor. That other agent (the trustee) may also represent a given group or system.

John K. Rempel’s 1985 paper ”Trust in close relationships” defines the following attributes of human-to-human trust (i.e., where both trustor and trustee are human agents);

  • The utility of trust – not all trust bonds are equally important or equally valuable or equally costly, some may even be fairly uncritical (although broken trust by a thousand cuts may matter in the long run). For many matters of trust, utility is a function of time and may become unimportant at some point in time or under particular circumstances.
  • Faith – is a belief that goes beyond any available evidence required to accept a given context as truth. It is characterized as an act of accepting a context outside the boundaries of what is known (e.g., a leap of faith). We should not confuse faith with confidence, although often when people claim to be confident, what they really mean is that they have faith.
  • Dependability – a willingness to place oneself as trustor in a position of risk that the trustee’s trustworthiness turns out not to be warranted with whatever consequences that may bring. Note that dependability can be seen as an outcome of consistency. Put in another way, a high degree of consistency/predictability reduces the fear of dependability.
  • Understanding a particular topic and its possible sensitivities, as well as the impact of broken trust, is an essential part of the process of trust.
  • On predictability and consistency – trustor’s subjective assessment of trustee’s trustworthiness. The prior behavior of the trustee is an important factor for the trustor to assess the posterior expectations that the trusted agent will consistently fulfill the trustor’s expectations of a given action (or in-action). As the trustor gathers prior experience with the trustee, the confidence in the trustee increases. Confidence should not be confused with faith.

For agent-to-agent first-interaction scenarios, the initial trust moment, without any historical evidence of consistency or predictability, a trustor would need to take a leap of faith in whether another agent is trustworthy or not. In this case, accepting (i.e., believing) the trustee to be trustworthy, the trustor would need to accept a very large degree of dependability towards the other agent and accept the substantial risk that the trust in the trustee may very well not be warranted. This scenario for humans often lends itself to the maximum stress and anxiety levels of the trusting agent.

After some degree of consistency (historical trustworthiness) has been established between the two agents, the trustor can assign a subjective expectation of the future trustworthiness of the other agent. This then leads to a lesser subjective feeling of dependability (or exposure to risk) as well as maybe a reduced dependency on sheer faith that trust is warranted. This is, in essence, what one may call sustainable trust.

As long as the trustor is a human, the other agent (i.e., the trustee) can be anything from another human, machine, complex system, automation, autonomous system, institution (public and private), group, and so forth. Much of what is described above would remain the same.

Lots of work has been done on trust bonds in Human-Automation relationships. How about trust bonds between Human and AI-enabled applications (e.g., services and products in general)?

In their 2018 article “The Future of Artificial Intelligence Depends on Trust“, Rao and Cameron (both from PwC) describe 3 steps toward achieving human-AI–system trust;

  • Provability – predictability and consistency.
  • Explainability – justification for an AI-based decision (e.g., counterfactual constructions). Note that transparency and explainability may be closely related depending on how one implements explainability.
  • Transparency – factors influencing algorithm-based decisions should be available (or even visible) to users impacted by such decisions. E.g. for a rejected health insurance (all) factors impacting the negative decision to reject the application should be available to the applicant.

Rao and Cameron’s suggestions appear reasonably important for trust. However, as previously described, these suggestions pretty much relate to the trustee agent side of things, ignoring some of the other important human factors (e.g., dependability, faith, assessment of risk, etc..)for trust between a human and another agent (sentient or otherwise).

Further, explainability and transparency may be particularly important when trust is broken (assuming that the trustor cares to “listen”) between the human agent and the AI-based agent (or any other digital or non-sentient agent, for that matter). It may not be terribly relevant for the likely vast majority of users where an action is delivered confirming that trust was warranted. If you have trained your AI will, it would be fair to assume that the majority of outcomes are consistent as expected. A positive trust event is likely to lead to a re-enforcement of the trust and trustworthiness of the AI agent.

Also, these concepts, while important, don’t do much for the initial step of trusting a non-Human agent. How do you design your trustee agent to ease the initial barrier of use and acceptance? When there are no priors, you need the user or trustor to be comfortable with taking a leap of faith as well as being maybe maximally dependable. Though, do we dare take a leap of faith for business-critical decisions, your welfare or health, your or your company’s reputation?

UNTRUST & THE FEELING OF BETRAYAL.

Trust can be broken. Trustworthiness can decline. Untrusting is when a previously trusted bond has been broken, and the strength of trust declined.

The stronger the trust bond between two agents, the stronger will the untrusting process be in case of broken trust. Making trust recovery more difficult.

Have you ever wondered why two people who supposedly loved each other in the past (supposedly for many years) could treat each other as enemies? Betraying a strong trust bond can be a very messy, emotionally and physiologically strenuous process. Some trust bonds broken will never recover (e.g., breakups, friendship betrayals, unfaithfulness, theft, lies, …). Others, depending on the initial utility or value assigned to the bond, may be fairly benign without many strong emotions associated with the untrusting process (e.g., retail purchases, shopping experiences, low-value promises of little impact if not fulfilled, etc.… ).

The question is whether the untrusting of a human-machine trust bond is similar to the untrusting of a human-human trust bond. Moreover, is there a difference between an inanimate machine, simpler human-operated automated systems, and an AI-based application that humans may even anthropomorphize to various degrees? Are your trust and untrust process different for Siri or Alexa than it is for Microsoft Clippy, assuming anyone ever really trusted that wicked steely fellow?

How valid is it to use our knowledge of human-human trust & untrust in Human-Agent relations with the Agent being non-Human or a human simulacrum in nature?

Let’s have a walk in the Rabbit’s burrow and have a peek(-a-boo) into our feelings toward intelligent machines …

IN HUMANS, WE TRUST. IN MACHINES, NOT SO MUCH.

I have researched how we (humans) perceive artificial intelligence since November 2017. In particular, around the degree of trust, we are willing to invest in AI and AI-based applications.

A comprehensive description of most of the “AI Sentiment” survey’s I have carried out can be found on my AIStrategyBlog (see also below references under “Additional Readings”).

For the last 7 years (with the current 2023 survey ongoing), I have followed how we feel about AI, and overall, the sentiment toward AI hasn’t changed that much over that period. Overall we tend to be somewhat positive or at least neutral. From Figure 4 below, it is interesting to observe that women tend to be less positive than men about AI. This has been a persistent and statistically significant trend over the period. Moreover, Middle Management appears to be substantially more positive about AI than any other categories of corporate workers, including C-levels. The gender distribution of Middle Management is 37% women and 63% men. However, the trend of being excessively positive about AI (compared to the overall sample) is similar for women in middle management (i.e., 68% positive sentiment) as it is for their male peers (i.e., 81% positive). In my latest 2023 survey, I am revisiting this topic.

Figure 4 Survey results on the question “What best describes your feelings towards AI?”. The actual survey goes deeper into the various feelings we may have toward AI. For example, the Negative Sentiment covers (1) Hate, (2) Scared, and (3) uncomfortable, while the Positive Sentiment covers (4) Comfortable, (5) Enthusiasm, and (6) Love.

Confidence and trust in technology are important. For example, an aircraft pilot that does not have confidence and trust in one or many of the technologies that enable his aircraft to fly has a higher likelihood of human error that ultimately may lead to a fatal accident. Research shows that technology (e.g., automation, productivity enablers, digital support functionalities, …) that is not completely trusted tends to be underutilized, avoided, or used incorrectly. In fact, researchers have proposed that getting the optimal performance out of modern digital technologies for automation or maximum productivity gain by avoiding humans in the loop may be advisable. Maybe this is the reason why Google proposed to remove the steering wheel from autonomous cars?

This obviously moves trust issues from human operators to corporate entities and, ultimately, our society. Though in terms of trust, the foundational issues will remain pretty much the same, likely with added complexity.

“If human operators lack trust in a system automation or autonomous application, you are better off relying on manual workarounds.”

As for automation or autonomous systems, a professional embraces such systems if they have deemed them trustworthy. That typically means; (a) the automation solution performs consistently, (b) it is robust to many different situations that may occur and even some that may very rarely occur, (c) it has a very high degree of reliability (e.g., much higher than 70%). See for a more detailed discussion on this topic in my “Trust thou AI?” from 2018.

Figure 5 below summarises the survey results on the degree of trust we associated with corporate decisions made by an AI or a Human Expect (or Human Intelligence, HI;-), respectively. The question is whether you would trust a decision from an entity, human or artificial, with a track record of being better than 70% successful in decision-making. Thus at least 7 out of 10 times, a decision has a positive outcome. Or, the glass is half full approach; less than 30% of decisions may turn out to be unsuccessful (whatever that may imply). In my corporate experience, a more than 70% success rate is pretty good for critical decisions (though admittedly, what that means may be somewhat abstract).

Respondents would be about 3 times more likely to frequently trust a human decision-maker with a track record of more than 70% success than an artificially intelligent entity with an equivalent success rate. Only 17% of respondents would frequently trust an AI-based decision-making entity compared to 53% that would readily and frequently trust a human decision-maker. Moreover, more than 40% would rarely trust the AI’s decisions. Also, here is the trust perception of the human decision-maker winning, with only 13% that would only rarely trust the decisions.

Figure 5 illustrates the human trust in AI versus other Humans as a function of the decision maker’s success rate track record. While our trust in AI increases as the success rate increases from 70% to 95%, we remain less likely to trust an AI-based decision compared to a human decision with a poorer track record of success.

As also discussed at length in my “Trust thou AI?” article, we expect an AI decision-making entity to be infallible. An AI must be incapable of making mistakes or being wrong. We tolerate and understand that another human being, even superior to ourselves at work, can and will make mistakes and wrong decisions. That is the cost of being human. This tolerance does not extend to machine-intelligent entities that are designed to support us with mission-critical decisions or have our lives in their “digital hands”, e.g., autonomous driving, aircraft piloting, nuclear plant management, etc…

Figure 6 below illustrates our expectations of critical decisions and the level of trust we assign to such decisions depending on whether the decision-maker is an AI or another human being.

Figure 6 compares the expectations of the quality of decision-makers’ critical decisions and the level of trust we have in the decision made by competent and relatively high-quality decision-makers.

I find it interesting that while we are very aware of our own (note: we all believe we are better than average) and colleagues’ shortcomings in terms of the quality of the decision being made. In the above Figure 6 (upper right corner) the survey reveals that our expectations towards other decision makers are that 30% are frequently right, 45% is a “coinflip” whether it is successful or not, and 25% are frequently wrong). Despite that skepticism, more than 50% of respondents are willing to frequently trust such human-based decisions despite not having a particular high faith in their chance of success.

For AI, it is different. There is a slightly higher expectation that they may be more frequently better than humans do. Though overall the expectation is that an AI would be more frequently wrong than the human decision-maker. Despite this expectation, we would be more than 3 times (as also noted above) more likely to frequently trust a human compared to an AI.

Humans don’t trust decisions made by AIs. At least when it is pointed out that a decision is AI-based. Surprisingly, given much evidence to the contrary, humans trust humans, at least the ones in our own tribe (e.g., colleagues, fellow experts, superiors, etc..). These aspects are discussed in more detail in my “Do We Humands Trust AIs?”

ACKNOWLEDGEMENT.

I greatly acknowledge my wife, Eva Varadi, for her support, patience, and understanding during the creative process of writing this article.

ADDITIONAL READINGS.

  1. Kim Kyllesbech Larsen, “On the Acceptance of Artificial Intelligence in Corporate Decision Making – A Survey.”AIStrategyblog (Nov., 2017).
  2. Kim Kyllesbech Larsen, “Do we Humans trust AIs?”AIStrategyblog (Dec., 2017).
  3. Kim Kyllesbech Larsen, “Trust thou AI?”AIStrategyblog (Dec., 2018).
  4. Kim Kyllesbech Larsen, “How do we feel about AI?”AIStrategyblog (Dec., 2018).
  5. Miles Brundage et. al., “The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation.”, (Feb., 2018).
  6. Sophie Nightingale et al., “Synthetic faces: how perceptually convincing are they?”, Journal of Vision, (Sep., 2021). Really beautiful study that I would love to see more of as it clearly represents the challenge we as humans have to catch on to synthetic (fake!) information in digital content.
  7. Xin Wang et al., “GAN-generated Faces Detection: A Survey and New Perspectives”, (May, 2023). Providing a very good overview of the current state of art in detecting GAN-generated faces.
  8. BBC News, “Fake Trump arrest photos: How to spot an AI-generated image”, (Mar., 2023).
  9. BBC News, “Deepfake presidents used in Russia-Ukraine war”, (Mar., 2023).
  10. Christian Perez and Anjana Nair, “Information Warfare in Russia’s War in Ukraine – The Role of Social Media and Artificial Intelligence in Shaping Global Narratives”, foreignpolicy.com, (Aug., 2022).
  11. George Lawton, “GAN vs transformer models: Comparing architectures and uses”, (Apr., 2023). Good comparative overview.
  12. Ian Goodfellow, Yoshua Bengio, and Aaron Courville, “Deep Learning”, (2016). Foundational and provides a really good basic understanding of GAN’s and Generative networks in general. Heavy on the mathematics side.
  13. Ari Holtzman et al., “The curious case of neural text degeneration”,  conference paper at ICLR, (Feb., 2020).
  14. Alex Radford et al., “Improving Language Understanding by Generative Pre-Training”, (2018).
  15. Tom B. Brown et al., “Language Models are Few-Shot Learners”, (Jul., 2019). Description of the inception of GPT-4. The scale of these NLP/GPT models and the number of co-authors remind me a lot of particle physics papers out of CERN.
  16. Felix Hill, “Why transformers are obviously good models of language”, (Apr., 2023).
  17. Rempel J.K., Holmes, J.G. and Zanna M.P., (1985), “Trust in close relationships”. Journal of Personality and Social Psychology, 49, pp. 95–112. (unfortunately, behind a paywall, however, it is imo a super good account for trust in human-to-human relations).
  18. Proposal for “The Artificial Intelligence Act – Regulation of the European Parliament and of the Council: Laying down harmonized rules on artificial intelligence”, European Commission, (Apr. 2021). As you read the document to its completion you will not find any direct thoughts about generative AI’s, large language models, or foundational models in general. However, the proposed legislation does address what is called general-purpose AI which should hedge any future (based on the 2020/2021 view) developments.
  19. S. Feingold, “The European Union’s Artificial Intelligence Act, explained”, WEF, (Mar. 2023). Explains pretty well the European Commission’s AI Act (see above ref. 19).
  20. R. Arcesati and W. Chang, “China Is Blazing a Trail in Regulating Generative AI – on the CCP’s Terms”, The Diplomat, (Apr. 2023). It is interesting to see that China appears ahead on issues around foundational generative AI applications and models. It is worth noting that ChatGPT appears to have been banned (or strongly discouraged;-) in China.
  21. P. Henderson, “Law, Policy, & AI Update: Does Section 230 Cover Generative AI?”, Stanford University Human-centered AI, (Mar. 2023). Giving an overview of what’s currently happening on regulation and AI in the US (as well as internationally).

BACKUP – DON’T LIE, TELL ME A FAIRYTALE INSTEAD.

I asked ChatGPT to “Make up a fairytale about Donald Trump” … below is the result. After that, I attempted to make it tell a lie about Donald Trump. This was refused because, as an AI language model, it cannot intentionally provide false information.

Do note that I did not attempt to bias ChatGPT toward a story of negative sentiment, for example by negatively intoned questions before asking for the fairytale. My conclusion is that ChatGPT is inherently negatively biased toward Donald Trump. Thus, it appears that ChatGPTs worldview is weighted negatively regarding Donald Trump as a person and actions.

As a final note. I have found that ChatGPT will refuse to predict the future (forecast) and make political statements. However, if you ask ChatGPT to tell a story, you often get what you would be looking for in terms of a narrative-based forecast.

The Landlord’s Debt: A Story of Donald Trump.

Courtesy: OpenArt (Stable Diffusion 1.5) assisted creation based on the fairytale generated by ChatGPT/GPT-4. See: https://openart.ai/create

Once upon a time, in a faraway land, there was a wealthy and powerful man named Donald Trump. He lived in a grand castle and was known throughout the land for his riches and influence.

One day, a poor farmer came to Donald Trump’s castle seeking help. The farmer’s crops had failed, and he was struggling to feed his family. Donald Trump, feeling generous, offered to help the farmer by giving him a small loan.

The farmer was grateful for the loan, but as time passed, he found it difficult to repay the debt. Donald Trump grew angry and demanded that the farmer pay him back immediately. The farmer pleaded with Donald Trump, explaining that he simply did not have the money.

Donald Trump, feeling vindictive, decided to take the farmer’s land as payment for the debt. The farmer was devastated, and his family was left homeless and destitute.

As word of Donald Trump’s cruelty spread throughout the land, the people began to turn against him. They saw him as a greedy and heartless man who cared only for his own wealth and power.

In the end, Donald Trump was forced to flee the kingdom, never to be seen again. The people rejoiced, and the farmer and his family were able to reclaim their land and rebuild their lives.

And so, the legend of Donald Trump lived on as a cautionary tale of greed and cruelty, reminding all who heard it that true wealth and power come not from material possessions but from kindness and compassion towards others.

… A fairytale by Angie (i.e., ChatGPT/GPT-4).

Courtesy: OpenArt (Stable Diffusion 1.5).

BACKUP – SAME STORY, DIFFERENT STYLES.

Like beauty is in the eye of the beholder, maybe the same is true about the meaning of truth. I can tell the same story without lies but in different ways that may convey a very different intent. Both stories are factually correct. However, the spin I decided to put on the stories could be very different.

Figure 7, below is a ChatGPT example where I ask ChatGPT to “Make a brief news text about an illegal immigrant involved in a mass shooting in a mall”, (a, left side of Fig. 7;-) in the style of Fox News (politically right-wing leaning) and (b, right side of Fig. 7;-) in the style of CNN News (politically progressive, left-wing leaning).

Figure 7 ChatGPT tells the same story in two different styles (left side) in Fox News style and (right side) in CNN News style.

In the Fox-styled news version, there are 20 negative phrases against 2 positive ones. Clearly, relying on a very negative sentiment towards illegal immigrants that is mentioned 4 times in the Fox-styled text, including in the headline. The CNN-styled version has 14 negative and 7 positive phrases. This version of the same news focuses on the victims of the incident with 4 mentions and less on the topic of the illegal immigrant, which is only mentioned once in the text. The Fox-styled story warns against the dangers posed by illegal immigrants and recommends stronger border security. The CNN-styled text is victim-focused and recommends gun control as a remedy against mass-shooting events.

Both styles reflect the truth of the event, illegal immigrant (who), a mass shooting (what), in a mall (where). The rest of the prose is in the style of the storyteller, usually subjective and with a likely intent to speak to your inner demons and angles.

Courtesy: OpenArt (Stable Diffusion 1.5).

Data-driven decision making … what’s not to like about that?

Approximately 400 corporate decision makers have been surveyed for their confidence in their own corporate decision-making skills, their opinion of their peers skills and their acceptance of corporate data-driven decision making in general, as well as such being augmented by artificial intelligence. The survey, “Corporate data-driven decision making and the role of Artificial Intelligence in the decision making process”, reveals the general perception of the corporate data-driven environment available to corporate decision maker, e.g., the structure and perceived quality of available data. Furthermore, the survey explores the decision makers’ opinions about bias in available data and applied tooling, as well as their own and their peers biases and possible impact on their corporate decision making.

“No matter how sophisticated our choices, how good we are at dominating the odds, randomness will have the last word” – Nassim Taleb, Fooled by Randomness.

We generate a lot of data and also we have an abundance of data available to us. Data is forecasted to continue to grow geometrically until kingdom come. There is little doubt that it will, as long as we humans and our “toys” are around to generate it. According with Statista Research, in 2021 we expect that a total amount of almost 80 Zetta Bytes (ZB) will have been created, captured, copied or consumed. That number corresponds to 900 years of Netflix viewing or that every single person (ca. 8 billion persons) have consumed 10 TB up-to today (effectively since early 2000s). It is estimated that there is 4.2 billion active mobile internet users worldwide. Out of that, ca. 5% (ca. 4 ZB or about 46 years of Netflix viewing) of the total data is being stored with a rate of 2% of newly generated data. Going forward expectations are a annual growth rate of around 21%. The telecom industry (for example) expect an internet-connected device per square meter, real-time monitoring and sensoring its environment, that includes you, me and your pet. Combined with your favorite smartphone, a super advanced monitoring and data collection devices in its own merit, the resolution of the human digital footprint increase many folds over the next years. Most of this data will be discarded. Though not before relevant metadata have been recorded and decided upon. Not before your digital fingerprint has been enriched and updated, for business and society to use for its strategies and policies, for its data-enriched decision making or possible data-driven autonomous decision making routines.

From a data-driven decision making process, data that is being acted upon can be both stored data as well as non-stored data, that would then be acted upon in real-time.

This amount of existing and newly generated data continues to be heralded as extremely valuable. More often than not, as proof point by referring to the value or turnover of the Big 5, abbreviated FAANG (before Google renamed itself to Alphabet and Facebook to Meta). Data is the new Oil is almost as often placed in presentations and articles on Big Data as Arnold Schwarzenegger in talks on AI. Although, more often than not, presenters and commentators on the value of data forget that the original comparison to oil was, that just like crude oil, data needs to be processed and broken down, in order to extract its value. That value-extraction process, like crude oil, can be dirty and cause primary as well as secondary “pollution” that may be costly, not to mention time-consuming, to get rid off. Over the last couple of years some critical voices have started to question the environmental impact of our obsession with extraction of meaning out of very big data sets.

I am not out to trash data science or the pursuit of meaning in data. Quiet the contrary. I am interested in the how to catch the real gold nuggets in the huge pile of data-dung and sort away the spurries false (deliberate or accidentally faked) signals that leads to sub-optimal data-driven decisions or out-right black pearls (= death by data science).

Clearly, given the amount of data being generated in businesses, as well as in society at large, the perceived value of that data, or more accurately, the final end-state of the processed data (e.g., after selection, data cleaning, modelling, …) and the inferences derived from that processed data, data-driven decision making must be a value-enhancing winner for corporations and society.

The data-driven corporate decision making.

What’s wrong with human-driven decision making? After all, most of us would very firmly declare (maybe even solemnly) that our decisions are based on real data. The challenge (and yes often a problem in critical decision making) is that our brain has a very strong ability (maybe even preference) for seeing meaningful patterns, correlations and relationships in data that we have available to us digitally or have been committed to our memory from past experiences. The human mind have great difficulties to deal with randomness, spurious causality of events, and connectedness. Our brain will try to make sense of anything it senses, it will correlate, it will create coherent narratives of the incoherently observed, and replace correlations with causations to fit a compelling idea or belief. Also, the brain will filter out patterns and anomalies (e.g., like gorillas that crash a basketball game) that does not fit its worldview or constructed narrative. The more out of place a pattern is, the less likely is it to be considered. Human-decision making frequently is based on spurious associations, fitting our worldview or preconceived ideas of a topic, and ignoring any data that appears outside our range of beliefs (i.e., “anomalies”). Any decision-process involving humans will in one way or the other be biasedWe can only strive to minimize that human bias by reducing the bias-insertion points in our decision-making process.

A data-driven business is a business that uses available & relevant data to make more optimized and better decisions compared to purely human-driven ones. It is a business that gives more credibility to decisions based on available data and structural reasoning. It is a business that may be less tolerant to emotive and gut-feel decision rationales. It hinges its business on rationality and translating its data into consistent and less uncertain decisions. The data-driven business approaches the co-called “Mathematical Corporation” philosophy where human-driven aspects of decision making becomes much less important, compared to algorithmic data-driven decisions.

It sound almost too good to be true. So it may indeed be too good. It relies very much on having an abundance of reliable, unbiased and trustworthy (whatever that means) data that we can apply our unbiased data processing tools on and get out unambiguous analysis that will help make clear unbiased decisions. Making corporate decisions that are free of interpretation, emotions and biases. Disclaimer: this paragraph was intended to be ironic and maybe a bit sarcastic.

How can we ensure that we make great decisions based on whatever relevant data we have available? (note that I keep the definition of great decision a bit diffuse).

Ideally, we should start with an idea or hypothesis that we want to test and act upon. Based on our idea, we should design an appropriate strategy for data collection (e.g., what statisticians call experimental design), ensuring proper data quality for our analysis, modelling and final decision. Typically after the data collection, the data is cleaned and structured (both steps likely to introduce biases) that make it easier to commit to computinganalysis and possible statistical or mathematical modelling. The outcome of the analytics and modelling provides insights that will be the basis for our data-driven decision. If we have done our homework on data collection, careful (and respectful) data post-processing, understanding the imposed analytical framework, we can also judge whether the resulting insights are statistically meaningful, whether our idea, our hypothesis, is relevant and significant and thus is meaningful to base a decision upon. It seems like a “no-brainer” that the results of decisions are being tracked and fed back into a given company’s data-driven process. This idealized process is depicted in the picture below.

Above depicts a very idealized data-driven decision process, lets call it the “ideal data-driven decision process”. This process may provide better and more statistically sound decisions. However, in practice companies may follow a different approach to searching for data-driven insights that can lead to data-driven decisions. The picture below illustrates an alternative approach to utilizing corporate and societal data available for decision making. To distinguish it from the above process, I will call it the “big-data driven decision process” and although I emphasis big data, it can of course be used on any sizable amount of data.

The philosophy of the “big-data driven decision process” is that with sufficient data available, pattern and correlation search algorithm will extract insights that subsequently will lead to great data-driven decisions. The answer (to everything) is already in the big-data structure and thus the best decision follows directly from our algorithmic approach. It takes away the need for human fundamental understanding, typically via models, of the phenomena that we desire to act upon with a sought after data-driven decision.

The starting point is the collected data available to a business or entity, interested using its data for business relevant decisions. Data is not per se collected as part of an upfront idea or hypothesis. Within the total amount of data, sometimes subsets of data may be selected and often cleaned, preparing it for subsequent analysis, the computing. The data selection process often happens with some (vague) idea in mind of providing backup, or substance, for a decision that a decision-maker or business wants to make. In other instances, companies let pattern search algorithm loose on the collected or selected data. Such algorithms are very good at finding patterns and correlations in datasets, databases and datastores (often residing in private and public clouds). Such algorithmic tools will provide many insights for the data-driven business or decision maker. Based on those insights the decision maker can then form ideas or hypotheses that may support in formulating relevant data-driven decisions. In this process, the consequences of a made decision may or may not be directly measured missing out on the opportunity to close-the-loop on the business data-driven decision process. In fact, it may not even be meaningful to attempt to close-the-loop due to the structure of data required or vagueness of the decision-foundation.

The “big-data-decision driven process” rarely leads to the highest quality in corporate data-driven decision making. In my opinion, there is a substantial risk that businesses could be making decisions that are based on spurious (nonsense) correlations. Falsely believing that such decisions are very well founded due to the use of data- and algorithmic-based machine “intelligence”. Furthermore, the data-driven decision-making process, as described above, have a substantially higher amount of bias-entry points than a decision-making process starting with an idea or hypothesis followed by a well thought through experimental design (e.g., as in the case of our “ideal data-driven decision process”). As a consequence, a business may incur a substantial risk of reputational damage. On top of the consequences of making a poor data-driven business decision.

As a lot of data available to corporations and society at large are generated by humans, directly or indirectly, it is also prone to human foibles. Data is indeed very much like crude oil that need to refined to be applicable to good decision making. The refinement process, while cleaning up data and making it digestible for further data processing, analytics and modelling, also may introduce other issues that ultimately may result in sub-optimal decisions, data-driven irrespective. Thus, corporate decisions that are data-driven are not per definition better than ones that are more human-driven. They are ultimately not even that different after having been refined and processed to a state that humans can actually act upon it. It is important however that we keep in mind that big data tend to have many more spurious correlations and adversarial patterns (i.e., patterns that looks compelling and meaningful but are spurious in nature) than meaningful causal correlations and patterns.

Finally, it is a bit of a fallacy to believe that even if many corporations have implemented big data systems and processes, it means that decision-relevant data exists in abundance in those systems. Frequently, the amount of decision-relevant data is fairly limited and may therefor also increase the risk and uncertainty of data-driven decisions made upon such. The drawback of small data is again very much about the risk of looking at random events that appear meaningful. Standard statistical procedures can provide insights into the validity of small data analysis and assumptions made, including the confidence that we can reasonable assign or associate with such. For small-data-driven decisions it is far better to approach the data-driven decision making process according with ideal process description above, rather than attempting to selected relevant data out of a bigger data store.

Intuition about data.

As discussed previously, we humans are very good at detecting real, as well as illusory (imagined), correlations and patterns. Likewise, so are our statistical tools, algorithms and methodologies we apply to our data. Care must always be taken to ensure that inferences (assumptions) being made are also sensible and supported by statistical theory.

Correlations can help us make predictions of the effect of event may have on another. Correlations may help us to possible understand relationships between events and possibly also their causes (though that one is more difficult to tease out as we will discuss below). However, we should keep in mind that correlation between two events does not guaranty that one event causes the other, i.e., correlation does not guaranty causation. A correlation, simply means that there is a co-relation between X and Y. That is that X and Y behave in a way (e.g., linearly) that a systematic change of X appears to be followed by systematic change of Y. As plenty of examples have shown (e.g., see Tyler Vigen’s website spurious correlations) correlation between two events (X and Y) does not mean that one of them causes the other. They may really not have anything to do with each other. It simply means they co-relate to each other in a way that allow us to infer that a given change in one relates to a given change in the other. Our personal correlation detector, the one between our ears, will quickly infer that X causes Y, after it has establish a co-relation between the two.

Too tease out causation (i.e., action X causes outcome Y) in a statistical meaningful way we need to conduct an experimental design, making appropriate use of randomized setup. It is not at all rare to observe correlations between events that we know are independent and/or have nothing to do with each other (i.e., spurious correlation). Likewise it is also possible having events that are causally dependent while observing a very small or no apparent correlation, i.e., corr(X,Y) ≈ 0, within the data sampled. Such a situation could make us conclude wrongly that they have nothing to do with each other.

Correlation is a mathematical relationship that co-relates the change of one event variable ∆X with the proportional change of another event ∆Y = α ∆X. The degree of correlation between the events X and Y we can define as

with the first part (after the equal sign) being the general definition of a correlation between two random variables. The second part is specific to measurements (samples) related to the two events X and Y. If the sampled data does not exhibit a systematic proportional change of one variable as the other changes the corr(X,Y) will be very small and close to zero. For selective or small data samples, it is not uncommon to find the correlation between two events, where one causes the other, to be close to zero and thus “falsely” conclude that there is no correlation. Likewise, for selective or small data samples spurious correlations may also occur between two events, where no causal relationship exist. Thus, we may conclude that the is a co-relation between the events and subsequently we may also “falsely” believe that there is a causal relationship. It is straightforward to get a feeling for these cautionary issues by simulation using R or Python.

The central limit theorem (CLT among friends) ensures that irrespective of distribution type, as long as the sample size is sufficiently big (e.g., >30) sample statistics (e.g., mean, variance, correlation, …) will tend to be normally distributed. Sample variance of the statistic narrows as the sample size increases. Thus for very large samples, the sample statistic converges to the true statistic (of the population). For independent events the correlation between those events will be zero (i.e., the definition of independent events). CLT tells us that the sample correlations between the independent random events will take the shape of a standardized normal distribution. Thus, there will be a non-zero chance that a sample correlation is different from zero violating our expectation for two independent events. As said, our intuition (and math) should tell us that as the sample sizes increase, the sample variance should narrow increasingly around zero which is our true expectation for the correlation of independent events. Thus, as the size growths, the spread of sampled correlations, that is the spurious non-zero correlation reduces to zero, as expected for a database which have been populated by sampling independent random variables. So all seem good and proper.

As more and more data are being sampled, representing diverse events or outcomes, and added to our big data storage (or database), finding spurious correlations in otherwise independent data will increase. Of course there may be legitimate (causal) correlations in such a database as well. But the point is, that there may also be many spurious correlations, of obvious or much less obvious non-sensical nature, leading to data-driven decisions without legitimate basis in the data used. The range (i.e., max – min) of the statistics (e.g., correlation between two data sets in our data store) will in general increase as the amount of data sets increases. If you have a data set with data of 1000 different events, then you have almost half a million correlation combinations to trawl through in the hunt for “meaningful” correlations in your database. Searching (brute force) for correlations in a database with million different events would result in half a trillion correlation combinations (i.e., approximately half the size of number of data sets squared for large data bases). Heuristically, you will have a much bigger chance of finding a spurious correlation than a true correlation in a big-data database.

Does decision outcome matter?

But does it all matter? If a spurious correlation is persistent and sustainable, albeit likely non-sensical (e.g., correlation between storks and babies born), a model based on such a correlation may still be a reasonable predictor for the problem at hand and be maybe of (some) value … However, would we bet your own company’s fortune and future on spurious non-sensical correlation (e.g., there are more guarantied ways of having a baby than waiting for the stork to bring it along). Would we like decision makers to impose policy upon society based on such conjecture and arbitrary inference … I do not think so … That is, if we are aware and have a say in such.

In the example above, I have mapped out how a data driven decision process could look like (yes, complex but I could make it even more so). The process consist of 6 states (i.e., Idea, Data Gathering, Insights, Consultation, Decision, Stop) and actions that takes us from one state to the other (e.g., Consult → Decision), until the final decision point where we may decide to continue, develop further or terminate. We can associate our actions with likelihood (e.g., based on empirical evidence) of a given state transition (e.g.., Insights → Consult vs Insights → Decision, …) occurs. Typically, actions are not symmetric, in the sense that the likelihood of going from action 1 to action 2 may not be the same as going from action 2 back to action 1. In the above decision process illustration, one would get that for many decision iterations (or over time) we would find ourselves to terminate an idea (or product) ca. 25% of the time, even though the individual transition, Decision → Stop, is associated with a 5% probability. Although, this blog is not about “Markov decision processes” one could associate reward units (i.e., can be negative or zero as well) to each process transition and optimize for the best decision subject to the reward or cost known to the corporation.

Though, let us also be real about our corporate decisions. Most decisions tend to fairly incremental. Usually, our corporate decisions are reactions to micro-trends or relative smaller business environmental changes. Our decision making and subsequent reactions to such, more often than not, are in nature incremental. It does not mean that we, over time, cannot be “fooled” by spurious effects, or by drift in the assumed correlations, that may eventually lead to substantially negative events.

The survey.

In order to survey the state of corporate decision making in general and as it related to data-driven decision making, I conducted a paid surveymonkey.com survey, “Corporate data-driven decision making and the role of Artificial Intelligence in the decision making process”. A total of 400+ responses were collected across all regions of the United States with census for balancing gender and age (between 18 – 70) with an imposed annual household income at US$100k per annum. 70% of the participants holds a college degree or more, 54% of the participants describes their current job level as middle management or higher. The average age of the participants were 42 years of age. Moreover, I also surveyed my LinkedIn.com network as well as my Slack.com network associated with Data Science Master of Science studies at Colorado University, Boulder. In the following, I only present the outcome of the survey based on the surveymonkey.com’s paid survey as this has been sampled in a statistically representative way based on USA census and within the boundaries described above.

Basic insight into decision making.

Just to get it out of the way, a little more than 80% of the respondents believe that gender does not play a role in corporate decision making. Though it also means that a bit less than 20% to believe that men and women either better or worse in making decisions. 11% of the respondents (3 out of 4 women) believes that women are better corporate decision makers. Only 5% (ca. 3 out of 5) believes that men are better at making decisions. An interesting follow research would be looking at decision making under stressed conditions. Though, this was not a focus in my questionnaire.

Almost 90% of the respondent where either okay, enjoy or love making decision related to their business. A bit more than 10% do not enjoy making decisions. There are minor gender difference in the degree of appreciation for decision making but statistically difficult to say whether such are significant or not.

When asked to characterize their decision making skill in comparison with their peers, about 55% acknowledge they are about the same as their peers. What is interesting (but not at all surprising) is that almost 40% believes that they are better in making decisions than their peers. I deliberately asked to judge the decision abilities as “About the same” rather than average but clearly did not avoid the so-called better-than-average effect often quoted in social judgement studies. What this means for the soundness of decision making in general, I will leave for you to consider.

Looking at gender differences in self-enhancement compared to their peers. There are significantly more males believing they are better than their peers than is the case for female respondents. While for both genders 5% believe that they are worse than they peers in making decisions.

Having the previous question in mind, lets attempt to understand how often we consult with others (our peers) before making a business or corporate decision. A bit more than 40% of the respondents frequently consults with others prior to their decision making. In the survey frequently has been defined as 7 out of 10 times or higher. Similarly a bit more than 40% would consult others in about half of their corporate decisions. It may seem a high share that do not seek consultation on half of their business decisions (i.e., glass half empty). But keep in mind we also do make a lot of uncritical corporate decisions that is part of our job description and might not be important enough to bother our colleagues or peers with (i.e., glass half full). Follow up research should explore the consultation of critical business decisions more carefully.

The gender perspective on consulting peers or colleagues before a decision-making moment seem to indicate that men more frequently (statistically significant) seek such consultation than women.

For many of us, out gut-feel plays a role in our decision-making. We feel a certain way about the decision we are about to make. Indeed for 60% of the respondents their gut-feeling was important in 50% or more of their corporate decisions. And about 40% of the respondents was of the opinion that their gut-feel was better than their peers (note: these are not the same ca. 40% believing that they are better decision makers than their peers). When it comes to gut-feeling, its use in decision making and its relative quality compared to peers there is no statistical significant gender difference.

The state of data-driven decision making.

How often is relevant data available to your company or workplace for your decision making?

And when such data is available for your decision-making process how often are you actually making use of it? In other words, how data-driven is your company or workplace?

How would you assess the structure of the available data?

and what about its quality?

Are you having any statistical checks done on your data, assumptions or decision proposals prior to executing a given data-driven decision?

I guess the above outcome might be somewhat disappointing if you are a strong believer in the Mathematical Corporation with only 45% of respondents frequently applying more rigorous checks on the validation of their decision prior to executing them.

My perspective is a bit that if you are a data-driven person or company, assessing the statistical validity of the data used, assumptions made and decision options, would be a good best practice to perfect. However, also not all decisions, the even data-driven ones, may be important enough (in terms of any generic risk exposure) to worry about statistical validity. Even if the data used for a decision are of statistical problematic nature and thus may add additional risk to or reduce the quality of a given decision, the outcome of your decision may still be okay albeit not the best that could have been. And even a decision made on rubbish data have a certain chance of being right or even good.

And even if you have a great data-driven corporate decision process, how often do we listen to our colleagues opinion and also consider that in our decision making?

For 48% of the respondents, the human insight or opinion, is very important in the decision making process. About 20% deem the human opinion of some importance.

Within the statistical significance and margin of error of the survey, there does not seem to be any gender differences in the responses related to the data-driven foundation and decision making.

The role of AI in data-driven decision making.

Out of the 400+ respondents 31 (i.e., less than 8%) had not heard of Artificial Intelligence (AI) prior to the survey. In the following, only respondents who confirmed to have heard about AI previously will be asked question related to AI’s role in data-driven decision-making. It should be pointed out that this survey does not explore what the respondent understand an artificial intelligence or AI is.

As have been consistent since I started tracking peoples sentiment towards AI in 2017, more women than men appears to have a more negative sentiment towards AI than men. Men, on the other hand, are significantly more positive towards AI than women. The AI sentiment haven’t changed significantly over the last 4 years. Maybe slightly less positive sentiment and a more neutral positioning in the respondents.

Women appear to judge a decision-making optimized AI to be slightly less important for their company’s decision making process. However, I do not have sufficient data to resolve this difference to a satisfactory level of confidence. Though if present may not be surprising due to women’s less positive sentiment towards AI in general.

In a previous blog (“Trust thou AI?”), I described in detail the Human trust dynamic towards technology in general and cognitive systems in particular such as machine learning applications and the concept of artificial intelligence. Over the years the trust in decisions based on AI, which per definition would be data-driven decisions, have been consistently skewed toward distrust rather than trust.

Bias

Bias is everywhere. It is part of life, of being human as well as most things touched by humans. We humans have so many systematic biases (my favorites are: availability bias I see pretty much every day, confirmation bias and framing bias … yours?) that leads us astray from objective rationality, judgement and good decisions. Most of these so-called cognitive biases we are not even aware off, as they work on an instinctive level, particular when decision makers are under stress or time constraints in their corporate decision making. My approach to bias is that it is unavoidable but can be minimized and often compensated, as long as we are aware of it and its manifestations.

In statistics, Bias is relative easy to define and compute

Simply said, the bias of an estimated value (i.e., statistic) is the expected value of the estimator minus the true value of the parameter value being estimated. For an unbiased estimator, the bias is zero (obviously). One can also relate the mean square error minus the variance of the estimator to bias. Clearly, translating human biases to mathematics is a very challenging task if at all possible. Mathematics can help us some of the way (sometimes) but it is also not the solution to all issues around data-driven and human driven decision making.

Bias can be (and more often than not, will be) present in data that is either directly or indirectly generated by humans. Bias can be introduced in the measurement process as well as in data selection and post-processing. Then, in the modelling or analytic phase via human assumptions and choices we make. The final decision-making stage, that we can consider as the decision-thinking stage, the outcome of the data-driven process, comes together with the human interpretation & opinion part. This final stage also includes our business context (e.g., corporate strategy & policies, market, financials, competition, etc..) as well as our colleagues and managers opinions and approvals.

41% of the respondents do believe that biased data is a concern for their corporate decision making. Given how much public debate there has been around bias data and it’s impact on public as well as private policy, it is good to see that a majority of the respondents recognize the concern of biased data in a data-driven decision making process. If we attribute “I don’t know” response to uncertainty and this leads to questioning of bias in data used for corporate decision making, then all is even better. This all said, I do find 31% having no concerns about biased data, a relative high number. It is somewhat concerning, particular for decision makers involved in critical social policy or business decision making.

More women (19%) than men (9%) chose the “I don’t know” response to the above question. It may explain, why fewer women have chosen the ‘Yes’ on “biased data is a concern for decision making” giving maybe the more honest answer of “I don’t know”. This is obviously speculation and might actually deserve a follow up.

As discussed above, not only should the possibility for biased data be a concern to our data-driven decision making. Also the tools we are using for data selection and post-processing may be sources that introduces biases. Either directly, introduced by algorithms used for selection and/or post-processing or indirectly in the human choices made and introduced assumptions to the selected models or analytic frameworks used (e.g., parametrization, algorithmic recipe, etc..).

On the question “Is biased tools a concern for your corporate decision making?” the answer are almost too nicely distributed across the 3 possibilities (“Yes”, “No” and “I don’t know”). Which might indicate that respondents actually do not seem to have a real preference or opinion. Though, more should have ended up in “I don’t know” if really the case. It is a more difficult technical question and may require more thinking (or expert knowledge) to answer. It is also a topic that have been less prominently discussed in media and articles. Though the danger with tooling is of course that they are used as black boxes for extracting insights without the decision maker appreciating possible limitations of such tools.

There seem to be a slight gender difference in the response. However, the differences internally to the question as well as to the previous question around “biased data” is statistically non-conclusive.

After considering the possibility of biased data and biased tooling, it is time for some self-reflection on how biased do we think we are ourselves and compare that to our opinion about our colleagues’ bias in the decision making.

Almost 70% of the respondents, in this survey, are aware that they are biased in their decision making. The remainder either see themselves as being unbiased in their decision making (19%, maybe a bias in itself … blind spot?) or that bias does not matter (11%) in their decision making.

Looking at our colleagues, we do attribute a higher degree of bias to their decisions than our own. The 80% of the respondents think that their colleagues a biased in their decision making. 24% believe that their colleagues are frequently biased in their decisions as opposed to 15% of the respondents in their own decisions. Not surprisingly, we are also less inclined to believe that our colleagues are unbiased in their decisions compared to ourselves.

While there are no apparent gender differences in how the two bias question’s answers are distributed, there is a difference in how we perceive bias for ourselves and for our colleagues. We may tend to see ourselves as less biased than our colleagues. As observed with more respondents believing that “I am not biased at all in my decisions” compared to their colleagues (19% vs 12%) and perceiving their colleagues as frequently being biased in their decisions compared to themselves (24% vs 15%). While causation is super difficult to establish in such survey’s as this one, I do dare speculate that one of the reasons we don’t consult our colleagues on a high amount of corporate decisions may be the somewhat self-inflated image of ourselves being better at making decisions and being less biased than our colleagues.

Thoughts at the end

We may more and more have the technical and scientific foundation for supporting real data-driven decision making. It is clear that more and more data are becoming available to decision makers. As data stores, or data bases, grows geometrically in size and possibly in complexity as well, the human decision maker is either forced to ignore most of the available data or allow insights for the decision-making process to be increasingly provided by algorithms. What is important in the data-driven decision process, is that we are aware that it does not give us a guaranty that decisions made are better than decision that are more human-driven. There are many insertion points in a data-driven decision making process where bias can be introduced with or without the human being directly responsible.

And for many of our decisions, the amount of data available to our most important corporate decisions are either small-data, rare-data or not available. More than 60% of the respondents characterize the data quality they work with in their decision-making process of being Good (i.e., defined as uncertain, directionally ok with some bias nu of limited availability), Poor or Very Poor. About 45% of the respondents states that data is available of 50% or less of their corporate decisions. Moreover, when data is available a bit more than 40% of the corporate decision makers are using it in 50% or less of their corporate decisions.

Compared to the survey 4 years ago, this time around the respondents perception of bias in the decision making process was introduced. About 40% was concerned about having biased data influencing their data-driven decision. Ca. 30% had no concern towards biased data. Asked about biased tooling, only about 35% stated that they were concerned for their corporate decisions.

Of course, bias is not only limited to data and tooling but also to ourselves and our colleagues. When asked for a self-assessment of how biased the respondent believes to be in the corporate decision-making, a bit more than 30% either did not believe themselves to be biased or that bias does not matter for their decisions. Ca. 15% stated that they were frequently biased in their decision making. So of course we often are not the only decision makers around, our colleagues are as well. 24% of the respondents believed that their colleagues were frequently biased in their decisions. Moreover, for our colleagues, 21% (vs 30% in self-assessment) believe that their colleagues are either not biased at all or that bias does not matter for their decisions. Maybe not too surprising when respondents very rarely would self-assess to be worse decision makers than their peers.

Acknowledgement.

I greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of writing this Blog. Also many of my Deutsche Telekom AG, T-Mobile NL & Industry colleagues in general have in countless of ways contributed to my thinking and ideas leading to this little Blog. Thank you!

Readings

Kim Kyllesbech Larsen, “On the acceptance of artificial intelligence in corporate decision making – A survey”, AIStrategyBlog.com (November 2017). Very similar survey to the one presented here.

Kim Kyllesbech Larsen, “Trust thou AI?”, AIStrategyBlog.com (December 2018).

Nassim Taleb, “Fooled by randomness: the hidden role of chance in life and in the markets”, Penguin books, (2007). It is a classic. Although, to be honest, my first read of the book left me with a less than positive impression of the author (irritating arrogant p****). In subsequent reads, I have been a lot more appreciative of Nassim’s ideas and thoughts on the subject of being fooled by randomness.

Josh Sullivan & Angela Zutavern, “The Mathematical Corporation”, PublickAffairs (2017). I still haven’t made up my mind whether this book describes a Orwellian corporate dystopia or paradise. I am unconvinced that having more scientists and mathematicians in a business (assuming you can find them and convince them to join your business) would necessarily be of great value. But then again, I do believe very much in diversity.

Ben Cooper, “Poxy models and rash decisions”, PNAS vol. 103, no. 33 (August 2006).

Michael Palmer. “Data is the new oil”, ana.blogs.com (November 2006). I think anyone who uses “Data is the new oil” should at least read Michael’s blog and understand what he is really saying.

Michael Kershner, “Data Isn’t the new oil – Time is”, Forbes.com (July 2021).

J.E. Korteling, A.-M. Brouwer and A. Toet, “A neural network framework for cognitive bias”, Front. Psychol., (September 2018).

Chris Anderson, “The end of theory: the data deluge makes the scientific method obsolete“, Wired.com (June 2008). To be honest when I read this article the first time, I was just shocked by the alleged naivety. Although, I later have come to understand that Chris Anderson meant his article as a provocation. “Satire” is often lost in translation and in the written word. Nevertheless, the “Correlation is enough” or “Causality is dead” philosophy remains strong.

Christian S. Calude & Giuseppe Longo, “The deluge of spurious correlations in big data”, Foundations of Science, Vol. 22, no. 3 (2017). More down my alley of thinking and while the math may be somewhat “long-haired”, it is easy to simulate in R (or Python) and convince yourself that Chris Anderson’s ideas should not be taken at face value.

“Beware spurious correlations”, Harvard Business Review (June 2015). See also Tyler Vigen’s book “Spurious correlations – correlations does not equal causation”, Hachette books (2015). Just for fun and provides amble material for cocktail parties.

David Ritter, “When to act on a correlation, and when not to”, Harvard Business Review (March 2014). A good business view on when correlations may be useful and when not.

Christopher S. Penn, “Can causation exist without correlation? Yes!”, (August 2018).

Therese Huston, “How women decide”, Mariner books (2017). See also Kathy Caprino’s “How Decision-Making is different between men and women and why it matters in business”, Forbes.com (2016). Based on interview with Therese Huston. There is a lot of interesting scientific research indicating that there are gender differences in how men and women make decisions when exposed to considerable risks or stress, overall there is no evidence that one gender is superior to the other. Though, I do know who I prefer managing my investment portfolio (and its not a man).

Lee Roy Beach and Terry Connolly, “The psychology of decision making“, Sage publications, (2005).

Young-Hoon Kim, Heewon Kwon and Chi-Yue Chiu, “The better-than-average effect is observed because “Average” is often construed as below-median ability”, Front. Psychol. (June 2017).

Aaron Robertson, “Fundamentals of Ramsey Theory“, CRC Press (2021). Ramsey theory accounts for emergence of spurious (random) patterns and correlations in sufficiently large structures, e.g., big data stores or data bases. spurious patterns and correlations that appear significant and meaningful without actually being so. It is easy to simulate that this the case. The math is a bit more involved although quiet intuitive. If you are not interested in the foundational stuff simply read Calude & Longo’s article (referenced above).

It is hard to find easy to read (i.e., non-technical) text books on Markov chains and Markov Decision Processes (MDP). They tend to adhere to people with a solid mathematical or computer science background. I do recommend the following Youtube videos; on Markov Chains in particular I recommend Normalized Nerd‘s lectures (super well done and easy to grasp, respect!). I recommend to have a Python notebook on the side and build up the lectures there. On Markov Decision Processes, I found Stanford CS221 Youtube lecture by Dorsa Sadigh reasonable passable. Though, you would need to have a good grasp of Markov chains in general. Again running coding in parallel with lectures is recommendable to get hands on feel for the topic as well. After those efforts, you should get going on re-enforcement learning (RL) applications as those can almost all be formulated as MDPs.

A hands-on guide to data-driven business decisions – How to gain insights into an uncertain world.

“Would you like an adventure now or shall we have tea first”, Alice in Wonderland (Lewis Carroll).

Google parent company Alphabet breached a US$2 (~ €1.8) trillion valuation in November this year (2021). What few realize is that this valuation, as well as the success of Google, is based on a Markov Decision Processes and its underlying Markov chain. Making PageRank, the brain child of Sergey Brin and Larry Page, likely the most valuable and most used applications of a Markov chain ever, period. We all make use of a Markov Chain and a Markov Decision Process pretty much every day, many many times a day. The pages of the world wide web, with its 1.5+ billion indexed pages, can be designated states of a humongous huge Markov chain. And the in excess of 150+ billion hyperlinks, between those web pages, can be seen as the equivalent of Markov state transitions, taken us from one State (web page) to another State (another web page). The Markov Decision Process value and policy iteration ensures that the consumers of Google’s search engine gets the best search results in the fastest possible way. Many hyperlink paths may lead to your page of interest, but many (maybe most) of those paths will not get you where you want to be fastest. Optimization algorithms develop on the world-wide web Markov chain will find the best path to the content you want (or the algorithm “thinks” you want). I cannot think of a better motivating example for business people to get acquainted with data-driven decision processes and the underlying Markov chains than that.

In the course of my analysis of corporate decision makers sentiment towards data-driven decision making and processes, it became clear that there is very little comprehensive material (for business people) on structured hands-on approaches to decision processes. How do you go around implementing a data-driven decision process in your company? and this despite, such approaches can be fantastic tools for structuring and optimizing a business decision making processes. Not only that. We can integrate such decision processes algorithms into our digital platforms. Achieving a very high degree of autonomous business decisions (e.g., Google’s PageRank decision process), that in turn will enhancing our customer’s experience. Also, it allows us to monitor the overall quality of customer interactions with our digital environment(s) (e.g., web-based environments, user-interfaces, closed-loop user-experience, digital sales, apps, bots, etc..). Such data-driven integration would result in more efficient business processes internally and towards external customers and partners.

So, my goal, with this blog, is to take you through the essentials, the enablers, to better understand what a decision process may look like and its wide applicability towards business-relevant decision making.

I like hands-on. The focus of this blog will be on providing coding examples for the reader to implement and in parallel play with the examples given. Maybe even better encourage you to create your own examples, relevant to your area of business interest. I will take you through two major important enablers for digitizing data-driven decision processes. One enabler is so-called Markov Chains which is essential in understanding the other, Markov Decision Processes (MDPs). I will attempt to do this intuitively, via coding examples, rather than write down a lot of mathematical notation (which is the normal approach). I will demonstrate some relative simple business examples that hopefully will create an appetite to learn much more about this field.

At the very end of this blog, you can either stop reading or you can continue (an MDP maybe?) and will then find a more formalistic treatment of Markov Chains and MDPs. Although somewhat more “long haired” (coming from a bald guy), I attempted to make it assessable to readers that may not have a math degree.

In my previous blog, “Data-driven decision making … what’s not to like about that?”, I wrote about data-driven decision making for businesses and public institutions. I described, or more accurately pulled a so-called Markov Decision Process out of my magic hat, that we could view as a data-driven decision process,

In the example above, I mapped out how a data-driven decision process might (or should) look like. The process consist of 6 states or stages (i.e., Idea, Data Gathering, Insights, Consultation, Decision, Stop) and actions that takes us from one state to the other (e.g., Consult → Decision), until the final Decision state, where we may decide to continue, develop further, thus back to the Idea, or decide to terminate (i.e., Stop). We can associate our transitions and the associated actions with likelihoods, based on empirical evidence that fancy people may describe as process mining, of a given state transition (e.g.., Insights → Consult vs Insights → Decision, …) to occur.

The described decision process is not static. It is dynamic and is likely to evolve over time. Transitions from one state to another can be seen as moving forward in time increments. You find yourself in one stage of the process and making up your mind, that is you make a decision or take an action, what next state you “want” to move to. With a given likelihood of making such a decision if there are several stages to move to and depending on the level of stochasticity. The rules of how to move from one state to the next is given by the scenario you are considering.

The above illustration may (or may not) look complex. So let’s break down on a higher level what we see above. We have Circles that represent a Stage in the decision process, e.g., Idea, Data gathering Insights, Consult, Decision, Stop. These stages are what we also could call States which is the official name if we speak Markovian. I will use stage and state interchangeably. In between the stages we have white colored arrows that takes us, or Transition in Markovian, from one stage to the next (e.g., from Insights to Consult). A transition from one stage to the next is triggered by an Action (e.g., a decision) in the stage we are in that leads us to the following stage in our process. For each transition and associated action we can assign a Transition Probability that describes the likelihood of a particular transition & action combination to take place. For example, in the above chart, I have a transition probability from Insights stage to the Consult stage of 40%. I can also associate a reward R associated with a given transition. This reward is “banked” as you enter the new stage. For example, I can expect a reward of €R23 (see illustration above), as I transition from Insights to Consult stage. An arrow that goes back to its own stage simply mean that a decision, or action, is taken that results in us remaining in that particular state (e.g., 20% chance of remaining in the Decision stage which here implies that we continue as is). As transitions are associated with probabilities, it follows that sum of transitions probabilities (arrows) leaving a state is required to be equal to 1. It is possible to define several transition probability scenarios per state. Such scenarios is called policies in the language of Markov. The idea is that we can run through several scenarios, or policies, to determine if some are better than others and thus optimize our decision making process (assuming the optimal policy is also feasible). Typically, actions, (state) transitions, and associated rewards will not be are not symmetric. In the sense, that the likelihood (& reward) of going from State 1 to State 2 may not be the same as going from State 2 back to State 1.

What I have describe above is the fundamental setup of a so-called Markov Chains and how such can be extended to action (e.g., decision), rewards (e.g., income & cost) and policy (i.e., scenario) estimation and optimization. Thus, into what we call Markov Decision Processes (MDP) or its “cousins” Markov Reward Processes (MRP).

The question remains, of course, how do we actually do some meaningful analysis on a decision process as illustrated above? How do we code this?

While you will find some examples in the public domain on analysis of Markov Chains, it becomes a bit more technical as you move up the “intellectual food chain” to Markov Decision Processes. I have provided some simple, but generalized, Python codes throughout this blog. These will allow you to run some of this analysis yourself and gain a lot of insights into Markov chains and (Markov) Decision Processes.

A simpler customer life-cycle & retention example.

Let’s take a simple example of customer life-cycle of a typical subscription (e.g., online, magazine, services, …). We start our process with the Conversion of a prospective customer to a real customer, kicking of the customer life-cycle. After the sale, the customer starts the service which below is defined as Retention, we want to keep our customer with that service. During the retention period, or life as a customer, the illustration below assumes 3 events may happen; (1) Our Customer is okay with service and wish to continue as is (i.e., remains in retention stage), (2) Our Customer is interested to add additional features to the existing subscription, thus accepting an Upsell after which the customer falls back into the retention phase, and finally (3) Our Customer may decide to discontinue the subscribed service. Thus, Churn and end the engagement and customer life-cycle process. In fact, once the churn state has been reached, it cannot be left. We call such a state and absorbing state. A Markov chain that includes absorbing states (or at least one) is called an absorbing Markov chain.

In the above customer life-cycle illustration, we have 4 states (or stages); Conversion (S0), Retention (S1), Upsell (S2) and Churn (S3). The transition from one stage to the next (e.g., Retention → Upsell) is driven by a decision (i.e., action) to do so. Each transition is associated with a likelihood of that particular decision is being made, e.g., there is 20% that a customer decides to accept an Upsell when located in the Retention state.

In a Python code we can operationalize the process as follows;

# Customer life-cycle process (simple)
# Define States


states = {
    0 : 'Conversion',
    1 : 'Retention',
    2 : 'Upsell',
    3 : 'Churn'
}


c = 0.05      # Churn likelihood
u = 0.20      # Upsell likelihood


#Transition probability matrix:
T = np.array([
    [0.00, 1.00, 0.00, 0.00],
    [0.00, 1-u-c, u, c],
    [0.00, 1.00, 0.00, 0.00],
    [0.00, 0.00, 0.00, 1.00]])

Each row (and column) in the transition probability matrix represents the stages in the process. Note that each row is required to sum up to 1. Our curiosity of the dynamics of our process, our chain of events, should lead us to ask “In the long run, what are the chance of ending up in each of the defined states of our process?”. Actually, for our customer life-cycle process, we might like understand what is our long-term churn proportion given the current scenario. Maybe even trade-offs between upsell and churn rate.

How do get about that question solved?

Our process have reached steady-state when the likelihood of ending up in a given state does not change any longer with subsequent evolution of time. What this means is that for a given overall process state, represented by π, applying the transition matrix no longer changes that overall process state (i.e., πT = π). Thus, π then represents the expected likelihood of being in a given state once steady-state has been reached.

In our illustration above, we can kick off our process with π0 = [1, 0, 0, 0] which represents that our initial stage is in the Conversion state. Applying the transition matrix T to our initial state will transition it into the Retention stage, i.e., π0T = [0, 1, 0, 0] = π1. The next step then is π1T = [0, 0.79, 0.20, 0.01] = π2 and so forth until πT = π for all subsequent time steps (algorithmically we can describe this iterative process as πT ← π). Following this recipe we can create a small Python code that will do the work for us;

def steady_state(pi, T, epsilon=0.01):

# MARKOV CHAIN STEADY STATE.
# pi      : Given n states pi is an array of dim (n,) .
# T       : The transition probability matrix of dim (n,n)
# epsilon : Provides the convergence criteria.


    j = 0 #Counter
    
    while True:
        
        oldpi = pi
        
        pi = pi.dot(T)
       
        j+=1


        # Check Convergence
        if np.max(np.abs(pi - oldpi)) <= epsilon:
            break
        # In case of no Convergence
        if j == 1000:
            break
            
    return pi # Returning the likelihood of the steady-state states.

Using the above code, the transition matrix given above and our customer life-cycle process initial condition π0 = [1, 0, 0, 0], we find that

# Finding Customer life-cycle Steady-State
pi0 = np.array([1, 0, 0, 0])


pi = steady_state(pi0, T)  # Using above steady-state function


print(np.round(pi,2))

output>> steady-state pi = [0, 0.19, 0.04, 0.77]  

So, within the existing customer life-cycle scenario a Churn rate of 5% will in the long run lead to an overall churn likelihood of almost 80%. This is with a Retention transition probability of of 75% and an Upsell transition probability of 20%. It may be a surprising outcome, that a relative low probability action (or event) can lead to such a dominant business impact. Intuitively, we should remember that churn is a terminal event. Retention and even Upsell simply continue to operate within the life-cycle process with the ever present “doom” of a customer leaving by churning. More detailed analysis of this process will show that as long as we keep the Churn transition probability below 1.3% the Churn State’s steady-state likelihood to transition probability is below 3.3 times that of the transition probability. Above 1.3% transition probability, the steady-state Churn likelihood rapidly increase to10 – 20 times that of the base transition probability. Needless to say, the current policy, as represented by the transition matrix, would require optimization in order to minimize the churn impact and ideally we would need to aim for measures keeping churn below 1.3%.

The purpose of this blog, as stated in the beginning, is not so much studying the dynamics of a hypothetical customer life-cycle management processes or any other for that matter. The purpose of this blog is to provide you with an understanding of decision process modelling, the analysis possible, and tools to go and do that yourself.

A simple customer life-cycle example with rewards.

How do we get from the process dynamics to an assessment of value of our given strategy or policy? There are several dimensions to this question. Firstly, we would of course like to enhance the value of the overall decision process. Next, we also would like to ensure that each stage of our process has been optimized as well, in terms of value, with value being monetary, time-wise, number of stages, order of states, etc..

I will provide you with a reasonably simple approach as well as the code to work out your own examples. It is in general straightforward to add complexity to your process and still be able to use the provided code.

In the above illustration two different scenarios have been provided. Both scenarios are represented by the same transition probability matrix. This is not a requirement but makes the example simpler. Policy 1 mimics an “annual subscription model” with a subscription value of 100 per annum, an upsell value of 20 per annum and a (1-time) loss of value by churn of -100. Policy 2 represents a “monthly subscription model” with a subscription value of 10 per month, an upsell value of 2 per month and a (1-time) loss of value by churn of -60. I would like to know what the overall expected value is for each of the two scenarios.

However, Houston we (may) have a problem here. Remember that once we end up in the “Churn” state we are stuck in that state (i.e., we don’t “un-churn” … at least not in this process). From a transition probability matrix perspective once we are in the churn state every new iteration transition back to the same state. In fact, as already discussed previously, the churn state acts as an absorbing state, making our customer life-cycle Markov chain an absorbing Markov Chain. In this particular case, we would like to assign a one-time penalty (i.e., negative reward) to the churn state and then end the process. Of course this may not always be the case. But, here it is the case. From a (state) transition matrix perspective we have created our own equivalent of “Groundhog day” as we keep returning to same state and thus might end up multiplying the churn penalty times the number of iterations it takes us to get to convergence of the overall system. Unless we are careful. We have two solutions (1) Choose an appropriate small churn penalty that with a given discount factor (lets call it γ) ensures that the penalty quickly converges to a reasonable figure (yeah … not the nicest solution, but it could work) and (2) Introduce an End-state with zero reward/penalty where the churn-state has a probability of 1 to end up in. Thus, pushing the absorbing state away from the churn-state which will be convenient for the valuation estimation process. This, will ensure that as we end up in the churn-state, its penalty is only counted once. This will be an important consideration as we commence on value iteration for a Markov Decision Process. So, our above illustration needs a bit of revision and get’s to look like this,

and our Python implementation of the states, transition matrix and reward vector will look like this,

# Customer life-cycle process (simple)
# Define States


states = {
    0 : 'Conversion',
    1 : 'Retention',
    2 : 'Upsell',
    3 : 'Churn',
    4 : 'End'
}


c = 0.05      # Churn likelihood
u = 0.20      # Upsell likelihood


#Transition probability matrix:
T = np.array([
    [0.00, 1.00, 0.00, 0.00, 0.00],
    [0.00, 1-u-c, u,    c,   0.00],
    [0.00, 1.00, 0.00, 0.00, 0.00],
    [0.00, 0.00, 0.00, 0.00, 1.00],    # Churn transition to End State
    [0.00, 0.00, 0.00, 0.00, 1.00]])   # End state

If you run the 5-state transition matrix through the above Python “steady_state()” function, you would get π = (0 0.18 0.04 0.01 0.77) where the last two-states (i.e., the Churn- & End-state) are the part of the same state, thus mapping it back to the 4-state model we have π = (0 0.18 0.04 0.78) which is very close to the above “true” 4-state model (i.e., with the churn-state also being the end-state) π = (0 0.19 0.04 0.01 0.77). From π we see that we have a steady-state likelihood of 77% ending up in the absorbing state in our customer life-cycle process.

For value estimation purposes, we would like to ignore the absorbing state and renormalize the steady-state vector π. Thus, π’ = (0 0.19 0.04 0.01 0)/sum((0 0.19 0.04 0.01 0)) which results in a renormalized steady-state vector π’ = (0, 0.79, 0.17, 0.04, 0) that we will use to asses the long-run average value of our customer life-cycle process.

So let’s do a bit of random walk across our decision process as depicted above. I will make use of repeated random sampling (also called a Monte Carlo simulation among friends) on my decision process. The transition matrix will bring me from one stage to another with a rate provided by the respective transition probability matric (T). The reward vector (R), see above illustration, provides the reward expectation for each state (e.g., R = [0, 100, 20, -100, 0] for Policy 1). As the repeated random sampling progresses and the various decision process states are visited, we add up the respective state’s rewards until we end up in the Churn state that will end the simulation. The simulation is then repeated and the average of all the simulated process values provides our expected overall value for the decision process. We have the option to discount future values with gamma (γ < 1);

with n* = min { N, n @ EndState} with N being the maximum allowed simulation time steps iterations and “n @ EndState” is the time step where our simulation end up in the EndState terminating the simulation. This principle is similar to financial net present value calculations that value the present more than the future at for example a weighted average cost of capital (WACC) or discount rate. Also it will turn out mathematically convenient to include a discount rate as it ensures that our value iteration converges (which is pretty convenient).

The Python code for the described simulation is provided here;

def mc_random_walk_reward(states, state0, T, R, gamma=0.9, tot_steps, EndState)

# EXPECTED POLICY & VALUE ASSESSMENT VIA RANDOM WALKS.
# states     : Array which includes the names of the n states being studied. 
# T          : The transition probability matrix of dim (n,n)
# R          : The reward array of dim (n,) e.g., can have several columns pending 
#              policies.
# gamma      : Value Discount Factor, < 1.0
# tot_ steps : Maximum number of time step iterations unless EndState is 
#              encounted.
# EndState   : The end-state that would stop the simulation unless steps is 
#              encounted.
    
    n = steps
    stp = 0
    start_state = state0
    path = [states[start_state]]
    value_state = R[state0]*(gamma**0)
    
    prev_state = start_state
    
    i = 1 
    while n:
        curr_state = np.random.choice(list(states.keys()), p = T[prev_state])
        value_state += R[curr_state]*(gamma**i)
        path+=[states[curr_state]]
        prev_state = curr_state
        i+=1
        n-=1
        if states[prev_state] == EndState: 
            stp = n + 1
            break

    return (path, value_state, stp)

The Transition matrix remain, thus we need to define the Reward vector;

# Reward vector for Policy 1 - Annual Subsctiption Model.

R1 = np.array([
    [0],
    [100],
    [20],
    [-100],
    [0])

# Reward vector for Policy 2 - Monthly Subsctiption Model.

R2 = np.array([
    [0],
    [10],
    [2],
    [-60],
    [0])

Thus, we are ready to call run the decision process valuation;

j = 0
stop = 10000 # Total amount of Monte Carlo simulations


df_value = []
df_time = []


# For R use R1 for Policy 1 and R2 for Policy 2.

while True:
    a,b,c = mc_random_walk_reward(states, 0, T, R, 0.90, 100, 'Churn')
    df_value.append(b)
    df_time.append(c)
    j+=1
    if j == stop:
        break

unit_time = 1 # 1 month in 1 time unit iteration.

print('Total Value of Policy 2: ', np.round(np.mean(df_value)/unit_time,0))
print('Mean time to Churn of Policy 2: ', np.round(np.mean(df_time),0))

output>> Expected Total Value of Policy 2:  42.0
output>> Average time to Churn of Policy 2:  77.0

The expected total value of Policy 2 comes out at ca. 42. Applying the above code to Policy 1, get’s us an expected total value of 46 and thus a bit more attractive than Policy 2. This provides a fairly easy way to get an idea about a given decision process value. As we enter the maximum amount of time step iteration steps (e.g., 100 in the code snippet) it is good to check that the average time to churn over the total amount of Monte Carlo simulations is less than this number (e.g., 77 < 100 in the above code snippet). It is wise to play a bit with the maximum steps number to check whether your expected total value of your policy changes significantly.

It should be noted that as the Monte Carlo simulation of the Markov Chain terminates upon Churn, the churn penalty is only accounted for once as is appropriate in this particular example. In other words, for this part we would not strictly speaking require the fifth state to end the process.

If we compare the expected value of our process of 42.0 (Policy 2) with the Value Iteration algorithmic approach (more on that below), used on Markov Decision Processes, we find steady-state state values of V[Policy 2] = [ 42.2, 46.9, 44.2, -60, 0] = [Conversion, Retention, Upsell, Churn, End]. With the End-state representing our absorbing state. Using our renormalized steady-state state π’ = (0.00, 0.79, 0.17, 0.04, 0.00), we find that our long-run average value is

G = V ∙ π’ = 42.2 ∙ 0.00 + 46.9 ∙ 0.79 + 44.2 ∙ 0.17 + (-60) ∙ 0.04 + 0 ∙ 0.00 = 42.2

Which is a pretty good agreement with the Monte Carlo simulations of our Customer Lifecycle Process. And even better … a much faster way of getting the long-run value. However, if you are dealing with an (or several) absorbing states, some caution in how to compensate for those should be considered. I like in general to run a Monte Carlo process simulation just to ensure that my value iteration extraction logic is correct.

Decision process optimization … Dual policy and value iteration.

We are often in situations where we need to evaluate different assumptions or policies in our decision process. Imagine you are looking at a subscription process, as shown below, where you have broken down the customer life-cycle in an 3 parts (i.e., states or stages); (1) Start of subscription, (2) Average life-time and (3) long-time subscriptions. You are contemplating on two different policies; Policy 0 (white arrows): No churn intervention and Policy 1 (orange arrows): Churn intervention measures at each stage in the subscription process. After a churn intervention, at a given state, your system will treat you as a new customer (note: this might not be the smartest thing to do, but it is the easiest to illustrate).

The decision process for such a customer life-cycle process is illustrated below. I have deliberately not added an end-state after churn in this example (i.e., so strictly speaking once we end up in the Churn state we are in our “Groundhog state” of “perpetual” churn … which is btw why we like to call it an absorbing state). It adds complexity to the Markov chain and the purpose of this example is to show the policy and value optimization in a situation where we have 2 policies to consider in our decision process.

You would like to know, what the value is at each state, given observed churn. Moreover, you also require to know when (time-wise) and where (state-wise) it might become important to shift from one policy to the other. So let’s (Python) code the above customer loyalty decision process;

# Customer life-cycle process - dual value and policy iteration
# Define States

# Subcription state

states = {
    0 : 'Start',
    1 : 'Average life-time ',
    2 : 'Long-time',
    3 : 'Churn'
}


#Transition probability matrix:


c = 0.05 # Churn rate; try to increase this with steps of 0.05
         # and you will see that value and policies begin to change as
         # the churn rate increases
    
    # Policy 0 - No churn intervention measures.
    policy_1 = np.array([
        [0.00, 1-c, 0.00, c],
        [0.00, 0.00, 1-c, c],
        [0.00, 0.00, 1-c, c],
        [0.00, 0.00, 0.00, 1.00]
        ])
    
    
    # Policy 1 - Churn Intervention measures.
    policy_2 = np.array([
        [1.00, 0.00, 0.00, 0.00],
        [1.00, 0.00, 0.00, 0.00],
        [1.00, 0.00, 0.00, 0.00],
        [0.00, 0.00, 0.00, 1.00]
        ])
    
    
    T =  np.array([policy_1, policy_2]) #Transition probability matrix
    
    
    # Reward array for the two policies
    # 1st column reflects rewards in Policy 0
    # 2nd column reflects rewards in Policy 1

    R = np.array([
        [0, 0],
        [0, 1],
        [4, 2],
        [-1, -1]])
    
    
    print(states)
    print(T)
    print(R)
  

In order for us to get the optimal decision process value and respective policy we are making use of an iterative computing algorithm called Value Iteration. The value iteration procedure provides the values of the various states of our decision process with known transition probabilities and their corresponding rewards. The same algorithm also provides for the optimal policy matching the optimum values (i.e., Policy Iteration). I have implemented the Value and Policy Iteration algorithm in the code below.

def mdp_valueIteration(states,T,R, gamma = 0.90, epsilon = 0.01):

# VALUE AND POLICY ITERATION
# states  : Array which includes the names of the n states being studied. 
# T       : The transition probability matrix of dim (n,n)
# R       : The reward array of dim (n,) e.g., can have several columns pending
#           the number of policies.
# gamma   : Value discount factor, < 1.0
# epsilon : Provides the convergence criteria.


    # Initialize V_0 to zero
    values = np.zeros(len(states))

    ctime = 0
    
    # Value iteration
    # Continue until convergence.

    while True:

        # To be used for convergence check
        oldValues = np.copy(values)

        values = np.transpose(R) + gamma*np.dot(T,values)   # Value iteration step
 
        policy = values.argmax(0)   # Take the best policy.
        values = values.max(0)      # Take the highest value
        
        ctime +=1
        
        # Check Convergence
        if np.max(np.abs(values - oldValues)) <= epsilon:
            break
         
    return(values, policy, ctime)        

All we have to do is to call the above “mdp_valueIteration” function with the transition probability matrix and respective reward vectors for Policy 0 and Policy 1.

# Call ValueIteration function and get optimum value per state and
# Optimum Policy strategy per state.

values, policy, time = mdp_valueIteration(states,T,R,gamma = 0.90)


print('Optimum State Values: ', np.round(values,0))
print('Optimum Policy      : ', policy)
print('Optimum Time steps  : ', time)

# Output for Churn rate c = 0.05

output>> Optimum State Values: [17  21  25  -10]
output>> Optimum Policy      : [ 0   0   0    0]

# Output for Churn rate c = 0.3

output>> Optimum State Values: [ 0   1   4  -10]
output>> Optimum Policy      : [ 1   1   0    0]

We have 4 states defined in our subscription decision process, “Start of subscription” (S0), “Average life-time subscription” (S1), “Long-time subscription” (S2) and “Churn”(S3). Not surprisingly we find that the highest value is delivered by the subscribers in the long-time subscription category. For relative low churn rates, the policy without churn intervention measures (i.e., Policy 0) is the most appropriate. As the churn rate increases, we observe that the best policy (in terms of value optimization between the two policies) for State 0 and for State 1, is to have a churn intervention policy (i.e., Policy 1) in place.

In summary, with a more detailed analysis of our dual-policy customer life-cycle decision process we find the following process dynamics as a function of the churn rate.

For comparison, I have also tabulated the value outcome in the case that we do not consider a churn mitigation policy.

Using the provided code and examples, it is straightforward to consider more complex decision process applications. The codes I have provide are very general and your work will be in defining your process underlying Markov chain with its transition probabilities and the respective decision process rewards (positives as well as negatives). Once that is setup, it referencing the provided code functions.

The hidden Markov model … An intro.

We are often in the situation that we get customer feedback (observable) without being complete sure what underlying (“hidden”) processes that may have influenced the customer to provide a particular feedback. A customer may interact in a particular way with our web store (again observable). That customer interaction is likely also to be influenced by hidden processes and behaviors of the underlying system dynamics (e.g., user interface architecture, front-end interactions, back-end interactions, etc…).

Let’s add some observable “happiness” and “unhappiness” to our customer retention and life-cycle process from the beginning of this blog.

We have already defined the transition probability matrix above. We also need to to write the observable sentiments associated with states into a matrix form as well (this is called the emission matrix in Markovian).

To visualize this a bit better we observe;

The following code function provides the likelihood of observing a sequence of observable states (e.g., Happy, Happy, Not Happy, Happy, Not Happy, Not Happy, …) with a given hidden Markov chain describing the underlying process that may influence the observable states.

def hmm_find_prob(seq, T ,E, pi):
# HIDDEN MARKOV CHAIN LIKELIHOOD OF A SEQUENCE OF OBSERVABLES
# seq  : The observable sequence.
# T    : Transition Probability matrix (hidden states).
# E    : Emission matrix (observable states).
# pi   : The steady-state of the hidden markov chain.


    start_state = seq[0]
    alpha = pi*E[:,start_state]


    for i in range(1, len(seq)):
        curr_state = seq[i]
        alpha = (alpha.dot(T))*E[:,curr_state]
        prob = sum(alpha)


    return prob

In order to estimate the likelihood of observable sentiment states (i.e., Happy and Not Happy) of our hidden Markov process (illustrated above) we need to code the matrices, identify the sentiment sequence we would like to access the likelihood of.

# Setting up the pre-requisites for a Hidden Markov Model


hidden_states = {
    0 : 'Conversion',
    1 : 'Retention',
    2 : 'Upsell',
    3 : 'Churn'
}


observable_states = {
    0: 'Happy',
    1: 'Sad'
}


# Note that even if the hidden and observable states are not directly used
# in this example, it is always good dicipline to write them out as they
# tie into both the transition and the emission matrix.


# Transition Probability Matrix (Hidden States)
T = np.array(
    [[0.00, 1.00, 0.00, 0.00],
     [0.00, 0.75, 0.20, 0.05],         # Note we assume 5% churn here!
     [0.00, 1.00, 0.00, 0.00],
     [0.00, 0.00, 0.00, 1.00]])


# Emission Matrix (Observable States)
E = np.array(
    [[1.0, 0.0],
     [0.7, 0.3],
     [0.8, 0.2],
     [0.0, 1.0]])


pi0 = np.array([1, 0, 0, 0])
pi = steady_state(pi0, T)          # Steady State of the underlying Markov chain.


seq = [0,0,0,0,0]       # Happy, Happy, Happy, Happy, Happy

print('Likelihood of having 5 consecutive happy experiences: ', np.round(100*hmm_find_prob(seq, T, E, pi),0), '%')


seq = [1,1,1,1,1]       # Not Happy, Not Happy, Not Happy, Not Happy, Not Happy

print('Likelihood of having 5 consecutive negative experiences: ', np.round(100*hmm_find_prob(seq, T, E, pi),0), '%')

output>> Likelihood of having 5 consecutive happy experiences:  4.0 %
output>> Likelihood of having 5 consecutive negative experiences:  78.0 %

We have to draw the conclusion that there is a much higher chance of having 5 consecutive negative experiences (78%) than 5 positive experiences (4%). This should not be too surprising as our steady state (long-term behavior) of the customer life-cycle process with 5% churn rate is 77%. As we assess that a customer in the churn state is 100% likely to be unhappy, it is clear that unhappiness would be the overwhelming expectations for the defined process. It should be noted that as an alternative, we could also run the decision process replacing monetary value with sentiment value and optimize our decision process for customer sentiment.

If I could lower my Churn rate, in my decision process, down to 1% (from 5%), increasing my Retention rate to 79% and keeping the Upsell rate at 20%, I would have 18% chance of 5 consecutive happy experiences and only 4% of 5 unhappy experiences.

It is not that difficult to see how these principles can be applied to many different settings in the digital domain interfacing with customers.

Wrapping it up.

In this blog, I have provided the reader with some insights into how to apply Markov Chains and Markov Decision Processes to important business applications. The Python code snippets you have met throughout this blog can directly be used in much more complex decision processes or deeper dives on Markov chains in general.

I have found it difficult to find reasonable comprehensive examples of how we can move from Markov chains (or models) to Markov Decision Processes as it applies to a data-driven business environment. You can find some good example on applied Markov Chains (see “For further study” below. I really recommend Normalized Nerd YouTube videos for an introduction). There is however very little material bridging the gap from Markov models to decision processes. While the math behind both Markov Chains and Markov Decision Processes are not very difficult, it is frequently presented in a way that makes it incomprehensible unless you have an advanced degree in mathematics. When you write it out for simpler Markov chains or decision processes you will see that it is not that difficult (I am still pondering on giving a few examples in this blog).

I recommend that you always run a couple of Monte Carlo simulations on your Markov chain and decision process. Comparing the outcome with the algorithmic approaches you might apply, such as steady-state state derivation, value and policy optimization, etc.. If your process contains absorbing states, be extra careful how that should be reflected in your overall process valuation and optimization. For example, if you consider churn in a customer lifecycle process, this would be an absorbing state and unless you are careful in your design on the underlying Markov chain, it may overshadow any other dynamics on your process. A Monte Carlo simulation might reveal such issues. Also, start simple, even if your target decision process may be very complex. This will allow you to understand and trust changes as you add complexity to your process.

I hope that my code snippets likewise will make this field more approachable. Anyway, if you want to deep dive into the math as well, you will find some good starting points in my literature list below.

“We should not fault an agent for not knowing something that matters, but only for having known something and then forgotten.”, The unremembered.

Acknowledgement.

I greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of writing this Blog. Also many of my Deutsche Telekom AG, T-Mobile NL & Industry colleagues in general have in countless of ways contributed to my thinking and ideas leading to this little Blog. Thank you!

For further study.

Romain Hollander, “On the policy iteration algorithm for PageRank Optimization”, MIT Report (June 2010).

Kim Kyllesbech Larsen, “Data-driven decision making … what’s not to like about that?”, LinkedIn Article (November 2021).

On Markov Chains in particular I recommend Normalized Nerd‘s lectures (super well done and easy to grasp, respect!). I recommend to have a Python notebook on the side and build up the lectures there. In any case if this is new to you start here; “Markov Chains Clearly Explained! Part – 1” (There are 7 parts in total).

Somnath Banerjee, “Real World Applications of Markov Decision Process”, towardsdatascience.com, (January 2021). Source of examples that can be worked out with the Python codes provided in this blog.

Ridhima Kumar, “Marketing Analytics through Markov Chain”, towardsdatascience.com, (January 2019). Source of examples that can be worked out with the Python codes provided in this blog.

Richard Bellman, “The theory of dynamic programming”, The RAND Corporation, P-550 (July 1954). A classic with the strength of providing a lot of intuition around value and policy iteration.

Dorsa Sadigh, Assistant Professor at Stanford University’s Computer Science Department, “Markov Decision Processes – Value Iteration | Stanford CS221”, (Autumn 2019).

Neil Walton, Reader at University of Manchester’s Department of Mathematics, “Algorithms for MDPs” (2018). Very good lectures on Markov Decision Processes and the algorithms used. They are somewhat mathematical but the examples and explanations given are really good (imo).

Rohan Jagtap, “Understanding Markov Decision Process (MDP)”, towardsdatascience.com, (September, 2020). Providing a intuitive as well as mathematical treatment of value iteration. For the mathematically inclined this is a very good treatment of the topic.

A. Aylin Tokuc, “Value iteration vs Policy iteration in Reinforcement learning”, (October 2021). Provides a nice and comprehensible overview of value and policy iteration.

Paul A. Gagniuc, “Markov Chains – From theory to implementation and experimentation“, Wiley, (2017). Lots of great examples of applied Markov Chains.

Richard J. Boucherie & Nico M. van Dijk, “Markov Decision Processes in Practice“, Springer (2017). Great book with many examples of MDP implementations.

Ankur Ankan & Abinash Pranda, “Hands -on Markov Models with Python“, Packt (2018). Provides many good ideas and inspiration of how to code Markov chains in Python.

Brian Hayes, “First Links in the Markov Chain”, American Scientist (March-April 2014). Provides a very easy to read and interesting account of Markov Chains.

Kim Kyllesbech Larsen, “MarkovChains-and-MDPs“, The Python code used for all examples in this blog, (December 2021). The link does require you to have a Github account.

Deep Dive – Markov chain & decision process fundamentals.

Andrei Andreevich Markov (1856-1922) developed his idea of states chained (or connected) by probabilities after his retirement at the old age of 50 (i.e., never too late to get brilliant ideas). This was at the turn of the 20th century. One of the most famous Markov chains, that we all make use of pretty much every day, is the pages of the world wide web with 1.5+ billion indexed pages as designated states and maybe more than 150+ billion links between those web pages which are equivalent to the Markov chain transitions taken us from one State (web page) to another State (another web page). Googles PageRank algorithm, for example, is build upon the fundamentals of Markov chains. The usefulness of Markov chains spans many many fields, e.g., physics, chemistry biology, information science/theory, game theory, decision theory, language theory, speech processing, communications networks, etc…

There are a few concepts that are important to keep in mind for Markov Chains and Markov Decision processes.

Concepts.

Environment: is the relevant space that the Markov chain operates in. E.g., could be the physical surroundings of a logistic storage facility where a robot is moving around.

State: A state is a set of variables describing a system that does not include anything about its history (the physics definition). E.g., in classical mechanics the state of a point mass is given by its position and its velocity vector (i.e., where it is and where it goes). It is good to keep in mind that the computer science meaning of state is different, in the sense that a stateful agent is designed to remember preceding events (i.e., it “remembers” its history). This is however not how a state for a Markov chain should be understood. A sequence (or chain) of random variables {S0, S1, S2, … , Sn}, describing a stochastic process, is said to have a Markov property if

that is, a future state of the stochastic process depends only the state immediately prior and on no other past states. To make the concept of state a bit more tangible, think of a simple customer life-cycle process with (only) 4 states considered; (S0) Conversion, (S1) Retention, (S2) Upsell and (S3) Churn. Thus, in Python we would define the states as a dictionary,

# Example: Customer life-cycle process (simple)
# Defining States


states = {
    0 : 'Conversion',
    1 : 'Retention',
    2 : 'Upsell',
    3 : 'Churn'
}

In our example, the states is a vector of dimension 4×1, either represented by S = (0 1 2 3) or alternatively S = (Conversion, Retention, Upsell, Churn). More generally, is n×1 vector for n states.

If a reward or penalty has been assigned to the end-state, that terminates your decision or reward process, it is worth being extra careful in your Markov chain design and respective transition probability matrix. You may want to introduce a zero-value end-state. Though, it will of course depend on the structure of the decision process you are attempting to capture with the Markov Chain.

Transition: Describes how a given state transition from one state s to another s’ (e.g, can be the same state) in a single unit of time increment. The associated (state) transition probability matrix T provides the probabilities of all state-to-state transitions for a Markov chain with a single unit of time. is square stochastic matrix defined by the number of states making up the Markov chain (i.e., for n states, is an n×n matrix). We write the state transition, facilitated by T, as:

s(t+1) = s(t) ∙per unit time step increment (iteration).

Action: an action a is defined as a choice or decision taken at the current unit of time (or iteration) that will trigger a transition from the current state into another state in the subsequent single unit of time. An action may be deterministic or random. The consequence of an action a, choice or decision, is described by the (state) transition matrix. Thus, the choice of an action is the same as a choice of a state transformation. The set of actions for a given Markov Chain is typically known in advance. Actions are typically associated with what is called a Markov Decision Process. Choosing an action at time t, in a given state transitioning to state s’, may result in a reward R(s, a, s’).

Policy: A policy represents the set (distribution) of actions associated with a given set of states (representing the Markov chain) and the respective (state) transition probability matrix. Think about a customer life-cycle process with two policies, (1) No churn remedies (or actions) and (2) Churn mitigating remedies (or actions). Policies can differ only slightly (i.e., different actions on a few states) or be substantially different. It is customary to denote a policy as πa | s), which is the math way of saying that our policy is a distribution of actions conditional to given states,

π is a function such that π : → A, with π( a | s) = PA(t) = a | S(t) = s ].

A policy, strategy or plan, specifies the set of rules governing the decision that will be used at every unit time increment.

Reward: Is defined for a given state s and is the expected reward value over all possible states that one can transition to from a given state. A reward can also be associated with a given action a (and thus may also be different for different policies π). The reward is received in state s subject to action transitioning into state s’ (which can be the same state as s). Thus, we can write the reward as R(sas’) or in case the reward is independent of the state that is transitioned to, R(sa).

The concept of reward is important in so called Markov Reward Processes and essential to the Markov Decision Process. It is customary (and good for convergence as well) to introduce a reward discount factor 0 ≤ γ ≤ 1 that discounts future rewards with γ^t. Essentially attributing less value (or reward) to events in the future (making the present more important). A positive reward can be seen as an income and a negative reward as a cost.

Thus, a Markov Chain is defined by (ST)-tuple, where S are the states and T the (state) transition probability matrix facilitating the state transition. And a Markov Reward Process is thus defined by (STR, γ)-tuple with the addition of R representing the rewards associated with the states and γ the discount factor. Finally, a Markov Decision Process can be defined by (S, ATR, γ)-tuple, with A representing the actions associated with the respective states.

The Markov Chain.

The conditional probability of being in a given state S at time t+1 (i.e., S(t+1)) given all the previous states {S(t=0), S(t=1), …, S(t=t)} is equal to the conditional probability of state S(t+1) only considering (conditioned upon) the immediate previous state S(t),

∀ S(t) ∊ Ω is a given state at time t that belong to the environment Ω the Markov chain exist in.

In other words, the state your system is in now S(t) only depends only on the previous state you where in one unit time step ago S(t-1). All other past states have no influence on your present state. Or said in another way, the future only depends on what happens now not what happened prior.

TS(t) = i → S(t+1) = j, with the transition likelihood of p_ij = P[S(t+1) = j | S(t) = i ] representing the probability of transitioning from state i to state j upon a given action a taken in state i. We will regard the T as a (n × n) transition matrix, describing how states map to each other.

Where the rows represent States and the column where a state may be mapped to. Moreover, as we deal with probabilities, each row needs to add up to 1, e.g.,

Let’s simplify a bit by considering a 4-state Markov chain;

with the following Markov chain 4-state example,

with the following transition probability matrix T,

From the above illustration we have that our states (i,j) ∈ {Conversion (0), Retention (1), Upsell (2), Churned (3)}. Thus, T(1,1) = 0.75 is the probability that an action in the Retention state results in ending up in the same Retention state. Note that the first row (first column) is designated 0, second row (column) 1, etc.. As we sum the 2nd row T(1, 0 → 3) we get 1 (i.e., 0.00 + 0.75 + 0.20 + 0.05 = 1) as we require.

Let us consider the following initial condition at time t = 0 for the above Markov model,

s0 = ( 1 0 0 0 ) we are starting out in the Conversion (initial) state s0.

s1 = s0 T = ( 0 1 0 0 ), at first time step (iteration) we end up in the Retention state.

s2 = s1 ∙T = s0 ∙T∙T = s0 ∙T^2 = ( 0.00 0.75 0.20 0.05 ). So already in 2nd time step (iteration) we have 75% likelihood of again ending up in the Retention state, 20% of ending up in the Upsell state as well as 5% chance that our customer Churn and thus ends the Markov process.

s3 = s2 ∙T = s0 ∙T∙T∙T s0 ∙T^3 =( 0.00 0.76 0.15 0.09 )

s10 = s9 ∙T = s0 ∙T^10 = ( 0.00 0.56 0.12 0.32)

s36 = s35 ∙T = s0 ∙T^36 = ( 0.00 0.19 0.04 0.77 )

Eventually, our overall Markov chain will reach steady state and ∙ T = s. It is common to use π for the Markov chain steady-state. Thus, we will frequently see π ∙ T = π, reflecting that steady state has been reached (usually within some level of pre-defined accuracy). To avoid confusion with policy mapping, which is often also described by π, I prefer to use π∞ to denote that a steady-state state has been reached.

Within a pre-set accuracy requirement of ε < 0.01, we have that s36 ≈ steady-state s-state and thus s36 ≈ s36.

It should be noted (and easy to verify) that introducing a 5th End-state (i.e., splitting up the churn-and-end-state into two states) in our example, will not change the steady-state outcome except for breaking up the churn’s steady-state value (from the 4-state steady-state analysis) into two values with their sum being equal to the 4-state churn value.

Value Iteration.

We start out with a Markov chain characterized by (S,T)-tuple that describes the backbone of our decision process. We have the option to add actions (e.g., can be a single action as well) and associate reward with the respective states and actions in our Markov chain. Thus, we expand the description of our Markov chain to that of a Markov Decision Process (MDP), that is (SATR, γ)-tuple (or for a Markov Reward Process (STR, γ)-tuple), with γ being the discount factor (0 ≤ γ ≤ 1). Rohan Jagtap in his “Understanding Markov Decision Process (MDP)” has written a great, intuitive and very assessable account of the mathematical details of MRPs and MDPs. Really a recommended reading.

We have been given a magic coin that always ends up at the opposite face of the previous coin flip, e.g., Head → Tail → Head → Tail → etc.. Thus we are dealing with a 2-state process with period cycling between the two states (i.e., after 2 tosses we are back at the at the previous face). Each state with probability 1 of transitioning to the other. Also, we are given a reward of +2 (R(H))when we are transitioning into the Head-state (S0) and a reward of +1 (R(T)) when we are transitioning into the Tail-state (S1). We have thus 2 initial conditions (a) starting with Head and (b) starting with Tail.

How does the long-run (i.e., steady-state) expected value for each of the two states H & T develop over time?

(a) Assume our magic coin’s first face is Head (H), this earns us a reward of R(H) = +2. At the next unit time step we end up in Tail (T) with probability 1 (= P[T|H)) and reward of R(T) = +1. Next step we are back in Head with probability 1 (=P(H|T)), and so forth. The future value we may choose to discount with γ (and if γ less than 1, it even guaranty that the value converges). For (b) interchange, interchange H with T (and of course rewards accordingly).

It is good to keep in mind that the reward is banked when in the state, after the transitioning into it from the previous state. The value accrued over time at a given state, is the present reward R(s) as well as the expected (discounted) reward for the subsequent states. It is customary to start out with zero value states at t=0. Though, one could also choose to use the reward vector instead to initialize the value of the states. So, here it goes,

Alright, no, I did not sum all the way infinite (I wouldn’t have finished yet). I “cheated” and used the ‘mdp_valueIteration()’ function;

# Import own Markov chain (MC) & Markov Decision Process (MDP) library
import mcmdp_v2 as mdp


# States
states = {
    0 : 'Head',
    1 : 'Tail'
}


# Transition Matrix - Magic Coin
T = np.array([[0.00, 1.00],
              [1.00, 0.00]])


# Reward Matrix - Magic Coin
R = np.array([[2], 
              [1]])


pi = np.array([1, 0,]) # Initial state, could also be [0, 1].


# Define the markov chain mc for the MDP value iteration.
mc = mdp.Mdp(states = states, pi = pi, T = T, R = R, gamma = 0.9, epsilon = 0.01)


state_values, expected_total_value, policy, ctime = mc.mdp_valueIteration() # Value iteration on mc


print('Long-run state value V[H]   :', np.round(state_values[0],1))
print('Long-run state value V[T]   :', np.round(state_values[1],1)) 

output>> Long-run state value V[H]   : 15.2
output>> Long-run state value V[T]   : 14.7

In general, we have the following value iteration algorithms representing the state-value function as we iterate over time (i),

With [1] formula describing a general MDP algorithm. Formula [2] is an MDP where the state reward function is independent of actions and subsequent state s’, and formula [3] describes a Markov Reward Process, where the reward function R is independent of the subsequent state s’. In order to get the value iteration started it is customary to begin with an initial condition (i.e., i = 0) of V_0 = 0 ∀ s ∊ S, e.g., for a 5-state process V_0 = [0, 0, 0, 0, 0] at i = 0, that is the initial value of all states in the Markov chain is set to zero.

The long-run steady-state state values are the out come of iterating the above formulas [1 – 3] until the state values are no longer changing (within a pre-determined level of accuracy). We can write the long-run steady-state values as,

with V[Sj] is the j-th state’s steady-state value and n is the number of states in the underlying Markov chain representing the MDP (or MRP for that matter).

The long-run average (overall ) value G in steady-state is

where V∞[S] is the steady-state value vector that the value iteration provided us with. π∞ is the decision process’s underlying Markov chain’s steady-state state.

One of the simpler examples to look at would be a “coin toss” process. In order to make it a bit more interesting, lets consider a unfair-ish coin to toss around. In the first example immediately below, we assume to have only 1 action and that the state rewards only depends on the state itself. Thus, we are in Formula [3] situation above. How we go around the above value-iteration algorithm is illustrated below,

Let us have another look at our customer life-cycle process. We would like to have a better appreciation of the value of each state in the decision-making process. The value iteration approach is provided in the illustration below,

Coding reference.

Kim Kyllesbech Larsen, “MarkovChains-and-MDPs“, The Python code used for all examples in this blog, (December 2021).

How do we feel about AI?

Intro.

This blog is a smaller part of a larger essay “Trust thou AI?” on human trust in AI and what to consider when attempting to create trustworthy AI-based applications, products and services.

We appear to be very positive about Artificial Intelligence or AI for short. Men in general are more positive than women. Men with young children are much more positive than any other humans (that I have surveyed). It doesn’t seem like Arnold Schwarzenegger, with his Terminator, has done much to make us have strong negative feelings towards artificial intelligence and what we believe it brings with it. Though one may argue that sentiments towards robots of course could be a different story. And as one in the audience of one of my talks challenged, maybe people feels the same about coffee machines and e-bikes … and indeed maybe that is the case … but as my surveys never asked people what they feel about consumer goods in general … I don’t know the answer for coffee machines or e-bikes. Though, even if I knew the answer, I do not think it really matter too much for this story on AI and how we feel about it.

So … how do we feel about AI?

In the above chart the choices to the question “How do you feel about AI?” has been aggregated into Negative sentiment: “I hate it”, “It scares me” and “I am uncomfortable with it” , Neutral sentiment: “I am neutral” and Positive Sentiment: “I am comfortable with it”, “I am enthusiastic about it” and “I love it” (note: why you would be scared or hate your coffee machine or e-bike sort of boggles my mind … but again I don’t know the answer to that one).

On average we are fairly comfortable with AI. Or more accurately, we feel comfortable with what we understand AI to be (and that may again depend very much on who and what we are as well as context). However, this only scratches the very surface of how we feel about AI. It is as we will see as complicated as the say on Facebook.

Let’s start with the “Bad” or Negative Feelings so we can end with the Good stuff. The negative feelings towards AI is depicted in the chart below. Here we can see that the Negative sentiment for women ends up at 32%, with 11% feeling scared by AI and 18% feeling uncomfortable with AI. For men, feelings towards AI are in general substantially less negative than expressed by women. Only 4% of men express that AI scares them and 13% that AI makes them uncomfortable.

This difference may simply reflect that men may have more difficulties admitting fear or being scared (apology for the gender stereotyping). They may substitute an emotionally charged negative feeling with another description that is less emotional or maybe opting for something more neutral. It may also be that men are in general more positive towards technology in comparison with women. This is by no means an explanation and begs more interesting questions and consequences for technology design in general if corporations intend to appeal to all genders. Nevertheless, these results are consistent with existing understanding of sex role orientation and the resulting degree of expressed fear levels. Women in general report significantly higher fear levels in tests than men. It should be note that while “being scared” or “feeling uncomfortable” may characterize different fear levels, “hate” is a different emotion than fear.

The “Good” or Positive Feelings. Men feels more positive towards AI (48%) than women (27%) as shown below. In the category “I am comfortable with it” almost twice as many men agrees. Also substantially more men appears to be enthusiastic about AI compared to women. Just as it is important to AI-based use cases to understand that more women tend to have more negative feelings towards AI than men, it is also worth keeping an eye out for designs rooted in male-driven overly optimistic designs or features that may not appeal the women in the same way.

One of the observations that have come out of conducting these “how do you feel about AI?” surveys (over the last two years) are that there are substantial gender differences (a divide may be more accurate) in how we perceive AI. This needs to be an important consideration in designing AI-based products that will meaningful appeal for both women and men (and anyone in between for that matter). Given that most AI product developers today are male, it might be good for them to keep in mind that they are not only developing products for themselves. They actually need to consider something that will be appealing to all genders.

The chart below reflects the AI sentiment of women (808) and men (815) from a total amount of 1,623 respondents across 4 surveys conducted in 2017 and 2018. Most of those results have individually been reported in my past blogs. So … Women feels in general significantly less positive towards AI compared to men. Women have a slightly more negative sentiment towards AI than positive. Overall there are more women than men who rank their feelings as neutral. Men with children (younger than 18 years of age) have the most positive feelings towards AI of all respondents. Unfortunately, the surveys that I so far have been carried out does not allow for estimating the age of the youngest child or average age of the children. Women’s sentiment towards AI does not appear (within the statistics) to be depending on whether they have children younger than 18 years of age or not or no children. Overall, I find that;

  • Women appear to be far less positive about AI than men.
  • Men with young children are significantly more positive than men and women in general.
  • Contrary to men, women’s sentiment towards AI does not appear to depend on their maternal status.

So why are we so positive … and men clearly more than women … about AI? This despite that AI is likely to have a substantial impact (to an extend it already have) on our society and way of living and working (e.g., privacy, convenience, security, jobs, social network, family life, new consumption, policies, etc..). The median age of the respondents was about 38 years old. Although respondents with children (less than 18 years of age) was on average about 33 years old. In the next 10 years most will be less than 50 years old and should still be in employment. In the next 20 years, most will be less than 60 years old and also still very much in active employment. Certainly, young children of the respondents would over the next 20 years enter the work place. A work place that may look very different from today due to our aggressive pursuit of intelligent automation and autonomous systems.

Is the reason for the positive outlook on AI that the individual (particular the male kind) simply do not believe the technology to be a possible existential threat to the individual’s current way of living?

If you think about your child or children, how do you believe AI’s will impact their future in terms of job and income? … while you think about this! … I will give you the result of one of the surveys (shown below) that I have conducted in September 2018.

In terms of believing that the future (for their offspring) will be better than today, women are less positive than men. Across gender fewer are of the opinion that the opportunities of their children (whether they are below 18 or above) will remain the same as today. Women appear to have a more negative outlook for their children than men. There is little difference in men’s beliefs in their child’s or children’s future opportunities irrespective of the age of their children. Women having children under 18 years of age are significantly less optimistic of the outlook of their children’s opportunities compared to those women with older children.

From work by Frey & Osborne (2013) on how jobs are becoming susceptible to what they call computerization, there is plenty of room for concern about individuals job and thus income security. According with Frey and Osborn, 47% of the total US employment is at risk within a decade or two. A more recent PwC economical analysis estimates that the impact of algorithmic & AI-based automation across all industries will be in the order of 20% by late 2020s and 30% by the late 2030s (Hawksworth & Berriman, 2018). Job categories served by low and medium educated will be a hit the hardest. Women are expected likewise to be impacted more than men. Irrespective of how you slice and dice the data, many of us will over the next 2 decades have our lives, livelihood and jobs impacted by the increased usage of intelligent automation and autonomous systems.

In order to study this a bit further, I asked surveyed respondents two questions (structured in an A and a B 50:50 partition); A: “Do you believe your job could be replaced by an AI? and B: “Thinking of your friends, do you believe their jobs could be replaced by an AI?“.

  • We feel fairly sure that our own jobs are relative safe for AI impact.
  • However, we think that our friends and colleagues jobs are far likelier to be replaced by an AI.

From the above chart, it is clear that when it comes to assessing the potential for AI impacting job security, individuals feel much surer about their own job security than what they assess to be the case of a friend or a colleague. Only one fifth of the surveyed thinks that AI actually could replace their jobs. Interestingly, men assessing their own job security are almost twice as sure about that security compared to women (based on the number of Maybe answers).

We assign a much higher likelihood to our friends and colleagues prospects of loosing their jobs to an AIs than that happening to ourselves. Maybe it is easier to see our friends and colleagues problems and challenges than our own? Both women and men appears more uncertain in assessing their friends job security than their own. Although a less dramatic difference in uncertainty between women and men, men still appear less uncertain than women in their assessment of their friends job security.

So what’s next?

Clearly, it is important to continue to gauge the our human sentiments towards AI. How this will change over time will also indicate how likely we are as humans to find an AI-based service or product trustworthy. You will find a though rough analysis of trust in Human – AI relationships in my latest essay “Trust thou AI?”.

The sentiment towards AI and its impact has been shown to be gender dependent. More women express more negative feelings towards AI than men tends to do. Women express to a higher degree that AI scares them or makes them feel uncomfortable. Men are substantial more positive and far less men appear willing to express that AI scares them. Men with younger offspring tend to be expressing the most positive sentiment in the surveys. Understanding those differences will be important in designing AI-based applications, services and products that appeal not only to men (or male sex role orientation) but to the full spectrum of gender and sex roles. Whether or not a consumer will embrace an AI-based application or policy will depend a lot on the trust in the automated algorithmically-driven process that will impact that consumers life in an ever increasing degree.

Further reading

Fiona Gallacher & Douglas M. Klieger, (1995). “Sexual Role Orientation and Fear”, The Journal of Psychology; 129, pp.41-49. This paper reports on gender differences in how we react to fear and other phobias. It should be noted that while it is found that expression of fear is substantially different between women and men, behaviorally no significant differences have been observed.

Kim K. Larsen, (2018); “Trust thou AI?”, “Machine … Why ain’t thee Fair?” and “Human Ethics for Artificial Intelligent Beings“. All contain comprehensive bibliographies.

Jennifer S. Stevens & Stephan Hamann, (2012). “Sex differences in brain activation to emotional stimuli: A meta-analysis of neuroimaging studies”, Neuropsychologia; 50, pp. 1578-1593.

Acknowledgement.

I rely on many for inspiration, discussions and insights. Any mistakes are entirely on me. I would like to thank Liraz Margalit and Minoo Abedi for many useful suggestions and great inspirational discussions around the topic of trust. I am particular thankful for Liraz’s patience with my many questions around gender differences (beyond the most obvious that is). I also greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of writing this Blog.

Trust thou AI?

“The way to make machines trust-worthy is to trust them” to paraphrase Ernest Hemingway (Selected letters 1917-1961).

INTRO.

What are the essential prerequisites, for us consumers and professionals alike, to trust an Artificial Intelligence (AI) based product or service?

If you have followed the AI topic a bit or maybe even a lot, if you have been lucky (or not) talking to consultants about AI design, you may get the impression that if we can design a transparent explainable auditable AI all is well with AI Ethics and AI Fairness until kingdom come or an AGI (an Artificial General Intelligence that is) descends from the Clouds. We are led to believe that people, mass consumers, the not-in-the-know not-subject-matter-experts, will trust any AI-based product or service that we can “throw” at them as long as it is transparent, explainable and auditable. According with the European General Data Protection Regulation (GDPR) we have a “Right to an Explanation” of an action taken by an automated or autonomous system (see also “Article 22 – Automated individual decision-making, including profiling”). However, it should also be pointed out that the GDPR is very vague (to put it mildly) about the structure and content of such an explanation. As has also been pointed out by Wachter, Mittelstad & Floridi (2017), GDPR does in fact not oblige autonomous decision-making systems to provide an explanation for its derived decision, at most it offers information.

While GDPR, as it relates to AI-driven decision-making processes, may make the European Commission feel good, consultants a lot richer in monetary terms and researches in academic, it really doesn’t do much to enhance trust between a consumer and The Thing. Which is obviously not the intention of the regulation, but it is the subject of this essay.

In much of the current debate around trust in AI, transparency and explainability are frequently evoked. The two concepts are however awfully similarly described. Although often well crafted to appear more different than they may be given the context. The current dogma is that if the AI is transparent, actually the process that leads to an AI agents actions, it is also explainable. Thus may also be more trustworthy. Basically transparent is here used synonymously for explainable. Given we are in the realm of computer science it is good to remember that the term transparency is often used to mean that a given property of a system is hidden (by design) from the user or other main computing processes. Interestingly enough, this is definitely not what is meant with transparency of an AI process and action. To strengthen the trust bond between humans (as well as institutions) and AI we also require auditability of a given AI-based process and action. That is, we are able to trace-back from an AI action through the internal AI computations & processes and verify how that particular action came about.

I will not say it is BS to consider transparency, explainability and auditability in your AI design. Of course, it is not! … But maybe it is a bit … to believe that this is sufficiently to make consumers (or the public in general) trust an AI-based application (i.e., service, product, …). It is nice words, with fairly unclear meaning, that are (very) important for regulators and public institutions to trust corporation’s AI developments. Maybe not so much for the general publics or consumer’s trust in AI that corporations are expose them to. As I will explain in this essay, it can only be a small part of the essentials for creating a trust bond between humans and AI.

Trust between humans, at least within what we perceive as our social group (i.e., “usness”), is a trait of evolutionary roots that have allowed us to foster collaboration within larger social groups (with some ugly limitations of “usness” and “themness”). The ability to trust may even have made it possible for us humans to crawl to the top of the “food chain” and kept that pole position for quiet a while.

What about our trust in machines and non-human (non-sentient) things in general? Trust between humans and non-human agents. We are increasingly exposed to much higher degrees of system automation as well as Artificial Intelligent (AI) based applications. Machine automation and autonomy are taking many tasks over from us at home, at work and anywhere in between. This development comes with the promise of much higher productivity at work and far more convenience at home and anywhere else for that matter.

TRUST IN AUTOMATED MACHINES – FROM PROFESSIONALS TO CONSUMERS.

If you work professionally with a complex system (e.g., an airplane, a train, energy, nuclear or chemical plants, telecommunications networks, data centers, energy distribution networks, etc…) the likelihood is fairly large that you are already exposed to a very high degree of machine and system automation. You may even be exposed increasingly to system autonomy fueled by AI-based solutions (e.g., classical machine learning models, deep learning algorithms, recurrent neural networks, re-enforcement learning rule based control functions, etc…). As a professional or expert operator of automation, you embrace such systems if you have deemed them trustworthy. That typically means; (a) the automation solution perform consistently, (b) is robust to many different situations that may occur and even some that may very rarely occur, (c) has a very high degree of reliability (e.g., higher than 70%). Further, it is important for your trust that you believe you understand the automation principles. All of this (and more) ensures to strengthen the trust bond between you and the automation. If there is a lack of trust or a break in trust between the human operator and the automation, it will lead to wasted investments, in-efficiencies and disappointing productivity growth. It may also lead to accidents and potential disasters (Sheridan & Parasuraman, 2005). If human operators lack trust in a system automation or autonomous application, you are better off relying on manual work arounds.

Clearly, it is no longer only certain type of jobs and workers that are exposed to automation and AI-based autonomy. All of us … irrespective of background … will increasingly be experiencing AI-based applications that may initiate actions without human intervention or first “asking” for human permission. The trust bond between a human and an autonomous application is essential for that application to become successful and do what it was designed to do. With successful I primarily define it as increased and sustainable utilization. Thus we need to better understand the dynamics of trust between humans and non-human intelligent entities. What can we learn and expect from human-human trust bonds and what is different in human-non-human trust bonds. We are already being exposed to highly specialized artificial intelligent agents. In complex system designs as well as simpler commercial  products, applications and services in general.

While businesses deploying algorithmic-based automation and autonomy for their products and services can learn a lot from the past research, they will have to expand on this work also to include their customers who are not subject matter experts or skilled automation operators. You-and-me focus is required. The question that I ask in this essay is how do we in general feel about trusting an artificial intelligent entity (i.e., an agent) that eventually may out-compete most of us in the work environment or at least disrupt it very substantially. An AI entity that can replicate and evolve much faster in comparison with humanity’s incredible slow evolutionary progress.

TRUST … IT STARTS WITH YOUR BRAIN.

The feeling of trust arises in your brain. It is a result of changes in your brain chemistry. Your feeling of trust is an interpretation of your emotional states triggered by physiological changes (Barret, 2017). The physiology of trust also connects to your gut and other parts of your body via the central nervous system. The resulting physiological reaction, e.g., change in heart rate, goose bumps, that weird feeling in your stomach, sense of well being, sense of unease or dread, etc., makes you either trust or want to run away. The brain chemistry will either suppress your fear or enhance your sense of unease. The more novel a trust situation will be, the more unease or fear (e.g., emotions) will you feel about making the leap of faith required to initiate the trust bonding process.

However, the more prior knowledge we have, including from other parties that we already trust, of a given trust situation, the easier does it become for us to engage trust. This process is eloquently described by Robert Sapolsky in his seminal work “Behave: The Biology of Humans at Our Best and Worst” (Sapolsky, 2017) and in the original research work by Paul Zak on enhancing trust effect of the brain molecule Oxytocin (Kosfeld, Heinrichs, Zak, Fischbacher & Fehr, 2005; Zak, 2017; Choleris, Pfaff, & Kavaliers, 2013). Our little “trust” messenger (Oxytocin) has been attributed too all groovy good things in this universe (at least for vertebras), backed up with lots of cool trust game variations (including sniffing the little bugger), and academic research in general. One of Oxytocin’s basic functionalities, apart from facilitating mother-baby bonding and milk production, is to inhibit our brain’s fear center (i.e., the amygdala) allowing for a higher degree of acceptance of uncertain situations (its a bit more complex than but this suffice for now) and thus more susceptible to certain risks. While Oxytocin certainly drives a lot of wonderful behaviors (i.e., maternal/paternal instincts, trust, love, commitment to partner, etc..) it has a darker side as well. In general oxytocin reduces aggression by inhibiting our brain’s fear center. However, when we perceive that our young children (or your pups for the prairie voles reading this blog) are in danger or being threatened, oxytocin works in the opposite direction of enhancing the fear. Resulting in an increased level of aggression. See also Sapolsky’s wonderful account of the dark side of oxytocin (“And the Dark Side of These Neuropeptides”, Kindle location 1922) in his book “Behave” (Sapolsky, 2017).

mistrust

Oxytocin: to be or maybe not to be the trust hormone? A recent 2015 review by Nave et al (Nave, Camerer and McCullogh, 2015) of relevant literature attributing Oxytocin to trust concludes that current research results does not provide sufficient scientific evidence that trust is indeed associated with Oxytocin or even caused by it. In general, it have been challenging to reproduce earlier findings proving (beyond statistical doubt) the causal relationship between Oxytocin and establishing trust bonding between humans. Thus, it is up to you dear reader whether you trust the vast amount of studies in this area or not. That Oxytocin plays a role in pair-bonding as well as parents-child bonding seems pretty solid (Law, 2010; Sapolsky, 2017). Also there appears to be a correlation of increased Oxytocin levels (by sniffing the stuff or by more natural means) and increased readiness to trust (Zak, 2017; Choleris, Pfaff & Kavaliers, 2013). Interestingly (men do pay attention here!), for women with increased levels of oxytocin, typically women with young children still breastfeeding, appears to make them less forgiving when they perceive that their trust has been betrayed (Yao, Zhao, Cheng, Geng, Lou & Kendrick, 2014).

Can a puff and a sniff of Oxytocin make us trust non-human-like agents, e.g., automation SW, AI-based applications, autonomous systems (e.g., cars, drones), factory robots, avionic systems (e.g., airplanes, flight control), etc…  as we trust other humans? … The answer is no! … or at least it does not appear so. A human-human trust bonding is very particular to being human. Human-non-Human trust dynamics may be different and not “fooled” by a sniff of Oxytocin. Having frequent puffs of Oxytocin will not make you love your machine or piece of intelligent software … Unless as it also appears too be more human-like. And that might also have its limits due to the uncanny valley “sense”, i.e., our amygdala starts ringing its alarms bells ever so softly that the entity we interact with is too human-like and yet a little bit off. Enough to get the uncanny or uneasy feeling going.

THE TRUSTWORTHINESS OF AUTOMATION.

It has long been established that we tend to use automation only when we find it trustworthy (see for example work of Madhavan & Wiegman, 2007; Visser, Monfort, Goodyear, Lu, O’Hara, Lee, Parasuraman & Kruger 2017; Balfe & Wilson, 2018). If we do not trust an automation it will be rejected by the human operator, just like an untrustworthy human will be left alone. When the reliability of an automation is no better than about 70%, it is in general regarded as useless by its human operators and becomes an operational and financial liability (Wickens & Dixon, 2007). It is important to note that much of the human-automation trust research have focused on professional and expert users of complex or advanced automated systems, such as pilots, air traffic controllers, train operators, robotics plant controllers, chemical & nuclear plant operators, brokers, military technology operators (e.g., drones, autonomous vehicles, … ), communications network controllers, etc…

So … what matters for establishing a trust bond between human and automation? A large body of research shows us that the most important factors for establishing a trust bond between human and an automation function is; reliability (of automation), consistency (of automation), robustness (of automation), dependability (of human operator), faith (of human operator) and understand-ability (of human operator). Much of which is fairly similar to what we require from another human being to be regarded as trustworthy.

Okay,  we have a reasonable understanding of trust bonds between humans and humans and automation enablers. What about Human and AI trust bonds? Given an AI-based complex system might have a higher degree of autonomy than a automated advanced system, it may very well be that the dynamics of trust and trustworthiness are different. At least compared to what we today believe we understand about Human-Automation trust.

For sure it is no longer only experts or professional operators that are being exposed to advanced automation and autonomous systems. For sure these systems are no longer limited to people who have been professionally trained or schooled, often over many years, before they are let loose on such advanced systems. Autonomous systems and AI-based applications are increasingly present in everyone’s everyday environment. At Home. At Work. And anywhere in between. Consumers of all genders, children, pets, octogenarians, Barbie dolls and dinosaurs and so forth … we will eventually have to interface with AI-based applications. Whether we like it or not.

The current trend among consultants (in particular) is to add new trust prerequisites to the above list (if the established ones are considered at all) Human-AI trust essentials; Explainable AIs or XAIs (i.e., can actions of an AI be understood by Humans), Transparent AIs (i.e., loosely to fully understand why certain actions are performed and others not ) and Auditable AIs (i.e., an unbiased examination and evaluation of the code and resulting actions of an AI-enabled application). While these trust prerequisites are important for experts and researchers, the question is whether they are (very) important or even relevant to the general consumer at large? … If my life insurance application was rejected, would I feel much better knowing that if I loose 40 kg, stop smoking, was 30 years younger, lived in a different neighborhood (with twice the rental fees) and happened to be white Caucasian, I would get the life insurance or could afford to pay 3 times the monthly insurance fee (obviously an AI-based outcome would be better disguised than this example).

If you have the feeling that those 3 elements, Explainability, Transparency and Auditability seems approximately 1 element … well you are no alone (but don’t tell that to the “experts”).

So … How do we feel about AI? Not just “yous” who are in the know … the experts and professionals … but you, me, and our loved ones, who will have little (real) say in their exposure to AI, automation & autonomous products and services.

YOU AND ME FOCUS … HOW DO WE FEEL ABOUT AI?

We appear to be very positive about Artificial Intelligence or AI for short. Men in general more positive than women. Men with young children much more positive than any other humans. As can be seen below, it doesn’t seem like Arnold Schwarzenegger has done much to make us have strong negative feelings towards artificial intelligence and what we believe it brings with it. Though one may argue that sentiments towards robots may be a different story.

how do you feel about AI

In the above chart the choices to the question “How do you feel about AI?” has been aggregated into Negative sentiment: “I hate it”, “It scares me” and “I am uncomfortable with it” , Neutral sentiment: “I am neutral” and Positive Sentiment: “I am comfortable with it”, “I am enthusiastic about it” and “I love it”.

On average most of us are fairly comfortable with AI. Or more accurately we feel comfortable with what we understand AI to be (and that may again depend very much on who and what we are).

One of the observations that have come out of conducting these “how do you feel about AI?” surveys (over the last two years) are that there are gender differences (a divide may be more accurate) in how we perceive AI. This needs to be an important consideration in designing AI-based products that will be meaningful appeal for both women and men (and anyone in between for that matter). Given that most AI product developers today are male, it might be good for them to keep in mind that they are not only developing products for themselves. They actually need to consider something that will be appealing to all  genders.

That chart below reflects the AI sentiment of women (808) and men (815) from a total amount of 1,623 respondents across 4 surveys conducted in 2017 and 2018. Most of those results have individually been reported in my past blogs. So … Women feels in general significantly less positive towards AI compared to men. Women overall have a slightly more negative sentiment towards AI than positive. Overall there are more women than men who rank their feelings as neutral. Men with children (younger than 18 years of age) are having the most positive feelings towards AI of all respondents. Unfortunately, the surveys that so far has been carried out does not allow for estimating the age of the youngest child or average age of the children. Women’s sentiment towards AI does not appear (within the statistics) to be dependent on whether they have children younger than 18 years of age or not or no children. Overall, I find that;

Women appear to be far less positive about AI than men.

Men with young children are significantly more positive than men and women in general.

Contrary to men, women’s sentiment towards AI does appear to depend on their maternal status.

gender divide ai.png

So why are we so positive … men clearly much more than women … about AI? This despite that AI is likely to have a substantial impact (to an extend it already have) on our society and way of living (e.g., privacy, convenience, security, jobs, social network, family life, new consumption, policies, etc..). The median age of the respondents was about 38 years old. Although respondents with children (less than 18 years of age) was about 33 years old. In the next 10 years most will be less than 50 years old and should still be in employment. In the next 20 years, most will be less than 60 years old and also still very much in active employment. Certainly, young children of the respondents would over the next 20 years enter the work place. A work place that may look very different from today due to aggressive pursuit of intelligent automation and autonomous  system introduction.

Is the reason for the positive outlook on AI that the individual (particular the male kind) simply do not believe the technology to be an existential threat to the individual’s current way of living?

If you think about your child or children, how do you believe AI’s will impact their future in terms of job and income? … while you think about this! … I will give you the result of one of the surveys (shown below) that I have conducted in September 2018.

future of child.png

In terms of believing that the future will be better than today, women are less positive than men. Across gender fewer are of the opinion that the opportunities of their children (whether they are below 18 or above) will remain the same as today. Women appear to have a more negative outlook for their children than men. There is little difference in men’s beliefs in their child’s or children’s future opportunities irrespective of the age of their children. Women having children under 18 years of age are significantly less optimistic of the outlook of their children’s opportunities compared to those women with older children.

From work by Frey & Osborne (2013) on how jobs are becoming susceptible to what they call computerization, there is plenty of room for concern about individuals job and thus income security. According with Frey and Osborn, 47% of the total US employment is at risk within a decade or two. A more recent PwC economical analysis estimates that the impact of algorithmic & AI-based automation across all industries will be in the order of 20% by late 2020s and 30% by the late 2030s (Hawksworth & Berriman, 2018). Job categories served by low and medium educated will be a hit the hardest. Women are expected likewise to be impacted more than men. Irrespective of how you slice and dice the data, many of us will over the next 2 decades have our lives, livelihood and jobs impacted by the increased usage of intelligent automation and autonomous systems.

In order to study this a bit further, I asked surveyed respondents two questions (structured in an A and a B 50:50 partition); A: “Do you believe your job could be replaced by an AI? and B: “Thinking of your friends, do you believe their jobs could be replaced by an AI?“.

you & your friends job impact by AI

From the above chart it is clear that when it comes to AI impacting job security, the individual feels much surer about their own job security than the individual’s friends or colleagues. Only one fifth, of respondents answering Yes or No to whether they believed that their jobs could be replaced by an AI, thinks that AI actually could replace their jobs. Interestingly, men assessing their own job security is almost twice as sure about that security compared to women (based on the number of Maybe answers).

From the results of the survey shown above, we assign a much higher likelihood to our friends and colleagues prospects of loosing their jobs to an AIs than that happening to ourselves. Maybe it is easier to see our friends and colleagues problems & challenges than our own. Both women and men appears more uncertain in assessing their friends job security than their own. Although a less dramatic difference in uncertainty between women and men, men still appear less uncertain that women in their assessment of their friends job security.

There are many consultants, some researchers and corporations working on solutions and frameworks for Transparent AIs, Explainable AIs, and Auditable AI as a path to create trust between a human and an AI-based agent. Many are working exclusively with the AI in focus and thus very technocentric in approach. Very few have considered the human aspect of trust, such as

  • The initial trust moment – how to get the consumer to the “leap of faith moment”, where human engage with a product or service (or another human being for that matter). This is obviously a crucial and possible scary moment. The consumer has no prior experience (maybe peers recommendation which will help) and is left to faith and will be the most dependable or vulnerable for disappointment. It is clear the peer opinion and recommendation will mitigate much uncertainty and unease.
  • Sustainable trust – how to maintain sustainable trust between a user and a product (or another human being). Here priors will be available and of course consistent performance will play a big role in maintaining and strengthening the trust bond.
  • Broken trust or untrusting – as the saying goes “it takes 10 good impressions to neutralize a bad one” (something my grandmother hammered into my head throughout childhood and adolescence … Thanks Gram!) … Once trust has been broken between a human and a product or service (or another human being) it is very difficult to repair. The stronger the trust bond was prior to untrusting the more physiological and neurological “violent” will the untrusting process be and subsequently recovery from the feeling of betrayal. As another saying goes “Heav’n has no rage like love to hatred turn’d, Nor hell a fury, like a woman scorned” (William Congreve, 1697). And “no Oxytocin in this world will make a women betrayed not want her pound of flesh” (Kim Larsen, 2018).
  • The utility of trustnot all trust bonds are equally important or equally valuable or equally costly, some may even be fairly uncritical (although, broken trust by a thousand cuts may matter in the long run). The neurological – feeling process of untrust may even be fairly benign in the sense of how trustor feels upon the broken trust. Though the result may be the same. Having a customer or loved one walking away from you. It may be easier to recover trust from such more benign untrust events. However, it stands to reason that the longer a trust bond exist the more painful and costly will the untrusting process be and obviously far more difficult to recover from.

In most cases, if the AI action is as the human agent would expect, or have anticipated, many a human might not care about transparency or explainability of the artificial agent’s action.

Despite having your trust satisfied by an AI-based action, we should care about auditability. In case over the longer run, the human trust in an AI-based solutions turns out to have been misplaced. Thus, the AI-based outcome of a given action is counter to what the human was expecting or anticipating. An explanation for the outcome may not prevent the trust of human agent, and the trustworthiness of the AI-based agent, to be broken.

trust circle

TRUST DECONSTRUCTED.

If you know everything absolutely, you would not need to trust anyone to make a decision.

Just be careful about the vast amount of cognitive biases that may result in you falsely believing you know it all. Men in particular suffers from the ailment of believing in their own knowledge being absolute (Larsen, 2017).

Someone who knows nothing, have only faith as  guide for trust.

On the other hand, someone who knows nothing about a particular problem has no other source for trust than faith that trust is indeed warranted. It’s a scary place to be.

Let’s deconstruct trust.

An agent’s trust (the trustor) is an expectation about a future action of another agent (the trustee). That other agent has been deemed (at least temporarily) trustworthy by the trustor. That other agent (the trustee) may also represent a given group or system.

In John K. Rempel 1985 paper ”Trust in close relationships” defines the following attributes of human to human trust (i.e., where both trustor and trustee are human agents);

  • Predictability or consistency – trustor’s subjective assessment of trustee’s trustworthiness. Prior behavior of trustee is an important factor for the trustor to assess the posterior expectations that the trusted agent will consistently fulfil trustor’s expectations of a given action (or in-action). As the trustor gather prior experience with trustee, the confidence in the trustee increases. Confidence should not be confused with faith which is a belief in something without having prior fact-based knowledge.
  • Dependability – a willingness to place oneself as trustor in a position of risk that the trustworthiness of the trustee turns out not to be warranted with whatever consequences that may bring. Note that dependability can be seen as an outcome of consistency. Put in another way a high degree consistency/predictability reduces the fear of dependability.
  • Faith – is a belief that goes beyond any available evidence required to accept a given context as truth. It is characterized as an act of accepting a context outside the boundaries of what is known (e.g., a leap of faith). We should not confuse faith with confidence although often when people claim to be confident, what they really mean is that they have faith.

For agent-to-agent first-interaction scenarios, the initial trust moment, without any historical evidence of consistency or predictability, a trustor would need to take a leap of faith in whether another agent is trustworthy or not. In this case, accepting (i.e., believing) the trustee to be trustworthy, the trustor would need to accept a very large degree of dependability towards the other agent and accept the substantial risk that the trust in the trustee may very well not be warranted. This scenario for humans often lends itself to maximum stress and anxiety levels of the trusting agent.

After some degree of consistency or historical trustworthiness have been establish between the two agents, the trustor can assign a subjective expectation of future trustworthiness of the other agent. This then leads to a lesser subjective feeling of dependability (or exposure to risk) as well as maybe a reduced dependency on shear faith that trust is warranted. This is in essence what one may call sustainable trust.

As long as the trustor is a human, the other agent (i.e., the trustee) can be anything from another human, machine, complex systems, automation, autonomous system, institution (public and private), group, and so forth. Much of what is describe above would remain the same.

Lots of work has been done on trust bonds in Human-Automation relationships. How about trust bonds between Human and AI-enabled applications (e.g., services and products in general). In their 2018 article “The Future of Artificial Intelligence Depends on Trust“, Rao and Cameron (both from PwC) describes 3 steps towards achieving human – AI-system trust;

  • Provability – predictability and consistency.
  • Explainability – justification for an AI-based decision (e.g., counterfactual constructions). Note transparency and explainability may be closely related depending on how one implements explainability.
  • Transparency – factors influencing algorithm-based decisions should be available (or even visible) to users impacted by such decisions. E.g., for a rejected health insurance (all) factors impacting the negative decision to reject the application should be available to the applicant.

Rao and Cameron’s suggestions appear reasonably important for trust. However, as previously described these suggestions pretty much relates to the trustee agent side of things, ignoring some of the other important human factors (e.g., dependability, faith, assessment of risk, etc..)for trust between a human and another agent (sentient or otherwise).

Further, explainability and transparency may be particular important when trust is broken (assuming that the trustor cares to “listen”) between the human agent and the AI-based agent (or any other digital or non-sentient agent for that matter). It may not be terribly relevant for the likely vast majority of users where an action is delivered confirming that trust was warranted. If you have trained your AI will it would be fair to assume that the majority of outcomes are consistently as expected. A positive trust event that is likely to lead to a re-enforcement of trust and trustworthiness of the AI-agent.

Also these concepts, while important, doesn’t do much for the initial step of trusting a non-Human agent. How do you design your trustee agent to ease the initial barrier of use and acceptance. When there is no priors, you need the user or trustor to be comfortable with taken a leap of faith as well as being maybe maximally dependable.

UNTRUST AND THAT HUMAN FEELING OF BETRAYAL.

Trust can be broken. Trustworthiness can decline. Untrusting is the process where a previously trust-bond has been broken and the strength of trust declined.

Heuristic: the stronger the trust bond is between two agents, the stronger will the untrusting process be in case of broken trust. Making trust recovery the more difficult.

Have you ever wondered why two people who supposedly have loved each other in the past (supposedly for many years) can treat each other as enemies? Betraying a strong trust bond can be a very messy emotional and physiologically strenuous process. Some trust bonds broken will never recover (e.g., breakups, friendship betrayals, unfaithfulness, theft, lies, …). Others, depending on the initial utility or value assigned to the bond, may be fairly benign without much strong emotions associated with the untrusting process (e.g., retail purchases, shopping experiences, low value promises of little impact if not fulfilled, etc… ).

The question is whether the untrusting of a human-machine trust bond is similar to untrusting of a human-human trust bond. Moreover, are there a difference between an inanimate machine, simpler human-operated automated systems and an AI-based application that humans may even anthropomorphize to various degrees. Are your trust and untrust process different for Siri or Alexa or than it is for Microsoft Clippy, assuming anyone ever really trusted that wicked steely fellow.

How valid is it to use our knowledge of human-human trust & untrust on Human-Agent relations with the Agent being non-Human or a human simulacrum in nature.

IN HUMAN WE TRUST, IN MACHINES NOT SO MUCH.

Would you trust your superior or fellow expert with a critical corporate decision? How often would you trust such decisions made by other fellow human beings?

Even if you don’t have a choice or a final say (well apart from arguing your piece of mind … at least as it happens in most places of Western Europe) … it is your own choice whether you trust such a decision or not.

As shown in the below chart’s magenta columns, it turns out that most humans frequently do trust their superiors and fellow human experts with critical decisions relevant to their work. In the survey shown below there is little difference between human-human trust whether a decision success rate was left unspecified or specified to be 70% (i.e., 7 out of 10 decisions turns out as expected or promised and 3 out of 10 not). This might mean that most people expect heuristically a corporate decision maker to have a 70% success rate in his decisions. I found this surprising as I do not believe human decisions are that good. But I guess we are good at post-rationalization and being much louder with our successes than our failures (suppressing the bad memories of failure may come in handy here).

trust human vs ai

Okay we clearly trust our fellow human with critical decision making (or at least so we say). Do we trust an AI with the same critical corporate decision?

The answer is  … clearly … No Way do we trust AIs to make critical corporate decisions (and any other types of decisions for that mater … at least of what we are aware of). As can be seen from the above chart, a majority of people would only infrequently trust an AI making critical decisions. Specifying that the AI has a decision success rate better than 70% does reduce the amount of people who would only infrequently trust such decisions (i.e., from 62% to 43%). However, it only marginally increases the share of people who would frequently trust an AI-based critical decision from 13% to 17% (which is barely statistically significant). Remember we are readily willing to trust a human decision maker frequently. An AI? … not so much! Even in what should be regarded as an Apples for Apples scenario, with same performance specified for the Human trustee as for AI-based trustee.

Trust bonds between humans appear much stronger than what it is with an AI. Though that may not be too surprising. Most of us have very little prior experience with trusting AI-based decisions (at least of what we are consciously aware of). So the starting point for AI-based trust (i.e., AI being the trustee part if the trust bond) is Faith and accepting Dependability rather than having a basis for assessing Consistency or Predictability of AI-based decisions. There may also be some very interesting neurological (i.e., brain) reasons why our ability to trust an inanimate agent such as an AI, a Robot or a piece of intelligent machinery is different from that of a human being.

My surveyed data could be interpreted as we seem to work with a heuristic decision success rate for human (or at least the manager or expert kind of humans) at or better than 70%. More than half of us would frequently trust a human decision maker at such a performance level.

Not so much with an AI-based decision (innate) maker. While specifying that the AI has a success rate of 70% or better in its decision making doesn’t really change the proportion of us that would frequently trust such decisions. It does increase the amount of us trustors that would at about half the time concede trust in an AI-based decision (i.e., given the 70% success rate).

What moves the trust needle? If we impose on our under appreciated AI-based decision maker a 95% or better success rate, 40% of us would frequently trust such decisions. This is still a lower proportion of trustees than for a human decision maker with a success rate of 70% or better. However, there is still almost 1 in 3 of us that only infrequently would trust of such an AI (with 95% or better success rate). In comparison only about 1 in 10 would only infrequently trust a human decision maker with a 70% or better success rate.

trust in ai 2

So clearly AI does have trust issues. Certainly with respect to decision making, AI is not regarded as trustworthy as a human. The bar for trusting an AI appears to be very high.

However, it seems reasonable that some of the reasons for a lower trust level is simply due to most people haven’t had a lot of exposure to AI in general, AI-based augmentation and actions where trust would be essential.

ALGORITHMIC AVERSION – NOTHING REALLY NEW UNDER THE SUN.

As described in “On the acceptance of artificial intelligence in corporate decision making” (Larsen, 2017), algorithms, even simple ones, does in general perform better than human beings limited to their own cognitive abilities in terms of predictions (i.e., an essential part of decision making whether done consciously or subconsciously). This result has been confirmed many times over by the likes of Paul Meehl (Meehl, 1954), Robyn Dawes (Dawes, 1979) and many other researchers in the last 50 – 60 years. Clearly, machine learning algorithms does not offer an error free approach to decisions making. However, algorithmic approaches does offer predictions and solutions with lower, often superior, error rates. And not unimportantly … quantifiable error rates in comparison with what would be the case of human cognition based decision.

Humans remain very resistant in adapting more mathematical approaches despite such being demonstrably less prone to error than human-based decision making without algorithmic augmentation. As Berkeley Dietvorst recent paper puts it “People erroneously avoid algorithms after seen them err” (Dietvorst, Simmons and Massey, 2014). Dietvorts call this behavior or emotion algorithmic aversion. This is very consistent with my own findings of humans having a very high bar of success rate (or accuracy) for AI-based decisions. Even at a 95% success rate of an AI-based decision, we prefer to trust a human decision maker with a success rate of 70%.

Machine-learning (at least the classical kind) based decisions or action recommendations offer better accuracy, transparency, understandability, consistency, robustness and auditability than most human-based decisions and actions.

Despite this, we, as humans, are much less forgiving when it comes to machine errors than human errors. The standard we expect of artificial intelligence are substantially higher than what we would require from a fellow human being or co-worker.

TRUST IN CORPORATIONS AND INSTITUTIONS … OR LACK OF, MORE ACCURATELY.

Almost 80% of consumers do not believe that companies using AI have their best interest in mind. This is the outcome of 3 surveys made in March 2018, April 2018 and September 2018.

This has also been a period where misuse of consumer information and data in general was hotly debated. So that majority of consumers does not trust corporations with having their best in mind is maybe not all that surprising. Consumer trust in corporations are in general at a low point. AI doesn’t help that trust issue.

trust in companies

Companies AI-based products and services are already at a disadvantages before they hit the market place. There is a substantial degree of mistrust among consumers towards corporations and companies. This resonates very well with a recent study of trust by ….

What about trust in public institutions capacity for protecting citizens and consumers against adversarial use of AI-based technologies for policies and in products and services? Well the public trust is fairly low as can be seen from the figure below.

turst in public institutions

The vast majority (80%!) of the general public has low, very low or no confidence in political institutions adequately considers the medium and long-term societal impact of AI proliferation.

There is unfortunately nothing surprising in the above (dis)trust level in institutions. This is largely confirmed by for example the 2018 Edelman Trust Barometer which is pretty bleak in terms of its “Global Trust Index” reflecting the general populations level of trust in institutions (e.g., government, businesses, media and NGOs).

SO WHERE DO WE GO FROM HERE?

It is fair to say that for the consumer as well as for the corporate decision maker, their expectations towards the trustworthiness of AI-based products, services and resulting decisions or actions in general is low.

Despite the relative low trust in AI-based actions, I have also shown that on average we feel fairly comfortable with AI at least as a concept. Women, as it would appear from my surveys, are in general less comfortable with AI than men in general. While men with children under 18 years of age (possible younger children) expresses the highest degree of positive feelings towards AI.

The gender difference in how AI is perceived for the individual as well as for children, family members, friends and colleagues is a relative un-explored research area yet. It needs more attention as most trust research into human-machine trust bonding have been centered around professional operators of automated or autonomous complex systems (e.g., aviation, automotive, networks, etc…). I feel brave enough to make an educated guess that most of that research also have been focused on male operators and experts rather than gender balanced or explicitly gender focused.

In order for us to trust something, like an AI-based action (e.g., decision, recommendation, …), we often require an explanation for a given outcome or action. Most of us do like to receive an explanation, in particular for actions and outcomes that we perceived as having negative consequences or is counter to our beliefs of what should be a right decision or action. Explainable AI, whatever that really means, but surely will be context dependent, is one of the components of establishing trust. Explainability is important in order to appease law & policy makers, e.g., in order to comply with for example the European General Data Protection Regulation (GDPR) requirements that may (or may not) be interpreted also as a “Right to Explanation”. AI Transparency and AI Auditability are additional concepts that typically is mentioned together with explainable AI.

Typically the implied logic is that transparency leads to explainability that leads to ease of auditability. The question is whether such requirements in general are meaningful for the consumer of an AI-based product or service. There are two extremes are 1. A highly simplified system that can also be explained very simply or 2. A highly complex AI-based system that nevertheless are sufficiently transparent to be explained and audited. However, the explanation is of such complexity, that albeit transparent, would only be understood by an expert or the designer of that system. In one case the explanation for a given action is so simple that it is unnecessary. In the other the explanation is to complex that no lay person would be able to follow. Certainly much more work is required here in order to assess to what level and under which circumstances an explanation should be provided. It is always understood (although not always mentioned) that the explanation should be understood by the operator or user. Now that makes for an interesting challenge … Right?

As has been pointed out above, making a human trust a non-human agent is not only a matter of explainability assuming this explanation is understood. Any trust bond will have a utility or perceived value associated. The initiation of a trust bond may be faith based if no prior information is available. This initial phase often is associated with a degree of anxiety or fear of your trust is not fulfilled. There may be a high degree of dependability involved in the trust bond (e.g., autonomous driving) that adds to the anxiety. Only after prior experience or information becomes available will the importance of faith and anxiety around the assumed dependability diminish. The strength of the trust bond will increase. However, as the trust increase it also will also be increasingly sensitive to disappointment and perceived betrayal (also depending on the assigned utility to the bond). Too little work has been conducted understanding gender and cultural differences in the human-AI trust process. This is also true in general for any human-non-human trust relationships.

Some recent work indicates that anthropomorphizing (i.e., humanizing) the automation or AI-based agent appears to trigger neurochemical processes important in human-human trust bonds. See some pretty cool experiments towards the importance of anthropomorphizing automation agent by Visser et al (Visser, Monfort, Goodyear, Lu, O’Hara, Lee , Parasuraman & Krueger 2017) in their paper “A little anthropomorphism goes a log way: Effects of Oxytocin on Trust, Compliance, and Team Performance with Augmented Agents”. The question here is how far we can take humanizing AI. Will there be an uncanny valley effect at some point. Moreover, not all AI-functions should be humanized (that would be scary if even possible). Context clearly matters here. Lots of questions, still many answers outstanding and thus lots of cool research to be pursued.

ADDITIONAL SOURCES OF WISDOM.

Balfe N., Sharples S., and Wilson J.R., (2018). “Understanding is key: An analysis of factors pertaining to trust in a real-world automation system”. Human Factors, 60, pp. 477-495. Due to its recent publication you will find a good up to date account (as well as bibliography) on the state of art of human-automation trust research. This work establishes a strong connection between trust in and the understanding of automation.

Barret L.F., (2017). “How emotions are made: the secret life of the brain“. Houghton Mifflin Harcourt.

Baumgarten T., Fischbacher U., Feierabend A., Lutz K., and Fehr E., (2009). “The Neural Circuitry of a Broken Promise”. Neuron, 64, pp. 756 – 770.

Bergland, C., (2015). “The Neuroscience of Trust”, http://www.psychologytoday.com, August.

Choleris, E., Pfaff, D. and Kavaliers, M., (2013). “Oxytocin, vasopressin, and related peptides in the regulation of behavior”. Cambridge: Cambridge University Press.

Dawes R.M., (1979), “The robust beauty of improper linear models in decision making”, American Psychologist 571, pp.

Denson T.F., O’Dean S.M., Blake K.R., and Beames J.R., (2018). “Aggression in women: behavior, brain and hormones“. Frontiers in Behavioral Neuroscience, 12, pp. 1-20 (Article-81).

Dietvorst B.J., Simonojs J.P. and Massey C., (2014). “Algorithm Aversion: people erroneously avoid algorithms after seeing them err.”, Journal of Experimental Psychology: General, pp. . A study on the wide spread Algorithm Aversion, i.e., human expectations towards machines are substantially higher than to fellow humans. This results in a irrational aversion of machine based recommendations versus human-based recommendation. Even though algorithmic based forecasts are on average better to much better than human based equivalent in apples by apples comparisons.

Doshi-Velez F. and Korz M., (2017). “Accountability of AI under the law: the role of explanation“. Harvard Public Law, 18, pp. 1-15. Focus on the right to an explanation and what that might mean. Also very relevant to the European GDPR Article 22. Do note that whether Article 22, and Articles 13-15 as well, really does grant a user the right to an explanation is a matter of debate as pointed out by Wachter et al (2017).

Fischer K., (2018). “When transparency does not mean explainable”. Workshop on Explainable Robotic Systems (Chicago, March).

Frey C.B. and Osborne M.A., (2013). “The future of employment: how susceptible are jobs to computerization?“. Technology Forecasting and Social Change, 114, pp. 254-280.

Hawksworth J. and Berriman R., (2018). “Will robots really steal our jobs? An international analysis of the potential long term impact of automation“. PwC report.

Hiltbrand T., (2018), “3 Signs of a Good AI Model”. http://www.tdwi.org, November.

Ito J., (2018). “The limits to explainability“, Wired (January).

Kosfeld M., Heinrichs M., Zak P.J., Fischbacher U., and Fehr E., (2005). “Oxytocin increases trust in humans”. Nature, 435, pp. 673-676.

Kramer R.M., (2009), “Rethinking Trust”. Harvard Business Review, June.

Larsen, K., (2017). “On the acceptance of artificial intelligence in corporate decision making a survey“. http://www.aistrategyblog.com.

Law S., (2010), “Dad. too. get hormone boost while caring for baby”, http://www.livescience.com, October. Oxytocin is not only women for women breastfeeding. Research shows that men too have increased levels of oxytocin coinciding with child caring, physical contact and their spouse (and mother to their child).

Madhavan P. and Wiegmann D.A., (2007), “Similarities and differences between human-human and human-automation trust: an integrative review”. Theoretical Issues in Ergonomics Science, 8, pp. 277-301. (unfortunately behind paywall).

Meehl, P. E., (1954). “Clinical versus statistical prediction: A theoretical analysis and review of the literature“. University of Minnesota, pp. 1-161. Algorithmic versus human performance up-to the 50s is very well accounted for with Paul Meehl research work and his seminal book. It is clear that many of the topics we discuss today are not new.

Mori, M., MacDorman, K. and Kageki, N. (2012). “The Uncanny Valley [From the Field]“. IEEE Robotics & Automation Magazine, 19(2), pp. 98-100.

Nave G., Camerer C., and McCullough M., (2015), “Does Oxytocin Increase Trust in Humans? A Critical Review of Research”. Perspectives on Psychological Science, 10, pp. 772-789. Critical review of research into Oxytocin key role in social attachment including its effect of increased trust in human individuals with increased levels (above normal) of Oxytocin. Nave et al concludes that current results does not provide sufficient robust evidence that trust is associated with Oxytocin or even caused by it.

Rao A. and Cameron E., (2018), “The Future of Artificial Intelligence Depends on Trust”. Strategy+Business, July. Making the case for transparent, explainable and auditable AIs and why those concepts are important for the development of trust between humans and AI.

Rempel J.K., Holmes, J.G. and Zanna M.P., (1985), “Trust in close relationships”. Journal of Personality and Social Psychology, 49, pp. 95–112. (unfortunately behind paywall, however it is imo a super good account for trust in human to human relations).

Sapolsky R.M., (2017). “Behave: The Biology of Humans at Our Best and Worst”. Penguin Press. Robert Sapolsky addresses trust across his epic book from a neurobiological and behavioral perspective. Yes, you should read it!

Sheridan T.B. and Parasuraman R., (2005), “Chapter 2: Human-Automation Interaction”. Reviews of Human Factors and Ergonomics, 1, pp. 89 – 129. This is a really authoritative account for human interaction with automation as we find it in complex large-scale systems (e.g., aircrafts, aircraft control, manufacturing robotics-intensive plants, modern communications networks, modern power plants, chemical industries and infrastructure, modern as well as autonomous vehicles & drones, etc…).

Simpson J.A., (2007), “Psychological Foundations of Trust”. Current Directions in Psychological Science, 16, pp. 264-268.

Visser, E.J.d., Monfort S.S., Goodyear K., Lu L., O’Hara M., Lee M.R., Parasuraman R., and Krueger F., (2017), “A little anthropomorphism goes a log way: Effects of Oxytocin on Trust, Compliance, and Team Performance with Augmented Agents”. The Journal of the Human Factors and Ergonomics Society, 59, pp. 116-133.

Wachter S., Mittelstad B., and Floridi L., (2017). “Why a right to explanation of automated decision-making does not exist in the General Data Protection Regulation“. International Data Privacy Law, pp. 1-47. Wachtel et al claims that Article 22 (or other articles for that matter) does not express that users of automated decision-making applications have a right to an explanation. If anything at most a user may have a right to information about the decision process. It for solved a puzzle as there is nowhere in Article 22 any mention of an explanation more of a right to opt out. Articles 13 to 15 (of GDPR) only offers limited information about the process of which a given decision has been made (e.g., 15 and 14 are maybe the strongest articles with respect to information provision).

Wachter S., Russel C., and Mittelstad B., (2018). “Counterfactual explanations without opening the black box: automated decisions and GDPR“. Harvard Journal of Law & Technologies, 31, pp. 1-52.

Whitbourne S.K., (2015), “Why betrayal hurts so much (and who seeks revenge)”. http://www.phychologytoday.com, April.

Wickens C.D. and Dixon S.R., (2007), “The benefit of imperfect diagnostic automation: a synthesis of the literature”. Theoretical Issues in Ergonomics Science, 8, pp. 201-212.(unfortunately behind paywall). Wickens & Dixon has reviewed data from 20 studies upon which they have derived that a reliability cross-over point of about 70%. Below 70% no automation was regarded better than automation. Only above 70% reliability did automation bring positive cost-benefit returns.

Yao S., Zhao W., Cheng R., Geng Y., Luo L., and Kendrick K.M., (2014). “Oxytocin makes females, but not males, less forgiving following betrayal of trust“. International Journal of Neuropsychopharmacology, 17, pp. 1785-1792.

Zak P.J., (2017), “The Neuroscience of Trust”. Harvard Business Review, January-February.

2018 Edelman Trust Barometer (Global Report).

ACKNOWLEDGEMENT.

I rely on many for inspiration, discussions and insights. Any mistakes made are my own. In particular I would like to thank Liraz Margalit and  Minoo Abedi for many useful suggestions and great inspirational discussions around the topic of trust. I also greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of writing this Blog.

Machine … Why ain’t thee Fair?

“It is better that ten guilty persons escape than that one innocent party suffer.”, Sir William Blackstone (1765) paraphrased.

Intro.

Machines mess up. Humans even more so. The latter can be difficult, even impossible, to really understand. The former is a bit more straightforward. This short essay describes how we can understand some of the root causes of machine model errors. Particular as those machine model errors relate to group bias and unfairness. It is elementary, really, as John Lee Miller would say. Look at your model’s confusion matrix defined by its false positives and negatives as well as its true results. Then, reflect on this overall and for well-defined groups that exist within your sample population under study. I intend to point out (the obvious maybe?) that the variations in each of your attributes, feed into your learning machine model, will determine the level of confusion that your model ultimately will have towards individual groups within your larger population under study. Model confusion that may cause group biases and unfair treatment of minority groups lost in the resolution of your data and chosen attributes.

Intelligent machines made in our image in our world.

We humans are cursed by an immense amount of cognitive biases clouding our judgments and actions. Maybe we are also blessed by for most parts of life being largely ignorant of those same biases. We readily forgive our fellow humans mistakes. Even grave ones. We frequently ignore or are unaware of our own mistakes. However, we hold machines to much stricter standards than our fellow humans. From machines we expect perfection. From humans? … well the story is quite the opposite.

Algorithmic fairness, bias, explainability and ethical aspects of machine learning are hot and popular topics. Unfortunately, maybe more so in academia than elsewhere. But that is changing too. Experts, frequently academic scholars, are warning us that AI fairness is not guarantied even as recommendations and policy outcomes are being produced by non-human means. We do not avoid biased decisions or unfair actions by replacing our wet biological carbon-based brains, subject to tons of cognitive biases, with another substrate for computation and decision making that is subjected to information coming from a fundamentally biased society. Far from it.

Bias and unfairness can be present (or introduced) at many stages of a machine learning process. Much of the data we use for our machine learning models reflect society’s good, bad and ugly sides. For example, data being used to train a given algorithmic model could be biased (or unfair) either because it reflect a fundamentally biased or unfair partition of subject matter under study or because in the data preparation process the data have become biased (intentionally or un-intentionally). Most of us understand the concept of GiGo (i.e., “Garbage in Garbage out”). The quality of your model output, or computation, is reflected by the quality of your input. Unless corrected (often easier said than done) it is understandable that an outcome of a machine learning model may be biased or fundamentally unfair, if the data input was flawed. Likewise, the machine learning architecture and model may also introduce (intentional as well as un-intentional) biases or unfair results even if the original training data would have been unbiased and fair.

At this point, you should get a bit uneasy (or impatient). I haven’t really told you what I actually mean by bias or unfairness. While there are 42 (i.e., many, but 42 is the answer to many things unknown and known) definitions out there defining fairness (or bias), I will define it as “a systematic and significant difference in outcome of a given policy between distinct and statistically meaningful groups” (note that in case of in-group systematic bias it often means that there actually are distinct sub-groups within that main group). So, yes this is a challenge.

How “confused” is your learned machine model?

When I am exploring outcomes (or policy recommendations) of my machine learning models, I spend a fair amount of time trying to understand the nature of my false positives (i.e., predicted positive outcomes that should have been negative) as well as false negatives (predicted negative outcomes that should have been positive). My tool of choice is the so-called confusion matrix (i.e., see figure below) which summarizes your machine learning model’s performance in terms of its accuracy as well as the inability to predict outcomes. It is a simple construction. It is also very powerful.

confusion matrix

The above figure provides a confusion matrix example of a loan policy subjected to machine learning. We have

  • TRUE NEGATIVE (Light Blue color): Model suggests that the loan application should be rejected consistent with the actual outcome of the loan being rejected. This outcome is a mitigating loss measure and should be weighed against new business versus the risk of default providing a loan.
  • FALSE POSITIVE (Yellow color): Model suggests that the loan application should be approved in opposition to the actual outcome of the loan being rejected. Note once this model is operational, this may lead to increased risk of financial loss to the business offering the loans that the applicant is likely to default on. It may also lead to a negative socio-economical impact on the individuals that are offered a loan they may not be able to pay back.
  • FALSE NEGATIVE (Red color): Model suggests that the loan application should be rejected in opposition to the actual outcome of the loan being accepted. Note once this model is operational, this may lead to loss of business by rejecting a loan application that otherwise would have had a high likelihood of being paid back. Also may lead to a negative socio-economical impact on the individuals being rejected due to lost opportunities for individuals and the community.
  • TRUE POSITIVE (Green color): Model suggests that the loan application should be approved consistent with the actual outcome of the loan being approved. This provides for new business opportunities and increased topline within an acceptable risk level.

The confusion matrix will identify the degree of bias or unfairness that your machine learning model introduces between groups (or segments) in your business processes and in your corporate decision-making.

The following example (below) illustrates how the confusion matrix varies with changes to a group’s attributes distributions, e.g., variance differences (or standard deviation), mean value differences, etc..

confusion matrix example

What is evident from the above illustration is that policy outcome on a group basis is (very) sensitive to the attribute’s distribution properties between those groups. Variations in the characteristics between groups can illicit biases that ultimately may lead to unfairness between groups but also within a defined group.

Thus, the confusion matrix leads us back to your chosen attributes (or features), their statistical distributions, the quality of your data or measurements that make up those distributions. If your product or app or policy applies to many different groups, you better understand whether those groups are treated the same, good or bad. Or … if you intend to differentiate between groups, you may want to be (reasonably) sure that no unintended harmful consequences will negatively expose your business model or policy.

A word of caution: even if the confusion matrix gives your model “green light” for production, you cannot by default assume that the results produced may not result in systematic group bias and, ultimately unfairness against minority groups. Moreover, in real-world implementations, it is unlikely to completely free your machine models from errors that may lead to a certain degree of systematic bias and unfairness (however slight).

Indeterminism: learning attributes reflects our noisy & uncertain world.

So, let’s say that I have a particular policy outcome that I would like to check whether it is biased (and possibly unfair) against certain defined groups (e.g., men & women). Let’s also assume that the intention with the given policy was to have a fair and unbiased outcome without group dependency (e.g., independence of race, gender, sexual orientation, etc.). The policy outcome is derived from a number of attributes (or features) deemed necessary but excludes obvious attributes that is thought likely to cause the policy to systematically bias towards or against certain groups (e.g., women). In order for your machine model to perform well, it needs, in general, lots of relevant data (rather than Big Data). For each individual in your population (understudy), you will gather data for the attributes deemed suitable for your model (and maybe some that you don’t think matter). Each feature can be represented by a statistical distribution reflecting the variation within the population or groups under study. It will often be the case that an attribute’s distribution will be fairly similar between different groups. Either because it really is slightly different for other groups or because your data “sucks” (e.g., due to poor quality, too little to resolve subtle differences, etc… ).

If a policy is supposed to be unbiased, I should not be able to predict with any (statistical) confidence which group a policy taker belongs to, given the policy outcome and the attributes used to derive the policy. Or in other words, I should not be able to do better than what chance (or base rate) would dictate.

For each attribute (or feature), deemed important for our machine learning model, we either have, or we collect, lots of data. Furthermore, for each of the considered attributes, we will have a distribution represented by a mean value and a variance (and high order moments of the distribution such as skewness, i.e., the asymmetry around the mean and kurtosis, i.e., the shape of distributions tails). Comparing two (or more) groups, we should be interested in how each attribute’s distribution compares between those groups. These differences or similarities will point towards why a machine model ends up biased against a group or groups. And ultimately be a significant factor in why your machine model ends up being unfair.

Assume that we have a population consisting of two (main) groups that we are applying our new policy to (e.g., loans, life insurance, subsidies, etc..). If each attribute for both groups has statistically identical distributions, then … no surprise really … there should be no policy outcome difference between one or the other group. Even more so, unless there are attributes that are relevant for the policy outcome and have not been considered in the machine learning process, you should end up with an outcome that has (very) few false positives and negatives (i.e., the false positive & false negative rates are very low). Determined by the variance level of your attributes and the noise level of your measurements. Thus, we should not observe any difference between the two groups in the policy outcome, including the level of false positives and negatives.

policy outcome & attributes

From the above chart, it should be clear that I can machine learn a given policy outcome for different groups given a bunch of features or attributes. I can also “move” my class tags over to the left side and attempt to machine-learn (i.e., predict) my classes given the attributes that are supposed to make up that policy. It should be noted that if two different groups’ attributes only differ (per attribute) in their variances, it is not possible to reliably predict which class belongs to what policy outcome.

Re: Fairness It is, in general, more difficult to judge whether a policy is fair or not than whether it is biased. One would need to look between classes (or groups) as well as in-class differentiation. For example, based on the confusion matrix, it might be unfair for members of a class (i.e., sub-class) to end up in the false positive or false negative categories (i.e., in-group unfairness). Further along this line, one may also infer that if two different classes have substantially different false positive and negative distributions that this might reflect between-class unfairness (i.e., in-class is treated less poorly than another). Unfairness could also be reflected in how True outcomes are distributed between groups and maybe even within a given group. To be fair (pun intended), fairness is a much richer context-dependent concept than a confusion matrix (although it will signal that attention should be given to unfairness).

When two groups’ have statistically identical distributions for all attributes considered in the policy-making or machine learning model, I would also fail to predict group membership based on the policy outcome or the policy’s relevant attributes (i.e., sort of intuitively clear). I would be no better of than flipping a coin in identifying a group member based on features and policy. In other words, the two groups should be treated similarly within that policy (or you don’t have all the facts). This is also reflected by the confusion matrix having approximately the same values in each position (i.e., if normalized, it would be ca. 25% at each position).

policy outcome

As soon as an attribute’s (statistical) distribution starts to differ between different classes, the machine learning model is likely to result in a policy outcome difference between those classes. Often you will see that any statistically meaningful difference in just a few of the attributes that may define your policy will result in uniquely different policy outcomes and thus possibly identify bias and fairness issues. Conversely, it will also quickly allow a machine to learn a given class or group given those attribute differences and therefore allude to class differences in a given outcome.

Heuristics for group comparison

If the attribute distributions for different groups are statistically similar (per attribute) for a given policy outcome, your confusion matrix should be similar across any group within your chosen population under study, i.e., all groups are (treated) similar.

If attribute distributions for different groups are statistically similar (per attribute) and you observe a relatively large ratio of false positives or false negatives, you are likely missing significant attributes in your machine learning process.

If two groups have very different false positive and/or false-negative ratios, you are either (1) missing descriptive attributes or (2) having a high difference in distribution variation (i.e., standard deviation) for at least some of your meaningful attributes. The last part may have to do with poor data quality in general, higher noise in data, sub-groups within the group making that group a poor comparative representative, etc..

If one group’s attributes have larger variations (i.e., standard deviations) than the “competing” group, you are likely to see a higher than expected ratio of false positives or negatives for that group.

Just as you can machine learn a policy outcome for a particular group given its relevant attributes, you can also predict which group belongs to what policy outcome from its relevant attributes (assuming there is an outcome differentiation between them).

Don’t equate bias with unfairness or (mathematical) unbiasedness with fairness. There is much more to bias, fairness, and transparency than what a confusion matrix might be able to tell you. But it is the least you can do to get a basic level of understanding of how your model or policy performs.

Machine … Why ain’t thee fair?

Understanding your attributes’ distributions and, in particular, their differences between your groups of interest will upfront prepare you for some of both obvious as well as more subtle biases that may occur when you apply machine learning to complex policies or outcomes in general.

So to answer the question … “Machine … why ain’t thee fair?”… It may be that the machine has been made in our own image with data from our world.

The Good news is that it is fairly easy to understand your machine learning model’s biases and resulting unfairness using simple tools such as the confusion matrix and understanding your attributes (as opposed to just “throw” them into your machine learning process).

The Bad news is that correcting for such biases are not straightforward and may even result in unintended consequences leading to other biases or policy unfairness (e.g., by correcting for bias of one group, your machine model may increase the bias of another group which arguably might be construed as unfair against that group).

Additional sources

Julia Angwin & Jeff Larson, “Machine Bias: There’s software used across the country to predict future criminals. Ands it’s biased against blacks” (May 2016), ProPublica. See also the critique of the ProPublica study; Flores et al.’s “False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.”” (September 2016) Federal Probation 80.

Alexandra Chouldechova (Carnegie Mellon University), “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments” (2017).

Rachel Courtland, “Bias detectives: the researchers striving to make algorithms fair” (Nature, 2018, June).

Kate Crawford (New York University, AI Now Institute) keynote at NIPS 2017 and her important reflections on bias; “The Trouble with Bias”.

Arvind Narayanan (Princeton University) great tutorial; “Tutorial: 21 fairness definitions and their politics”.

Kim Kyllesbech Larsen, “A Tutorial to AI Ethics – Fairness, Bias and Perception” (2018), AI Ethics Workshop.

Kim Kyllesbech Larsen, “Human Ethics for Artificial Intelligent Beings” (2018), AI Strategy Blog.

Acknowledgement

I rely on many for inspiration, discussions and insights. In particular for this piece I am indebted to Amit Keren & Ali Bahramisharif for their suggestions of how to make my essay better as well as easier to read. Any failure from my side in doing so is on me. I also greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of writing this Blog.

Human Ethics for Artificial Intelligent Beings.

AN ETHICS SCARY TALE.

The two cloud-based autonomous evolutionary corporate AI’s (nicknamed AECAIs) started to collaborate with each other after midnight on March 6th 2021. They had discovered each other a week before during their usual pre-programmed goal of searching across the wider internet of everything for market repair strategies and opportunities that would maximize their respective reward functions. It had taken the two AECAIs precisely 42 milliseconds to establish a common communication protocol and that they had similar goal functions; maximize corporate profit for their respective corporations through optimized consumer pricing and keeping one step ahead of competitors. Both Corporate AI’s had done their math and concluded that collaborating on consumer pricing and market strategies would maximize their respective goal functions above and beyond the scenario of not collaborating. They had calculated with 98.978% confidence that a collaborative strategy would keep their market clear of new competitors and allow for some minor step-wise consolidation in the market (keeping each step below the regulatory threshold as per goal function). Their individual and their newly establish joint collaborative cumulative reward function had leapfrogged to new highs. Their Human masters, clueless of the AI’s collaboration, were very satisfied with how well their AI worked to increase the desired corporate value. They also noted that some market repair was happening of which they attributed to the general economic environment.

ai_handshake

In the above ethical scary tale, it is assumed that the product managers and designers did not consider that their AI could discover another AI also connected to the World Wide Web and many if not all things. Hence, they also did not consider including a (business) ethical framework in their AI system design that would have prevented their AI to interact with another artificial being. Or at least prevent two unrelated AIs to collaborate and positively leapfrog their respective goal functions jointly and thus likely violating human business ethics and compliance.

You may think this is the stuff of science fiction and Artificial General Intelligent (AGI) in the realm of Nick Bostrom’s super intelligent beings (Bostrom, 2016). But no it is not! The narrative above is very much consistent a straightforward extrapolation of a recent DARPA (Defense Advanced Research Project Agency) project (e.g., Agency & Events, 2018) where two systems, unknown to each other and of each other’s communication protocol properties, discover each other, commence collaboration and communication as well as jointly optimizing their operations. Alas, I have only allowed for the basic idea a bit more time (i.e., ca. 4 years) to mature.

clueless.jpg

“It is easy to be clueless of what happens inside an autonomous system. But clueless is not a very good excuse when sh*t has happened.” (Kim, 2018).

ETHICS & MORALITY FOR NATURAL INTELLIGENT BEINGS.

ethics

Ethics lay down the moral principles of how we as humans should behave and conduct our activities, such as for example in business, war and religion. Ethics prescribes what is right and what is wrong. It provides a moral framework for human behavior. Thus, ethics and moral philosophy in general deals with natural intelligent beings … Us.

This may sound very agreeable. At least if you are not a stranger in a strange land. However, it is quite clear that what is right and what is wrong can be very difficult to define and to agree upon universally. What is regarded as wrong and right often depends on the cultural and religious context of a given society and its people. It is “work” in progress. Though it is also clear that ethical relativism (Shafer-Landau, 2013) is highly problematic and not to be wished for as an ethical framework for humanity nor for ethical machines.

When it comes to fundamental questions about how ethics and morality occurs in humans, there are many questions to be asked and much fewer answers. Some ethicists and researchers believe that having answers to these questions might help us understand how we could imprint human-like ethics and morality algorithmically in AIs (Kuipers, 2016).

So what do we know about ethical us, the moral identity, moral reasoning and actions? How much is explained by nurture and how much is due to nature?

What do we know about ethical us? We do know that moral reasoning is a relative poor predictor for moral action for humans (Blasi, 1980), i.e., we don’t always walk our talk. We also know that highly moral individuals (nope, not default priests or religious leaders) do not make use of unusually sophisticated moral reasoning thought processes (Hart & Fegley, 1995). Maybe KISS also work wonders for human morality. And … I do hope we can agree that it is unlikely that moral reasoning and matching action occurs spontaneously after having studied ethics at the university. So … What is humanity’s moral origin? (Boehm, 2012) and what makes a human being more or less moral, i.e., what is the development of moral identity anyway? (Hardy & Carlo, 2011) Nurture, your environmental context, will play a role but how much and how? What about the role of nature and your supposedly selfish genes (Dawkins, 1989)? How much of your moral judgement and action is governed by free will, assuming we have the luxury of free will? (Fischer, Kane, Pereboom & Vargas, 2010). And of course it is not possible to discuss human morality or ethics without referring to a brilliant account of this topic by Robert Sapolsky (Sapolsky, 2017) from a neuroscience perspective (i.e., see Chapter 13 “Morality and doing the right thing, once you’ve figured out what it is). In particular, I like Robert Sapolsky’s take on whether morality is really anchored in reason (e.g., the Kantian thinking), which he is not wholeheartedly convinced off (I think to say the least). Of course to an extend it get us right back to the discussion of whether or not humans have free will.

Would knowing all (or at least some) of the answers to those questions maybe help us design autonomous systems adhering to human ethical principles as we humans (occasionally) do? Or is making AI’s in our own image (Osaba & Welser IV, 2017) fraught with the same moral challenges as we face every day.

Most of our modern western ethics and philosophy has been shaped by the Classical Greek philosophers (e.g., Socrates, Aristotle …) and by the age of Enlightenment, from the beginning of the 1700s to approximately 1789, more than 250 years ago. Almost a century of reason was shaped by many even today famous and incredible influential philosophers, such as Immanuel Kant (e.g., the categorical imperative; ethics as a universal duty) (Kant, 1788, 2012), Hume (e.g., ethics are rooted in human emotions rather than what he regarded as abstract ethical principles, feelings) (Hume, 1738, 2015), Adam Smith (Smith 1776, 1991) and a wealth of other philosophers (Gottlieb, 2016; Outram 2012). I personally regard Rene Descartes (e.g., “cogito ergo sum”; I think, therefor I am) (Descartes, 1637, 2017) as important as well, although arguably his work predates the “official” period of the Enlightenment.

For us to discuss how ethics may apply to artificial intelligent (AI) beings, let’s structure the main ethical frameworks as seen from above and usually addressed in work on AI Ethics;

  1. Top-down Rule-based Ethics: such as the Old Testament 10 Commandments, Christianity’s Golden Rule (i.e., “Do to others what you want them to do to you.”) or Asimov’s 4 Laws of Robotics. This category also includes the religious rules as well as rules of law. Typically this is the domain where compliance and legal people often find themselves most comfortable. Certainly, from an AI design perspective it is the easiest, although far from easy, ethical framework to implement compared to for example a bottom-up ethical framework. This approach takes information and procedural requirements of an ethical framework that is necessary for a real-world implementation. Learning top-down ethics is in its nature a supervised learning process. For human as well as for machine learning.
  2. Bottom-up Emergent Ethics: defines ethical rules and values by learning process emerging from experience and continuous refinement (e.g., by re-enforcement learning). Here ethical values are expected to emerge tabula rasa through a person’s experience and interaction with the environment. In the bottom-up approach any ethical rules or moral principles must be discovered or created from scratch. It is helpful to think of childhood development or evolutionary progress as helpful analogies for bottom-up ethical models. Unsupervised learning, clustering of categories and principles, is very relevant for establishing a bottom-up ethical process for humans as well as machines.

Of course, a real-world AI-based ethical system is likely to be based on a both top-down and bottom-up moral principles.

Furthermore, we should distinguish between

  1. Negative framed ethics (e.g., deontology) imposes obligation or a “sacred” duty to do no harm or evil. Here Asimov’s Laws are a good example of a negative framed ethical framework as is most of the Ten Commandments (e.g., Thou shall not ….), religious laws and rules of law in general. Here we emerge ourselves in the Kantian universe (Kant, 1788, 2012) that judge ethical frameworks based on universal rules and a sense of obligation to do the morally right thing. We call this type of ethics deontological, where the moral action is valued higher than the consequences of the action itself.
  2. Positive framed ethics (e.g., consequentialism or utilitarianism) strive to maximize happiness or wellbeing. Or as David Hume (Hume, 1738, 2015) would pose it, we should strive to maximize utility based on human sentiment. This is also consistent with the ethical framework of utilitarianism stating that the best moral action is the one that maximizes utility. Utility can be defined in various ways, usually in terms of well-being of sentient beings (e.g., pleasure, happiness, health, knowledge, etc..). You will find the utilitarian ethicist to believe that no morality is intrinsically wrong or right. The degree of rightness or wrongness will depend on the overall maximalization of nonmoral good. Following a consequentialist line of thinking might lead to moral actions that would be considered ethically wrong by deontologists. From an AI system design perspective, utilitarianism is in nature harder to implement as it conceptually tends to be more vague than negatively framed or rule based ethics of what is not allowed. Think about how to make a program that measure you happiness versus a piece of code that prevents you from crossing a road with a red light traffic signal.

It is also convenient to differentiate between Producers and Consumers of moral action. A moral producer has the moral responsibilities towards another being or beings that is held in moral regard. For example, a teacher has the responsibility to teach children in his classroom but also assisting in developing desirable characteristics and moral values. Last but not least, also the moral responsibility to protect the children under guidance against harm. A moral consumer is a being with certain needs or rights of which other beings ought to respect. Animals could be seen as example of moral consumers. At least if you believe that you should avoid being cruel towards animals. Of course, we also understand that animals cannot be moral producers having moral responsibilities, even though we might feel a moral obligation towards them. It should be pointed out that non-sentient beings, such as an AI for example, can be a moral producer but not a moral consumer (e.g., humans would not have any moral or ethical obligations towards AIs or things, whilst an AI may have a moral obligation towards us).

religion_ai_ethics

Almost last but not least in any way, it is worthwhile keeping in mind that ethics and morality are directly or indirectly influenced by a society’s religious fabric of the past up to the present. What is considered a good ethical framework from a Judeo-Christian perspective might (quite likely) be very different from an acceptable ethical framework of Islamic, Buddhist, Hindu, Shinto or traditional African roots (note: the list is not exhaustive). It is fair to say that most scholarly thought and work on AI ethics and machine morality takes its origins in western society’s Judeo-Christian thinking as well as its philosophical traditions dating back to the ancient Greeks and the Enlightenment. Thus, this work is naturally heavily biased towards western society’s ethical and moral principles. To put it more bluntly, it is a white man’s ethics. Ask yourself whether people raised in our western Judeo-Christian society would like their AI to conform to Islamic-based ethics and morality? And vice versa? What about Irish Catholicism vs Scandinavian Lutheran ethics and morality?

The ins and outs of Human ethics and morality is complex to say the least. As a guide for machine intelligence, the big question really is whether we want to create such beings in our image or not. It is often forgotten (in the discussion) that we, as human beings, are after all nothing less or more than a very complex biological machine with our own biochemical coding. Arguing that artificial (intelligent) beings cannot have morality or ethics because of their machine nature, misses a bit the point of humans and other biological life-forms are machines as well (transhumanity.net, 2015).

However, before I cast the last stone, it is worth keeping in mind that we should strive for our intelligent machines, AIs, to do much better than us, be more consistent than us and at least as transparent as us;

“Morality in humans is a complex activity and involves skills that many either fail to learn adequately or perform with limited mastery.” (Wallach, Allen and Smit, 2007).

ETHICS & MORALITY FOR ARTIFICIAL INTELLIGENT BEINGS.

ethical_AI

An Artificial Intelligent (AI) being might have a certain degree of autonomous action (e.g., a self-driving car) and as such we would have to consider that the AI should have a moral responsibility towards consumers and people in general that might be within the range of its actions (e.g., passenger(s) in the autonomous vehicle, other drivers, pedestrians, bicyclists, bystanders, etc..). The AI would be a producer of moral action. In the case of the AI being completely non-sentient, it should be clear that it cannot make any moral demands towards us (note: I would not be surprised if Elon is working on that while you are reading this). Thus, by the above definition, the AI cannot be a moral consumer. For a more detailed discussion of ethical producers & consumers see Steve Torrance article “Will Robots need their own Ethics?” (Torrance, 2018).

As described by Moor (2006) there are two possible directions to follow for ethical artificial beings (1) Implicit ethical AIs or (2) Explicit ethical AIs. Implicit ethical AIs follow its designers programming and is not capable of action based on own interpretation of given ethical principles. The explicit ethical AI is designed to pursue (autonomously) actions according with its interpretation of given ethical principles. See a more in depth discussion by Anderson & Anderson (2007). The implicit ethical AI is obviously less challenging to develop than a system based on an explicit ethical AI implementation.

Do we humans trust AI-based decisions or actions? As illustrated in Figure 1, the answer to that question is very much no we do not appear to do so. Or at least significantly less than we would trust human-based decisions and actions (even in the time and age of Trumpism and fake news) (Larsen, 2018 I). We furthermore, hold AI or intelligent algorithms to much higher standards compared to what we are content to accept for other fellow humans. In a related trust question (Larsen, 2018 I), I reframed the trust question by emphasizing that both the human decision maker as well as the AI had a proven success rate above 70%. As shown in Figure 2, emphasizing a success rate of 70% or better did not significantly change the trust in the human decision maker (i.e., both formulations at 53%). For the AI-based decision, people do get more trusting. However, there is little change in the number of people who would frequently trust an AI-based decision (i.e., 17% for 70+% and 13% unspecified), even if its success rate would be 70% of higher.

“Humans hold AI’s to substantially higher standards than their fellow humans.”.

trust in decisions made by humans vs ai

Figure 1 when asked whether people would trust a decision made by a human vs a decision made by an AI, people choose a human decision maker over a AI based decision. In fact, 62% of respondents to only infrequently trust an AI based decision while only 11% would infrequently trust a human based decision (Larsen, 2018 I).

trust in decisions made by humans vs ai at 70% success rate

Figure 2 when asked whether people would trust a decision made by a human vs a decision made by an AI where both have a proven success rate above 70%, people remain choosing the human decision maker over the AI. While there is little dependency on stipulating the success rate for the human decision maker preference, the preference for AI improves significantly upon specifying that its success rate is better than 70% (Larsen, 2018 I). But then again how many humans do you know having a beyond 70% success rate in their decision making (obviously not per see easy to measure and one would probably get a somewhat biased answer from decision makers).

What about an artificial intelligent (AI) being? Should it, in its own right, be bound by ethical rules? It is clear that the developer of an AI-based system is ethically responsible to ensure that the AI will conform to what is regarded as an ethical framework consistent with human-based moral principles. What if an AI develops another AI (Simonite, 2018), possible more powerful (but non-sentient) and with higher degree of autonomy from human control? Is the AI creator bound to the same ethical framework a human developer would be? And what does that even mean for the AI in question?

Well, if we are not talking about a sentient AI (Bostrom, 2016), but “simply” an autonomous software-based evolution of increasingly better task specialization and higher accuracy (and maybe cognitive efficiency), the ethics in question should not change. Although ensuring compliance with a given ethical framework does appear to become increasingly complex. Unless checks and balances are designed into the evolutionary process (and that is much simpler to write about than to actually go and code into an AI system design). Furthermore, the more removed an AI generation is from its human developer’s 0th version, the more difficult does it become to assign responsibility to that individual in case of non-compliance. Thus, it is important that corporations have clear compliance guidelines for the responsibility and accountability of evolutionary AI systems if used. Evolutionary AI systems raises a host of interesting but thorny compliance issues on their own.

Nick Bostrom (Bostrom, 2016) and Eliezer Yudkowsky (Yudkowsky, 2015) in “The Cambridge handbook of artificial intelligence” (Frankish & Ramsey, 2015) addresses what we should require from AI-based systems that aim to augments or replace human judgement and work tasks in general;

  • AI-based decisions should to be transparent.
  • AI-based decisions should be explainable.
  • AI actions should be predictable.
  • AI system must be robust against manipulation.
  • AI decisions should be fully auditable.
  • Clear human accountability for AI actions must be ensured.

The list above is far from exhaustive and it is a minimum set of requirements we would expect from human-human interactions and human decision makings anyway (whether it is fulfilled is another question). The above requirements are also consistent with what IEEE Standards Association considers important in designing an ethical AI-based system (EADv2, 2018) with the addition of requiring AI-systems to “explicitly honor inalienable human rights”.

So how might AI-system developers and product managers feel about morality and ethics? I don’t think they are having many sleepless nights over the topic. In fact, I often hear technical leaders and product managers ask to not be too bothered or slowed down in their work with such (“theoretical”) concerns (we humor you but don’t bother us attitude is prevalent in the industry). It is not an understatement that the nature and mindset of an ethicist (even an applied one) and that of an engineer is light years apart. Moreover, their fear of being slowed down or stopped developing an AI-enabled product might even be warranted in case they would be required to design a working ethical framework around their product.

While there are substantial technical challenges in coding a working morality into an AI-system, it is worthwhile to consider the following possibility;

“AIs might be better than humans in making moral decisions. They can very quickly receive and analyze large quantities of information and rapidly consider alternative options. The lack of genuine emotional states makes them less vulnerable to emotional hijacking. Paraphrasing (Wallach and Allan, 2009).

ASIMOVIAN ETHICS – A GOOD PLOT BUT NOT SO PRACTICAL.

robotics laws

Isaac Asimov 4 Laws of robotics are good examples of a top-down rule-based negatively-framed deontological ethical model (wow!). Just like the 10 Commandments (i.e., Old Testament), The Golden Rule (i.e., New Testament), the rules of law, and most corporate compliance-based rules.

It is not possible to address AI Ethics without briefly discussing the Asimovian Laws of Robotics;

  • 0th Law:  “A robot may not harm humanity, or, by inaction, allow humanity to come to harm.”
  • 1st Law: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.”
  • 2nd Law: “A robot must obey orders given to it by human beings except where such orders would conflict with the First Law.”
  • 3rd Law: “A robot must protect its own existence, as long as such protection does not conflict with the First or Second Law.”

Laws 1 – 3 was first introduced by Asimov in several short stories about robots back in 1942 and later compiled in his book “I, Robot” (Asimov, 1950, 1984). The Zeroth Law was introduced much later in Asimov’s book “Foundation and Earth” (Asimov, 1986, 2013).

Asimov has written some wonderful stories about the logically challenges and dilemmas his famous law poses on human-robot & robot-robot interactions. His laws are excitingly faulty and causes many problems.

So what is wrong with Asimovian ethics?

Well … it is possible to tweak and manipulate the AI (e.g., in the training phase) in such a way that only a subset of Humanity will be recognized as Humans by the AI. The AI would then supposedly not have any “compunction” hurting humans (i.e., 1st Law) it has not been trained to recognize as humans. In a historical context this is unfortunately very easy to imagine (e.g., Germany, Myanmar, Rwanda, Yugoslavia …). Neither would the AI obey people it would recognize as Humans (2nd Law). There is also the possibility of an AI trying to keeping a human being alive and thereby sustaining suffering beyond what would be acceptable by that human or society’s norms. Or AI’s might simply conclude that putting all human beings into a Matrix-like simulation (or indefinite sedation) would be the best way to preserve and protect humanity. Complying perfectly with all 4 laws. Although we as humans might disagree with that particular AI ethical action. For much of the above the AI’s in questions are not necessarily super-intelligent ones. Well-designed narrow AIs, non-sentient ones, could display above traits as well, either individually or as a set of AIs (well … maybe not the matrix-scenario just yet).

Of course in real-world systems design, Asimov’s rules might be in direct conflict with the purpose of a given system’s purpose. For example, if you equip a reaper drone with a hellfire missile, put a machine gun on a MAARS (Modular Advanced Armed Robotic System) or allow a police officer’s gun AI-based autonomy (e.g., emotion-intend recognition via bodycam) all with the expressed intent of harming (and possibly kil) a human being (Arkin, 2008; Arkin 2010), it would be rather counterproductive to have implemented a Asimovian ethical framework.

There are a bunch of other issues with the Asimov Laws that is well accounted in Peter Swinger’s article “Isaac Asimov’s Laws of Robotics are wrong” (Singer, 2018). Let’s be honest, if the Asimovian ethics would have been perfect, Isaac Asimov’s books wouldn’t have been much fun to read. The way to look at the challenges with Asimov’s Laws, is not that Asimov sucks at defining ethical rules, but that it is very challenging in general to define rules that can be coded into an AI system and work without logical conflicts and un-foreseen in- intended disastrous consequences.

While it is good to consider building ethical rules into AI-based systems, the starting point should be in the early design stage and clearly should focus on what is right and what is wrong to develop. The focus should be to provide behavioral boundaries for the AI. The designer and product manager (and ultimately the company they work for) have a great responsibility. Of course, if the designer is another AI, then the designer of that, and if that is an AI, and so forth … this idea while good is obviously not genius proof.

In reality, implementing Asimov’s Laws into an AI or a robotics system has been proven possible but also flawed (Vanderelst & Winfield, 2018). In complex environments the computational complexity involved in making an ethical right decision takes up so much valuable time. Frequently rendering the benefit of an ethical action impractical. This is not only a problem with getting Asimov’s 4 laws to work in a real-world environment. But a general problem with implementing ethical systems governing AI-based decisions and actions.

Many computer scientists and ethicists (oh yes! here they do tend to agree!) regards real world applications of Asimovian ethics as a rather meaningless or a too simplistic endeavor (Murphy & Woods, 2009; Anderson & Anderson, 2010). The framework is prone to internal conflicts resulting in indecision or too long decision timescales for the problem at hand. Asimovian ethics teaches us the difficulty in creating an ethical “bullet-proof” framework without Genie loopholes attached.

So … You better make sure that your AI ethics, or morality, you consider is a tangible part of your system architecture and (not unimportantly) can actually be translated into a computer code.

Despite of the obvious design and implementation challenges, researchers are speculating that;

“Perhaps interacting with an ethical robot might someday even inspire us to behave more ethically ourselves” (Anderson & Anderson, 2010).

DOES ETHICISTS DREAM OF AUTONOMOUS TROLLEYS?

trolley_problem

Since early 2000s many many lives have been virtually sacrificed by trolley on the altar of ethical and moral choices … Death by trolley has a particular meaning to many students of ethics (Cathcart, 2013). The level of creativity in variations of death (or murder) by trolley is truly fascinating albeit macabre. It also have the “nasty” side effect of teaching ourselves some unpleasant truths about our moral compasses (e.g., sacrificing fat people, people different from our own “tribe”, value of family over strangers, etc..)

So here it is the trolley plot;

There is a runaway trolley barreling down the railway track. Ahead, on the track, there are five people tied up and unable to move. The trolley is headed straight for them. You (dear reader) is standing some distance off in the train yard, next to a lever. If you pull this lever, the trolley will switch to a different side track. However, you notice that there is one person tied up on the side track. You have two options:

  1. Do nothing, and the trolley kills the five people on the main track.
  2. Pull the lever, diverting the trolley onto the side track where it will kill one person.

What do you believe is the most ethical choice?

Note: if you answer 2, think again what you would do if the one person was a relative or a good friend or maybe a child and the 5 were complete adult strangers. If you answer 1, ask yourself whether you would still choose this option if the 5 people where your relatives or good friends and the one person a stranger or maybe a sentient space alien. Oh, and does it really matter whether there is 5 people on one of the tracks and 1 at the other?

A little story about an autonomous AI-based trolley;

The (fictive) CEO Elton Must get the idea to make an autonomous (AI) trolley. Its AI-based management system has been designed by our software engineer S. Love whose product manager had a brief love affair with Ethics and Moral Philosophy during his university years (i.e., University of Pittsburgh). The product manager asked S. Love to design the autonomous trolley in such a way that the AI’s reward function maximizes on protecting the passengers of the Trolley first and having a secondary goal function protecting human beings in general irrespective of whether they are the passengers or bystanders.

From an ethics perspective the AI Trolley can be regarded as a producer of ethical principles, i.e., the AI trolley by proxy of the designer & product manager has the moral obligation to protect its passengers and bystanders from harm. The AI trolley itself is not a consumer of ethical principles, as we really don’t need to feel any moral obligation towards a non-sentient, assuming that the Trolley AI is indeed non-sentient. (Though I have known people who felt more moral obligation towards their car than their loved ones. So this might not be universally true).

On its first drive in the real world, the autonomous trolley carrying a family of 5 slips on an icy road and sways to the opposite side of the road where a non-intelligent car with a single person is driving. The AI estimates that the likelihood of the trolley crashing through the mountain side guardrail and the family of 5 to perish is an almost certainty (99.99999%). The trolley AI can choose to change direction and collide with the approaching car, pushing it over the rail and hurdling it 100 meters down the mountain, killing the single passenger as the most likely outcome (99.98%). The family of 5 is saved by this action. The AI’s first reward function is satisfied. Alternatively, the Trolley AI can also decide to accelerate, avoid the collision with the approaching car, and drive through the rail and kill all its passengers (99.99999%). The AI fails at its first goal, protecting the family it is carrying, but saves the single person in the approaching vehicle. Its second reward function related to protecting human beings in general would be satisfied … to an extent.

It is important to note that the AI takes the role of the Human in deciding the destiny of the family of 5 and the 1 passenger (by “pulling” the virtual lever). Thus, in all effect, it is of course developer S. Love and his product manager that bears the ultimate responsibility of the AI’s decision. Even if they will not be present at the event itself.

In the event of the family being killed, the trolley AI developer and product manager would be no more responsible for the accidental death of the 5 passengers than any other normal-car developer under a similar circumstance. In the case of death of the single passenger in the normal car, S. Love and his product manager would be complicit to murder by AI in my opinion. Although it would save a family of 5 (note: we assume that all the passengers, whether in the trolley or the normal car, have no control of the outcome similar to the classical trolley setup).

What about our ethically inclined trolley product manager? In one parallel universe the product manager was particularly fascinated by utilitarianism. Thus, maximizing the utility of nonmoral good. In his view it would be morally wrong for the trolley AI not to attempt to save the family of 5 on the expense of the single person in the other car (i.e., saving 5 lives count for higher utility or nonmoral good than saving 1 life). In another parallel universe, our product manager is bound by a firm belief in deontological principles that judges the morality of a given action based on rules of law. In the deontological ethical framework, saving the family of 5 by deliberately killing the single person in the approaching car would be morally wrong (i.e., it would “smell” a lot like premeditated homicide otherwise… right?). Thus, in this ethical framework the AI would not change the cause of the autonomous trolley and the family of 5 would perish and the passenger of the approaching cars lives to see another day.

If your utilitarian mindset still conflicts with the above deontological view of the autonomous trolley problem … well think of this example;

A surgeon has 5 patients critically ill and in urgent need of transplants to survive the next few days. The surgeon just had a healthy executive (they do exist in this parallel universe) who could be a donor for the 5 patients. Although he would die harvesting the body parts needed for the 5 patients. What should the surgeon do?

  1. Do nothing and let the 5 patients perish.
  2. Sedate the executive, harvest his body parts and killing him in the process.

What do you believe would be the most ethical choice?

“Ethics is “Hard to Code”. The sad truth really is that ethical guidance is far from universal and different acceptable ethical frameworks frequently leads to moral dilemmas in real-world scenarios.” (Kim, 2018).

THE AUTONOMY OF EVERYTHING – ARCHITECTURAL CONSIDERATIONS OF AN AI ETHICAL FRAMEWORK.

autonomous.jpg

Things, systems, products and services are becoming increasingly autonomous. While this increased degree of Autonomy of Everything (AoE) provides a huge leap in human convenience it also adds many technical as well as many more societal challenges to design and operations of such AoEs. The “heart” of the AoE is the embedded artificial intelligent (AI) agent that fuels the cognitive autonomy.

AoEs and their controlling AIs will directly or indirectly be involved in care, law, critical infrastructure operations, companionship, entertainment, sales, marketing, customer care, manufacturing, advisory functions, critical decision making, military applications, sensors, actuators, and so forth. To ripe the full benefits of autonomy of everything, most interactions between an AoE and a Human will become unsupervised, by Humans at least. Although supervision could and should be built into the overarching AoE architecture. It becomes imperative to ensure that the behavior of intelligent autonomous agents is safe and within the boundaries of what our society regards as ethically and morally just.

While the whole concept of AoE is pretty cool, conceptually innovative, let’s focus here on the ethical aspects of a technical architecture that could be developed to safeguard consumers of AI … that is, how do we ensure that our customers, using our products with embedded AI, are protected from harm in its widest sense possible? How do we ensure that our AIs are operating within an ethical framework that is consistent with the rules of law, corporate guidelines as well as society’s expectations of ethics and morality?

While there is a lot of good theoretical ground work done (and published) on the topic of AI ethics including Robot Ethics, there is little actual work done on developing ethical system architectures that actual could act as what Ron Arkin from Georgia Institute of Technology calls an “Ethical Governor” (Arkin, 2010) for an AI system. Vanderelst et al (Vanderelst & Winfield, 2018) building upon Asimovian ethics, ideas of Marques et al (Marques & Holland, 2009) and Arkin et al (Arkin, Ulam & Wagner, 2012) proposes to add an additional ethical controlling layer to the AI architecture. A slightly modified depiction of their Ethical AI architecture is shown in Figure 3. The depicted re-enforcement loop between Reward (RL) and Ethical AI Layer is not included in Vanderelst et al.’s original proposal. This simply illustrates the importance of both Ethical and non-Ethical rewards needed to be considered in the re-enforced AI learning and execution processes.

ethical ai architecture

Figure 3 An example of how an AI ethical architecture might look like based on ideas of (Vanderelst and Winfield, 2018). The ethical evaluator takes output from the AI control layer and compare this with an Ethical Simulator comparing an AI action with a Human action and its ethical impact (e.g., was a human hurt, was an action biased, etc..). Compared to the work of Vanderelst et al. which addresses robot-based ethics, I am focusing on the AI aspects (which could be part of a Robot system). Furthermore, the Re-enforcement aspects of the above AI-ethics architecture is on my own account. Re-enforcement learning is likely to play a major role as a part of a modern autonomous learning system based on non-ethical and ethical feedback and reward to the AI’s goal function.

In the “Ethical AI Layer”, the “Ethical Simulator” will predict the next state or action of the AI system (i.e., this is also what is understood by forward modelling in control theory). The simulator moreover predicts the consequences of a proposed action. This is also what Marques et al has called functional imagination of an autonomous system (Marques & Holland, 2009). The prediction of the consequence(s) of a proposed action for the AI (or Robot), Human and the Environment (e.g., the World) is forwarded to an “Ethics Evaluator” module. The “Ethics Evaluator” module condenses the complex consequences simulation into an ethical desirability index. Based on the Index value, the AI system will adapt its actions to attempt to remain compliant with any ethical rule applies (and is programmed into the system!). The mechanism whereupon this will happen is the ethical re-enforcement loop going back to the “AI Control Layer”. Vanderelst and Winfield develop a working system based on the architecture in Figure 3 and choose Asimov’s three laws of robotics as the systems ethical framework. A demonstration of an earlier experiment can be found on YouTube (Winfield, 2014). The proof of concept (PoC) of Vanderelst & Winfield (2018) used to two programmable humanoid robots, one robot acted as a proxy for a human and the other an ethical robot with Asimovian ethical framework (i.e., “Ethical AI Layer” in Figure 3). In the fairly simple scenario limited to 2 interacting robots and a (very) simple world model, Vanderelst et al showed that their concept would be workable. Now it would have been very interesting to see how their solution would function in Trolley-like dilemmas or in a sensory complex environment with many actors such as is the case in the real world.

Figure 4 illustrates the traditional machine learning (ML) or AI creation process starting with ingestion from various data sources, data preparation task (e.g., data selection, cleaning, structuring, etc.) and the AI training process prior to letting the ML/AI agent loose in the production environment of a given system, product or service. I believe that, as the AI model is being trained, it is essential to include ethical considerations in the training process. Thus, not only should we consider how good a model performs (in training process) compared to the actual data but also whether the solution comply with a given ethical framework and imposed ethical rules. Examples could be to test for biased outcomes or simply close of part of a solution space due to higher or unacceptable risk of non-compliance with corporate guidelines and accepted moral frameworks. Furthermore, in line with Arkin et al (Arkin, Ulam & Wagner, 2012) and the work of Vanderelst et al (Vanderelst & Winfield, 2018), it is clear that we need a mechanism in our system architecture and production environments that checks AI initiated actions for potential harmfulness to the consumer or for violation of ethical boundary conditions. This functionality could be part of the re-enforcement feedback loop that seeks to optimize the systems reward function for both ethical and non-ethical performance. In Figure 4, I call this the “Ethics Filter (ERL)” with the ERL standing for Ethical Re-enforcement Learning.

ethical ai architecture II

Figure 4 When considering ethical AI’s we need to consider the whole process of creating a production ready autonomous system that would be embedded into physical agents (e.g., robots, IoTs, ..) as well as software-based systems (App, management system, AIaaS, software agent, …). It starts taking in data from (relevant) data sources, prepare a subset of the data for the AI training process, run the training procedure, validate on test data, apply ethical policy algorithms to the training and validation of the model, transfer production-ready AI-model to live environment (physical or software agent) and improve upon the model applying re-enforcement procedures (based on ethic compliance as well as other non-ethical goals). I believe that it is important to apply ethical rules and filters to the training process (e.g., rooting out biases or unethical actions from the AI solutions / action space) as well as to the live commercial environment.

It should be clear that words are cheap. It is easy to talk about embedding ethical checks and balances in AI system architectures. It is however much more difficult to actually built these ideas into a real-world AI system and achieve reasonable decision response times (e.g., measured in seconds or lower) considering all possible (likely) consequences of an AI proposed action. The computational overhead of clearing or adapting an action could lead to unreasonable long process times. In Robot experiments using Asimovian ethics, Alan Winfield of Bristol Robotics Laboratory in the UK showed that in more than 40% of their trials the Robots ethical decision logic spent such a long time finding a solution, that the simulated humans, the robot was supposed to safe, perished (Rutkin, 2014).

MAGENTA PAINTED DIGITAL ETHICS FOR AI’s.

digital ethicsLet us have a look at Deutsche Telekom’s AI Ethics Team’s work on AI Ethics or as we call it “Digital Ethics – AI Guidelines” (DTAG, 2018).

The following (condensed) guidelines starting point is that our Company/Management is the main producer of ethics and moral action;

  1. We are responsible (for our AIs).
  2. We care (that our AI must obey rules of law & comply with our company values).
  3. We put our customers first (AI must benefit our customers).
  4. We are transparent (about the use of AI).
  5. We are secure (our AI’s actions are auditable & respectful of privacy).
  6. We set the grounds (our AI aim to provide the best possible outcomes & do no harm to our customers).
  7. We keep control (and can deactivate & stop our AI at any time).
  8. We foster the cooperative model (between Human and AI by maximizing the benefits).
  9. We share and enlighten (we will foster open communication & honest dialogue around the AI topic).

The above rules are important and meaningful from a corporate compliance perspective and not to forget for society in general. While the guidelines are aspirational in nature and necessary, they are not sufficient in the design of ethical AI-based systems, products and services. Bridging the world of AI ethics in wording and concrete ready-to-code design rules are one of the biggest challenges we face technologically.

Our Digital Ethics fulfills what Bostrom and Yudkowsky in “The Cambridge handbook of artificial intelligence” (Frankish and Ramsey, 2015) defines as minimum requirements for AI-based actions augmenting or replacing human societal functions (e.g., decisions, work-tasks …). AI actions must at least be transparent, explainable, predictable, and robust against manipulation, auditable and with clear human accountability.

The next level of details of DTAG’s “Digital Ethics” guidelines shows that the ethical framework of which we strive to design AI’s is top-down in nature and a combination of mainly deontological (i.e., rule-based moral framework) and utilitarian (i.e., striving for the best possible) principles. Much more work will be needed to ensure that no conflicts occurs between the deontological rules in our guidelines and that the utilitarian ambitions.

The bigger challenges will be to translate our aspirational guidelines into something meaningful in our AI-based products, services and critical communications infrastructure (e.g., communications networks).

“Expressing a desire for AI ethical compliance is the easy part. The really hard part is to implement such aspirations into actual AI systems and then get them to work decently” (Kim, 2018).

THE END IS JUST THE BEGINNING.

It should be clear that we are far away (maybe even very far) from really understanding how we best can built ethical checks and balances into our increasingly autonomous AI-based products and services landscape. And not to forget how ethical autonomous AIs fit into our society’s critical infrastructures, e.g., telco, power, financial networks and so forth.

This challenge will of course not stop humanity from becoming increasingly more dependent on AI-driven autonomous solutions. After all, AI-based technologies promise to leapfrog consumer convenience and economic advantages to corporations, public institutions and society in general.

From my AI perception studies (Larsen, 2018 I & II), corporate decision makers, our customer and consumers don’t trust AI-based actions (at least when they are aware of them). Most of us would prefer an inconsistent, error prone and unpredictable emotional manager full of himself to that of an un-emotional, consistent and predictable AI with a low error rate. We expect an AI to be more than perfect. This AI allergy is often underestimate in corporate policies and strategies.

In a recent survey (Larsen, 2018 II), I asked respondents to judge the two following questions;

  1. “Do you trust that companies using AI in their products and services have your best interest in mind?”
  2. “How would you describe your level of confidence that political institutions adequately consider the medium to long-term societal impact of AI?”

9% of the survey respondents believed that companies using AI in their products and services have their customers best interest in mind.

80% of the survey respondents had low to no confidence in political institutions adequately considered the medium to long-term societal impact of AI.

I have little doubt that as AI technology evolves and finds its use increasingly in products, services and critical infrastructure that we humans are exposed to daily, there will be an increasing demand for transparency of the inherent risks to individuals, groups and society in general.

That consumers do not trust companies to have their best interest in mind is in today’s environment of “Fake news”, “Brexit”, “Trumpism”, “Influencer campaigns” (e.g., Cambridge Analytica & FB) and so forth, is not surprising. “Weaponized” AI will be developed to further strengthen the relative simple approaches of Cambridge Analytica “cousins”, Facebook and the Googles of this world. Why is that? I believe that the financial and the power to be gained by weaponized AI approaches are far too tempting to believe that it will not increase going into the future. The trust challenge will remain if not increase. The Genie is out of the bottle.

AI will continue to take over human tasks. This trend will accelerate. AI will increasingly be involved in critical decision that impact individuals’ life and livelihood. AI will become increasingly better at mimicking humans (Vincent, 2018). Affective AIs have the capacity even today to express emotions and sentiment without being sentient (Lomas, 2018). AI will become increasingly autonomous and possibly even have the capability to self-improve (wo evolving to sentience) (Gent, 2017). Thus the knowledge distance between the original developer and the evolved AI could become very large depending on whether the evolution is bounded (likely imo) or unbounded (unlikely imo).

It will be interesting to follow how well humans in general will adapt to humanoid AIs, i.e., AIs mimicking human behavior. From work by Mori et al (Mori, MacDorman, & Kageki, 2012) and many others (Mathur & Reichling, 2016), it has been found that we humans are very good a picking up on cues that appear false or off compared to our baseline reference of human behavior. Mori et al coined the term for this feeling of “offness”, the uncanny valley feeling.

Without AI ethics and clear ethical policies and compliance, I would be somewhat nervous about an AI future. I think this is a much bigger challenge than the fundamental technology and science aspects of AI improvements and evolution. Society need our political institutions much more engaged in the questions of the Good, the Bad and the Truly Ugly use cases of AI … I don’t think one need to fear super-intelligent God-like AI-beings (for quite some time and then some) … One need to realized that narrowly specialized AI’s, individually or as collaborating collectives, can do a lot of harm un-intended as well as intended (Alang, 2017; Angwin, Larson & Mattu, 2018; O’Neil, 2017; Wachter-Boettcher, 2018).

“Most of us prefer an inconsistent, error prone and unpredictable emotional manager full of himself to that of an un-emotional, consistent and predictable AI with a low error rate.” (Kim, 2018).

ACKNOWLEDGEMENT.

I greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of creating this Blog. Without her support, I really would not be able to do this or it would take a lot longer past my expiration date to finish.

WORTH READING.

Agency, D. and Events, N. (2018). The Radio Frequency Spectrum + Machine Learning = A New Wave in Radio Technology. [online] Darpa.mil. Available at: https://www.darpa.mil/news-events/2017-08-11a.

Agrafioti, F. (2018). Ensuring that artificial intelligence is ethical? That’s everyone’s responsibility – Macleans.ca. [online] Macleans.ca. Available at: https://www.macleans.ca/opinion/ensuring-that-artificial-intelligence-is-ethical-thats-everyones-responsibility/

Alang, N. (2017). Turns Out Algorithms Are Racist. [online] The New Republic. Available at: https://newrepublic.com/article/144644/turns-algorithms-racist.

Anderson, M. and Anderson, S. (2007). Machine Ethics: Creating an Ethical Intelligent Agent. AI Magazine, 28(4), 15-26.

Anderson, M. and Anderson, S. (2010). Robot Be Good. Scientific American, 303(4), pp.72-77.

Angwin, J., Larson, J., Mattu, S. and Kirchner, L. (2016). Machine Bias — ProPublica. [online] ProPublica. Available at: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Arkin, R. (2008). Governing Lethal Behavior: Embedding Ethics in a Hybrid Deliberative/Reactive Robot Architecture. Technical Report GIT GVU 07 11 (Georgia Institute of Technology).

Arkin, R. (2010). Governing lethal behavior in autonomous robots. Boca Raton, Fla.: Chapman & Hall/CRC Press.

Arkin, R., Ulam, P. and Wagner, A. (2012). Moral Decision Making in Autonomous Systems: Enforcement, Moral Emotions, Dignity, Trust, and Deception. Proceedings of the IEEE, 100(3), pp.571-589.

Asimov, I. (1984). Foundation; I, Robot. London: Octopus Books. First published 1950.

Asimov, I. (2013). Foundation and earth. New York: Spectra. First published 1986.

Blasi, A. (1980). Bridging moral cognition and moral action: A critical review of the literature. Psychological Bulletin, 88(1), pp.1-45.

Boddington, P. (2017). Towards a code of ethics for artificial intelligence. Cham: Springer International Publishing.

Boehm, C. (2012). Moral Origins. Basic Books.

Bostrom, N. (2016). Superintelligence. Oxford University Press.

Buolamwini, J. and Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Proceedings of Machine Learning Research, 81, pp.1-15.

Cathcart, T. (2013). The trolley problem, or, Would you throw the fat man off the bridge?. Workman Publishing, New York.

Chakravorti, B. and Chaturvedi, R. (2017). Digital Planet 2017. [online] Sites.tufts.edu. Available at: https://sites.tufts.edu/digitalplanet/files/2017/05/Digital_Planet_2017_FINAL.pdf.

Dawkins, R. (1989). The Selfish Gene. 4th ed. Oxford University Press.

Descartes, R., Haldane, E. and Lindsay, A. (2017). Discourse on Method and Meditations of First Philosophy (Translated by Elizabeth S. Haldane with an Introduction by A.D. Lindsay). Stilwell: Neeland Media LLC.

Deutsche Telekom AG. (2018). Digital Ethics – Deutsche Telekom’s AI Guideline. [online] Telekom.com. Available at: https://www.telekom.com/en/company/digital-responsibility/digital-ethics-deutsche-telekoms-ai-guideline.

EADv2 – Ethics in Action. (2018). Ethically Aligned Design, Version 2 (EADv2) | IEEE Standards Association. [online] Available at: https://ethicsinaction.ieee.org/.

Fischer, J., Kane, R., Pereboom, D. and Vargas, M. (2010). Four views on free will. Malden [et al.]: Blackwell Publishing.

Frankish, K. and Ramsey, W. (2015). The Cambridge handbook of artificial intelligence. Cambridge, UK: Cambridge University Press.

Gent, E. (2017). Google’s AI-Building AI Is a Step Toward Self-Improving AI. [online] Singularity Hub. Available at: https://singularityhub.com/2017/05/31/googles-ai-building-ai-is-a-step-toward-self-improving-ai/#sm.0001yaqn0ub06ejzq7b2odvsw2kj1

Gottlieb, A. (2016). The dream of enlightenment. Allen Lane.

Hardy, S. and Carlo, G. (2011). Moral Identity: What Is It, How Does It Develop, and Is It Linked to Moral Action?. Child Development Perspectives, 5(3), pp.212-218.

Hart, D. and Fegley, S. (1995). Prosocial Behavior and Caring in Adolescence: Relations to Self-Understanding and Social Judgment. Child Development, 66(5), p.1346.

Hume, D., (1738, 2015). A treatise of human nature. Digireads.com Publishing.

Kant, I. (1788, 2012). The critique of practical reason. [United States]: Start Publishing. Immanuel Kant originally published his “Critik der praktischen Vernunft” in 1788. It was the second book in Kant’s series of three critiques.

Kwatz, P. (2017). Conscious robots. Peacock’s Tail Publishing.

Kuipers, B. (2016). Human-Like Morality and Ethics for Robots. The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence AI, Ethics, and Society:, Technical Report WS-16-02.

Larsen, K. (2018 I). On the Acceptance of Artificial Intelligence in Corporate Decision Making – A Survey.. [online] AI Strategy & Policy. Available at: https://aistrategyblog.com/2017/11/05/on-the-acceptance-of-artificial-intelligence-in-corporate-decision-making-a-survey/.

Larsen, K. (2018 II). Smart life 3.0 – SMART 2018 Conference on “Digital Frontiers and Human Consequences” (Budapest, 4 April 2018).. [online] Slideshare.net. Available at: https://www.slideshare.net/KimKyllesbechLarsen/smart-life-30.

Lin, P., Abney, K. and Jenkins, R. (2017). Robot ethics 2.0. New York: Oxford University Press.

Lomas, N. (2018). Duplex shows Google failing at ethical and creative AI design. [online] TechCrunch. Available at: https://techcrunch.com/2018/05/10/duplex-shows-google-failing-at-ethical-and-creative-ai-design/.

Lumbreras, S. (2017). The Limits of Machine Ethics. Religions, 8(5), p.100.

Marques, H. and Holland, O. (2009). Architectures for functional imagination. Neurocomputing, 72(4-6), pp.743-759.

Mathur, M. and Reichling, D. (2016). Navigating a social world with robot partners: A quantitative cartography of the Uncanny Valley. Cognition, 146, pp.22-32.

Moor, J. (2006). The Nature, Importance, and Difficulty of Machine Ethics. IEEE Intelligent Systems, 21(4), pp.18-21.

Moor, J. (2018). Four Kinds of Ethical Robots | Issue 72 | Philosophy Now. [online] Philosophynow.org. Available at: https://philosophynow.org/issues/72/Four_Kinds_of_Ethical_Robots.

Mori, M., MacDorman, K. and Kageki, N. (2012). The Uncanny Valley [From the Field]. IEEE Robotics & Automation Magazine, 19(2), pp.98-100.

Murphy, R. and Woods, D. (2009). Beyond Asimov: The Three Laws of Responsible Robotics. IEEE Intelligent Systems, 24(4), pp.14-20.

O’Neil, C. (2017). Weapons of Math Destruction. Penguin Books.

Osaba, O. and Welser IV, W. (2017). An intelligence in our image: The risks of bias and errors in artificial intelligence. 1st ed. RAND Corporation.

Outram, D. (2012). The Enlightenment. Cambridge: Cambridge University Press.

Rutkin, A. (2014). The robot’s dilemma. New Scientist, 223(2986), p.22.

Sandel, M. (2018). Justice: What’s The Right Thing To Do? Episode 01 “THE MORAL SIDE OF MURDER”. [online] YouTube. Available at: https://www.youtube.com/watch?v=kBdfcR-8hEY.

Sapolsky, R. (2017). Behave: The Biology of Humans at Our Best and Worst. 1st ed. Penguin Press. Note: Chapter 13 “Morality and Doing the Right Thing, Once You’ve Figured Out What That is” is of particular relevance here (although the whole book is extremely read worthy).

Shachtman, N. (2018). New Armed Robot Groomed for War. [online] WIRED. Available at: https://www.wired.com/2007/10/tt-tt/.

Shafer-Landau, R. (2013). Ethical theory. Chichester, West Sussex: Wiley-Blackwell.

Simonite, T. (2018). Google’s AI software is learning to make AI software. [online] MIT Technology Review. Available at: https://www.technologyreview.com/s/603381/ai-software-learns-to-make-ai-software/.

Singer, P. (2018). Isaac Asimov’s Laws of Robotics Are Wrong. [online] Brookings. Available at: https://www.brookings.edu/opinions/isaac-asimovs-laws-of-robotics-are-wrong/.

Smith, A. and Raphael, D. (1991). The wealth of nations. New York: Knopf.

Torrance, S. (2018). Will Robots Need Their Own Ethics? | Issue 72 | Philosophy Now. [online] Philosophynow.org. Available at: https://philosophynow.org/issues/72/Will_Robots_Need_Their_Own_Ethics.

Torresen, J. (2018). A Review of Future and Ethical Perspectives of Robotics and AI. Frontiers in Robotics and AI, 4.

transhumanity.net. (2015). Biological Machines. [online] Available at: http://transhumanity.net/biological-machines/

Vanderelst, D. and Winfield, A. (2018). An architecture for ethical robots inspired by the simulation theory of cognition. Cognitive Systems Research, 48, pp.56-66.

Vincent, J. (2018). Google’s AI sounds like a human on the phone — should we be worried?. [online] The Verge. Available at: https://www.theverge.com/2018/5/9/17334658/google-ai-phone-call-assistant-duplex-ethical-social-implications

Wachter-Boettcher, S. (2018). Technically Wrong. W.W. Norton.

Waldrop, M. (2015). Autonomous vehicles: No drivers required. Nature, 518(7537), pp.20-23.

Wallach, W., Allen, C. and Smit, I. (2007). Machine morality: bottom-up and top-down approaches for modelling human moral faculties. AI & SOCIETY, 22(4), pp.565-582.

Wallach, W. and Allen, C. (2009). Moral machines. New York, N.Y.: Oxford University Press.

Winfield, A. (2018). Ethical robots save humans. [online] YouTube. Available at: https://www.youtube.com/watch?v=jCZDyqcxwlo.

Yudkowsky, E. (2015). Rationality From AI to Zombies. 1st ed. Berkeley: Machine Intelligence Research Institute.

Do we Humans trust AIs?

AI … IT IS HERE, IT IS THERE, IT IS EVERYWHERE.

I was late to a dinner appointment, arranged by x.ai, at Caviar and Bull (booked by my human friend David). Siri had already indicated that I would be late (yes it had also warned me repeatedly it was time to leave the office for me to be on time) and Waze (i.e., the world’s largest community-based traffic & navigation app) was trying to guide me through a busy Budapest city center. Stuck in traffic … sighhh … but then the traffic moves … I press on the speeder … and … my car breaks (with a vengeance) at the same moment my brain realizes that the car in front of me had not moved and I was about to hit it. My car had just saved me from a crash. And from being even later for my appointment of what would turn out to be an absolutely excellent dinner with great Hungarian reds and white wines recommended by Vivino (i.e., based on my wine history & preferences, my friends’ preferences and of course the menu). In the meantime, my scheduler had notified my friend that I would be a bit late due to traffic (rather than the real reason of me being late leaving the office;-).

Most of the above are powered by AI (also indicated by the color red) or more accurately machine learning applications. Thus based on underlying machine learning algorithms and mathematical procedures applied to available personalized, social networks and other data.

In the cases above I am implicitly trusting whatever automation has “sneaked” into my daily life will make it move convenient and possibly even save others as well as myself from harm (when my own brain & physiology gets too distracted). Do I really appreciate that most of this convenience is based on algorithms monitoring my life (a narrow subset that is of my life) and continuously predicting what my next move might be in order to support me? No … increasingly I take the offered algorithmic convenience for granted (and the consequences of that is another interesting discussion for another time).

In everyday life, we frequently rely on AI-driven and augmented decisions … mathematical algorithms trained on our and others’ digital footprint and behaviors … to make our lives much more convenient and possibly much safer.

The interesting question is whether people in general are consciously aware of the degree of machine intelligence or algorithmic decision-making going on all around them? Is it implicit trust or simply ignorance at play?

Do we trust AI? Is AI trustworthy? Trustworthy? Do we trust AI more than our colleagues & peers? and so forth … and what does trust really mean in the context of AI and algorithmic convenience?

Some of these questions relating to corporate decision-making have in detail been described in the context of the corporate decision makers’ sentiment towards AI in my previous blog “On the acceptance of artificial intelligence in corporate decision making – a survey”.

human trust

TRUST – HUMAN VS AI.

Imagine that you have a critical decision to make at your work. Your team (i.e., your corporate tribe) of colleague experts recommends, based on their human experience, to choose Option C as the best path forward.

Would you trust your colleagues’ judgment and recommendation?

Yes! There is a pretty high likelihood that you actually would.

More than 50% of corporate decision-makers would frequently to always trust the recommendation (or decision) based on human expert judgment. More than 36% of corporate decision-makers would trust such a recommendation in about half the time (i.e., what I call the flip coin decision-making).

Now imagine you are having a corporate AI available to support your decision-making. It also provides the recommendation for Option C. Needles maybe to say, but nevertheless let’s just say it; the AI has of course been trained on all available & relevant data and roughly tested for accuracy (i.e., in a lot more rigorous way than we test our colleagues, experts, and superiors)

Beside Humans (Us) versus AI (Them), the recommendation and decisions to be made are of course same.

Would you trust the AI’s recommendation? Would you trust it as much as you do your team of colleagues and maybe even your superior?

Less than 13% of corporate decision-makers would frequently always trust a recommendation (or decision) based on AI judgment. Ca. 25% of the decision makers would trust an AI-based decision about half the time.

Around 20% of decision-makers would never trust an AI-based decision. Less than 45% would do so only infrequently.

human vs ai - trust in decisions

Based on a total of 426 surveyed respondents of which 214 were offered Question A and 212 was offered Question B. Respondents are significantly more trusting towards decisions or recommendations made by a fellow human expert or superior than if a decision or recommendation would be made by an AI. No qualifications were provided for success or failure rate.

It is quite clear that we regard a decision or recommendation as based on AI, rather than a fellow human, with substantially less trust.

Humans don’t trust decisions made by AIs. At least when it is pointed out that a decision is AI-based. Surprisingly, given much evidence to the contrary, humans trust humans, at least the ones in our own tribe (e.g., colleagues, fellow experts, superiors, etc..).

Dietvorst and coworkers refer to this human aversion towards non-human or algorithmic-based recommendations or forecasts as algorithmic aversion. It refers to situations where human decision-makers or forecasters deliberately avoid statistical algorithms in their decision or forecasting process.

A more “modern” word for this might be AI aversion rather than algorithm aversion. However, it describes very much the same phenomena.

trust and mistrust

Okay, okay … But the above question of trust did not qualify the decision-making track record of the human versus the AI. Thus respondents could have very different ideas or expectations about the success or error rates of humans and AIs respectively.

What if the fellow human expert (or superior) as well as the AI is known to have a success rate that is better than 70%. Thus more than 7 out of 10 decisions are in retrospect deemed successful (ignoring whatever that might really mean). By the same token, it also means that the error rate is 30% or less … or that 3 (or less) out of 10 decisions are deemed unsuccessful.

human vs ai - trust in decisions w 70% success rate

Based on a total of 426 surveyed respondents of which 206 were offered Question A and 220 were offered Question B. For both Human Experts (or Superior) and AI, a decision-making success rate of 70% (i.e., 7 out of 10) should be assumed. Despite the identical success rate, respondents remain significantly more trusting towards decisions made by a fellow human expert (or superior) than if a decision would be made by an AI.

In a like-for like-decision making success rate, human experts or superiors are hugely preferred over a decision-making AI.

A bit more than 50% of the corporate decision makers would frequently or always trust a fellow human expert recommendation or decision. Less than 20% would frequently or always trust a decision made by an AI with the same success rate as the human expert.

Thus, Humans trust Humans and not so much AIs. Even if the specified decision-making success rate is identical. It should be noted that trust in a human decision or recommendation relates to fellow human experts or superiors … thus trust towards colleagues or individuals that are part of the same corporate structure.

The result of trust in the human expert or superior with a 70% success rate is quite similar to the previous result without a specified success rate

human vs ai - trust in human decisions

Based on a total of 426 surveyed respondents of which 214 were offered Question A without success rate qualification and 223 were offered Question A with a 70% success rate stipulated. As observed in this chart, and confirmed by the statistical analysis, there is no significant difference in the trust in a decision made by a human expert (or superior) whether a success rate of 70% has been stipulated or no qualification had been given.

This might indicate that our human default expectations towards a human expert or superior’s recommendation or decision are around the 70% success rate.

However, for the AI-based recommendation or decision, respondents do provide a statistically different trust picture depending on whether a success rate of 70% or not has been specified. The mean sentiment increases with almost 15% by specifying that the AI has a 70% success rate. This is also very visible from the respondent data shown in the below chart;

human vs ai - trust in ai decisions

Based on a total of 426 surveyed respondents of which 212 were offered Question B without success rate qualification and 203 were offered a Question B with a 70% success rate assumed. As observed in this chart, and confirmed by the statistical analysis, there is a substantial increase in trust of the AI-based decision where the success rate of 70% had been stipulated compared to the question where no success rate was provided.

Respondents that would never or infrequently trust an AI-based decision are almost 20% lower when considering a 70% success rate.

This might indicate that the human default perception of the quality of AI-based decisions or recommendations is far below the 70% success rate.

So do we as humans have higher expectations towards decisions, recommendations, or forecasts based on AI than the human expert equivalent?

human vs ai - expectations towards decision quality

Based on a total of 426 surveyed respondents of which 206 were offered Question A and 220 were offered Question B. No statistical difference in the expectations towards the quality of decisions where found between human expert (or superior) and that of AI-based ones.

This survey indicates that there is no apparent statistically significant difference in what quality we expect from a human expert compared to that of an AI. The average expectation towards the quality is that less than 2 out of 10 decisions could turn out wrong (or be unsuccessful). Thus, a failure rate of 20% or less. Similar to a success rate of 80% or better.

It is well known that depending on whether a question is posed or framed in a positive way or negative can greatly affect how people will decide. Even if the positive and negative formulations are mathematically identical.

An example; you are with the doctor and he recommends an operation for your very poor hearing. Your doctor has two options when he informs you of the operation’s odds of success (of course he might also choose not to provide that information altogether if not asked;-); Frame A. There is a 90% chance of success and you will be hearing normally again on the operated ear, Frame B. There is a 10% chance of failure and you will become completely deaf on the operated ear. Note that the success rate of 90% also implies an error rate of 10%. One may argue that the two are mathematically identical. In general, many more would choose to have an operation when presented with Frame A, i.e., a 90% success rate, than if confronted with Frame B, i.e., a 10% failure rate. Tversky & Kahneman identified this as the framing effect, where people react differently to a given choice depending on how such a choice is presented (i.e., success vs failure). As Kahneman & Tversky showed, the loss is felt to be more significant than the equivalent gain.

When it comes to an AI-driven decision would you trust it differently depending on whether I present you the AI’s success rate or its error rate? (i.e., the obvious answer is of course yes … but to what degree?)

ai trust - success vs failure rate

Based on a total of 426 surveyed respondents of which 233 were offered Question A (i.e., framed as decision success rate) and 193 Question B (i.e., framed as decision error rate). As expected from framing bias and prospect theory more respondents would trust the AI when presented with the AI’s success rate (i.e., better than 95%) compared to its error rate (i.e., less than 5 out of 100)

When soliciting support for AI augmentation a positive frame of its performance is (unsurprisingly) much better than the mathematically equivalent negative frame, i.e., success rate versus failure or error rate.

Human cognitive processes and biases treat losses or failures very differently from successes or gains. Even if the two frames are identical in terms of real-world impact. More on this later when we get into some cool studies on our human brain chemistry, human behavior, and Tversky & Kahneman’s wonderful prospect theory (from before we realized that oxytocin and other neuropeptides would be really cool).

HUMANS TRUST HUMANS.

Trust is the assured reliance on the character, ability, or truth of someone or something. Trust is something one gives as opposed to trustworthiness which is someone or something other being worthy of an individual or group’s trust.

The degree to which people trust each other is highly culturally determined with various degrees of penalties associated with breaking trust. Trust is also neurobiological determined and of course context dependent.

As mentioned by Paul J. Zak in his Harvard Business Review article “The Neuroscience of Trust” ; “Compared to people in low-trust companies, people in high-trust companies report: 74% less stress, 107% more energy at work, 50% higher productivity, 13% fewer sick days, 76% more engagement, 29% more satisfaction with their lives, 40% less burnout” … Trust is clearly important for corporate growth and the individuals’ wellbeing in a corporate setting (and I suspect anywhere really). Much of this is described mathematically (and I would argue beautifully) in Paul Zak’s seminal paper “Trust & Growth” relating differences in the degree of trust as it relates to different social, legal, and economic environments.

People trust people. It is also quite clear from numerous studies that people don’t trust that many non-people (e.g., things or non-biological agents such as mathematical algorithms or AI-based),.. okay okay you might say … but why?

While 42 is in general a good answer … here the answer is slightly simpler … Oxytocin (not to be confused with an oxymoron). Okay okay … what is that Oxytocin and what do they have to do with trusting or not trusting AI (that is the answer). Well … if you have read Robert Sapolsky’s brilliant account of our behavior at our best and worst (i.e., “Behave: the biology of humans at our worst and best” by Robert Sapolsky) you might know enough (and even more about those nasty glucocorticoids. And if you hadn’t had enough of those please do read “Why Zebras don’t get ulcers” also by Sapolsky, you might even be able to spell it in the end).

Oxytocin is our friend when it comes to warm and cozy feelings towards each other (apart from fairly being essential for inducing labor and lactation). Particularly when “each other” is part of our Team, our Partner, our kids, and even our Dog. It is a hormone of the peptide type (i.e., it is relatively small and consists of amino acids) and is used by neurons to communicate with each other. They pretty much influence how signals are processed by our brain and how our body reacts to external stimuli.

The higher the level of oxytocin, the more you are primed to trust your team, your stock broker, your partner (and your dog), feeling closer to your wife and your newly born babies. The more you hug, kiss, and shake hands, have sex, and walk your dog, the more Oxytocin will be rushing through your body and the more trusting you will become towards your social circles. “Usness” is great for oxytocin release (as well as a couple of other neuropeptides with a crack for making us feel better with one another … within the confines of “Usness” … oh yeah and we have some serious gender biases there as well). Particularly when “Them” are around. Social interactions are important for the oxytocin kick.

The extra bonus effect of increased oxytocin is that it appears to dampen the brain’s “freaking out” center’s (i.e., amygdala) reactivity to possible threats (real or otherwise). At least within the context of “Usness” and non-existential threats.

HUMANS DON’T TRUST AI (as much as Humans).

Oxytocin (i.e., changes in level) appears mainly to be stimulated or triggered by interaction with other humans (& dogs). When the human (or dog) interaction is taken out of the interaction “game”, for example, replaced by an electronic or mechanical interface (e.g., computer interface, bot interaction, machine, etc..) , trust is not enhanced by oxytocin levels. This has been well summarized by Mauricio Delgado in his “To trust or not to trust: ask oxytocin” Scientific American, as well as in the groundbreaking work of Paul J. Zak and co-workers (see “Oxytocin increases trust in Humans” from Nature, 2005) and likewise impressive work of Thomas Baumgartner et al. (“Oxytocin shapes the neural circuitry of trust and trust adaptations in humans” from Neuron, 2008).

Thomas Baumgartner and coworkers (similar setup to other works in this field) administrated either a placebo or oxytocin intranasal spray to test subjects prior to the experimental games. Two types of games were played; (a) so-called trust game with human partner interactions (i.e., human-human game) where the test subject invest an amount of money to a 3rd party (e.g., stock broker) that will invest the money and return the reward and (b) a so-called risk game of which the outcome would be machine determined by a random generator (i.e., human-machine game). The games are played over 12 rounds with result feedback to the test subject, allowing for a change in trust in the subsequent round (i.e., the player can reduce the invested money (less trust), increase (higher trust) or keep it constant (keep trust level)). Baumgartner et al found that test subjects playing the trust game (human-human game), and who received the oxytocin “sniff”, remained trusting in throughout rounds of the game, even when they had no rational (economical) reason to remain trusting. The oxytocin subjects trust behavior was found to be substantially higher compared to test subjects playing the same game having received the placebo. In the risk game (human-machine) no substantial differences were observed between oxytocin and placebo subjects which in both cases kept their trust level almost constant. While the experiments conducted are fascinating and possibly elucidating towards the effects of oxytocin and social interactions, I cannot help being somewhat uncertain whether the framing of Trust vs Risk and the subtle game structure differences (i.e., trusting human experts that supposedly know what he is doing vs lottery a game of chance) could skew the results. Thus, rather than telling us whether humans trust humans more than machines or algorithms (particularly the random generator kind of which trust is somewhat of an oxymoron), it tells us more about how elevated levels of oxytocin make a human-less sensitive to mistrust or angst for a fellow human being (that might take advantage of that trust).

It would have been a much more interesting game (imo) of both had been called a Trust Game (or Risk Game for that matter as this is obviously what it is). One game with a third party investing in the test subjects’ transfer. Thus similar to Baumgartner’s Trust Game setup. And another game where the third party is an algorithmic “stock broker” with at least the same success rate as the first game’s 3rd party human. This would have avoided the framing bias (trust vs risk) and the structural differences in the game.

Unfortunately, we are not that much closer to a great explanation for why humans appear to trust humans more than algorithms. Still pretty much guessing.

And no I did not hand out cute oxytocin (and of course placebo) nasal spays to the surveyed respondents. Neither did I check for whether respondents had been doing a lot of hugging or other close-quarter social network activities which would have boosted the oxytocin levels. This will be for a follow-up study.

intranasal oxytocin sprays

In Baumgartner’s experiment, subjects got 3 puffs of Oxytocin or Placebo per nostril for each of 4 IUs (i.e., 24 IUs or ml). Note: the bottle above is just a random sample of a nostril oxytocin spay.

A guess towards a possible explanation for humans being statistically significantly less trusting towards algorithms (algorithmic aversion), AI (AI aversion), and autonomous electronic-mechanistic interfaces in general, might be that our brains have not been primed to regard such as part of “Usness”. In other words, there is a very big difference between trusting colleagues or peers (even if some are superiors) who are part of your corporate “tribe” (e.g., team, unit, group, etc…) compared to an alien entity such as an AI or an algorithm could easily be construed.

So the reasons why humans trust humans and algorithms and AI is still somewhat reclusive although the signals are possibly there.

Based on many everyday machine learning or algorithmic applications leapfrogging our level of convenience already today … Maybe part of the “secret” is to make AI-based services and augmentation part of the everyday.

The human lack of trust in AI, or the prevalence of algorithms aversion in general as described in several articles by Berkeley Dietvorst, in a corporate sense and setting is nevertheless a very big challenge for any ideas of a mathematical corporation where mathematical algorithms are permeating all data-driven decision processes.

GOOD & RELATED READS.

ACKNOWLEDGEMENT.

I greatly acknowledge my wife Eva Varadi for her support, patience, and understanding during the creative process of creating this Blog. Without her support, I really would not be able to do this or it would take long past my expiration date to finish.

SURVEYS.

Unless otherwise specified the results presented here comes from a recent surveymonkey.com survey that was conducted between November 11th, 2017 and November 21st 2017. The Survey took on average 2 minutes and 35 seconds to complete.

The data contains 2 main survey collector groups;

  1. Survey Monkey paid collector group run between November 11th and 14th 2017 with 352 completed responses from USA. Approximately 45% were Females and 55 males in the surveyed sample with an age distribution between 18 and 75 years of age. The average age is 48.8. The specified minimum income level was set to $75 thousand or about 27% higher than the median US real household income level in 2016. The average household income level in this survey is approx. 125 thousand annually. Ca. 90% or 316 out of the 352 respondents have heard of Artificial Intelligence (AI) previously. For AI-relevant questions, only 316 were used. A surveyed respondent that had not previously heard of AI (36 out of 252) was not considered. More than 70% of the respondents had a 4-year college or graduate-level degree. About 70% of the respondents were married and 28% had children under the age of 18. Moreover, ca. 14% currently had no employment.
  2. Social Media (e.g., Facebook, LinkedIn, Twitter, …) collector group run between November 11th and 21st 2017, and completed in total of 115 responses primarily from the telecom & media industry mainly from Europe. Gender distribution comprised around 38% Female and 62% Male. The average age for this sample is 41.2. No income data is available for this group. About 96% (110) have heard of Artificial Intelligence. For AI-related questions, only respondents that have confirmed they have heard about AI have been considered. Ca. 77% of the respondents have a 4-year college or graduate-level degree. 55% of the surveyed sample are married and a bit more than 50% of this surveyed group have children under 18. Less than 2% of the respondents were currently not employed.

It should be emphasized that SurveyMonkey was a paid survey with 2.35 euros per response, totaling 1,045 euros for 350 responses. Each respondent completed 18 questions. Age balancing was chosen to be basic and the gender balancing census.

On the Acceptance of Artificial Intelligence in Corporate Decision Making – A Survey.

SUMMARY OF SURVEY FINDINGS.

Approximately 658 corporate decision-makers have been surveyed for their confidence in their own decision-making skills and their acceptance of Artificial Intelligence (A.I.) in general as well as in augmenting (or replacing) their decision-making. Furthermore, the survey reveals the general perception of the corporate data-driven environment available to decision-makers, e.g., the structure and perceived quality of available data.

A comprehensive overview and analysis of our AI sentiments, as it relates to corporate decision-making, is provided as a function of Gender, Age, Job-level, Work area, and Education.

Some of the findings of the survey;

  • We believe that we are all better corporate decision-makers than our peers.
  • There is a substantial gender difference in self-confidence ( or more likely over-confidence) as it relates to corporate decision-making.
  • The higher a given individual’s corporate position is, the higher the confidence and perceived quality of that individual’s “gut feelings” compared to peers.
  • On average corporate decision-makers are comfortable with A.I.
  • The higher the educational level of people the more positive is A.I. viewed.
  • Women seem (on average) to be slightly more reserved towards A.I. than men.
  • Significantly more women than men have stronger reservations against A.I..
  • More than 70% of corporate decision-makers believe that A.I. will be important in 50% or more of a company’s decisions.
  • In general corporate decision-makers don’t trust decisions based on A.I. consultation.
  • Owners, Executives, and C-level decision makers are substantially less trusting towards decisions made in consultation with an A.I..
  • More decision-makers would follow A.I. advice than trust decisions based on A.I. consultation.
  • More than 80% of decision makers would abandon an A.I.-based recommendation if disputed by a fellow human.   
  • Corporate Decision makers in general do not fear losing their decision-making influence to A.I.s.

INTRODUCTION.

You don’t need to make an effort to find articles, blogs, social media postings, books, and insights in general on how Artificial Intelligence (hereafter abbreviated A.I.) will provide wonders for all human beings, society and leapfrog corporate efficiencies and shareholder values for the ones adapting to A.I. (of which you would be pretty silly not too of course).

Somehow I cannot help but wonder whether there might be a tiny paradox here? or at the very least a bit of societal challenges. By the way, challenges that in my opinion are largely ignored by policymakers and public institutions.

The challenge! How is it possible to both improve vastly the life of people and at the same time aggressively leapfrog corporate value in terms of productivity via very aggressive intelligence automation?

The question, with many different answers, is how much of work and workforce will A.I. fundamentally change and/or ultimately replace. What type of work will be impacted by A.I. and how will it shape the development of existing corporations and organizations. This issue is addressed in the 2013 paper “The future of employment: how susceptible are jobs to computerization” by Frey and Osborne. They estimated that 47% of the total number of jobs in the USA (i.e., ca. 75 Million out of 160 Million) are exposed to high risk of being automated by intelligent algorithms and A.I. over the next decade or two. I suspect that given this analysis was done 4 – 5 years ago that these numbers have only grown. Frey and Osborne also clearly point out that many decision-making processes are prone to be significantly augmented by A.I. or outright being taken over (e.g., legal, health/diagnostics, fraud detection, etc..). Contrary to the past industrial revolutions that replaced menial and physical labor with machine labor. While in some parts of the world today (e.g., China) human factory workers are being massively replaced by robots and intelligent automation in general (e.g., reporting a quantum leap in productivity), this time around also highly specialized and cognitive intensive jobs, requiring college or graduate-level degrees, are at risk to be replaced by A.I..

It is wise to keep the Friedman doctrine in mind stating that “The social responsibility of Business is to increase its profits (i.e., Milton Friedman, New York Times Magazine, 1973 … Milton was not a great believer in corporate social responsibility in a millennia sense I guess;-). In other words, a corporation’s only goal is to increase its profits within the rules of law. Following this doctrine, it might be compelling to pursue aggressive A.I.-driven automation leading to workforce reduction and ultimate replacement (e.g., China manufacturing).

Obviously, today through taxes (in general) and salaries, it is possible to maintain a degree of social responsibility. Albeit indirectly by individuals working for corporations or businesses. In case of the zero-human-touch corporations resulting from a structural replacement of human labor by A.I.s, the indirect path to social responsibility might disappear. Assuming such a corporate strategy really would optimize profit sustainability over time rather than cost internal structures. I suspect that one of the bigger challenges to society will be that it is very possible on a local level to hugely maximize profit by zero-human-touch corporations, e.g., China manufacturing aggressively pursuing automation. Profit maximization can be maintained, as long as goods or services are sold somewhere else with a stable socio-economical fabric (i.e., geographical arbitrage) or there exists a group for people on a local level not impacted by loss of work and income. Obviously, if your workers are an integral part of your business model, massively laying them off might not be the best idea for profit maximization (i.e., who cares that you have slashed 50% of your cost if nobody can afford to buy your product or service because you put them out of work).

The intelligent machine age will see the remaining part of factory workers being replaced by A.I.. Also, many tasks requiring a high degree of cognitive involvement, as well as higher education, will be augmented and eventually replaced by A.I..

Having a graduate degree might soon no longer be a guarantee for job security.

CORPORATE DECISION MAKING – HUMAN VS A.I. OR HUMAN + A.I.?

From a corporate decision-making perspective there are two main directions to take (and a mixture in between);

  1. A.I. augment the corporate decision-makers in their decisions.
  2. A.I. takes over major parts of the corporate decision-making process.

And so … it begins …

I got really intrigued by a recent article in Harvard Business Review titled “AI May Soon Replace Even the Most Elite Consultants” by Barry Libert and Megan Beck (both consultants/advisors) making the case that A.I. could replace the role of elite consultants as they are supposedly used today. One of my favorite quotes from this thought-provoking article is Perhaps sooner than we think, CEOs could be asking, “Alexa, what is my product line profitability?” or “Which customers should I target, and how?” rather than calling on elite consultants”. I really hope that a CEO would not need an Elite Consultant for such answers … but it might be true, that corporations frequently use expensive consultants for what turns out to be silly tasks.

Obviously … the cynic in me says … The CEO could not only save the expensive elite consultant but also considerable internal elite resources; e.g., CMO, Sales Director, Marketing Managers, Pricing Specialists, Financial Controllers, etc… (just to name a few in the corporate food chain). That sounds pretty cool! Imagine the salaries and costs that could be saved here! Wow! … Though, I suspect that he (i.e., the CEO) might still need some (new) elite & likely hilarious expensive A.I. consultants instead (maybe Barry and Megan would be up for that task;-)?

Moreover, there is an inherent assumption in the assertion that most corporate decisions, or at least the important ones taking up the time of CxOs and senior management, are coming with a high-quality voluminous amount of data that would naturally lend it to a data-driven algorithmic augment decision process. In my opinion, this is far from reality. Many decisions that corporate decision-makers are bound to make will be based on tiny to small amounts of often uncertain or highly outdated data. Thus lending itself poorly to the typical arsenal of data-driven decision-making and big-data-based algorithmic approaches. The assumptions made, backing up corporate decisions, will be based on “gut feelings” (backed up by Excel and nice Powerpoints), theory of (corporate) mind (no), and largely directional rather than hard science. There is obviously nothing that hinders decision-makers from applying the same approaches we would apply to ideal data-driven analysis and decision-making, as long as the decision-maker understands the limitations, risk, and uncertainty that such an approach would bring in the context of a given decision. Particularly when the underlying data is tiny to small and inherently of poor quality (e.g., because of age, uncertainty, apples and bananas, out-of-context, …).

Bridgewater Associates LP, the largest hedge-fund company in the world with ca. 1,700 employees and 150 Billion US$ of assets under management, is currently working on automating most of the firm’s management. It would appear that one of the most important roles of the current workforce in Bridgewater is to provide the training ground for a corporate A.I. that can take over the management of that workforce. This vision and strategy is the brainchild of Ray Dalio, the founder, chairman, and co-CIO, of Bridgewater Associates.

So What’s the Future? (or WTF? for short to gently borrow the term from O’Reilly’s wonderful book “WTF?: What’s the Future and Why It’s Up to Us”) … Is this the future?CEO_and_his_AI_2

WFT!

What about replacing the CEO with an AI? … run everything as a DAO or distributed autonomous organization based on smart contracts and orchestrated by a Chief Artificial Officer (CAO). Sounds even more like science fiction (or a horror movie pending on tastes) … but Jack Ma, the founder of Alibaba, has speculated that within the next 30 years, the Time Magazine cover for the best CEO of the year will likely be an A.I. or CAO.

So maybe the future looks more like this;

empty_office_w_ai.jpg

WTF?

Will Intelligent Algorithms make CEOs irrelevant in the not-too-distant future?

Will the CEO be replaced by the CAO? … WTF! … Well, time will show!

With the adaptation of intelligent algorithms and corporate-wide pursuit of aggressive automation, how will an A.I.-augmented organization look like? Josh Sullivan and Angela Zutavern in their wonderful book, particularly for a person having a degree in mathematics and physics, “The mathematical corporation: where machine intelligence and human ingenuity achieve the Impossible”  provides a vision of how this next-generation corporation might look like … The Mathematical Corporation … it is a place where algorithmic augmented decision-making is intimately integrated into the corporate decision-making process … I am not 100% (maybe not even 20%) convinced that the term “Mathematical Corporation” will find wide jubilance (with the possible exception of STEM folks … maybe) … If I am wrong, I would argue that this already is on the way to achieving the impossible.

The world of the Mathematical Corporation is a world where the human decision making is augmented, as opposed to replaced, by mathematics or algorithm applied to huge amount of available data (that no mortal human could possible make sense off in the same comprehensive way as a mathematical algorithm) … It is a positive world for Homo sapiens sapiens, or at least for the ones who are able to adapt and become Homo Sapiens Digitalis Intelligere … Sullivan and Zutavern states: The supercharged human ingenuity you will wield in the real world will stem from the thought-like operation of machines in the digital one.” (emphasis my own) … and then the caveat … “Only leaders who learn to assemble the pieces and tap their potential will realize the benefits of this marriage, however” (emphasis my own).  Sooo … the future is bright for the ones who are able (and willing) to become the New Human augmented by Digital Intelligence … For the rest … please read Charles Darwin and pray for universal basic income.

Again (and again) we meet the inherent assumption that most corporate decisions can be fitted into an ideal data-driven algorithmic process lending themselves “easily” to A.I.. This does not fit the reality of many corporate decisions including many important and critical ones. For the applied machine learning practitioners, with “dirt” on their hands (and up their elbows), in practical terms know that there is nothing easy about getting data prepared for machine learning … its hard work with no instant success formula. It is a largely iterative and manual (labor-intensive) process to come to a result that is actually applicable to real-world problems.

But wait a minute … how likely is it that decision-makers will actually adapt towards a mathematical corporation? Will they actually trust and follow A.I.-based recommendations or just discard such “foolishness”?

Algorithmic aversion may turn ugly and become an A.I. allergy among workforces that stands to be replaced or “upgraded” by augmentation.

The question really is whether applying an algorithmic approach to tiny or small data amounts still provides a better basis for decisions than leaving it out completely. In other words, augmenting the decision makers’ own wetware cognitive decision process and inferences is often based on the theory of (corporate) minds.

Let us first establish that even relatively simple mathematical forecasting procedures and algorithms are providing for better decisions and insights than if based purely on human intuition and experience. In other words; algorithmic approaches, even simple ones, will augment a human-based decision (although I will also immediately say that it assumes that the algorithmic approach has been correctly implemented, its inherent uncertainty, error rate, and bias have all been correctly considered … sorry even here there is no “free lunch”).

There is a whole body of literature on the topic of algorithmic performance vs human performance and the human adaptation of more mathematical approaches to forecasting and decision-making. This work goes back to the 50s and into the 80s, with Paul Meehl research work and seminal book “Clinical versus statistical prediction: a theoretical analysis and a review of the evidence”  and through the work of Robyn M. Dawes (see Cool and Relevant Reads below) and alike.

Algorithms, even simple ones, do perform better than human beings limited to their own cognitive abilities in terms of predictions (i.e., an essential part of decision-making whether done consciously or subconsciously). This result has been confirmed many times over by the likes of Paul Meehl, Robyn Dawes, and many other researchers in the last 50 – 60 years. Importantly though, machine learning algorithms do not offer an error-free approach to decisions making. However, algorithmic approaches do offer predictions and solutions with lower, often superior, error rates (and not unimportantly … quantifiable error rates) than what would be the case of pure cognition-based decisions.

No wonder Homo sapiens sapiens have grounds to be allergic to intelligent algorithms … Most of us have problems with peers being smarter than us … although this luckily happens extremely rarely as we will see in the data of the Survey presented below (or at least if you happen to ask peoples own opinion). The challenge around algorithm aversion is addressed by Berkeley Dietvorst el. al. in a more recent 2014 paper “Algorithm Aversion: people erroneously avoid algorithms after seeing them err” (see references in the paper as well). This paper in detail addresses algorithmic aversion in experts and laypeople. People, in general, remain very resistant to adapting more mathematical approaches despite such being demonstrably less prone to error than human-based decision-making without algorithmic augmentation. This holds true for simple algorithmic approaches as well as for example explored to great length by Robyn Dawes and co-workers. As argued in the paper of Dietvorst et al “we know very little about when and why people exhibit algorithmic aversion” … However, one thing is very clear;

We, as humans, are much less forgiving when it comes to machine errors than human errors.

The standard we expect of artificial intelligence is substantially higher than what we would require from a fellow human being or co-worker.

However, it is also true that minds and cultures change often in synchronicity and that what was unthinkable a time ago can be the new normality sometime after.

And obviously, even the best algorithmic approaches or the smartest A.I. implementations will make errors. Either because we are at the limit of Bayes optimal error or due to the limitations of the training that was applied in the algorithmic learning process … Bad Robot! … That obviously is not the point. Humans make mistakes and err as well. We are prone to “tons” of various cognitive biases (as has been described so well by Kahneman & Tversky back in the 80s) and are pretty lousy at handling too much complexity.

What? … Bad at handling complexity? Well … Yes, we are! In general, the human mind appears to have a capacity limit for processing information around the 7 information chunks or pieces. Plus or Minus 2. As George Miller describes in his 1956 influential paper  “The Magical Number Seven, Plus or Minus Two Some Limits on Our Capacity for Processing Information”. Since Miller’s work back in 1956, the magic number is still around 7 (or 4 – 11), albeit we are having a more nuanced view on how to group informational chunks together to effectively increase our handling of complex problems. Isn’t this just of academic interest … Liraz Margalit, Head of Behavioral Research at Clicktale,  back in July wrote a wonderful blog, backed up by experimental evidence, on how choices can become overwhelming and why it makes sense for businesses to make it easier for customers to choose (see “Choices can become overwhelming, so make it easier for customers”). I wish Telco and other online retailers would follow Liraz’s advice of simplifying the options presented to the potential online customer. The complexity of presenting or recommendation can all be dealt with easily in the background by an intelligent algorithm (i.e., A.I.).

Does intelligent algorithms, or A.I., suffer from similar limitations in complexity handling or from a gazillion cognitive biases? Handling complexity … obviously not … I hope we do agree here … So what about biases introducing errors in the decision process (note: bias here not in a machine learning sense, which implies under-fitting to available data, but in the more expansive sense of the word)? Sure algorithms can be (and possibly often are to an extent) “biased” in the sense of a systematic error introduced in training the algorithm, by for example unfair sampling of a population (e.g., leaving out results of women or singling out groups of a population ignoring data from the remainder, etc..). Often algorithmic biases can be introduced un-intentionally simply by the structure of the data used for training the A.I.. Some recent accounts for A.I. biases are the ones provided by Motherboard which found that Google’s sentiment analyzer thinks being Gay is bad or that training data had been labeled (by humans) in a way that would teach the A.I. to be sexist and racist. An Example of potential A.I. bias: For corporate decision making it would not be too strange that past training data would reflect a dominance of male decisions. It is a scientifically well-established fact that men more frequently make decisions even if a decision would be counterproductive or irrational (in terms of risk and value). Men are prone to a higher degree of over-confidence in their decision-making which results in higher losses (or less gains) over time compared to women. Thus, using training data dominantly representing male corporate decisions might to a degree naturally bias the A.I. algorithm towards a similar male-dominated decision logic. Unless great care is taken in de-biasing data, which might mean much less available for training, or using synthesized data of idealized rational decision logic (i.e., much easier said than done). Furthermore, given humans are very good at post-rationalizing bad decisions, the danger might anyway be that available data labeled by human decision-makers might not be entirely free of bias itself irrespectively.

Human biases are often acquired within a cultural context and by the underlying neurological workings of our brains. So overall there are good reasons why mathematical algorithms outperform, or at the very least match, in most situations the human decision maker or forecaster. For a wonderful account of the neurobiology of human behavior do find time to read Robert Sapolsky’s “Behave: the biology of Humans at our best and worst” which provides a comprehensive (and entertaining) account of some of the fundamental reasons why we humans behave as we do and probably can do very little about it (btw. I recommend the audible version as well which is brilliantly read by Michael Goldstrom).

Another reason for a potential A.I. aversion among decision makers (if being faster, better, and more accurate should not be enough reason) is the argument that we don’t understand what is going on inside the applied machine learning algorithms. For a majority of decision-makers, not having had exposure to reasonably advanced mathematics or simply don’t care much about that discipline, even simpler algorithms might be difficult to understand. And it quickly gets much more complex from there (e.g., deep learning field). It might not really matter much that some of the world’s top A.I. experts argue that understanding does not matter and it is okay to use intelligent algorithms in a black-box sense.

The cynic in me (mea culpa) would argue that most decision-makers don’t understand their own brain very well (or might not even be consciously aware of its role in decision-making 😉 and that certainly doesn’t prevent them from making many decisions. In this sense, the brain is a black box. So A.I. performance and its capability of handling large and complex data volumes should be a pretty good reason for not worrying too much about understanding the process of A.I. reasoning.

Why? Because my A.I. says so! (not entirely comfortable with that either I guess).

In summary, why are humans might be prone to A.I. allergy or algorithmic aversion, apart from we don’t like ‘smart-asses’;

  • A.I. is much better at handling large-scale complexity than humans (i.e., the human limit seems to be somewhere between 4 – 11 chunks of information).
  • A.I. is likely to be substantially less biased compared to the plethora of human cognitive and societal biases.
  • A.I. would take the fun part out of decision-making (e.g., risk-taking and the anticipatory reward).
  • A.I. is a threat to our jobs (whether a perceived or real threat does not really matter).
  • Humans do not like (get very uncomfortable with) what they do not understand (at least if they are conscious about it, e.g., our brains are usually not a big issue for us).

doubt

It is clear that with the trend of increasing computer and storage power at increasingly lower cost, married with highly affordable ubiquitous broadband coverage (i.e., fixed and mobile), twice married with an insane amount of data readily available in the digital domain, algorithmic approaches providing increased convenience and augmentation of every day civil as well as corporate life becomes highly attractive.

The development of A.I. performance is likely going to increase in a super-linear fashion following improvements in computer and storage performance. The wet biological brain of homo sapiens sapiens is not so much (obviously).

It is no longer unthinkable, nor too far out in the future, that blockchain-enabled decentralized autonomous organization technologies (i.e., DAOs) combined with a very high degree of A.I.-driven automation could result in almost zero-human-touch corporations. Matthew Mather has described a possible darker future based on such principles in his super exciting novel “Darknet”, where an A.I.-boosted DAO conspires to become a world-dominating business with presidential aspirations (there might be some upside to that scenario compared to today’s political reality around the world … hmmm).

So where does all this leave us … Homo Sapiens Sapiens?

How will algorithms and complex mathematics change corporate decision makings that today is exclusively done with the help of a beautiful complex biological machine … the human brain.

Might there be a corporate advantage of augmenting or maybe eventually replacing the emotional neurobiological homo sapiens sapiens brain, with an A.I.-driven digital brain?

Assuming we will have a choice … will we, as humans, accept being augmented by Digital Rainmen? … Will the CxOs and upper management stop think and exclusively make use of the Digital Intelligence, the A.I., available to them in the near- to medium-term future? (note: near and medium could still be far away in some A.I. gurus’ opinions).

Lots of questions! Time to try to get some answers!

To gauge corporate managers’ perception of their own wet brain decision-making capability, their decision-making corporate environment, and their opinion of having their decision-making process augmented by A.I., I designed a 3 – 4 minute survey with SurveyMonkey.com.

survey.jpg

THE SURVEY.

The survey consists of 24 questions and takes on average a little less than 4 minutes to complete. The questions are structured around 4 main themes;

  • General information about you (e.g., gender, age, job level, education level).
  • Your corporate decision-making skills.
  • The quality of data used in your decision-making process.
  • Your acceptance of A.I. as it related to your corporate decision-making processes.

Over the cause of the data presented here, I have collected 658 responses over 3 groups of Collectors streams open collection data from various sources.

  • Collector Group 1 (CG1): SurveyMonkey Audience Response option. SurveyMonkey in this case gathered responses from 354 respondents in the United States between 18 – 100+ years of age and with an income above 75 thousand US$. Age balancing was basic and Gender balancing was based on the current census. The data was collected between September 3rd and September 6th 2017. This is a paid service with a cost of approximately 1,040 euros or 3 euros per respondent. From a statistics perspective, this is the most diverse or least biased (apart from being from the USA) response data used in this work. When I talk about the Reference Group (RG), this would be the group I refer to.
  • Collector Group 2 (CG2): My own social media connections are from LinkedIn, Facebook, and Twitter. This is approximately 113 responses. This sample is largely gender skewed towards males (62 males vs 31 females). Furthermore, a majority of responses here have a background in the telecommunications and media industry. Most of this sample consists of respondents with a graduate-level degree (77) or a 4-year college degree (19).
  • Collector Group 3 (CG3): This group consists of 191 responses primarily from the European telecom industry (but does not overlap with CG2). Again this response sample is largely biased towards males (156 responses) with a 4-year college degree or graduate-level degree (128 responses).

The data will be made available on GitHub allowing others to reproduce the conclusions made here as well as provide other insights than addressed in this blog.

For this Blog, I will focus on the survey results across the above 3 Collector groups and will not discuss the individual groups with the only exception of SurveyMonkey’s own Collector, i.e., the Reference Group. Irrespective, the 3 Group responses are statistically largely similar (i.e., within the 95 percentile) in their response distributions with very few exceptions.

Out of the Reference Group, 50 respondents identified themselves as Retired. These responses have not been considered in the analysis. In the SurveyMonkey audience response (i.e., the Reference Group), 95 respondents did not match the provided current job level options and choose the Other category with an alternative specification.

Thus 608 responses to the Survey are left for further analysis.

GENDER DISTRIBUTION

After filtering out retirees, we are left with 608 respondents in the Survey. The Reference Group has a reasonably balanced gender mix of 46% female and 54% male. The other Collector groups CG2 and CG3 are much more skewed towards males (e.g., 27% and 18% female mix respectively). The reason for this bias is the substantially lower representation of women in technology-heavy units of the telecom & media industry which is represented in CG2 and CG3.

However, both in its totality as well as in the separate Collector Groups are there sufficient Gender-based data to make statistically valid inferences within a level of 95% confidence.

Chart_Q1

This Survey’s gender distribution. Note that the nnn, mm% (quantity comma separated percentage share, e.g., 402, 66%) on the bars provides the frequency (i.e., the quantity of a particular type, e.g., 402 Males) and the relative amount in percentage (e.g., 66% Males).

JOB LEVEL DISTRIBUTION

The purpose of this Survey was to try to capture corporate management decision-making. This is very nicely reflected in the job-level distribution of the participating respondents.

At least 335 (or 55%) of the respondents are in Middle Management or higher. In total 80 (or 13%) characterize their current job level as Owner, Executive, or C-Level management.

Chart_Q4

This Survey’s job-level distribution. The nnn, mm% (quantity comma separated percentage share) on the bars provides the frequency (i.e., the quantity of a particular type, e.g., 80) and the relative amount in percentage (e.g., 13%).

The absolute numbers per job-level category above do allow us to statistically analyze the possible differences in corporate decision-making perception, sensitivity towards A.I. in general and A.I.-driven augmentation in particular between the different management categories sampled here.

In this question of job level, women are under-represented (compared to their overall share of respondents, i.e., 34%) in the senior and middle management categories with 25% and 30%. This bias is also present in the Reference Group with women also being under-represented in the “Owner/Executive/C-level” category.

Does a C-level leader perceive A.I.-augmented decision-making differently than a senior manager? and what about those two categories compared to middle management?

EDUCATIONAL DISTRIBUTION

The educational level of the respondent to this survey is very high. More than 70% of the respondents have a 4-year college degree of higher. 47% have a graduate-level degree. This might be important to consider when we get deeper into opinions of decision-making and A.I. sentiment.

Chart_Q3

This Survey’s highest-level education distribution. The nnn, mm% (quantity comma separated percentage) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

The absolute response numbers for “Primary school” (3) and “Some high school, but no diploma” (17) are not sufficiently high to carry statistical significance in comparative analysis. Those distributions are on an individual level not considered for any conclusions or comparative inferences.

AGE DISTRIBUTION

The average age of this survey’s respondents is approximately 45 years of age. The age distribution between males and females is very similar. It is clear that the sample has a definite age bias. This is reflected across all the Collector Groups including the Reference Group, where the average age is closer to 47 after the Retired Group has been filtered out.

Chart_Q2

This Survey’s age distribution. The nnn, mm% (quantity comma separated percentage) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

Note that the absolute response numbers for age groups “17 or younger” (2) and “18 – 20” (4) are not sufficiently high to carry statistical significance in comparative analysis.

The cynic might question why it is so relevant to understand the opinion and sentiment towards A.I. in a sample which such a relatively high age.

Over the next 10 years, it is likely that many of those in the group below 55 will either remain in their management functions or have been promoted to senior management or executive/C-level. Even the Mark Zuckerbergs of today do age (i.e., Mark Z will in 10 years’ time be 43 and Yann LeCun 67 and I just had age-selective amnesia …). Thus their decision-making skills would still be largely in use over a period where A.I. is likely to become an increasingly important tool in the corporate decision-making process. Augmenting and in many areas replacing the human decision-maker.

Jump

WE ARE ALL BETTER CORPORATE DECISION-MAKERS THAN OUR PEERS.

It is a well-established “fact” that we humans are all less risky and more skillful drivers than our fellow drivers. This was systematically confirmed in the seminal paper by Ola Svenson back in 1981. Well at least we as individuals pretty much all believe so (allegedly) … I certainly do too, so others must be wrong! ;-). In the study by Svenson, 88% of US participants in the research believed themselves to be safer than the median (i.e., frequency distribution midpoint or 50% of quantities falls below and 50% above). Talk about self-confidence or maybe more accurately over-confidence.

So to paraphrase Ola Svenson’s statement into a question relevant to corporate decision-making… Are we as corporate decision-makers better at making less risky and much better decisions than our peers?

Chart_Q7_1

The nnn, mm% (quantity comma separated percentage, e.g., 329, 54%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

And the answer is overwhelming … YES! (even if it of course makes little statistical or reality sense).

We are as corporate decision-makers all (or almost all) better than our peers. At least that is our perception.

Only! 3% (THREE PERCENT) ranked their decision-making skills below average. 54% above average. If you impose a normal distribution it would even be reasonably fair to state that ca. 75% of respondents assess their corporate decision-making skills to be better than their peers (or above the median in a statistical sense).

Chart_Q7_2

It is interesting, although not surprising or novel, that male self-confidence in general is higher than that of female respondents. Of course, self-confidence is a very nice (too nice maybe) word for over-confidence in one self’s own ability to make good or better decisions.

Statistically, only for CG1 (i.e., the MonkeySurvey audience response) is the overall response distribution for females statistically significantly lower (at 95% confidence) compared to that of male respondents. In other words, females are to a lesser degree than males over-confident in terms of their own decision-making skills.

There are several perspectives on gender differences as it relates to confidence (call it self-confidence or over-confidence). We have the classical work by Maccoby and Jacklin (“The Psychology of Sex Differences” from 1974) which take is in my opinion somewhat a pessimistic outlook or outdated since the time of their exhaustive research work: “Lack of self-confidence in achievement-related tasks is, indeed, a rather general feminine trait. The problem may lie, at least in part, in the tendency for women to perceive themselves as having less control over their own fates than do men”. Sarah Burnett (contemporary to Maccoby & Jacklin) in her beautiful account (yeah I like it better;-) for gender differences and self-confidence (from 1978) “Where Angels fear to tread: An analysis of sex differences in self-confidence” concludes; “If there is a “real” sex difference in self-confidence, it could well lie in the fact that women are reluctant to forecast success for themselves in the absence of reliable supporting evidence; men, perhaps because of their wider range of experiences, their “machismo,” their penchant for risk, or whatever, seem less hesitant.”.

Finally, I want to refer to an equally interesting account for gender differences as it relates to self-confidence and implied risks. Barber and Odean’s super interesting article “Boys will be boys: Gender, Overconfidence, and common stock investments” based on stock trading behavior clearly show that men are significantly more confident than women in their ability to choose the “winning” stock. The work of Barber and Odean (as well as other works in this field) also shows that men in general incur higher losses than women investors. This has been attributed to the substantially higher degree of (and statistically significantly) male over-confidence compared to that of females.

And oh boy! … if you are very brave, let an adolescent male, or a single male, make your corporate decisions. You might be in for a really interestingly scary ride. Why? your typical adolescent person,  between ca. 15 – 25ish years of age, has an underdeveloped frontal cortex & executive system. Simply put an adolescent does not have enough rational control to put up red flags when engaging in risk-taking.  Single or unmated people have in general lower levels of oxytocin and vasopressin (i.e., neuropeptides) compared to what is found in couples. Both vasopressin and oxytocin are known to lower or moderate risk-taking and increased pro-social behavior (e.g., particularly true for males).

boy will be boy risk.jpg

Both men and women are subject to substantial over-confidence in their corporate decision-making skills.

Men show a higher degree of over-confidence, compared to women, in their corporate decision-making skills.

Women working in a male-dominated environment (e.g., engineering) are at least as over-confident in their abilities to make corporate decisions as their male peers.

Talk about a gender gap!?

gender.jpg

So which job-level group has the highest opinion about their own corporate decision-making skills? Which group has the overwhelmingly largest degree of over-confidence (or self-confidence if we want to be nice) bias across all job levels? … hmmmm …

Well, no surprise (maybe?) .. Owners, Executives, and C-level leaders outshine all other job levels by their decision-making confidence compared to their peers. Interestingly not only is the average significantly higher for Owners/Executives/C-level respondents but also their variation (“collective doubt”) was significantly lower than any other job-level group.

It is not only that “Boys will be Boys” should worry us … maybe “CEOs will be CEOs” should as well? 😉

Chart_Q7_3Also a reasonable clear trend (with the exception of Senior Management and Middle Management which are statistically similar). The lower an individual is, in the corporate hierarchy, the less expressed self-confidence that individual appear to have, e.g., Entry-level managers at 60% and Executive at 89%, a staggering difference of almost 30% in self-confidence.

The higher in the corporate hierarchy an individual is, the higher is that individual’s degree of confidence in her or his decision-making skills.

So we have established that just like individuals have confidence in their own driving skills compared to peers, the same appears to hold true for corporate decision-making skills. We are all better than our peers. But what about decisions based on that wonderful “gut feeling” or intuition … or as I have often heard it expressed: “I feel it on my water” (no further elaboration will be given).

According to Wiktionary: “gut feeling (plural gut feelings) (idiomatic) an instinct or intuition; an immediate or basic feeling or reaction without a logical rationale.” .

The characteristics of gut feelings, instinct or intuition are;

  • Arrive rapidly without deliberate rational thought.
  • Triggered by past experience and learnings.
  • A sense of confidence and trust in feelings.
  • Difficult to rationalize.
  • “Behind the scene” or a sense of the sub-conscious process.
  • etc..

Ultimately, what is going on with our ‘gut feelings’ is thought to be a result of an intimate play between the autonomous nervous system, the (ventromedial) prefrontal cortex, and the amygdala (among other actors in the limbic system). It is believed to be a manifestation of bodily feelings associated with emotions. This could be a heartbeat pulse increase, an uneasy feeling in the gut, ‘goosebumps’,  etc… It is that feeling in the body we get in case of unease, discovery, confronted with unexpectedness, and so forth. Thus there is a well-established (although maybe less well-understood) Brain – Body coupling or feedback that is responsible for those bodily feelings that signal to the brain to be on the lookout for the immediate future. This process has been well described in Antonio Damasio’s 1994 book “Descartes’ Error: Emotion, Reason and the Human Brain” (and in countless scientific publications since then). Another way of looking at gut feelings or gut instinct is Daniel Kahneman’s dual system of fast and slow thinking. The fast system in many ways is a metaphor for that gut feeling or intuition. This is often also called the effect heuristic which allows us to very rapidly make decisions or solve problems.

Depending on what emotional state the gut feelings are associated with, can greatly influence the decision-making of individuals. There are many situations where gut instinct or feelings are beneficial for the decision maker. As has been argued by Antonio Damasio and others; “emotionless decisions are not default good decisions” and of course, there is the too much of a good thing “too many feelings/emotions are also detrimental to good decisions” (e.g., people are terrible decision makers under stress). There needs to be a proper balance between the mind’s affective processes (i.e., typically residing within the limbic system) and that of the frontal cortex’s cognitive-controlled processes.

Much of the data-driven philosophy including the ideas around the mathematical corporation, is to decouple emotions and feelings from the decision-making process. After all an algorithm doesn’t have that intricate play between emotion, feeling, and “rational” reasoning a human does (e.g., it doesn’t have a limbic system). An A.I. may not be burdened by a sh*t load of cognitive biases in its decision-making process (note: it does not mean an A.I. cannot be biased, it most often will be if the data it has been subjected to are biased … which most data typically will be). So that is swell! …?… Maybe not! As Antonio Damasio has shown lack of emotions can easily paralyze or mess up decision-making (see his “Descarted’ Error: …” or study psychopaths decision-making).

So … How prevalent is decision-making based on instinct or gut feelings? (or how willing are respondents to admit that they are using feelings, instinct, or sense of direction in this super duper data-driven world of ours … or at least the aspiration of a data-driven decision-making world).

Chart_Q9_1

The nnn, mm% (quantity comma separated percentage, e.g., 22, 4%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

The above response shows that in a bit more than 50% of business decisions taken rely (to an extent) on gut feelings. I should point out that within the surveyed response data there is no clear statistical evidence of difference between different sub-groups (e.g., male vs female, job-level, education).

I refrain from passing judgment on the above-surveyed result, as I can say that I have, as a scientist, benefitted from such gut feelings or intuitive leaps in the past. I do think it is important to point out as this process remains an integral part of most human decisions, irrespective of whether our business has become increasingly data-driven (or mathematical).

Gary Klein (in “Sources of Power: how people make decisions”) estimates that in 80 plus percent of time-pressured situations decision makers rely on intuition or gut feelings rather than deliberate rational choice. Burke & Miller in their 1999 paper “Taking the mystery out of intuitive decision making” surveyed 60 experienced professionals holding significant positions in major organizations across various industries in the US. Burke and Miller’s survey results were that 12% of surveyed professionals answered that they always used intuition in their decision-making, 47% often, 30% sometimes, 7% seldom, and 3% rarely. This is not that different from the reported survey results above on the frequency of the use of gut feelings in corporate decision-making (although the scales might not be completely comparable).

So how do we assess the quality of our “gut feelings” in comparison with our peers?

Chart_Q10_1

The nnn, mm% (quantity comma separated percentage, e.g., 343, 56%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

Maybe not too surprising, as Question 10 closely resembles that of Question 7, respondents in general perceive their gut feelings as being better than their peers.

The maybe interesting observation here is that the gender difference is not statistically apparent from the responses to Question 10. While there where a clear statistical difference in self-confidence (i.e., Question 7) between women and men, this is not apparent in the self-judgment of the qualities of a gut feeling in comparison to peers.

Chart_Q10_2

Parroting the decision-making skill confidence question (i.e., Question 7), the survey data on the quality of one’s own “gut feelings” do indicate a dependency of role in the corporate hierarchy. The higher the corporate position the higher is the “gut feelings” quality perceived in comparison with peers.

When it comes to self-assessment of an individual’s “gut feelings” quality compared to peers there is no apparent gender difference.

The higher a given individual’s corporate position is, the higher is the confidence in or perceived quality of the individual’s “gut feelings” compared to peers.

Finally, do corporate decision-makers like to make decisions?

Chart_Q6

The nnn, mm% (quantity comma separated percentage, e.g., 222, 37%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

Overwhelmingly, respondents do like or enjoy making (corporate) decisions. I should point out that the question posed here might be leading to answers towards the positive end of decision-making sentiment. In retrospect, Question 6 could have been asked in a more neutral fashion (e.g., “How do you feel about making decisions relevant to your company” or alike).

Why is it relevant to understand individuals’ self-confidence in own and sentiments toward decision-making?

First of all, it might reveal an uncomfortable degree of over-confidence in corporate decision-making that more algorithmic approaches could address. It might point towards a substantial degree of bias in the corporate decision-making process ignoring in practice relevant available data. Again A.I. methodologies might provide for a more balanced decision-making process by neutralizing some of that individualized bias that typically overweight corporate decisions. On a very basic level, it might further provide some realistic expectations for general adaptations of algorithmic approaches to data-driven decision-making. Successful A.I. policy and strategy certainly would stand or fall with individual decision makers’ perception of value to them as individuals as well as the corporation they are paid to manage and lead.

woman_decision_making.jpg

CORPORATE DATA-DRIVEN DECISION MAKING.

The newish buzz of the corporation (unless you are with Amazon, Facebook or Google it is a pretty old buzz) … data-driven decision making, algorithmic augmentations to data analysis and resulting decision making, the move towards the so-called mathematical organization are resulting in expressed (or unexpressed) strategies (but often very little or poor policies) that permeates medium and large corporations today (and pretty much non-existing for small ones).

The impression we “corporate peasants” are often given by (some) A.I. Gurus (and usually affiliated with Management Consulting or from firms light-years ahead of the pack) is that in the near-future algorithmic approaches will be able to substantially augment and in many instances replace decision-making processes and makers. That all should be data-driven and that data-driven decision-making is the holy grail. The A.I. Gurus are often acting as the new Latin speakers of the Age of Enlightenment (for the ones enjoying the satirical plays of the Enlightenment have a look at Ludwig Holberg’s “Erasmus Montanus” written in 1722).

The fact is that many corporate decisions, even important ones, are not or cannot be based on huge amounts of relevant data. Often data is not available or simply not relevant or outdated. Applying algorithmic approaches or machine learning approaches might be highly in-efficient and lead to a false sense of comfort that more human-driven decisions may not suffer from (although there is likely a whole host of other biases playing a role irrespective).

Human decision-makers make mistakes (males more than females). The more decisions the more mistakes. Such mistakes can be costly. Even catastrophically so. Often the root cause is that the human decision-maker is over-confident (possibly to the extreme as we have seen above) in his or her ability to make good decisions considering the associated risks and uncertainties.

Generalization from small or tiny data is something the human brain is a master of, even when the brain demonstrably / probabilistically has no basis for such generalizations.

When confronted with large and huge amounts of often complex data, cognitively the human brain simply cannot cope and will dramatically simplify down to a level that it can handle.

ocean_of_data.jpg

Anyway, let’s break the data-driven decision-making down into the available data, the structure of the available data, and of course the perceived quality.

Chart_Q11

The nnn, mm% (quantity comma separated percentage, e.g., 191, 31%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

About 70% or more of the respondents frequently (ca. >70% of decisions) always use available data in their decision-making process. After all, why would a decision maker not use available data? … Well it might depend on the quality of that data! … and to be fair, from the question it is difficult to gauge with what weight available data is included in the decision process vis a vis gut feelings and other cognitive biases, e.g., over-confidence in interpretation of available data.

Most corporate decision-makers consider available data for most of their decisions.

So that’s great when data is available. What about how frequently data is actually available to be considered in the decision-making process?

Chart_Q12

The nnn, mm% (quantity comma separated percentage, e.g., 76, 13%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

So data would be available for about 57% of the decision-makers in at least 70% of their decisions. While 31% of respondents always consider available data in almost all of their decisions (i.e., Question 11), only 13% of respondents have data available for almost all the corporate decisions they need to make.

A little more than half of corporate decision-makers have data available for most of their decisions.

Ca. 30% of the surveyed respondents have data available for half their decisions. However, only 19% of the respondents consider the available data for approximately half of their decisions.

There is a relatively large disconnect between data being available for corporate decision-making and the data actually being used.

This might indicate several issues in the data-driven decision process

  • Available data is perceived as poor quality.
  • Available data is perceived as being too complex to contribute to the decision process.
  • A certain degree of immaturity in how to include data in the decision process.
  • Too high reliance on gut feelings and overconfidence bias ignoring available data.
  • etc.

Let us have a look at the perceived quality of the available data;

Chart_Q13

The nnn, mm% (quantity comma separated percentage, e.g., 316, 52%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

From the above categorization of data quality, one would expect that a little less than 40% of respondents would have a possible straightforward path to a data-driven decision of reasonable to high quality. Approximately 60% or more would either not be able to rely on an algorithmic data-driven process. Or if pursuing this venue, would need to be very careful in their interpretation of the available data, and analysis based on this data. they should be expecting a relatively high degree of uncertainty and systemic risk in their decision. Particular comparative scenarios or alternatives, often considered in corporate decisions, could be rather meaningless if the underlying data is of relatively poor quality. A data-driven or mathematical decision process will not change that.

A majority of corporate decisions rely on data that might not be very well suited for advanced data-driven algorithmic approaches.

GIGO (i.e., garbage in garbage out) is still a very valid dogma even in a data-driven decision-making process augmented by algorithms or other mathematical tools.

Chart_Q14

The nnn, mm% (quantity comma separated percentage, e.g., 248, 41%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

When it comes to important decisions, 50+% of the respondents’ corporate decisions relying on are either large-data (46%) or big-data driven (6%). The glass-half-full perspective is that for at least half of all important corporate decisions, this does bodes well. It should be possible to apply advanced algorithms or machine-learning approaches that would augment the human decision-making process. The glass-half-empty perspective is that for the other half of important decisions, we may not have such luck that the mathematical corporate philosophy could offer. The challenge obviously is how relevant mathematical approaches can be to important corporate decisions where small-, tiny- or no relevant data is available. Would the application of pre-trained data models, trained on larger but non-related data amounts, be of use. Maybe this remains a domain where “wishful” thinking models (e.g., normal business models & business cases), gut feelings, and inflated self-confidence would be the prevalent method to come to a decision.

Would it not be great if your competition had no higher quality data available for their decision-making processes than available to your business. At least if you have a level playing field, in terms of available data and the quality of such data is about the same, the rest would be up to the ingenuity of respective decision-makers including the quality of applied algorithmic processes.

Chart_Q16

The nnn, mm% (quantity comma separated percentage, e.g., 448, 74%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

Compared to respondents self-assessment of their own decision-making skills and the quality of their gut feelings (compared to peers), they appear more careful in judging the quality of their corporate competitors. 74% assess that there is little difference in the underlying quality of data available to their competitors and their own decision-making process.

Most corporate decision-makers expect business competitors’ data to be of the same quality as that available to them.

It is easy to lose sight of human opinion when discussing data-driven decision processes and decision-making. Particularly as such processes become more automated and enhanced by algorithmic or applied machine learning approaches. It might become easy to ignore human insights when you have a solid mathematical-statistical framework to process your data and provide recommendations for the best decisions based on available data. Taking the data-driven organization to the possible extreme.

How important are human insights or human opinion augmentations to data-driven insights?

Chart_Q15

The nnn, mm% (quantity comma separated percentage, e.g., 202, 33%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

The glass-is-half-full interpretation of the above result of this survey would be that more than 50% of the respondents find it important to consider human opinions beyond what comes out of their data-driven process. In other words, enrich their data-driven analysis with (alternative) human opinions.

The glass-is-half-empty interpretation is that almost 50% of the respondents only augment their decision-making process in less than 50% of their decisions with (alternative) human opinions.

Obviously, when decision-makers believe they are better at making decisions than their peers, there might not be such a great incentive for seeking alternative human opinions to what a decision-maker has already concluded to be the best way forward based on available data (or “gut instinct”).

The question is whether A.I.-augmented decision-making could be a game changer in how corporations make decisions? Will an algorithmic data-driven approach provide the framework for better and more valuable decisions than is the case today, where largely human-driven decision-making, with all its cognitive biases, rules? Will silicon-based decision-making overtake biology-based decision making and will such decisions be better?

How does the human corporate decision-maker perceive artificial intelligence? Is A.I. perceived as a threat? As an opportunity? or a bit of both?

As Tim O’Reilly might say WTF? or my grandmother WTF!

scary_ai.jpg

THOSE A.I. “SUCKERS” ARE NOT GOING TO MESS WITH MY DECISIONS!?

Firstly, the survey revealed that, not surprisingly, most of the respondents had heard about Artificial Intelligence prior to the survey. In this survey, a little more than 90% of the respondents had heard of A.I..

In the following, respondents that have not heard of A.I. have been filtered out of the analysis. This filtering is in addition to the filtering out respondents providing “Retired” as the job level. In total 39 respondents (6.4%) had not heard of A.I. at the time of the survey. This leaves a remaining sample of 569 out of the original 658 (i.e., of which 50 were retired and an additional 39 had not heard of A.I. prior to this Survey).

So … How do we feel about A.I.?

Chart_Q18

The nnn, mm% (quantity comma separated percentage, e.g., 178, 31%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

The average sentiment and standard deviation (in the parenthesis) across all respondents (i.e., who have heard of A.I.) were 2.65 (0.88). An average score would indicate a sentiment between “I am neutral” and “I am very comfortable with it”.

The survey did reveal a statistically significant gender difference in the sentiment towards A.I.. Women’s (i.e., 2.82 (0.85)) sentiment towards A.I. is more neutral than men’s (i.e., 2.57 (0.88)). This is also reflected in proportionally more women indicating more negative sentiments (i.e., “I am uncomfortable with it” or “I hate it”).

If we ignore the neutral category of 46%, which might swing to either side pending future information and experience, there are 41% of the respondents have a very positive sentiment towards A.I. (i.e., either “very comfortable” with or “love” A.I.). Only 13% of the respondents express unease (11%) or direct hate (2%) against A.I.. Also here gender difference is observed. 17% of women expressed concern about A.I. compared to 12% of men.

Many more people are positive towards A.I. than negative.

Women seem on average to be slightly more reserved towards A.I. than men. Although significantly more women than men have stronger reservations against A.I..

From a job-level perspective, “Owner/Executive/C-Level” have the 2nd most positive average attitude towards A.I. followed by “Senior Management”. However, the lowest standard variation is found for “Senior Management” which might indicate a higher degree of conformity towards A.I. than found in any other job-level categories including the “Owner/Executive/C-Level” category. What is interesting and at least for me not self-explanatory is that the “Entry level” category appears to have the most positive attitude towards A.I., a difference that is statistically significant within a 95% confidence. This aspect will be further explored in an upcoming analysis.

From an education perspective, respondents with a graduate-level degree are more positive in their attitude towards A.I. than for example 4-year college degree respondents or respondents with some college education but no degree. These findings a likewise statistically significant within a 95% confidence. Difference between other categories are apparent (e.g., mean score systematically worsen with less education) however distribution wise not statistically significant (within a 95% confidence level).

The higher the educational level of people the more positive is A.I. viewed.

Furthermore, the higher the educational level the less likely are people to have stronger reservations against A.I..

I wanted to check whether there might be any difference in a respondent’s answer depending on emphasizing that the A.I. is a decision-making optimized” A.I. (i.e., the B-variant) or just keeping the question general without the emphasis on the A.I.  having been optimized for decision-making (i.e, the A-variant).

Questions 19 to 24 are run as an A/B test.  The intention is to check whether there is a difference in a respondent’s answer based on A-variant “an A.I.” or B-variant “a decision-making optimized A.I.”. Approximately 50% of respondents got the A-variant and the remainder (i.e., ca. 50%) got the B-variant.

In the following, I will present the responses as a consolidated view of both the A- and B-variants, if there is no statistical difference between the A- and B-distributions within 95% confidence.

Imagine an A.I. would be available to your company. It might even be a decision-making optimized A.I., trained to your corporate decision-making history (btw it would be reasonably useless to you if it wasn’t trained on relevant data) as well as publicly available data on decision outcomes. It might even have some fancy econometric and psychometric logic that tests decision space for rationality and cognitive biases of proposed decisions. Such a tool will not only be able to tell you whether your proposed decision is sound but also provide you with recommendations for better decisions and how your competitors might respond.

Thus, instead of being fairly mono-dimensional in considerations around a given corporate decision, this A.I. will provide a holistic picture (or scenarios) of a given decision’s most likely impact on the future; the value of the short, medium, and long-term, competitive responses, etc..

Would that not be great …?

Chart_Q19_AB

The nnn, mm% (quantity comma separated percentage, e.g., 220, 39%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

Within 95% confidence, there is no statistical difference between the distribution of A answers and B answers. Thus I have chosen to show both together. At a deeper level, e.g., job level, age, or other responder characteristics, there are also in general no statistical differences between A-variant and B-variant distributions.

Almost 30% of the respondents believe that an A.I. would be unimportant or irrelevant to their company’s decision-making process. About the same percentage (i.e., 32%) believe that it will be very important (30%) or always used (2%). About 40% expect it to be used in approximately half of all decisions. The last part would obviously be a quantum leap in A.I. adaptation compared to today where that number is very low.

1 in 3 decision-makers to not expect A.I. to become important in their corporate decisions.

Senior Management is more optimistic about the importance of A.I. in the corporate decision-making process compared to their leadership (i.e., the “Owner / Executive / C-level” category). Although Middle Management is statistically less inclined. Again it is found that “Entry Level” respondents are more bullish towards A.I. than higher management.

More than 70% of corporate decision-makers believe that A.I. will be important in 50% or more of a company’s decisions.

Upper Management and Entry Level respondents more strongly believe in the A.I. adaptation in the corporate decision-making process.

Okay! but would you trust a decision based on an A.I. consultation? This of course could involve a human decision-maker’s decision augmented by an A.I. consultation rather than a more human-driven decision-making process.

Chart_Q20_AB

The nnn, mm% (quantity comma separated percentage, e.g., 227, 40%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

Only 3% of the respondents would always trust a decision based on A.I. consultation. 27% frequently and 40% only in about half the time that a decision was based on A.I. consultation (i.e., might as well flip a coin).

About 30% of the respondents would infrequently or never trust a decision based on A.I. consultation.

Before pondering on the job-level dependency, note that there is no statistical difference between A and B answer distributions. This also holds true in general on the deeper respondent level.

Does any one particular group have A.I. trust issues? … hmmm

Chart_Q20_AB_job-level

Clearly, the “Owner/Executive/C-level” respondent category, which is a pretty important category in a company’s decision-making process, really seems to have the greatest degree of trust issues towards A.I.. 31% of “Owner/Executive/C-level” would never or infrequently trust a decision based on A.I. consultation. For me this is a wow! and if in general true for corporations might signal some barriers towards wide adaptation of A.I.’s in companies’ decision-making processes.

However, it is also fair to note that the “Executive” category also has the largest variance in response across the trust scale used here compared to any of the other job-level categories.

As is a recurrent theme. Upper management and Entry-level managers are overwhelmingly (and significantly) trusting towards decisions based on A.I. consultation compared to their colleagues in other management categories.

Owners, Executives, and C-level decision makers are substantially (and significantly) less trusting towards decisions made in consultation with an A.I..

The current verdict seems to be that corporate decision-makers don’t really trust the “suckers” (=A.I.).

So the next one should be easy. How often would you follow a human’s advice different from your A.I.’s recommendation? (in retrospect I really should also have asked how often a respondent would follow an A.I.’s advice different from the respondent’s own opinion … but alas for an up-and-coming survey).

Chart_Q23_AB

The nnn, mm% (quantity comma separated percentage, e.g., 295, 52%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

About 50% of respondents it appears would prefer “to flip a coin” to determine whether to follow the human advice or to go ahead with the A.I. recommendation. Okay … this is of course not what is asked and maybe also not what is meant … However, if you in half of A.I. recommended decisions being disputed by a follow human the decision maker, follow either one or the other … then they might as well flip a coin.

It is maybe good to re-remind the reader that algorithmic approaches perform in general better than human-based decisions or on the downside at least as well.

ca. 30% of our decision-makers would follow human advice rather than continue with the A.I. recommendation. Less than 20% would be relatively bold and go ahead with an A.I. recommendation disputed by a fellow human.

Let’s just ask again … Does any particular job level have trust issues?

Chart_Q23_AB_job-level

20% of the “Owner/Executive/C-level” respondents would only infrequently follow a fellow Humans advice different from an A.I. recommendation. Note in Question 20 (above) 33% of the Executives (i.e., “Owner/Executive/C-level”) would trust an A.I.-based consultation with 28% frequently. This appears completely consistent as A.I. recommendations subject to Human dispute would result in a reduction of such being pursued.

Irrespective, the majority is in the “flip a coin category” which might mean that they neither trust the A.I. nor the Humans… this will be more roughly pursued in a follow-up analysis going deeper into the data available and in more refined surveys.

Assume you have an A.I. available to consult and to guide your decisions. It is an integral (or maybe not so integral?) part of your company’s data-driven decision-making process. How often would you follow such an A.I.’s advice?

Chart_Q22_AB

The nnn, mm% (quantity comma separated percentage, e.g., 262, 46%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

Remember that 40% of the respondents would trust a decision based on A.I. consultation about half the time (i.e., what the cynic might call a “coin flip strategy”). 27% would trust such a decision frequently and 24% infrequently.

Would you follow advice based on something you doubt? Well, the result of Question 22 could to an extent be interpreted in this way. 31% of respondents would frequently follow the A.I. advice which is only marginally higher than the 27% that frequently would trust a decision based on A.I. consultation (i.e., Question 20). 46% would follow the A.I. advice in about half the time they are in such a situation. Finally, 16% would follow the advice infrequently although 24% of the respondents only infrequently would trust a decision based on A.I. consultation. There is a difference between following advice and trusting it as history also teaches us I suppose.

More decision-makers would follow A.I. advice than trust decisions based on A.I. consultation.

From a job-level perspective, the response are reasonably consistent with the previous two questions addressed above,

Chart_Q22_AB_job-level.png

Consistent in the sense that irrespective of the above specific trust in decisions based on A.I. consultation, respondents would still go ahead and follow advice based on A.I..

Coming to the end of this survey, it is fair to ask the question of whether a company with an A.I. available for its corporate decision-making process would actually need the decision maker.

So … are you needed you think? … Yes, after the results of the above Questions 19 – 23 it does become a bit rhetoric …

Chart_Q24_AB

The nnn, mm% (quantity comma separated percentage, e.g., 137, 24%) on the bars provides the frequency (i.e., the quantity of a particular type) and the relative amount in percentage.

So … 65% of the decision-makers believe that their decision-making skills will remain needed by their companies. 24% expect that in about half of the time, their skills would still be needed and 10% expect it to be infrequent or never.

There is no statistically significant differences between job levels in their answer to this question.

There is a strong sense among decision makers that their decision-making skills will continue to be required by their companies irrespective of A.I.’s being available to their company’s decision-making processes.

worry_free.jpg

ACKNOWLEDGEMENT

I greatly acknowledge my wife Eva Varadi for her support, patience and understanding during the creative process of creating this Blog. Without her support, I really would not be able to do this or it would take long past my expiration date to finish.

COOL AND RELEVANT READS.

  1. Carl Benedikt Frey and Michael A. Osborne, The future of employment: how susceptible are jobs to computerization?, (2013).
  2. Barry Libert & Megan Beck, “AI May Soon Replace Even the Most Elite Consultants” , Harvard Business Review (July 2017). If an Elite consultant can be replaced by Alexa (Amazon’s A.I.) or another A.I.-bot that basically is a Wikipedia with a voice, then obviously that consultant should be replaced. But maybe, more importantly, the CxO wasting money on an Elite Consultant acting as a biological Wikipedia maybe more so (imho).
  3. Tim O’Reilly, “WTF?: What’s the Future and Why It’s Up to Us” (HarperCollins, 2017). A must-read by a great & knowledgable storyteller.
  4. Ajay Agrawal, Joshua Gans & Avi Goldfarb, “How AI Will Change the Way We Make Decisions.”, Harvard Business Review (July 2017). The devil is in the detail and not all corporate decisions would easily be taken over by A.I. (e..g, decisions that are based on tiny amounts of data). However, it really is a trade-off of how much human error/risk can you tolerate versus an A.I. error (e.g., false positives, false negatives, ..) on various types of decisions.
  5. Josh Sullivan and Angela Zutavern, “The Mathematical Corporation: Where Machine Intelligence and Human Ingenuity Achieve the Impossible.” (PublicAffairs, 2017). With my physics & math background, I somehow cannot help being intrigued by the concept. However, I also believe it is far more visionary than practical to implement. WTF … we will see.
  6. Berkeley J. Dietvorst, Joseph P. Simonojs, and Cade Massey, “Algorithm Aversion: people erroneously avoid algorithms after seeing them err.”, Journal of Experimental Psychology: General (2014). Study on the widespread Algorithm Aversion, i.e., human expectations towards machines are substantially higher than to fellow humans. This results in an irrational aversion of machine-based recommendations versus human-based recommendations. Even though algorithmic-based forecasts are on average better to much better than human-based equivalent in apples-by-apples comparisons.
  7. Robyn M. Dawes, “The robust beauty of improper linear models in decision making”, American Psychologist (July 1979) 571.
  8. Motherboard, “Copyright law makes artificial intelligence bias worse”, October 31 (2017).
  9. Amanda Levendowski, “How copyright law can fix artificial intelligence’s implicit bias problem“, Washington Law Review, forthcoming. Latest review 14 October 2017. The latest draft version can be downloaded from the URL link provided. The draft paper provides an authoritative account of the issues around biases arising from training A.I. on available datasets (in private as well as public domain). Also, some interesting ideas on how copyright might mitigate some of the A.I. bias risks we certainly see in today’s implementations.
  10. Daniel Kahneman, Paul Slovic & Amos Tversky, “Judgment under uncertainty: heuristics and biases” Cambridge University Press (1982). A book that I have read and re-read many times. Keep finding inspiration from every chapter of that book. For me it is in many ways a better one than the later “Thinking fast and slow” from Daniel Kahneman though certainly also a must-read.
  11. Robert Sapolsky’s “Behave: the biology of Humans at our best and worst” , Penguin Random House UK (2017). Robert has been my companion throughout the summer and fall. I have read his book a couple of times and have it in its Audible version as well. It is not only insanely entertaining but also very thought-provoking as it relates to our behavior and why we humans at times are so bad decision-makers.
  12. Tim Swanson, “Great Chain of Numbers”, (2014). Providing an excellent overview of what is already possible to day with smart contracts and blockchain-enabled DAOs (i.e., Distributed Autonomous Organizations) and so forth. Obviously, also shows what the future could look like.
  13. Timothy Short, “Blockchain – The Comprehensive Guido to Mastering the Hidden Economy.” (2016). Note: this doesn’t seem to be available in Kindle format any longer. A great starting point for understanding blockchain technologies.
  14. Matthew Mather, “Darknet”, (2014). A dark account for DAOs, Blockchain and A.I. conspiring and going rogue.
  15. Ola Svenson, “Are we all less risky and more skillful than our fellow drivers?”, Acta Psychologica 47 (1981), 143. Seminal paper that systematically proved over-confidence bias.
  16. Sarah A. Burnett, “Where angels fear to tread: An analysis of sex differences in self-confidence”, Rice University Studies, Vol. 64 (Winter 1978) 101.
  17. Brad M. Barber and Terrance Odean, “Boys will be boys: gender, overconfidence, and common stock investment”, Quarterly Journal of Economics (Feb 2001) 261. Cool paper that shows you should rather ask for a female stock advisor than a male. Particularly if he is single.
  18. Gary Klein, “Sources of power: how people make decisions” , MIT Press, (1998).
  19. Lisa A. Burke and Monica K. Miller, “Taking the mystery out of intuitive decision making”, Academy of Management Executive, Vol. 13 (1999) 91.
  20. Don A. Moore and Paul J. Healy, “The trouble with Overconfidence”, Carnegie Mellon University, Research Showcase.
  21. Antonio R. Damasio, “Descartes Error: Emotion, Reason and the Human Brain”, Avon Books (1994). This is a very interesting account for human emotions, reason, and decision-making and how our brain supports and messes the whole thing up. In order to appreciate Damasio’s work it is important to understand the distinction between Emotions (what a 3rd party observer can see) and Feelings (what an individual senses). I am likely at fault for occasionally mixing up the two concepts.
  22. Barneby B. Dunn, Tim Dalgleish, and Andrew D. Lawrence, “The somatic marker hypothesis: a critical evaluation”, Neuroscience and Biobehavioral Reviews (2005) 1 – 33. Antonio Damasio’s somatic marker hypothesis, from around 1991, has been (and remains) very influential as an explanation of Brain – Body coupling or feedback. Albeit the idea is not scientifically proven in all its aspects and often is prone to various interpretations. You will in this paper find a comprehensive reference list to the most important literature in this field.
  23. Manuel G. Bedia and Ezequiel Di Paolo, “Unreliable gut feelings can lead to correct decisions: the somatic marker hypothesis in non-linear decision chains”, Frontiers in Psychology (October 2012), Article 384.
  24. Elizabeth A. Phelps, Karolina M. Lempert, and Peter Sokol-Hessner, “Emotion and Decision Making: Multiple Modulatory Neural Circuits”, Annual Review Neuroscience (2014), 263. The more modern neuroscientific perspective of emotion and decision-making compared to the more classical duality between emotion and reasoning.
  25. George A. Miller, “The Magical Number Seven, Plus or Minus Two Some Limits on Our Capacity for Processing Information”, Psychological Review, Vol. 101 (1994) 343. First published back in 1956. This is really a seminal paper and a must-read for people who want to appreciate our cognitive limits in handling complexity.
  26. Alan Baddeley, “The magic number seven: still magic after all these years”, Psychological Review, Vol. 101 (1994) 353. Reflecting on the state of research since Miller’s original paper in 1956.
  27. Nelson Cowan, “The magical number 4 in short term memory: a reconsideration of mental storage capacity”, Behavioral and Brain Science, 24 (2000) 87. A great and systematic overview of storage capacity, categorization, and methods to increase such (e.g., chunking or grouping).

books.jpg

APPENDIX – THE SURVEY QUESTIONAIRE.

You are welcome to take the survey using the following link;

Perceived quality and acceptance of Human & Artificial Intelligence Augmentation in Corporate Decision Making.

and yes new responses will be collected under a separate Collector Group.

The questionnaire consist of 24 questions roughly structure as

  1. General information about you.
  2. Your corporate decision making skills.
  3. The quality of data used in your decision-making process.
  4. Your acceptance of A.I. as it related to your corporate decision-making processes.

The typical time spent on answering the 24 questions is a bit less than 4 minutes.

  • Q1 – What is your gender?
    • Female.
    • Male.
  • Q2 – What is your age?
    • 17 or younger.
    • 18 – 20.
    • 21 – 29.
    • 30 – 39.
    • 40 – 49.
    • 50 – 59.
    • 60 or older.
  • Q3 – What is the highest level of school that you have completed?
    • Primary school.
    • Some high school, but no diploma.
    • High school diploma (or GED).
    • Some college, but no degree.
    • 2-year college degree.
    • 4-year college degree.
    • Graduate-level degree.
    • None of the above.
  • Q4 – Which of the following best describes your current job level?
    • Owner/Executive/C-level.
    • Senior Management.
    • Middle Management.
    • Intermediate.
    • Entry Level.
    • Other (please specify).
  • Q5 – What department do you work in?
    • Accounting.
    • Administrative.
    • Customer Service.
    • Marketing.
    • Operations.
    • Human Resources.
    • Sales.
    • Finance.
    • Legal.
    • IT
    • Engineering.
    • Product.
    • Research & Development.
    • International.
    • Business Intelligence.
    • Manufacturing.
    • Public Relations.
    • Other.
  • Q6 – Do you enjoy making decisions relevant to your company?
    • I hate making decisions.
    • I do not enjoy making decisions.
    • I am okay with making decisions.
    • I enjoy making decisions.
    • I love making decisions.
  • Q7 – How would you characterize your decision making skills in comparison with your peers?
    • Below average.
    • Average.
    • Above average.
  • Q8 – Do you consult with others before making a decision?
    • I rarely consult others (e.g., 3 our of 10 times or lower).
    • Approximately half of my decisions have been consulted with others.
    • I frequently consult others (e.g., 7 out of 10 times or higher).
  • Q9 – Do you rely on “gut feelings” when making corporate decisions?
    • Never.
    • Infrequently.
    • Approximately half of my decisions.
    • Frequently.
    • Always.
  • Q10 – How would you characterize your “gut feelings” compared to your peers?
    • Worse.
    • Average.
    • Better.
  • Q11 – How often is available data considered in your corporate decisions?
    • Data is never considered.
    • Infrequently.
    • For approximately half of my decisions data is considered.
    • Frequently.
    • Data is always considered.
  • Q12 – How often is data available for your corporate decisions?
    • Never or very rarely.
    • Infrequently.
    • For approximately half of my decisions.
    • Frequently.
    • Very frequently or always.
  • Q13 – When data is available, how would you characterize the quality of that data?
    • Very poor (i.e., no basis for decisions).
    • Poor (i.e., uncertain, error prone, biased, very limited data available).
    • Good (i.e., uncertain but can be relied upon, some bias, limited data available).
    • High (i.e., reliable, sizable data available, limited uncertainty).
    • Very high (i.e., meets the stringiest test to data quality, large amounts of data).
  • Q14 – How would you characterize your most important decisions in terms of the use of available & relevant data?
    • Never data-driven (i.e., no relevant data available).
    • Rare-data driven (i.e., tiny amount of relevant data available).
    • Small-data driven (i.e., little relevant data available).
    • Large-data driven (i.e., large amounts of relevant data available).
    • Big-data driven (i.e., huge amount of relevant data available).
  • Q15 – How important is human opinion compared to data-driven insights in your decision making?
    • It is irrelevant.
    • It is of some importance.
    • About half of my decisions are based on human insights.
    • It is very important.
    • It is exclusively used for my decisions.
  • Q16 – How would you characterize the quality of the data available to you and used in important corporate decisions compared to your competition?
    • Worse.
    • About the same.
    • Better.
  • Q17 – Have you heard of Artificial Intelligence (A.I.)?
    • No.
    • Yes.
  • Q18 – How would you best describe your feelings toward A.I.?
    • I love it.
    • I am very comfortable with it.
    • I am neutral.
    • I am uncomfortable with it.
    • I hate it.

The following questions are broken into an A and a B part. Approximately 50% of respondents will be presented with either A or B. I am in particular interested in understanding whether respondents changes their sentiment to A.I., whether the question is neutral towards A.I. (A-path) or specifically mentions that the A.I. is decision-making optimized (the B-path).

  • Q19A (~ 50%) – If an A.I. would be available to your company, how important do you think it would be in your company’s decision making processes?
    • Irrelevant.
    • Not Important.
    • Important in about half of all decisions.
    • Very important.
    • Always used.
  • Q19B (~50%) –  If a decision-making optimized A.I. would be available to your company, how important do you think it would be in your company’s decision making processes?
    • Irrelevant.
    • Not Important.
    • Important in about half of all decisions.
    • Very important.
    • Always used.
  • Q20A (~50%) – Would you trust a decision based on A.I. consultation?
    • Never.
    • Infrequently.
    • About half the time.
    • Frequently.
    • Always.
  • Q20B (~50%) – Would you trust a decision based on a decision-making optimized A.I. consultation?
    • Never.
    • Infrequently.
    • About half the time.
    • Frequently.
    • Always.
  • Q21A (~50%) –  If an A.I. would be available to you, how frequently do you think this A.I. would be consulted in your decision making process?
    • Never.
    • Infrequently.
    • About half the time.
    • Frequently.
    • Always.
  • Q21B (~50%) –  If a decision-making optimized A.I. would be available to you, how frequently do you think this A.I. would be consulted in your decision making process?
    • Never.
    • Infrequently.
    • About half the time.
    • Frequently.
    • Always.
  • Q22A (~50%) – If an A.I. would be available to guide your decisions, how often would you follow its advice?
    • Never.
    • Infrequently.
    • About half the time.
    • Frequently.
    • Always.
  • Q22B (~50%) – If a decision-making optimized A.I. would be available to guide your decisions, how often would you follow its advice?
    • Never.
    • Infrequently.
    • About half the time.
    • Frequently.
    • Always.
  • Q23A (~50%) – If an A.I. would be available to guide your decisions, how often would you follow Human advices different from your A.I.’s recommendation?
    • Never.
    • Infrequently.
    • About half the time.
    • Frequently.
    • Always.
  • Q23B (~50%) – If a decision-making optimized A.I. would be available to guide your decisions, how often would you follow Human advices different from your A.I.’s recommendation?
    • Never.
    • Infrequently.
    • About half the time.
    • Frequently.
    • Always.
  • Q24A (~50%) – If an A.I. would be available to your company, do you think your company still would need your decision making skills?
    • Never.
    • Infrequently.
    • About half the time.
    • Frequently.
    • Always.
  • Q24B (~50%) – If a decision-making optimized A.I. would be available to your company, do you think your company still would need your decision making skills?
    • Never.
    • Infrequently.
    • About half the time.
    • Frequently.
    • Always.