Using AI vs. Finding Your Ground Truth Supremacy

Today, most companies, VC’s, and product teams are approaching the current generation of AI’s as if their core economics will work just like the ‘big data’ revolution that created titans like Google, Facebook, and Amazon. But there are big economic differences. Unlike the previous ‘big data’ revolution, the current generation of AI’s are waiting for broader participation to give up their big gains. We’re not there yet and just about every company has major AI wins waiting in the wings. Every company can and should be aggressively working to realize those wins.

So why aren’t they?

I think it’s because they’re reading from the playbook of the last 20 years and expect big tech to figure it all out for them. And I think that’s a mistake. Just like it took a long time after the advent of personal computers for companies to figure out how to actually get gains from ‘computerizing’ their operation, the big, unrealized gains from AI will have more to do with individual companies learning for themselves what works for their specific operation. Just like computers offered up gains in very different ways across spaces like retail, manufacturing, and health, so it may be with AI.

Specifically, I think you get the durable wins when you execute on what I’ll call your ‘Ground Truth Supremacy’. Ground truth is an AI/data-science term that describes the degree to which you have relevant observations about your user (or subject) that you truly, deeply understand. In the context of a human’s interaction with AI, you have Ground Truth Supremacy in the cases where you have better:

  1. context about what the user is trying to do and
  2. comparable data about similar user interactions.
Figure 1: Finding Your Ground Truth Supremacy
ground-truth-ai

Who has this supremacy? If you’re running a business with any substantial scale, you do! This hasn’t been the case with big data/machine learning so far, and so it may be easier to understand if we compare it to the ‘big data’ revolution that preceded the current generation of AI’s.

The Big Data Revolution of 2015

The last generation of big data models allowed players with a lot of data to out-predict their competition on economically important interventions, like which search results to show a given user and which ad maximized the likelihood that they would click-through.

These interventions fostered a winner-take-all dynamic for two key reasons. First, the interventions were (and still are) mass market—the way Google or Facebook decides what search results or posts to show a user works the same for everyone. Second, those models worked better the more proprietary data you had. The two facts interacted in a virtuous cycle which perpetuated the key players’ dominance in their respective categories.

For example, Google created one of the century’s great fortunes with a two-sided market selling search results, email, etc. to end users on the one side and ads on the other. Key to the durability of their leadership in search is the virtuous cycle they’ve created: their proprietary data about users allows them to present more relevant search results (and ads). That, in turn, creates more data to continue to improve those results.

Unpacking this dynamic brings us to a slightly technical item that will help your understanding of AI a ton: the relationship between dependent and independent variables. Let’s say you run an ice cream shop and you want to make sure you have just enough (but not too many) employees on hand to service your customers. You want an AI to predict your store traffic by hour for the next two weeks so you can make better decisions about staffing. The way the AI thinks about it, the dependent variable would be peak customer visits by hour and the independent variables would be factors that you think are tied to your foot traffic like time of year, holidays, temperature, and rain, to name a few.

Figure 2: Dependent and Independent Variables
dependent-variable.001

Bringing this back around to the Google example, for search their focal DV’s are:
the set of search results that are most likely to be what you want (observed through proxies like click through rates, return searches, etc.)
the search ad that you’re most likely to click on.

They’re the best at predicting these DV’s because they have a wealth of proprietary data on IV’s for you and your behavior online: when you shop, what interests you, etc. They can even finish sentences for you like ‘Why do I…’. The figure below describes the core basis for Google’s wildly profitable and, to date, unbeatable search business. Some companies talk about ‘moats’ to keep out competition. Google has had an ocean.

Figure 3: Google’s Data-Driven Dominance in Search
google-big-data.001

The various AI’s/models that Google uses to do this have gotten better and better over time, but it’s their proprietary data on user behavior that really matters. That’s the reason why, for example, Google famously open sourced their core Tensorflow model.

As remote as the Google example may seem, the AI capabilities of today are immediately, specifically relevant to you. And you should be acting on them.

The AI Revolution of Right Now

The large language models that power services like chatGPT are very different from the prior generation of models. They’re different both with regard to their operations and the DV’s they predict but also, and, more importantly, the independent variables (IV’s) they rely on and who has custody of that data.

In the case of large language models (LLM’s) like the one that runs the chatGPT service, the DV is, ‘What set of words will most please this user?’ The IV’s are the initial prompt from the user and all the text the model in question could lay its hands on: everything on the Internet, books in the public domain, and anything else they can find.

Figure 4: An LLM’s Dependent and Independent Variables

llm.001If that DV (“What set of words will most please this user?”) sounds a little vague, it is. But there are ways to sharpen it, and a group of researchers described one in a seminal paper called ‘Attention Is All You Need’. It wasn’t that big a deal at the time (2017), but it is now because it led to the prevailing model for AI, which trains in three general steps:

1. ‘Unsupervised Learning’
The AI model looks at lots of text and creates a notion of how it all relates to each other. It will create many, many such notions and see which ones do best in the tests that follow. Most of the data involved is text that’s generally available.

2. ‘Self-Supervised Learning’
This may be one of the less obvious parts—the AI then finds comparable bodies of text to test itself. For example, researchers relied a lot on translations of the same text such as a book or web page that’s already available in multiple languages. Most of the data involved is text that’s generally available.The previous unsupervised step in the model has, for example, created a notion of how German and English relate to each other. Now the model will test that particular notion by creating its own translation and then comparing that to the already existing (human generated) translation. The AI prunes away the worse-matching models and cultivates the better performing ones.

3. ‘Supervised Learning’
Now, there’s an opportunity to further improve the model using ‘known good’ and ‘known bad’ examples, which are called ‘labeled training data’. The AI takes a version of itself and tests it against these more definitive examples to see how it does.

This labeled training data is, for example, AI responses to prompts that human raters have reviewed and ‘labeled’ in some way as good/bad/OK. To date, these raters have mostly been low cost, offshore contractors. However, as applications like chatGPT acquire more feedback (thumbs up/down, follow-on requests) from end users, more popular models are acquiring more of this data organically from actual end users.

Back to our comparison with the ‘big data’ analytics that powers Google’s search, their entire trove of proprietary data is ‘labeled’: the user did/didn’t click on this set of ads, the user did/didn’t return to these search results to try again, etc. With a market share of >90% in search and all that great data, it’s easy to see why Google’s remained so dominant.

Now, if chatGPT were to dominate the ‘AI market’ in the way Google dominates search, they might end up with an equally durable franchise. But that’s not happening. First, many AI transactions are not end-user from end users; they’re from embedded applications (ex: chatbots) and use the AI provider’s API. Second, many of the important buyers are enterprises who early-on demanded (and got) service agreements where their data stays private. Finally, even with its massive early lead, openAI (creators of chatGPT) have a market share well below 50%. No single company is achieving the kind of dominance that we’ve come to accept with the big data revolution that started in the mid-2010’s. And that’s your opportunity.

Your Ground Truth Supremacy

So, where’s the money? Where are the competitive ‘moats’ that lead to durable wins? This brings me back to ‘Ground Truth Supremacy’ and two key assets you need to achieve it on your focal user experiences (UX’s):

  1. better context about the experience
  2. better data about comparable experiences

Most of the data involved in creating AI’s is text that’s generally available, but for your most valuable UX’s, you already have an edge. For example, let’s say you’re a health care system and one of your patients wants to know what they should do about their kid’s tick bite; they send you a written inquiry and a photo. They’re your patient and using your phone system or website, and that gives you ground truth supremacy on context. Specifically, this allows you to better engineer the prompt you’ll give to an AI for a response, supplementing it with, for example, the prevalence of tick-borne illnesses in their area, relevant background on the patient, and the general type of response you think would be useful.

You also have data about what’s worked (and not) for your other, similar patients, and so you likely have ground truth supremacy on comparable data as well. Let’s say there’s a hospital system that’s 10x as large as you- possibly that outweighs the specific relevance of your data to your patients, but also a lot of what companies like openAI and their robust set of competitors are doing is licensing more and more data and making it generally available. What this data allows you to do is run the result through a set of further supervised refinements. You can think of this as a fourth step beyond the ‘supervised learning’ step ‘targeted supervised learning’.

Figure 5: AI & The Ground Truth Supremacy
your-ground-truth-supremacy.001

What does a good execution look like in practice? Non-profit online course giant Khan Academy has been early and public about how they’re executing with AI to maximize what I’d call their ground truth supremacy. They now operate an always-on AI tutor called ‘Khanmigo’, which uses openAI’s GPT service. However, they also created their own markup language to supply hyper contextual prompts, improving their particular inputs to general-purpose GPT for their specific user interaction. They’re also using various AI’s to further layer refinements onto ‘basic’ GPT with proprietary labeled training data from their users.

Finding Your Ground Truth Supremacy

Ironically, finding your Ground Truth Supremacy starts with qualitative evidence- defining in qualitative terms your user experiences and then identifying the most valuable interventions an AI could make. Like any good design, you should start with what you know and what you want, and then worry about implementation. Fortunately, the tools we have today make it relatively fast and easy to iteratively figuring that out.

Once you get the hang of identifying those valuable UX interventions and what user context is important, the path to cementing your Ground Truth Supremacy means leveraging the contextual and comparable data you already have (or can easily create). In the early days of AI, you’d be crazy not to focus on the easy wins like AI-driven coding and supplemental chatbots. However, key to durable AI wins is going in and thoughtfully experimenting your way to better user experiences by way of AI-powered interventions. That means identifying your most valuable user interventions, testing alternative UX’s to see what delivers, and then making sure you have a data strategy that’s going to keep enhancing that UX.

Personally, I’m excited about tech creating a broader set of economic wins, and I hope you are, too. Here’s to finding our ground truth supremacies! If you want a playbook for how product teams are getting there, I can’t help but recommend this delightful book: Hypothesis-Driven Development.