Since the concept of a "digital assistant," an app listening continually for pre-configured trigger words to initiate helpful actions, was first launched with Siri in 2010 then spread to other smartphone devices and televisions, even unsophisticated consumers seemed to recognize an inherent privacy danger with this technology:

If you're always listening to EVERYTHING to match on the trigger word, what are you DOING with the information gleaned from what I said BEFORE saying the trigger word?

That privacy issue is coming up again along with even larger economic and legal issues as capabilities first referenced by Cox Media in December 2023 are getting new attention in the media in September of 2024. In December of 2023, information Cox Media provided to investors referenced a capability for advertisers and commerce firms to leverage information about "pre-trigger speech" captured from digital assistants to more explicitly target ads and -- AND -- influence shopping interactions after a user follows an ad to a commerce site to buy. The message from Cox Media was not merely referencing a hypothetical, it touted it in the present tense, essentially saying we can do this for you now.

This capability and concerns about its potential for abuse have likely come back in September 2024 after multiple companies have testified that their commerce sites have leveraged data mined from customer behavior reflected in cookies and other data sources to tailor shopping recommendations. That's a nice way of saying the sites are leveraging per-customer data to filter the products, providers AND PRICES listed to a user shopping on the site. And that's a nice way of saying that some of these companies have been implementing per-customer price discrimination in their systems, something that is illegal and a tremendous abuse of market power.

How did we get here? And where is this headed?

Evolution of Voice Recognition

In the 1990s and 2000s, the relatively limited CPU processing capabilities, available memory and audio quality of speech resulted in recognition algorithms that were "speaker-dependent." To recognize user X's voice saying a limited number of utterances, the system had to record user X saying each utterance multiple times (often 10-20 times) to average those samples together mathematically to use in recognition. After that training, if user Y said those same words, the recognition would not be as good as for user X who trained the system.

When smartphones introduced digital assistants, the recognition could be performed ON the device using audio signals captured locally in much higher fidelity than a classic telephone signal limited to 4kHz bandwidth to fit into a 64,000 bits/sec digital stream in phone networks. That boost in processor power aboard the phone also allowed use of "speaker-independent" speech models captured from (presumably) thousands or millions of human voices, making the recognition quality much higher. But while those initial speaker-independent models may have involved thousands or millions of voices, their "dictionary" of trained words was still relatively small. And therein lies the ethical and legal root of this story.

Illicit Data Collection

As already stated, at the introduction of these digital assistants in 2010, even non-technical users immediately raised the concern:

If you're always listening to EVERYTHING to match on the trigger word, what are you DOING with the information gleaned from what I said BEFORE saying the trigger word?

From the start, device manufacturers essentially waved their hands and said the device does nothing with "pre-trigger" utterances. Any digital data reflecting parsed speech PRIOR to a trigger word and subsequent "request" is simply sent to the bitbucket, never to be seen again.

That was the claim. However, anyone with a background in software engineering suspected this would not remain the case for very long, if it was ever true to begin with. Why? Because all of that digitized speech uttered PRIOR to a trigger phrase is a gold mine of data to use in expanding the "dictionary" of speaker-independent training models that would boost accuracy rates of utterances AFTER the trigger word.

All the device and application makers had to do was enable collection of ALL of the recognition data, both BEFORE and AFTER the trigger word. And clearly they did. It isn't clear if each device and application maker actually reflected that clearly and accurately in their terms of service agreement with the customer. If not, this business decision alone could result in multi-billion dollar class action lawsuits over privacy violations. These lawsuits would be perfectly justified.

But the thought of "just collect the pre-trigger utterance data as well" clearly triggered another thought. Since we have that pre-utterance data parsed and mapped to the originating speaker by their device and IP address, what if this utterance data was combined with THAT data for ad targeting? Keep in mind Cox Media is a subsidiary of Cox Communications. Cox Communications sells cell phone service and high-speed internet service. They have viewership data from set-top boxes, viewership data from any streaming video app they provide to their customers and some data from their customer's internet usage. Cox Media sells on-screen and online advertising to businesses on its parent's traditional cable TV network and acts as a broker for buying online ad impressions. This same business model is used by other communication carriers, whether telco or cable.

In the case of Cox Media, their December 2023 presentation to investors included this smoking gun:

Don't just know what they're searching for, know what they're talking about.

That's CLEARLY a reference to sharing "pre-trigger" data with ADVERTISERS. And that is not likely an action that Cox Communications divulged in its terms of service agreement with subscribers of its cell phone service.

The core point here is that data collection methods for improving voice recognition capabilities clearly strayed beyond the consent collected from paying customers. At least the universe of humans who turned on the assistant feature.. Or…? Were device makers sampling speech regardless of whether the customer enabled the assistant function or not? These capabilities are hard to turn off and, in some cases, impossible to un-install even if "disabled." So are they really off? That's a different technical topic that won't be addressed here.

Illicit Price Discrimination

All of those data collection capabilities on the part of device makers and service providers interacted with another capability that many online commerce companies have already been implementing for years which is just coming to light: the use of per-user profile data to drive shopping experiences. Again, that's a nice way of putting it. This practice essentially amounts to highly customizable price discrimination. Instead of a monopoly setting one price in the market that matches the producers marginal cost but reduces overall supply and creating inefficiency that way, this approach lets the seller set a unique price per customer, reflecting everything the seller can derive about the customer's individual "demand curve" for the product involved. How does this work? In multiple ways, all of which would likely be undetectable to you as a consumer.

Scenario #1 - Big Ticket Purchase Imagine you've been considering purchasing a new car for the last year. You've considered sedans, you've considered trucks, you've considered SUVs. Over the past year, you've been thinking more and more about one vehicle, a Toyota 4Runner. That term has come up in conversations with your spouse, it's been appearing in your Google searches. You've watched a couple dozen videos on YouTube about 2024 versus 2025 4Runners. You've been surfing to Toyota's web site at an increasing frequency as the 2025 model year nears.

Now, in light of all this data collection going on, ponder all of the personal data tied to your smart phone, tablet, smart TV and PC browser related to your actions over the past year.

DNS data shows repeated surfing to www.toyota.com
YouTube viewership data shows videos correlated to 4Runner
pre-trigger utterances show an increase in the appearance of "4Runner" in speech

Now the new 4Runner comes out and Toyota sends you an email with a link announcing new inventory will be arriving shortly, click here to see the brochure or schedule a test drive. If you click on that test drive link, all of that prior data collection can be linked with your click that registered you for a test drive and chat with a salesman.

Now imagine showing up to that dealership. That salesman could have your surf and search history summarized and know you aren't just a casual shopper, you probably already have your mind made up. How good a deal are you going to get when the seller knows you are 50% less likely to buy any other vehicle?

Scenario #2 - Routine Purchases -- Imagine you are a regular Amazon shopper and user of Alexa on various devices. Imagine you've been having conversations about buying a new big-screen TV. Imagine Alexa has been capturing your pre-trigger utterances for two years, analyzing them against your Amazon shopping behavior and has determined that you tend to be rather, ahem, "decisive" about purchases. Once you start talking about buying something, you generally buy within a day or so, rather than debating features and prices for two weeks.

That behavioral pattern could be used by Amazon to "customize" your shopping experience to ensure the TV models most profitable for Amazon appear on the first page, despite being significantly more expensive than competing products. Amazon knows you won't scroll beyond four pages of results and adjusts the appearance of products accordingly. In contrast, a different user found to be more cautious would still see a customized shopping view but even more price-sensitive options would still be prioritized for Amazon's financial benefit.

These are not hypotheticals Just today, I received an email from an online musical instrument retailer at 12:54pm touting new Gibson Les Paul Studio models. I opened that email around 2:20pm, clicked on the link to bounce to their web site and looked at the detail page of one specific color. At 2:27pm, I got another email from the vendor with the subject of "Take another look?" with content that showed the exact color I viewed online. That's how fast data collected by cookies can be turned into new outbound communication and fed back into online portals for customized experiences.

Where are These Capabilities Headed?

The first answer to that question should be federal criminal court. If the summary above is remotely close to what these companies are actually doing, the service providers involved would be subject to massive lawsuits for violating terms and conditions and customer privacy laws. The commerce customization capabilities implemented by online merchants would amount to illegal price fixing and discrimination. If taken to extreme, they could be used to silently, invisibly red-line sales to specific areas or members of specific ethnicities.

For example, if a company doesn't want to sell to a specific group, their "commerce customization" could include logic to fake inventory levels of zero when the shopper maps to designated metadata. Unless that shopper suspected something odd and asked a different person in a different home to surf to the same site at the same time for the same item and find available stock, they might never know they were being discriminated against.

The fact that there are three smartphone makers, two dominant PC operating system makers and only five or six dominant service providers in the entire United States makes these data mining economies of scale nearly irresistible to those in control of the data. The real question is whether the value provided to consumers -- in this case seemingly driven by the "convenience" of a digital assistant -- outweighs the economic harm resulting from abuse of these monopolistic powers.

As stated multiple times in the recent past, American consumers have lapsed into thinking that our cool toys will someone stop working or stop getting more cool if ANY limits are imposed on the giant corporations so generously bestowing upon us the fruits of their genius. This is what many thought prior to the divestiture of AT&T in 1984.

Forty years later, innovation and economic efficiency are again getting strangled by abusive monopolies. Sure, long distance calls are free instead of eighteen cents per minute but now the phone costs $1300, you're convinced you need a new phone every three years, cell service is $19.99/month AND your modern day "Daughter Bell" is listening to everything you say and selling your profile to every merchant you deal with.

WTH

WatchingTheHerd

Tuesday, September 17, 2024

Anything You Say Can and Will...

Evolution of Voice Recognition

Illicit Data Collection

Illicit Price Discrimination

Where are These Capabilities Headed?