Is an A.I. better if it sounds more human?

Before I start getting into what could be considered a rant, or an undocumented opinion at best, I will start with this: I know that generative-A.I. tools like ChatGPT will obviously continue to be used more and more and deepen their mark on history. For once, I think the word “disruptive” is the right one.

But I wonder if the priorities of the industry are right.

I wonder if these companies focus too much on the way their products appear rather than how much more useful and accurate they could be. Again, I am just wondering, and I don’t think I know enough about this topic to write a proper rant, but I feel like I need to jot down my questions — that I would not call “concerns” here — about what’s coming.

Casey Newton, writing for Platformer:

On Monday, OpenA.I. announced the latest updates for ChatGPT. One feature lets you interact with its large language model via voice.

OpenA.I. wants to improve the UI of ChatGPT: sounds good. If what these generative-A.I. apps do is impressive, the way we interact with them still has room for improvement. Typing requests into a bland text field may not always be the best way to benefit from all the power these tools can offer, and having to read big chunks of text doesn’t feel as innovative as it could be in many cases.

You can imagine the next steps here. A bot that gets to know your quirks; remembers your life history; offers you coaching or tutoring or therapy; entertains you in whichever way you prefer. A synthetic companion not unlike the real people you encounter during the day, only smarter, more patient, more empathetic, more available.

On paper, the potential of this new voice feature looks good, promising, and exciting. As a tech enthusiast and an innovation-driven consumer, I can see how this can change the way we use tools like ChatGPT, and even the whole way we consume technology in our daily lives. I get what Newton is saying here talking about “the next steps”, I can also see how this synthetic voice can soon be matched to an AI-generated video — a little like what Meta unveiled — to create a human lookalike that gives you answers and help you with your day.

I just wonder if this is not putting the UI cart before the accuracy horse.

Also, do we really want this? Will the public want something like this? I’m not asking about the need — many of the stuff we own we already don’t need — but the want. Will I want an AI-assistant with a voice that sounds exactly like a human voice? I certainly doubt that. Maybe that’s just me. Didn’t the whole “Instagram filter era” ended because it didn’t feel authentic enough? I feel that making these A.I. tools more “human” is kind of the same as putting filters on pictures to make them look better. They do look better and more appealing, but users ended up going back to “real” photos and videos, and TikTok’s success was arguably built on that perceived authenticity.

Those of us who are blessed to have many close friends and family members in our life may look down on tools like this, experiencing what they offer as a cloying simulacrum of the human experience.

Well, you got me there Casey.

But I imagine it might feel different for those who are lonely, isolated, or on the margins. On an early episode of Hard Fork, a trans teenager sent in a voice memo to tell us about using ChatGPT to get daily affirmations about identity issues. The power of giving what were then text messages a warm and kindly voice, I think, should not be underestimated.

100% agree. Some people might love this technology: a virtual machine that can create valuable social interactions for many, why not? Although, if for people who are lonely and isolated a human voice may be better than an ugly robotic voice, and people might feel less lonely, I’m not sure the company of an A.I. is a good solution, regardless of how good it is, because people will know they are not really talking to a person. Will these AI-assistant be like smarter cats? Conscious TVs?

I’m no expert in human psychology, but I’m just not sure that making the voices and messages more human — while everyone knows very well they interact with a computer — will make the A.I. apps more efficient. It will sure sound cooler and encourage people to use them more (making them better eventually). It will be impressive at first. No doubt that in a few years from now, the current state of computer-generated voice will sound like it was created in the 1950s. This could be the “retina display” moment for AI, making it very difficult to go back to the previous generation.

Maybe it will be a breakthrough. Maybe this is the premise of something revolutionary, maybe via some sort of future personal device, who knows? Maybe I’m already an old man yelling at a cloud, but I like my virtual assistants to remain as virtual as possible. Humans are not even that great to begin with.

I don’t want to fall into the sceptic group. I try to have a wait-and-see approach regarding this kind of experiments (I say “experiments” to describe these newly announced features because they seem to always be introduced and released like prototypes or alpha versions).

Newton writes:

OpenA.I. tends to present its products as productivity tools: simple utilities for getting things done.

I am wondering if making an A.I. sound more human makes it really better at getting things done.

Another example: Spotify wants to automatically translate podcast episodes and then use the host’s AI-cloned voice. That sounds cool, but I’d like to see a focus on the quality of the translations first. Once we know translations are truly great, then maybe we can focus on having a similar-sounding voice in every language. Like I said, it feels the priorities are wrong, but maybe I’m just way over my head on this topic.

I like the potential of this technology, but I would like it even better if these A.I. companies could address the issues associated with A.I. just as quick as they are imagining new ways to monetise their technology. I would like to see them figure out the whole copyright issue with their technology first for example, before moving on to making it more appealing.

Hemant Taneja, writing on Harvard Business Review back in 2019, on his column The Era of “Move Fast and Break Things” Is Over:

Many of today’s entrepreneurs live by Facebook founder Mark Zuckerberg’s now-famous motto: “Move fast and break things.” Zuckerberg intended for this to inform internal design and management processes, but it aptly captures how entrepreneurs regard disruption: more is always better. We raced to put our products into consumers’ hands as fast as possible, without regard for the merit of—and rationale for—offline systems of governance.

Does this sound familiar to you too?