What Have Humans Just Unleashed?

Call it tech’s optical-illusion era: Not even the experts know exactly what will come next in the AI revolution.

Erik Carter
A colorful illustration of robots.

Listen to this article

To hear more audio stories, download the Hark app.

GPT-4 is here, and you’ve probably heard a good bit about it already. It’s a smarter, faster, more powerful engine for AI programs such as ChatGPT. It can turn a hand-sketched design into a functional website and help with your taxes. It got a 5 on the AP Art History test. There were already fears about AI coming for white-collar work, disrupting education, and so much else, and there was some healthy skepticism about those fears. So where does a more powerful AI leave us?

Perhaps overwhelmed or even tired, depending on your leanings. I feel both at once. It’s hard to argue that new large language models, or LLMs, aren’t a genuine engineering feat, and it’s exciting to experience advancements that feel magical, even if they’re just computational. But nonstop hype around a technology that is still nascent risks grinding people down because being constantly bombarded by promises of a future that will look very little like the past is both exhausting and unnerving. Any announcement of a technological achievement at the scale of OpenAI’s newest model inevitably sidesteps crucial questions—ones that simply don’t fit neatly into a demo video or blog post. What does the world look like when GPT-4 and similar models are embedded into everyday life? And how are we supposed to conceptualize these technologies at all when we’re still grappling with their still quite novel, but certainly less powerful, predecessors, including ChatGPT?

Over the past few weeks, I’ve put questions like these to AI researchers, academics, entrepreneurs, and people who are currently building AI applications. I’ve become obsessive about trying to wrap my head around this moment, because I’ve rarely felt less oriented toward a piece of technology than I do toward generative AI. When reading headlines and academic papers or simply stumbling into discussions between researchers or boosters on Twitter, even the near future of an AI-infused world feels like a mirage or an optical illusion. Conversations about AI quickly veer into unfocused territory and become kaleidoscopic, broad, and vague. How could they not?

The more people I talked with, the more it became clear that there aren’t great answers to the big questions. Perhaps the best phrase I’ve heard to capture this feeling comes from Nathan Labenz, an entrepreneur who builds AI video technology at his company, Waymark: “Pretty radical uncertainty.”

He already uses tools like ChatGPT to automate small administrative tasks such as annotating video clips. To do this, he’ll break videos down into still frames and use different AI models that do things such as text recognition, aesthetic evaluation, and captioning—processes that are slow and cumbersome when done manually. With this in mind, Labenz anticipates “a future of abundant expertise,” imagining, say, AI-assisted doctors who can use the technology to evaluate photos or lists of symptoms to make diagnoses (even as error and bias continue to plague current AI health-care tools). But the bigger questions—the existential ones—cast a shadow. “I don’t think we’re ready for what we’re creating,” he told me. AI, deployed at scale, reminds him of an invasive species: “They start somewhere and, over enough time, they colonize parts of the world … They do it and do it fast and it has all these cascading impacts on different ecosystems. Some organisms are displaced, sometimes landscapes change, all because something moved in.”

The uncertainty is echoed by others I spoke with, including an employee at a major technology company that is actively engineering large language models. They don’t seem to know exactly what they’re building, even as they rush to build it. (I’m withholding the names of this employee and the company because the employee is prohibited from talking about the company’s products.)

“The doomer fear among people who work on this stuff,” the employee said, “is that we still don’t know a lot about how large language models work.” For some technologists, the black-box notion represents boundless potential and the ability for machines to make humanlike inferences, though skeptics suggest that uncertainty makes addressing AI safety and alignment problems exponentially difficult as the technology matures.

There’s always been tension in the field of AI—in some ways, our confused moment is really nothing new. Computer scientists have long held that we can build truly intelligent machines, and that such a future is around the corner. In the 1960s, the Nobel laureate Herbert Simon predicted that “machines will be capable, within 20 years, of doing any work that a man can do.” Such overconfidence has given cynics reason to write off AI pontificators as the computer scientists who cried sentience!

Melanie Mitchell, a professor at the Santa Fe Institute who has been researching the field of artificial intelligence for decades, told me that this question—whether AI could ever approach something like human understanding—is a central disagreement among people who study this stuff. “Some extremely prominent people who are researchers are saying these machines maybe have the beginnings of consciousness and understanding of language, while the other extreme is that this is a bunch of blurry JPEGs and these models are merely stochastic parrots,” she said, referencing a term coined by the linguist and AI critic Emily M. Bender to describe how LLMs stitch together words based on probabilities and without any understanding. Most important, a stochastic parrot does not understand meaning. “It’s so hard to contextualize, because this is a phenomenon where the experts themselves can’t agree,” Mitchell said.

One of her recent papers illustrates that disagreement. She cites a survey from last year that asked 480 natural-language researchers if they believed that “some generative model trained only on text, given enough data and computational resources, could understand natural language in some non-trivial sense.” Fifty-one percent of respondents agreed and 49 percent disagreed. This division makes evaluating large language models tricky. GPT-4’s marketing centers on its ability to perform exceptionally on a suite of standardized tests, but, as Mitchell has written, “when applying tests designed for humans to LLMs, interpreting the results can rely on assumptions about human cognition that may not be true at all for these models.” It’s possible, she argues, that the performance benchmarks for these LLMs are not adequate and that new ones are needed.

There are plenty of reasons for all of these splits, but one that sticks with me is that understanding why a large language model like the one powering ChatGPT arrived at a particular inference is difficult, if not impossible. Engineers know what data sets an AI is trained on and can fine-tune the model by adjusting how different factors are weighted. Safety consultants can create parameters and guardrails for systems to make sure that, say, the model doesn’t help somebody plan an effective school shooting or give a recipe to build a chemical weapon. But, according to experts, to actually parse why a program generated a specific result is a bit like trying to understand the intricacies of human cognition: Where does a given thought in your head come from?

The fundamental lack of common understanding has not stopped the tech giants from plowing ahead without providing valuable, necessary transparency around their tools. (See, for example, how Microsoft’s rush to beat Google to the search-chatbot market led to existential, even hostile interactions between people and the program as the Bing chatbot appeared to go rogue.) As they mature, models such as OpenAI’s GPT-4, Meta’s LLaMA, and Google’s LaMDA will be licensed by countless companies and infused into their products. ChatGPT’s API has already been licensed out to third parties. Labenz described the future as generative AI models “sitting at millions of different nodes and products that help to get things done.”

AI hype and boosterism make talking about what the near future might look like difficult. The “AI revolution” could ultimately take the form of prosaic integrations at the enterprise level. The recent announcement of a partnership between the Bain & Company consultant group and OpenAI offers a preview of this type of lucrative, if soulless, collaboration, which promises to “offer tangible benefits across industries and business functions—hyperefficient content creation, highly personalized marketing, more streamlined customer service operations.”

These collaborations will bring ChatGPT-style generative tools into tens of thousands of companies’ workflows. Millions of people who have no interest in seeking out a chatbot in a web browser will encounter these applications through productivity software that they use every day, such as Slack and Microsoft Office. This week, Google announced that it would incorporate generative-AI tools into all of its Workspace products, including Gmail, Docs, and Sheets, to do things such as summarizing a long email thread or writing a three-paragraph email based on a one-sentence prompt. (Microsoft announced a similar product too.) Such integrations might turn out to be purely ornamental, or they could reshuffle thousands of mid-level knowledge-worker jobs. It’s possible that these tools don’t kill all of our jobs, but instead turn people into middle managers of AI tools.

The next few months might go like this: You will hear stories of call-center employees in rural areas whose jobs have been replaced by chatbots. Law-review journals might debate GPT-4 co-authorship in legal briefs. There will be regulatory fights and lawsuits over copyright and intellectual property. Conversations about the ethics of AI adoption will grow in volume as new products make little corners of our lives better but also subtly worse. Say, for example, your smart fridge gets an AI-powered chatbot that can tell you when your raw chicken has gone bad, but it also gives false positives from time to time and leads to food waste: Is that a net positive or net negative for society? There might be great art or music created with generative AI, and there will definitely be deepfakes and other horrible abuses of these tools. Beyond this kind of basic pontification, no one can know for sure what the future holds. Remember: radical uncertainty.

Even so, companies like OpenAI will continue to build out bigger models that can handle more parameters and operate more efficiently. The world hadn’t even come to grips with ChatGPT before GPT-4 rolled out this week. “Because the upside of AGI is so great, we do not believe it is possible or desirable for society to stop its development forever,” OpenAI’s CEO, Sam Altman, wrote in a blog post last month, referring to artificial general intelligence, or machines that are on par with human thinking. “Instead, society and the developers of AGI have to figure out how to get it right.” Like most philosophical conversations about AGI, Altman’s post oscillates between the vague benefits of such a radical tool (“providing a great force multiplier for human ingenuity and creativity”) and the ominous-but-also-vague risks (“misuse, drastic accidents, and societal disruption” that could be “existential”) it might entail.

Meanwhile, the computational power demanded by this technology will continue to increase, with the potential to become staggering. AI likely could eventually demand supercomputers that cost an astronomical amount of money to build (by some estimates, Bing’s AI chatbot could “need at least $4 billion of infrastructure to serve responses to all users”), and it’s unclear how that would be financed, or what strings might ultimately get attached to related fundraising. No one—Altman included—could ever fully answer why they should be the ones trusted with and responsible for bringing what he argues is potentially civilization-ending technology into the world.

Of course, as Mitchell notes, the basics of OpenAI’s dreamed-of AGI—how we can even define or recognize a machine’s intelligence—are unsettled debates. Once again, the wider our aperture, the more this technology behaves and feels like an optical illusion, even a mirage. Pinning it down is impossible. The further we zoom out, the harder it is to see what we’re building and whether it’s worthwhile.

Recently, I had one of these debates with Eric Schmidt, the former Google CEO who wrote a book with Henry Kissinger about AI and the future of humanity. Near the end of our conversation, Schmidt brought up an elaborate dystopian example of AI tools taking hateful messages from racists and, essentially, optimizing them for wider distribution. In this situation, the company behind the AI is effectively doubling the capacity for evil by serving the goals of the bigot, even if it intends to do no harm. “I picked the dystopian example to make the point,” Schmidt told me—that it’s important for the right people to spend the time and energy and money to shape these tools early. “The reason we’re marching toward this technological revolution is it is a material improvement in human intelligence. You’re having something that you can communicate with; they can give you advice that’s reasonably accurate. It’s pretty powerful. It will lead to all sorts of problems.”

I asked Schmidt if he genuinely thought such a trade-off was worth it. “My answer,” he said, “is hell yeah.” But I found his rationale unconvincing. “If you think about the biggest problems in the world, they are all really hard—climate change, human organizations, and so forth. And so, I always want people to be smarter. The reason I picked a dystopian example is because we didn’t understand such things when we built up social media 15 years ago. We didn’t know what would happen with election interference and crazy people. We didn’t understand it and I don’t want us to make the same mistakes again.”

Having spent the past decade reporting on the platforms, architecture, and societal repercussions of social media, I can’t help but feel that the systems, though human and deeply complex, are of a different technological magnitude than the scale and complexity of large language models and generative-AI tools. The problems—which their founders didn’t anticipate—weren’t wild, unimaginable, novel problems of humanity. They were reasonably predictable problems of connecting the world and democratizing speech at scale for profit at lightning speed. They were the product of a small handful of people obsessed with what was technologically possible and with dreams of rewiring society.

Trying to find the perfect analogy to contextualize what a true, lasting AI revolution might look like without falling victim to the most overzealous marketers or doomers is futile. In my conversations, the comparisons ranged from the agricultural revolution to the industrial revolution to the advent of the internet or social media. But one comparison never came up, and I can’t stop thinking about it: nuclear fission and the development of nuclear weapons.

As dramatic as this sounds, I don’t lie awake thinking of Skynet murdering me—I don’t even feel like I understand what advancements would need to happen with the technology for killer AGI to become a genuine concern. Nor do I think large language models are going to kill us all. The nuclear comparison isn’t about any version of the technology we have now—it is related to the bluster and hand-wringing from true believers and organizations about what technologists might be building toward. I lack the technical understanding to know what later iterations of this technology could be capable of, and I don’t wish to buy into hype or sell somebody’s lucrative, speculative vision. I am also stuck on the notion, voiced by some of these visionaries, that AI’s future development might potentially be an extinction-level threat.

ChatGPT doesn’t really resemble the Manhattan Project, obviously. But I wonder if the existential feeling that seeps into most of my AI conversations parallels the feelings inside Los Alamos in the 1940s. I’m sure there were questions then. If we don’t build it, won’t someone else? Will this make us safer? Should we take on monumental risk simply because we can? Like everything about our AI moment, what I find calming is also what I find disquieting. At least those people knew what they were building.