The pitfalls of impression-based tech policy analysis

In late-2022, OpenAI released a Transformer-based Large Language Model (LLM) called “ChatGPT.” Against the expectations of OpenAI’s staff, ChatGPT became the fastest-growing web-based app in history, rising to 100 active million users in two months (beaten only by Meta’s Threads). The first public impressions of ChatGPT had both sublime qualities and portensions of doom. In February 2023, Henry Kissinger, Eric Schmidt, and Daniel Huttenlocher wrote that generative artificial intelligence (AI) is comparable to the intellectual revolution initiated by the printing press, this time consolidating and ‘distilling’ the storehouse of human knowledge. In March 2023, Eliezer Yudkowsky, foreseeing extinction-level risks, implored the world’s governments and militaries to shut down the project of AI and “be willing to destroy a rogue datacenter by airstrike.”

These first impressions represent two ends of a spectrum, but the reasoning that occupies the space between them is commonplace in technology policy analysis: personal impressions of generative AI infiltrate the background assumptions from which policy analyses are conducted. When assumptions of fundamental importance go unquestioned, it is all too easy to fall into the trap of extrapolating from current technological conditions to future technological marvels. Technology policy analysts of all stripes do excellent work, but it is time to identify the gaps in our reasoning and aim individually and collectively higher.

An example illustrates the general tendency. The Center for a New American Security’s Paul Scharre, in his book “Four Battlegrounds” —which on the whole is a treasure trove of insights—hedges on the future of AI, though leans towards the idea that “Building larger, more diverse datasets may result in more robust models. Multimodal datasets may help to build models that can associate concepts represented in multiple formats, such as text, images, video, and audio.” This expectation leans on the idea that scaling up AI systems (making their internal capacity and training datasets larger) will lead to new capabilities, with positive reference to Richard Sutton’s famous argument in “The Bitter Lesson” about the benefits of such techniques.

Not long after, Microsoft’s researchers helped set the tone for a flurry of over-optimistic claims about the future of LLMs with their provocatively titled, “Sparks of Artificial General Intelligence” paper on GPT-4. It is not difficult to see how one’s personal impression of GPT-4 could lead to an equivalent sense of “We’re on the brink of something big here.” Yet, this is no justification for allowing the assumptions bound up in this sentiment to fester in one’s analyses.

Ample research highlights the limits of LLMs and other Transformer-based systems. Hallucinations (authoritative but factually incorrect statements) continue to plague LLMs, with some researchers suggesting these are simply innate features of this technology. According to one recent study, voters using chatbots for basic information about the 2024 elections can readily be misinformed about hallucinated polling places and other false or outdated information. Other research shows that LLMs’ ability to form abstractions and generalize them lags behind that of humans; the reasoning abilities of multimodal systems are a similar story. OpenAI’s most recent development—the text-to-video generator “Sora”—while remarkable in its realism, invents objects and people out of thin air and fails to comply with real-world physics.

So much for the idea that new modalities like image and video would lead to the reliable, robust, and explainable AI systems we desire.

None of this suggests that there is only hype in the technology world. Carnegie’s Matt O’Shaughnessy correctly notes that talk of “superintelligence” is likely to negatively influence policymaking because of machine learning’s fundamental limitations. Additionally, the Biden administration’s extensive October 2023 executive order on AI, while dramatically invoking the Defense Production Act to authorize the monitoring of certain computationally powerful AI systems, was more diverse in tone than one might expect.

Yet, the problem we identify here is not a hype problem per se. Hype is a result of getting stuck in analytic frames that are too easily ignored in favor of quick publications and individual or organizational self-promotion. Lest we mistakenly believe this is just a peculiar LLM-specific tendency, the disappointment of AI-enabled and autonomous drones on the battlefield in Ukraine should raise eyebrows about the alleged rapidity of fundamental breakthroughs occurring in 2023. Moreover, it is easier to find nuance in the domain of quantum information science, but at the same time, little individual or collective reflection appears to arise when its crown jewel of quantum computing begins to see its future downgraded.

Nevertheless, generative AI today is starting to look like a parody of Mao’s Continuous Revolution—the transformation of this technology into a human-like “general” intelligence or some other marvel of technological imagination is always one model upgrade away, and it cannot be allowed to succumb to challenges from regulatory bodies or popular movements.

The takeaway is that policy analysts make choices when assessing technology. The choice of certain assumptions over others presents the analyst with a certain set of possible policy options at the expense of others. That individuals have first impressions about new technologies is unavoidable and can serve as a source of diversity of opinion. The problem for policy analysis arises when practitioners fail to pour their first (or second, or third, etc.) impressions into a shared crucible that exposes unstable ideas to high-temperature intellectual criticism, thereby guiding them toward an articulation of specific policy challenges and solutions without unduly neglecting other possibilities wholesale.

Policy analysis generally is a concoction of ingredients from industry, domestic politics, and international affairs. Merely identifying that a policy challenge exists is not done de novo but from an intuitive link between the needs and values of a society and anticipated or actual impacts by developments within its borders or abroad. That intuition—we all have it—should be the focus of our honest and shared scrutiny.

Vincent J. Carchidi is a Non-Resident Scholar at the Middle East Institute’s Strategic Technologies and Cyber Security Program. He is also a member of Foreign Policy for America’s NextGen Initiative 2024 Cohort.

READ SOURCE