Skip to main content

Mere Orthodoxy exists to create media for Christian renewal. Support this mission today.

“AI Safety” Without Virtue Will Never Be Safe

December 19th, 2023 | 8 min read

By Robin Phillips

The recent return of Sam Altman to the helm of OpenAI has brought new urgency to the nationwide debate about AI safety.

Altman is acutely aware of the dangers of artificial general intelligence (AGI), has warned that AGI could harm the world, and has been unable to alleviate fears that his company might inadvertently be building something dangerous. Yet rather than joining Eliezer Yudkowsky and the doomers, Altman has positioned himself with the accelerationists. Their creed: race to conquer AI before it conquers us.

It is widely speculated, not without some evidence, that this accelerationist approach may have been behind Altman’s recent (though short-lived) ouster. Specific concern has been raised about a mysterious “Q*” or “Q-star” system the company has been developing. By integrating existing language processing power with new capabilities in mathematical reasoning, Q* might be investing AI with capabilities for which we are unprepared.

The solution for all this, we are told, is AI safety, safety, safety. In practice, this means regulations, rules, guidelines, standards, monitoring, policies, watchdogs, etc. And here the Biden administration seems to be ahead of the game: on October 30th, the President released an Executive Order promising “new standards for AI safety and security.”

Philosophers are getting into the game by adding ethics into the discussion, with the result that most philosophy courses on ethics now include discussion of AI and robotics.

Yet in all the discussion about AI ethics and safety, you’ll be hard pressed to find much discussion about AI and virtue. Virtue, though enjoying something of a resurgence among contemporary moral philosophers, has for the most part receded from popular vocabulary. For many, the concept of virtue may bring to mind the quaint world of Jane Austen when characters were concerned with preserving or not losing their virtue. 

What Are Virtues?

Historically, virtues have been central to the good life, and have been understood to be character traits constitutive of human flourishing. Virtues include qualities like fortitude, integrity, temperance, attentiveness, prudence, etc. Such traits are not developed overnight but require steady habituation over time.

These virtues involve more than merely knowing the answer to the question “what should I do?” but include a complex amalgam of intuition, character, and a type of know-how that emerges over a lifetime of formation, usually in community with others who are wise and virtuous. 

Virtues vs. Ethics

Ethics, by contrast, is less about persons, and more about actions, duties, and laws. It is the branch of philosophy dealing, among other things, with questions like, “Under what conditions would such-and-such be morally permissible?” Whereas virtue is, by definition, personal and human-specific, ethics can be discussed in the abstract, as when we formalize the ethics in different professional fields (i.e., business ethics, medical ethics, legal ethics, etc.)

It isn’t hard to see why those involved in AI are focused on ethics while tending to neglect the virtues. On one level, this is purely practical: it’s comparatively easy to do ethics well, but very hard to excel at virtue. Ethics is something you can learn or teach quickly; virtue requires a lifetime of good habits and character formation. Ethics is something that can be programmed into a machine; virtue is something only humans can have.

How We Became a Post-Virtue Society

There are also historical-cultural reasons for the eclipse of virtue among AI researchers and practitioners. In his seminal work After Virtue, Alasdair MacIntyre showed that it was only after the Enlightenment that ethics and morality came to be perceived as something detached from teleology, and therefore separable from the questions of human flourishing that had been central to all previous accounts of the virtues. This led to a number of difficulties, including the classic is-ought problem made famous by David Hume: how do you infer normative claims from purely empirical observations?

In MacIntyre’s reading, what emerged from these shifts was "morality” coming to exist as its own domain, suspended in a nether realm that is neither theological, legal, aesthetic, nor phenomenological, occupying a cultural space entirely its own but with a rather hazy ontological status. By thus detaching morality from the rest of life, the philosophers of the 18th century created space for interminable questions about whether morality is even real (or, as the problem is often framed, whether morality is “objective”).

Beyond conundrums about the objectivity of morality, the philosophical sea-change of the Enlightenment bequeathed a legacy in which there is no rational context in which to situate the virtues. Consequently, we live in what MacIntyre described as a post-virtue condition, in which the status previously afforded to the virtues has been replaced by what he calls emotivism. In MacIntyre’s analysis, virtues remain a driving force in our public discourse, yet the virtues to which we might appeal (whether tolerance, justice, cooperation, equity, etc.) exist in a merely fragmentary form, the remnants of an older tradition that once provided a coherence now lacking.

What Has Virtue to Do With AI?

Nowhere is this post-virtue condition more evident than in the field of AI. Those working in the burgeoning domain of “AI safety” are proceeding on the assumption that virtues can be neglected in favor of a reductionism in which AI ethics is simply one more programming problem.

We see this reductionism in OpenAI, the company responsible for ChatGPT. For example, in his AI safety lecture, Scott Aaronson from OpenAI suggested that programming goodness into AI is being approached as a problem of engineering. Aaronson shared how he was under pressure from Ilya Sutskever, cofounder and chief scientist at OpenAI, to find “the mathematical definition of goodness” and “the complexity-theoretic formalization of an AI loving humanity.” Here is perhaps the final culmination of Hume’s is-ought problem: what is real is what can ultimately be reduced to algorithmic mathematics.

How might the discussion of AI safety be reframed if we positioned the virtues as central to the debate? I suggest this would open up new sets of questions currently neglected in the industry, including perhaps:

  • What habits would engineers and managers need to cultivate to steer tech companies (and by extension, the industry) in a humane direction?
  • What might an AI company look like that valued engineers and managers with virtues like moderation, prudence, temperance, courage, etc.
  • What sort of character traits should be prized among those working to align AI with human values?
  • What types of habits of mind and body might be conducive to the cultivation of such traits over time?

For many, these types of questions may seem a category mistake. When companies like Microsoft and Google are hiring engineers, the last thing they are likely to ask is whether the candidate makes his or her bed every morning or practices prudence in relationships. In fact, what stockholders seek in corporate executives is not virtue but skills. As long as executives do not break the law or violate professional ethics, their level of personal virtue is seen as irrelevant. Similarly, the stereotype of the good programmer is often precisely someone who lacks human virtues, perhaps a person on the spectrum whose brain works in a single-minded, Spock-like fashion—in short, someone like Sam Altman.

As this neglect of virtue becomes systemic to an entire industry, the result is what we see today: largely amoral tech companies concerned primarily with making money, with ethics departments obsessed with laws, protocols, guardrails, codes, policies, etc., but with comparatively less attention on the virtues.

President Biden’s Executive Order on “Safe, Secure, and Trustworthy Artificial Intelligence” is a case in point. The sprawling 19,682 word document never mentions virtue, but treats AI safety and ethics (often grouped under the larger umbrella of “trustworthy AI”) as problems that can be adequately addressed through rules and guidelines alone. Of course, law cannot mandate personal virtue, but that is precisely the point: if we trust the law alone and neglect the larger conversation about virtue, we will be opening ourselves up to a false security.

Beyond Law: A Plea for Human Virtues

The current law-based approach echoes the ethics of Immanuel Kant (1724–1804), whose answer to Hume’s is-ought problem was to place ethics on the same footing as scientific laws. For the Prussian philosopher, ethics became a discipline dealing with impersonal laws that are universally descriptive—what he famously called the “categorical imperative.” By thus detaching morality from the personal, the human, and all consideration of circumstance, Kant imagined he had rescued ethics from receding into subjectivity. On such a scheme, ethics does not emerge out of the messy and imprecise realm of virtuous habits, let alone character traits developed over time (Kant actually downplayed the role of character); rather, ethics involve knowing the right ethical laws and then going out into the world to apply those laws irrespective of circumstance. As such, ethics becomes not a matter of situation-specific prudence, but something like the law of gravity, an exact science dealing with impersonal laws.

To any of the ancient wise men—Confucius, Plato, Aristotle, Jesus, the saints of the Christian tradition—the idea that ethics could be impersonal would have been nonsense. For them, ethics is about nothing if not character. Even the first five books of the Hebrew scriptures, which go a long way to itemizing laws to cover almost every conceivable circumstance, ultimately contextualize ethics in terms of character traits and wisdom that emerge through a well-disciplined life over time.

It is significant that contemporary AI safety works, not in the wake of ancient philosophers or the Bible, but in the shadow of Kant. For example, Isaac Asimov’s seminal (though largely hypothetical) work on robotics in the 1950s proceeded on the assumption that robotic safety could be achieved by programming machines to follow the right procedures. According to this framework, if something goes wrong in an automated system, whether in real life or in a simulation, then we simply need more comprehensive laws. Yet as the AI community is finding out from the notorious alignment problem, such an approach to AI safety may not be feasible within current, or even future, engineering frameworks.

The turn to human-specific virtues, though it strikes many as inefficient, may actually prove the only option in the era we are quickly entering. Yet our society seems largely unready for that discussion. Modern tech companies like Microsoft, Google, Facebook, and OpenAI, seem intent on continuing the legacy of the 19th century scientific management movement through the fetish with “efficiency,”—a word which, in practice, functions as code-speak for achieving all but total automation many degrees removed from human agency. Through AI-powered automation, it is hoped we can eliminate the variabilities and unpredictability of “human factors,” and thus achieve the technological analogue to Kant’s categorical imperative.

Yet we would do well to pause and consider the debt we owe to human factors, especially a type of virtuous intuition that cannot be quantified. Consider the 1983 near-disaster when the Soviets nearly launched the world into nuclear holocaust. Around midnight on September 26, a time of particular strain in the Cold War, unusual weather patterns caused the Soviet early-warning radar system near Moscow to incorrectly register five incoming American missiles. The computer responded by issuing a retaliatory order “Launch,” which would have triggered a full-scale nuclear war against the United States and her allies. Even after procedures were followed to rule out computer error (for example, by resetting the systems), the order to retaliate continued.

Stanislav Petrov, the lieutenant colonel on duty in the Soviet Air Defense Forces Dome that night, was cool-headed and thoughtful. Petrov drew on his past training, which had convinced him that any U.S. first strike would be massive, likely more than only five missiles. While the rest of the base was in panic, Petrov put his career on the line, deviated from orders, and declined to follow the instruction to launch an attack. That night Petrov exercised the virtues of patience, prudence, attentiveness, thoughtfulness, and courage, while avoiding vices like reactivity, impulsivity, rigidity, and fear. As a result of his virtue, a nuclear disaster was averted.

If this incident had happened not in 1983 but in 2023, the outcome would likely have been very different. Today widespread bias for automation has resulted in a shrunken space left to human virtue, including situation-specific prudence. One of the reasons that systems regulating everything from commerce to trading to military defense are increasingly handed over to AI is precisely to eliminate the messiness of human judgment, as if “human factors” are a bug and not a feature. Yet without human agency there can be no virtue, including the type of virtue Petrov exercised when he proved the triumph of the human over the machine.

Discussion of possible AI doomsday scenarios are often compared to the nuclear threat. Yet if we can learn anything from our near brush with nuclear holocaust, it is that rules, guidelines, policies or procedures will never save us: we need old-fashion human virtue.

Robin Phillips

Robin Phillips has a Master’s in Historical Theology from King’s College London and a Master’s in Library Science through the University of Oklahoma. He is the blog and media managing editor for the Fellowship of St. James and a regular contributor to Touchstone and Salvo. He has worked as a ghost-writer, in addition to writing for a variety of publications, including the Colson Center, World Magazine, and The Symbolic World. Phillips is the author of Gratitude in Life's Trenches (Ancient Faith, 2020), and Rediscovering the Goodness of Creation (Ancient Faith, 2023).