Do ChatGPT and Co. understand humor?

Language models such as ChatGPT know millions of jokes. But do they understand why they are funny, or do they just pretend they do? Malte Elson deals with the psychology of digitalization and knows the answer.

Text: Malte Elson 2025/12/02

Humor often arises from the unsaid and thus becomes a test for language models. © iStock

“You can only pull jokes out of your sleeve if you’ve put them there beforehand,” Dutch showmaster Rudi Carrell once said. Those who, like Carrell himself during his lifetime, seem to tell jokes effortlessly, have an internal archive of countless turns of phrase and puns. Any particular moment is drawn from this archive, and the spot-on performance is painstakingly rehearsed.

The sleeves of today’s large language models such as ChatGPT contain more snippets of humor than all the wardrobes of human comedians put together: Millions of punchlines and corny jokes as well as countless variations of classic joke structures, from slapstick to anti-jokes. So, if Carrell’s quip is true, why does language model comedy sometimes seem to have slipped out of an undershirt rather than been pulled out of a sleeve?

Why are jokes funny?

Taking into account the enormous individual differences in what people laugh at (and what they don’t), there are three classic (and rival) humor theories in psychology about why people laugh:

The incongruence theory sees humor as the result of the collision of expectations and reality: unexpected contradictions or twists. A famous example is the joke of British comedian Bob Monkhouse: “When I die, I want to go peacefully like my grandfather did–in his sleep. Not yelling and screaming like the passengers in his car.”

Magazine uniFOKUS

"Funny, isn't it?"

This article first appeared in uniFOKUS, the University of Bern print magazine. Four times a year, uniFOKUS focuses on one specialist area from different points of view. Current focus topic: humor.

Relief theory considers humor as a valve for built-up tension or nervousness. In a recent Commission meeting, a colleague said: “As long as ChatGPT has trouble making coffee, my job here is safe.” This form of gallows humor can be observed wherever people are exposed to high levels of stress and it helps them to gain emotional distance, but also to foster camaraderie among those suffering.

According to the theory of superiority, there is a victim in every joke. We laugh when, at least for a moment, we feel superior to others – the clumsy, the naive, the embarrassed or our former self.

Humorous communication often consists of cognitive, social and affective components that fit all three theories.

As a large language model, I can’t understand humour

Language models can play with expectations, break taboos or provoke with clichés. So do they use these theories of humor to generate particularly funny output for humans? No way! Or to put it more accurately: Only insofar as these theories are encoded in linguistic traces of humor. Language models do not apply explicit models of incongruence, superiority or relief in the psychological sense, but rather reconstruct empirical regularities.

Instead of humor theories, language models use statistical shortcuts. They do not optimize for humor by insight, but, where sufficient data is available, by probabilities: If enough people find a certain linguistic pattern funny, the model learns to reproduce these semantic, syntactic and pragmatic structures. Of course, people also learn through pattern recognition what is funny without necessarily having to understand why (e.g. with regular formats such as memes).

On the one hand, this leads to a paradoxical insight into language models: They implement the psychological theories of humor better than they understand them. Actually, language models don’t understand anything about what they produce during their long day. On the other hand, this also leads to a truth about us: When we laugh at a model-generated joke, it’s not because the model understands humor, but because we want to recognize our own humor in the joke.

From parameter to punchline

Humor is a test for language models: Sometimes it arises not only from what is said, but also from what is meant (which in turn can be exactly what is not said). Timing is also crucial – pauses, eye contact, precise coordination between expectation and resolution, but also tone of voice and intonation. Humor is also strongly influenced by (sub)culture and makes use of implicit norms or references. All of this creates meaning that is not always encoded in pure text data. The fact that humor can also be difficult for people to recognize becomes clear at the latest when only your own laughter breaks the awkward silence after a failed joke.

“Humor is a test for language models, because it arises not only from what is said, but often also from what is not said.”

Malte Elson

Research into the humor of artificial intelligence distinguishes between three complementary abilities: Recognition, understanding and generation.

Humor recognition is first and foremost a classification problem: For example, a system must use lexical or semantic features to determine whether a sentence, a picture or a scene is comical. Especially for clearly defined domains and highly rule-based humor (e.g. “Knock, knock”jokes), recognition performance is already relatively good. In addition to implicit, culturally encoded peculiarities and emotional ambiguity, target group references (for whom and about whom a joke is made) are the main causes of misclassification.

Overall, humor recognition still relies heavily on curated datasets. The reason lies less in the technical performance of the models than in the nature of humor itself: Humor is a matter of negotiation. Whether a statement is funny or not depends on the cultural context, social relationship, intention and situational appropriateness – none of this can easily be “learned” from raw text data.

Humor as the ultimate discipline

Recognizing what is funny is different from being able to explain why something is funny (similar to the way people feel about films by Yorgos Lanthimos). Understanding humor includes semantics (such as why an expectation arises only to be broken), assessing social roles and socio-cultural contexts (when and for whom a joke is funny or appropriate), and understanding ironic phrases and exaggerations (when is the opposite of what is said meant).

Subscribe to the uniAKTUELL newsletter

Discover stories about the research at the University of Bern and the people behind it.

Modern models usually have a plausible explanation for simple puns. However, as soon as world knowledge or ambiguity comes into play, they regularly fail or provide the helpful explanation that something is funny because it is funny. Understanding sarcasm and irony is particularly difficult when knowledge of tone, facial expressions or the situational context is lacking. Nevertheless, some models now show surprising abilities in terms of contextual understanding and can often react correctly when users joke (“LOL”).

Generating humor is considered the supreme discipline. For a long time, algorithms were limited to joke templates or grammar rules (for example, the pun generator JAPE). Language models began to change the picture. In studies in which people rate the quality of jokes, language models are roughly on a par with human authors. Witscript, a model by gag author Joe Toplyn specially trained in classic joke structures, generates high-quality “one-liners” (“What is a good birthday present for someone who loves knitting? A sheep.”). But AI jokes are often still cliché or predictable. However, when paired with human curators, they can also master more challenging humor tasks (see the highly recommended book “Meine Witze sind alle nur gecloud,” a co-production by the satirist Cornelius W. M. Oettle and the character Quippy conceived with ChatGPT).

Humor is when you laugh despite the language model

Artificial intelligence has made the leap from the simple recognition of humorous patterns to creative production and situational application. But will it ever understand why people laugh at its output? In my view, this question is not relevant. Either way, artificial intelligence will undoubtedly play a greater role in the coming years in terms of how humor is created, disseminated and evaluated. Language models are already writing jokes on demand, drafting sketch ideas and helping to test punchlines. More than ever, comedian Danny Kaye’s warning will apply in the future: “It’s dangerous to laugh at a joke. You get to hear it again and again.”

About the person

Malte Elson

has been Professor of the Psychology of Digitalisation since 2023. His special areas are quality management in academia, in particular the identification and avoidance of errors, as well as the privacy and security of (research) data. Depending on who you ask, his research is a joke.

What makes you laugh personally?

“1. My wife, 2. Fart cushions, preferably both together.”