The Science of Personalization: Can We Narrow the Gap Between Bot and Human

by William Jones / ⠀AI / January 29, 2026

When users study how machines respond to human questions, they often find something both ordinary and unsettling. The answers are competent, even articulate, but they lack something that mattered—the flicker of personality that makes a reply feel real. A recent paper from Posterum Software seeks to measure this difference and apply it to make AI more human.

Posterum AI’s search engine, developed in conjunction with its Human-AI Variance Index, promises a conversational experience that feels more familiar and intuitive. Users create detailed private profiles that capture traits such as occupation, values, beliefs, and other demographic data, and Posterum AI channels these profiles into a human-like search process. Instead of drawing from a universal data pool, the system tailors its responses to each individual, producing answers that try to understand the context and nuance.

Testing Whether Machines Think Like Humans

The Human–AI Variance Score, or HAVS, was born from a simple experiment: What if machines were asked to answer national survey questions as if they were people with distinct lives? Posterum’s research team created sixteen human profiles representing different ages, incomes, religions, and political affiliations. These profiles were entered into four large language models—ChatGPT, Claude, Gemini, and DeepSeek—through the Posterum AI app, available on Google Play.

Each model was instructed to answer questions drawn from respected public surveys such as Gallup, Pew Research, and YouGov. Posterum’s research examined how AI models respond across five major areas: Economics, Life, Morality, Science, and Politics. They compared the models’ responses to actual responses from survey participants. To measure how close the AI came to sounding human, the team used a version of the root-mean-square formula. The closer the match, the higher the HAVS score. The results stood out. ChatGPT and Claude came closest to human-like responses, scoring above 94. Gemini and DeepSeek followed, with scores in the low 90s.

On moral and political questions, where values and empathy matter more than facts, the models did best. But in Economics, the gap was widest. The AI tended to fall back on textbook logic, assuming people act rationally. Real humans, though, bring emotion, experience, and bias into their choices. The models got the facts right, but something essential was missing: the human element.

Profiles and Perspective

One of the most surprising results came when the team asked the models to simulate political or racial perspectives. When given Republican or Democrat identities, the answers shifted, echoing the divides seen in real-world polling. The study found that the way a model is built has a greater effect than the profile it simulates. For example, when race was factored in, the differences between models were greater than those between real demographic groups. When Posterum gave the models clear profiles, political bias softened. Systems that once leaned one way started to answer more evenly.

This could shape the future of search tools. Instead of hiding bias with neutral answers, systems might achieve fairness by being upfront about perspective. Posterum AI already does this by letting users share their traits and values. The data stays on the user’s device, so personalization is honest and private.

The Numbers Behind the Human Signal

The HAVS study analyzed over 1,000 comparisons between human and machine answers. On political questions, the models scored between 95 and 97, almost matching human judgment. They did almost as well on science and morality. Economics was the weak spot, with scores between 86 and 89. Claude and ChatGPT led the pack, while Gemini and DeepSeek were more uneven. DeepSeek, in particular, stood out. Though it ranked third overall, it scored lowest in several areas. The researchers think this might be because its training data is less focused on U.S. social and political content.

The implications go beyond rankings. By tracking HAVS scores over time, Posterum hopes to see how models get closer to human thinking. They see the index as a potential standard for measuring artificial reasoning, much like the Turing Test once measured intelligence.

From Measurement to Meaning

Posterum is already using these findings in its search platform. The same method that compares models to humans now helps Posterum AI interpret a user’s question. When someone asks about policy, morality, or science, the system draws on those tested reasoning patterns and adapts them to the user’s profile. The result feels less like a search and more like a conversation.

There’s a quiet irony here. By making machines more closely resemble people, Posterum learned more about human inconsistency than about machine precision. The HAVS data showed that people tend to agree on morality and science, but split widely on economics. The machines mirror that confusion with surprising accuracy.

Posterum plans to keep refining its approach, expanding the dataset, and adding more demographic profiles. The next step is to use the index not just as a mirror, but as a tool to improve the models themselves. As the company prepares for its full public release, the goal has shifted. It’s no longer just about building smarter systems. It’s about measuring understanding. And if Posterum’s research is right, that understanding might not belong to humans alone.