Study Finds How AI Can Identify Anonymous Internet Users

There was a long-held assumption that the deanonymization of anonymous internet users, although theoretically possible, would be too difficult, time-consuming, and expensive. However, artificial intelligence, particularly large language models, has made the process more efficient.

Large Language Models Can Re-Identify Individuals Through Cross-Referencing

Researchers have demonstrated that modern artificial intelligence systems can identify anonymous internet users by analyzing patterns hidden in ordinary online conversations. Large language models can reconstruct identity profiles and match pseudonymous accounts with real-world professional profiles.

Background and Findings

A team of AI engineers and researchers headed by Florian Tramèr had a hunch that large language models have become powerful enough to deanonymize unnamed internet users. Hence, to test this inkling, they designed an automated framework that imitates the reasoning process used by human investigators when attempting to identify anonymous online authors.

Specifically, instead of depending on direct identifiers like names or profile links, the system analyzes unstructured text from posts, comments, and discussions across online platforms. The experiment involved nearly 1000 LinkedIn profiles paired with accounts on the tech discussion forum Hacker News. Note that all direct identifying information was intentionally removed.

The framework operates via a multi-stage methodology that analyzes textual information, constructs identity profiles, and compares those profiles against large public datasets. The goal was to determine whether AI could reconstruct identities even after obvious identifiers such as names, usernames, and profile links were removed from datasets. Below are the findings:

• Data Extraction

User posts were scanned across discussion platforms and identified microdata like education references, career descriptions, personal experiences, geographic indicators, and topic expertise embedded within ordinary online conversations.

• Semantic Profiling

The system converted textual clues from unstructured writing into structured profiles using semantic analysis. The model generated detailed descriptions of probable educational and professional backgrounds, interests, and technical knowledge levels.

• Embedding Generation

Extracted information was transformed into numerical representations known as semantic embeddings. These allow AI to compare anonymous user profiles with millions of publicly available profiles across platforms and other online databases.

• Candidate Matching

Embedding from anonymous accounts was compared with profiles across the internet. Ranked lists of possible identity matches based on similarity scores calculated from language patterns, interests, and professional indicators were generated.

• Evidence Reasoning

Candidate matches were gauged using reasoning-based comparisons. The system analyzed whether technical knowledge, discussion topics, career hints, and linguistic characteristics align between anonymous posts and potential real-world profiles.

• Confidence Scoring

Probability-based confidence scores were then assigned to potential matches. The system reported identifications only when evidence met predetermined reliability thresholds to reduce the likelihood of incorrect conclusions or random speculation.

Additional Conclusions

The researcher further evaluated system performance using accuracy and precision measurements commonly used in evaluating machine learning systems and models. Their evaluation showcased about 67 percent accuracy in identifying correct individuals, while precision reached roughly 90 percent when the system produced confident identity matches.

Note that the economic feasibility of large-scale internet deanonymization using large language models was also assessed. The researchers estimated that identifying a single user required computational costs between 1.00 and 4.00 U.S. dollars. This suggests that large organizations could potentially analyze thousands or millions of accounts efficiently.

Additional testing explored whether artificial intelligence could draw connections among multiple pseudonymous accounts belonging to the same individual across different communities or time periods. Results indicated that the system could detect consistent writing patterns and topic interests that persist even when users attempt to separate identities.

The findings suggest that persistent usernames and pseudonymous online identities may provide far less anonymity and privacy than previously assumed. Moreover, as analysis using AI models becomes increasingly powerful and more accessible, the research indicates that every post contributes incremental clues capable of revealing real-world identities.

FURTHER READING AND REFERENCE

  • Lermen, S., Paleka, D., Swanson, J., Aerni, M., Carlini, N., and Tramèr, F. 2026. “Large-Scale Online Deanonymization with LLMs.” arXiv. DOI: 48550/ARXIV.2602.16800