Former Meta News Head Campbell Brown founded Forum AI, spending 17 months systematically evaluating the information quality of mainstream AI models, discovering that Gemini has cited information from Chinese Communist Party official websites when handling reports unrelated to China, and nearly all tested models exhibit a left-leaning political bias.
(Background summary: AI destroys a 133-year-old Princeton tradition: when “cheating” gradually becomes common sense)
(Additional context: 97,895 underground forum conversations tell you: hacker communities actually also hate AI)

Table of Contents

Toggle

The problem no one is testing
Fluent errors are harder to detect than silence
Regulations forcing change, not moral self-awareness

Brown is a former journalist, who was a news anchor at CNN, later became Meta’s head of news, directly managing how Facebook presents news policies to its 3 billion users worldwide.

This position gave her a close-up view of the full picture of “how platforms shape information flow.” She left Meta 17 months ago and founded Forum AI in New York, focusing on one thing that foundation model companies generally skip: systematically assessing whether AI-generated information is accurate, fair, and multi-perspective.

The problem no one is testing

Forum AI’s core product is a “geopolitical event benchmark framework.”

It works by inviting top consultants with diverse political spectrums and professional backgrounds—Niall Ferguson, Fareed Zakaria, former U.S. Secretary of State Tony Blinken, former House Minority Leader Kevin McCarthy, former U.S. National Security Advisor Anne Neuberger—to rate responses from mainstream AI models on the same complex geopolitical event.

Currently, Forum AI has reached about 90% consensus with human experts, giving its evaluation results a defendable standard rather than just one person’s opinion.

Brown identified three levels of problems, each more difficult to fix from a technical perspective.

The first is a flaw in source selection logic. Gemini, when handling certain reports unrelated to China, cited content from Chinese Communist Party official websites. This isn’t just a factual error; it’s a filtering logic issue in source selection: the model only judges “this is text, this is a link,” not “what is this source’s stance, how credible is it, does it have a clear political purpose.”

The political nature of sources themselves is invisible in the AI output process.

The second is structural political bias. Brown tested nearly all mainstream models, which showed a tendency toward left-leaning political bias. This isn’t conspiracy theory; it’s a natural result of the training data distribution. AI learns from the texts it’s trained on, and tends to replicate their tone and ideological framework.

Mainstream content on the English internet—mainstream media reports, academic papers, social media posts—generally carries a certain political bias. Models trained on this data inherit that bias without awareness.

More troubling is that this bias isn’t a bug that can be fixed; it’s embedded in every output logic of the model.

The third is a lack of context and multiple perspectives. Brown states that existing models generally lack “background context, multiple viewpoints, and transparent reasoning.” The answers AI provides are statements, not structured explanations like “Party A sees this event as representing X, Party B as Y, with fundamental disagreements rooted in…”

It gives you an answer but doesn’t tell you from which perspective it was derived.

Fluent errors are harder to detect than silence

Brown points out a structural blind spot: foundation model companies prioritize mathematical, coding, and logical reasoning abilities when evaluating and ranking models; information accuracy and political diversity are almost never included in mainstream benchmark tests.

The reason is simple. Code has correctness, and tests can reveal errors. Math problems have standard answers, and accuracy can be measured. But “what constitutes an accurate and fair report of a geopolitical event”—who judges that? How many people with different viewpoints need to reach consensus? There’s no engineering solution to this.

In product development processes led by engineers, ranked by benchmarks, this is systematically skipped. The result is that information accuracy becomes an almost invisible metric in AI’s capability assessment system.

The cost of this omission can be seen in a concrete case. Last year, New York City conducted a compliance audit of AI hiring systems to check whether employer-used AI screening tools violated anti-discrimination employment laws. The audit found that over half of the cases did not detect violations.

The problem isn’t that the violation rate is low; it’s that this may indicate the AI tools used for auditing are themselves not accurate enough to detect issues, not that the issues don’t exist.

This is Brown’s core point: AI problems are not just about giving wrong facts, but about making people trust and accept false facts.
If a person knows they don’t know something, they still have a chance to look it up. But when AI confidently and fluently states a falsehood, most users have no reason to doubt it.

Fluent errors are harder to detect than silence, and even harder to correct.

Regulations forcing change, not moral self-awareness

Brown’s judgment is straightforward: what drives change isn’t moral pressure or public opinion, but the business risks posed by compliance issues.

Her argument is rooted in a pragmatic reality: under the current incentive structure of the AI industry, no one has a strong enough reason to proactively solve this problem until its costs become unavoidable.
In credit approval, insurance underwriting, recruitment screening—AI decisions in these scenarios are governed by existing laws.

Once AI outputs discriminatory or inaccurate results, companies using AI bear legal responsibility.
This pressure ultimately propagates upward to model providers, demanding they deliver auditable, verifiable, and accuracy-guaranteed outputs.
Not because they believe it’s morally right, but because their corporate clients start including such requirements in contracts.

Lerer Hippeau led a $3 million seed round for Forum AI last year.
This figure is small in the AI field, but it signals a judgment: “AI evaluation” is a business, and the demand for it may grow faster than what’s currently visible.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
1.8M Popularity
#
CLARITYActPassesSenateCommittee
3.5M Popularity
#
DailyPolymarketHotspot
954.11K Popularity
#
BitcoinVShapedReversalBack
178.98M Popularity
#
WCTCTradingKingPK
803.09K Popularity

Pinned

Sitemap

Former Meta News Director Investigation: Almost All AI Models Lean Politically Left, Gemini Cited Chinese Communist Official Media

The problem no one is testing

Fluent errors are harder to detect than silence

Regulations forcing change, not moral self-awareness

Trending Topics

GateSquareMayTradingShare

CLARITYActPassesSenateCommittee

DailyPolymarketHotspot

BitcoinVShapedReversalBack

WCTCTradingKingPK

Pinned