Aretify - AI Output Verification

The Study

Over the past quarter, we collected and analyzed 500 AI-generated financial summaries, earnings reports, and market analyses produced by four major LLMs. Each document was independently verified by financial analysts against source data from SEC filings, Bloomberg terminals, and verified market data.

The results should concern anyone using AI for financial analysis.

Key Findings

Overall Inaccuracy Rate: 23%

Nearly one in four AI-generated financial documents contained at least one material inaccuracy — defined as an error that could influence an investment decision.

Types of Financial Hallucinations

We categorized the errors we found:

Fabricated metrics (35%): Invented revenue figures, P/E ratios, or growth rates that don't match any filings
Misattributed data (25%): Real financial data attributed to the wrong company or quarter
Calculation errors (20%): Incorrect arithmetic when computing margins, ratios, or year-over-year changes
Temporal confusion (15%): Mixing data from different reporting periods
Entity conflation (5%): Merging financial data from parent and subsidiary companies

The Most Dangerous Pattern: Plausible Numbers

The most concerning finding was that fabricated financial metrics were almost always plausible. When GPT-4 invented a revenue figure for a company, it typically fell within a realistic range — close enough to the real number that a casual reader wouldn't question it, but far enough to be materially misleading.

For example, one AI-generated summary reported a company's Q3 revenue as $4.2 billion when the actual figure was $3.8 billion — a 10.5% overstatement that could significantly affect valuation models.

Sector-by-Sector Analysis

Accuracy varied significantly by sector:

Technology: 19% error rate (most data available for training)
Healthcare/Biotech: 28% error rate (complex financial structures)
Energy: 25% error rate (commodity price confusion)
Financial Services: 31% error rate (highest error rate, complex reporting)

Case Study: The Phantom Acquisition

One particularly striking hallucination involved a complete fabrication of a corporate acquisition. The AI generated a detailed summary of "Company X's $2.1 billion acquisition of Company Y," including deal terms, regulatory approval timeline, and expected synergies. No such acquisition existed or had been announced.

This type of hallucination is especially dangerous in financial contexts because:

It could trigger trading activity based on false information
It might be interpreted as insider knowledge of an unannounced deal
It could expose users to regulatory scrutiny

Regulatory Implications

Financial hallucinations don't just carry business risk — they carry legal risk. The SEC has increasingly focused on AI-generated content in financial communications, and distributing materially inaccurate financial information — even if AI-generated — can trigger enforcement actions.

Recommendations

Never publish AI-generated financial data without human verification against source documents
Implement automated verification that checks figures against filing databases
Flag AI-generated content clearly in any financial communications
Use Aretify's financial verification module for real-time accuracy checking

Conclusion

The financial industry's rapid adoption of AI tools for analysis and reporting has outpaced the development of adequate verification systems. Until verification catches up with generation, every AI-produced financial number should be treated as unverified until proven otherwise.

The cost of getting this wrong isn't just financial — it's regulatory, reputational, and potentially legal.