Beyond the Hype: How Structured Data Can Save AI Financial Analysis

XBRL International CEO John Turner on the need for digital analytics to be grounded in traceable, high quality, digital data.
AI is coming for financial analysis, and it’s coming fast. Will that mean actionable insights or a hot mess of guesswork and hallucinations – and how do we know? In the last couple of weeks we’ve seen two AI developments that seem highly relevant to digital reporting. We believe that structured digital data is a key ingredient for high-quality AI analysis, and the latest evidence to support that view is stacking up.
First, a new (pre-peer analysis, as seems to be the approach for AI) academic paper is challenging assumptions about AI’s readiness for financial analysis. The research team [1], from The Fin AI, Columbia University, the Georgia Institute of Technology and Gustavus Adolphus College, have created FinTagging, a benchmark for evaluating how well large language models (LLMs) can extract and structure financial information from corporate reports.
Their findings are sobering. While AI models are good at locating and classifying financial facts in documents, they struggle dramatically with the precise semantic understanding that tagging (and financial analysis) requires. When it comes to the crucial task of linking extracted numbers and text to correct concepts in the full US-GAAP taxonomy – the list of digital definitions used for US reporting – even the best models achieve only 17% accuracy. In other words, they are guessing incorrectly about the meaning of each fact at least 83% of the time. Of course, this pinpoint identification becomes trivial (merely a case of linking back to the precise digital facts being used) if AI is supplied with XBRL tags and taxonomies.
Second, companies like Perplexity AI are announcing ambitious plans to democratise financial data access via AI – without tapping into available digital structure. The FinTagging research appears to reveal fundamental gaps in current approaches.
I have to emphasise at the outset that both the research and the live LLM efforts in this area are entirely new. So, as with everything in the AI world, readers should approach these subjects with caution and work to test continuously.
Data access yes, actionable insights no
Perplexity AI’s recent announcement of SEC/EDGAR integration promises to make complex corporate filings by US companies instantly understandable through natural language queries. It’s an ambitious vision that could democratise financial analysis for millions of investors, and at XBRL International we wholeheartedly support the goal of leveraging AI for enhanced data access and financial transparency.
But there’s a fundamental problem lurking beneath the surface: as far as we can tell, the AI model appears to be consuming only the human-readable HTML portions of SEC filings, as well as third-party analysis – not the rich, structured machine-readable data that’s already embedded in every filing.
In our experiments, we’ve found that using the Enterprise Pro version of the Perplexity tool throws up a decidedly mixed set of results. Here are just a few examples:
- Asking the tool about Microsoft’s [MSFT] Current Assets for the quarter ended 31 March 2025 provides the correct results.
- On requesting a breakdown of current assets, the tool first says that the detailed balance sheet is not available in the press release or financial highlights for that quarter. Then, when prompted to check the SEC filing, it confidently returns figures from June 2024, claiming they are from the March 2025 10-Q report. After being told the numbers are incorrect, it corrects itself.
- When asked about “Net recognized gains/losses on investments,” it returns only the total figure for “Other Expenses,” stating that the item is included within it. However, it fails to identify the specific amount – even though it is disclosed separately in the report.
- When asked to provide a breakdown of the segment disclosures made by Microsoft, the tool will identify business segments, but not the geographic segments that appear directly beneath them.
- Ask about Amazon’s [AMZN] R&D expenditure (a recurring favourite!) and it will provide a third-party estimate, or a (probably) much larger number (“Technology and Content” and “Technology and Infrastructure”). Unhelpfully, it does so with confidence. Amazon (in)famously doesn’t disclose its R&D spend.
We don’t have the resources at XBRL International to do more than scratch the surface of the new tool from Perplexity. It is very clear, however, that it is not using the SEC’s published structured data and metadata. Instead it is primarily relying on the HTML on the page, as well as third-party assessments available online. As others have noticed, the resulting output is a long way from investor-grade information and analysis.
All of this represents a misstep – but the opportunity is there to be seized. The FinTagging research seems to provide both a roadmap for improvement and a testing framework to identify current limitations.
Reading like humans, not machines
When companies file with the SEC, they submit a single Inline XBRL document that is both human and machine-readable. Each financial figure and piece of narrative is tagged with a standardised concept from the US-GAAP taxonomy, or sometimes a company-specific definition known as an “extension.” The tags give each fact a precise meaning – making it clear, for example, which disclosures to compare across multiple reports – and they provide links to other concepts, creating a sophisticated set of semantic relationships. The SEC filings are exactly the kind of structured data that should make AI analysis more accurate and comprehensive.
Yet our tests seem to show that Perplexity, like most AI-powered financial tools to date, ignores this digital information. It appears to process only the narrative text and tables in HTML format, supplemented by third-party commentary from across the web. This is like having access to a precisely catalogued library but choosing to read only the book covers, skim a few volumes and mix in some random book reviews.
FinTagging: a reality check for AI financial claims
The FinTagging research provides a stark reality check for AI capabilities in financial data processing. The team’s approach uses existing XBRL reports to put LLMs through their paces. The AIs make their best guesses (without XBRL guidance) at what is in each report, and the known XBRL tags work as a kind of answer sheet to compare against. The researchers were particularly interested in LLMs’ potential in automated XBRL tagging, but their ability to accurately identify data is equally relevant to AI-assisted analysis.
The team tested ten state-of-the-art LLMs on two critical tasks: extracting financial facts from documents (FinNI) and linking those facts to correct XBRL taxonomy concepts (FinCL). The results were revealing:
- Strong(ish) extraction capabilities: Models like DeepSeek-V3 achieved up to 72% F1 scores at locating and extracting financial facts from both text and tables. GPT-4o was the second-placed performer at 60%, and smaller models performed poorly.
- Weak semantic understanding: The same high-performing models managed no more than 17% accuracy at linking those facts to correct XBRL concepts, selecting from the more than 10,000 concepts that make up the full US-GAAP taxonomy.
- Complete failure under traditional approaches: When tested as simple classification tasks across the full taxonomy, even top models scored 0%.
These findings suggest that while AI can read financial documents reasonably well – although no analyst, anywhere, is happy with 72% accuracy – it struggles profoundly with the precise semantic understanding that structured financial analysis requires.
The FinTagging paper’s methodology, comparing extracted facts against known XBRL data, could serve as a powerful testing framework for LLMs’ financial analysis capabilities. Such testing would likely reveal significant gaps – financial facts missed entirely, concepts conflated incorrectly, or nuanced accounting treatments oversimplified.
The way forward: AI Plus XBRL
Do we really need XBRL? Both the research and Perplexity’s initial outputs give us a resounding yes. On its own, AI is nowhere near capable of interpreting financial reports accurately.
Tools like Perplexity could dramatically improve their financial analysis by incorporating the machine-readable XBRL data that’s already freely available right now. The SEC provides complete Inline XBRL filings with every numerical fact precisely tagged, standardised taxonomies defining relationships between financial concepts, and historical data enabling trend analysis with semantic consistency. This means that it is possible to trace every number and every piece of text back to source: specific individual facts in the company’s official regulatory filing.
The technical pathway exists: see this article next. And yes, we are of course aware that there is more to do:
- LLMs would need comprehensive error-checking mechanisms that ensure that they are not using facts that have been incorrectly tagged; and
- Tools need to define their approaches to data normalisation to permit comprehensive comparisons across corporates.
We don’t, therefore, foresee the rapid demise of information providers, who (after all) have been working with XBRL data for more than a decade now, have well-honed normalisation processes, and access to the very same cutting-edge AI capabilities.
Nevertheless, these developments are important.
Beyond current limitations
By integrating structured XBRL data, LLMs can move beyond simple content analysis toward true financial intelligence. The FinTagging research shows this integration won’t be trivial—but it also suggests that structured approaches work incredibly well when properly implemented. Economies with authoritative structured data repositories (like the US SEC’s EDGAR system, Japan’s EDINET system, Korea’s DART, the NSE and BSE XBRL data repositories in India, the BMV data in Mexico… among others) provide the basic building blocks for AI-driven tools to offer up new levels of insight for investors, regulators and every other kind of user.
Financial analysis isn’t just another AI application—accuracy matters enormously for investment decisions. It doesn’t matter if Amazon’s AI incorrectly guesses that I would like a pair of white sneakers. It really does matter if an AI incorrectly guesses what AMZN’s R&D spend is. If AI tools are going to democratise access to financial data, they need to leverage the structured, machine-readable formats that ensure precision and completeness. The FinTagging research provides both the wake-up call and a roadmap.
For regulators and policy makers, we think that these developments underscore:
- The importance and urgency of having complete investment-grade repositories of digital reports: they are significant national assets that are coming into their own.
- The correlated importance of focussing on continuous data quality improvements across digital reports of all kinds.
AI is here to stay. As a decision-making copilot, it’s critical we give AI the structured information it needs to navigate with confidence as we speed towards (hopefully!) exciting new horizons ahead.
[1] Pre-press draft: FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information – Yan Wang, Yang Ren, Lingfei Qian, Xueqing Peng, Keyi Wang, Yi Han, Dongji Feng, Xiao-Yang Liu, Jimin Huang, Qianqian Xie available at https://arxiv.org/abs/2505.20650v1.