How well do AI models like GPT-4 understand XBRL Data?

Posted on January 18, 2024 by Revathy Ramanan

A couple of weeks ago, in the XBRL International newsletter, we commented unfavourably on the use of PDF versions of SEC filings as inputs to Large Language Models (LLMs). We are much more interested in how these tools can be leveraged to examine structured data, such as the Inline XBRL formatted structured filings provided to the SEC by every listed company in the United States. This blog provides the details of our initial findings, prompted by some interesting original research by Patronus AI – and, in short, it shows promise!

Helpfully, the authors of the research by Patronus AI research provided a set of example natural language queries, as well as their findings, which showed (to be expected) poor accuracy from a variety of LLMs that were fed a diet of unstructured plain PDF financial reports. By their nature, financial statements (even in the relatively constrained “rules based” world of US GAAP) are extremely variable and very, very complex. It seems to us that expecting investor grade analysis to arise from this kind of unstructured information is overly optimistic.

On the other hand, what can an LLM achieve when it is fed a healthier diet, of structured data that has been prepared by management, providing a single source of digital truth that all users can consume?

Artificial Intelligence(AI) large language models (LLMs) are increasingly being developed to comprehend and interpret queries made in natural language and to interact with the underlying datasets. An example of such a query could be, “Show me the average debt-to-equity ratio for the pharmaceutical industry in 2023”.

XBRL data is well structured and designed for machines to consume. The next logical question is whether the AI models understand XBRL data and can answer queries asked in natural language.

The past few weeks I have been experimenting in this area – and the result is that Chat GPT did pretty well! The LLM was fairly good at understanding an xBRL-JSON report and answering the natural language queries. The experiment also didn’t feed the LLM the taxonomy, which could be a simple way to train an AI to better understand XBRL data, and provide more accurate answers in future. While it’s not perfect, it’s a promising start.

This blog explains my process, sets out the initial, tentative findings. I’ve used Chat GPT 4 for this work (even XBRL International can afford it).

Importantly, I’ve used xBRL-JSON formatted reports. This is one of our newer formats, and conforms to XBRL International’s Open Information Model. It is designed for analytics and can be consumed directly into a range of off the shelf analytic tools, including Microsoft Office 365 business intelligence, and Tableau. The xBRL-JSON format is almost always where users should start when they are looking to analyse XBRL data. A range of tools lets you transform from other XBRL formats, such as Inline XBRL, into xBRL-JSON.

To explore this, we began by suppling GPT-4 with 10 publicly available Financial Statement XBRL reports (these are directly available from the US SEC’s EDGAR portal) and asking queries such as:

  • What was the profit growth rate?
  • Did the company pay a dividend?
  • What were the significant acquisitions?

The first step was to convert XBRL reports accessed from EDGAR to xBRL-JSON format.  For this study, the open source XBRL processor Arelle was used. Here is the documentation for command line usage for the xBRL-JSON report conversion in Arelle.

I then loaded the resulting xBRL-JSON files into a Chat GPT-4 session, using the upload feature provided by the LLM. First, I wanted to gauge whether the GPT-4 Data Analyst understands the xBRL-JSON data structure. Following is the snippet of GPT-4’s conversation, starting with my prompt and the response from GPT-4’s Data Analyst.

Analyse this XBRL-JSON structure

The XBRL-JSON file starts with a “documentInfo” section, which includes information about the document type (identified as “https://xbrl.org/2021/xbrl-json”) and various features and namespaces used within the document. The namespaces include URLs for different standards and identifiers like “amzn” (Amazon), “cik” (SEC’s Central Index Key), “country”, “dei” (Data Element Identifiers), “iso4217” (currency codes), “srt” (SEC Reporting Taxonomy), “us-gaap” (U.S. Generally Accepted Accounting Principles), and “xbrl” (eXtensible Business Reporting Language).To provide a thorough analysis, I will need to further examine the contents of the file, focusing on key sections such as financial data, company information, and any other relevant details. Let’s proceed with a more detailed examination.

The XBRL-JSON file contains two top-level keys: ‘documentInfo’ and ‘facts’.

1. documentInfo: We’ve already seen that this section includes metadata about the document, such as the document type, features, and various namespaces.

2. facts: This section likely contains the actual data reported in the XBRL document, such as financial statements, notes, and other relevant business information.

To provide a detailed analysis, I will delve into the ‘facts’ section to examine the types of data it contains and their structure. Let’s explore this section next.

The ‘facts’ section of the XBRL-JSON file contains detailed data entries, each identified by a unique key. These entries are structured with the following attributes.

1. Value: The actual reported value.
2. Dimensions: This includes several sub-attributes:


    : Refers to the specific accounting concept being reported (e.g., ‘dei:AmendmentFlag’, ‘dei:DocumentFiscalYearFocus’).


    : The entity to which the data pertains, often represented by a CIK (Central Index Key) number.


    : The time period for which the data is relevant.
    Additional dimensions may include language (for text fields) and other specific identifiers depending on the concept.

From the preview, we can see entries related to basic document information like whether the document is an amendment (‘dei:AmendmentFlag’), the fiscal year focus (‘dei:DocumentFiscalYearFocus’), the fiscal period focus (‘dei:DocumentFiscalPeriodFocus’), and the entity’s Central Index Key (‘dei:EntityCentralIndexKey’). There is also an entry for a specific accounting concept (‘us-gaap:OtherAssetsNoncurrent’).

To provide a more comprehensive analysis, we can explore specific types of data within this section, such as income statement, balance sheet items, or specific notes, depending on what is required. Please let me know what specific aspects or types of data you would like to analyze further.

Here, GPT-4 does a good job of understanding the xBRL-JSON structure without the need to provide additional instructions.

I started by asking a simple question – “What is the reported Profit?” for various companies. Here is a snippet for the profit question from ‘Alphabet Inc’ 2022 10K.

What is the reported Profit?

The reported Net Income (Profit) for each available period in the XBRL-JSON data is as follows:

1. Period: January 1, 2020, to January 1, 2021

    Net Income: $40,269,000,000

2. Period: January 1, 2021, to January 1, 2022

    Net Income: $76,033,000,000

3. Period: January 1, 2022, to January 1, 2023

    Net Income: $59,972,000,000

These values represent the company’s net income for the respective periods, considering only the entries with the basic dimensions, which typically include the fundamental details of the financial data.

GPT-4 was able to answer this question correctly, and the results matched the reported values in the 10K report.

The next step was to ask a question involving calculations: “What was the profit growth percentage for last year”? Following is the snippet for this question for ‘Alphabet Inc’ 2022 10K:

What was the profit growth percentage for last year?

The profit growth percentage for the last year (from 2021 to 2022) was approximately -21.12%. This indicates a decrease in profit compared to the previous year. ​

GPT-4 got it right!

At times, GPT-4 may not be able to provide an answer if it cannot guess the correct concept for the question asked. In such situations, one may need to guide it towards using the desired concepts. For instance, in the example below, GPT-4 was unable to identify cash flow concepts from ‘COCA-COLA CONSOLIDATED, INC.’ 2022 10K . However, with the assistance of an additional prompt, it was able to provide the information:

What were investing, and financing activities cash flows?

There are no recorded values in the JSON file for cash flows from operations, investing, and financing activities under the concepts
us-gaap:CashProvidedByUsedInInvestingActivities, and

This indicates that either the data for these specific cash flow activities isn’t included in the file, or it’s categorized under different concept names. If there are other aspects or specific terms you’d like me to search for, please let me know.

Search for concepts like NetCashProvidedByUsed.. for above query

For the year 2022, the cash flow values for operations, investing, and financing activities are as follows:

1. Operations: $554,506,000 (Net Cash Provided)
2. Investing: -$324,985,000 (Net Cash Used)
3. Financing: -$174,187,000 (Net Cash Used)

Among these activities, the **Operations** activity brought in the most cash flow in 2022. It had a positive cash flow, indicating that it generated cash, whereas the Investing and Financing activities had negative cash flows, indicating cash usage.

On the first try, GPT-4 attempts to guess the names of concepts based on the patterns it has identified in the report, which does not work for this query. When one gives a subtle hint, such as a concept name pattern, it is able to do the job.

The next query was to ask it to determine if it could calculate a few financial ratios. Here I asked it calculate Debt-Equity ratio for 2022. Let’s see how the conversation went for‘Alphabet Inc’ 2022 10K report.

What is the Debt-Equity ratio in 2022

The Debt-to-Equity ratio for the year 2022 is approximately 0.43. This ratio is calculated by dividing the company’s total liabilities (debt) by its total equity.

What were the facts used used in the calculation

The facts used for calculating the Debt-to-Equity ratio for the year 2022 are as follows:

1. Total Liabilities (Debt) for 2022:
– Value: $107,633,000,000
– Concept: This value was identified under a concept likely representing total liabilities or debt in the financial report.

2. Total Equity for 2022:
– Value: $251,635,000,000
– Concept: This value was identified under a concept likely representing total equity in the financial report.

These values were extracted from the JSON file based on the concepts typically used in financial reports to represent total liabilities (as an indicator of debt) and total equity. The Debt-to-Equity ratio was then calculated using these values.

GPT-4 correctly identified the concepts and used the right formula, but it misinterpreted the periods. The values it listed for 2022 actually belong to 2021. In xBRL-JSON, periods are represented with time components to ensure consistent representation and avoid different interpretations of the same date. The instant period 2022-12-31 is represented as 2023-01-01T00:00:00, which is equivalent to 24:00:00 on the previous day. My guess is GPT-4 interpreted “2022-01-01T00:00:00” as 2022.

I clarified the period interpretations. Here is the extract of conversation after the additional prompt.

2023-01-01T00:00:00 is equivalent to 24:00:00 on that previous day.

You are correct in pointing out that “2023-01-01T00:00:00” is equivalent to the very start of January 1st, 2023, which is effectively the same as the end of December 31st, 2022. In financial reporting, such timestamps are often used to indicate the exact cutoff point for the reporting period.

Now calculate debt equity ratio for 2022

The Debt-to-Equity ratio for the year ending on December 31, 2022, is approximately 0.426. This ratio is calculated by dividing the company’s total liabilities (debt) by its total equity.

List the facts used

The facts used for calculating the Debt-to-Equity ratio for the year ending on December 31, 2022, are as follows:

1. Total Liabilities (Debt) for 2022:
– Value: $109,120,000,000
– Concept: The value was identified under a concept representing total liabilities in the financial report for the period ending on 2023-01-01T00:00:00, which corresponds to the end of the 2022 fiscal year.

2. Total Equity for 2022:
– Value: $256,144,000,000
– Concept: The value was identified under a concept representing total equity in the financial report for the same period.

These values, extracted from the JSON file, were used to calculate the Debt-to-Equity ratio for the year ending on December 31, 2022.

It corrected its period interpretation and recomputed the Debt-Equity ratio based on the correct values for 2022 with the additional prompt.
Based on this learning, it correctly applied the period interpretation for the subsequent queries in that session.

Let’s level up. I went to ask questions from the detailed tagged notes such as “What was customer concentration?”, “What are the business segments?” or “What were the major acquisitions?”. Let’s look at one such question from ‘Amazon.com, Inc’ 2022 10K.

What is the segment revenue reported?

The revenue reported for each business segment in 2022 is as follows:

1. North America Segment (amzn:NorthAmericaSegmentMember): The revenue reported for this segment is \$315,880,000,000.
2. International Segment (amzn:InternationalSegmentMember): The revenue reported for this segment is \$118,007,000,000.
3. Amazon Web Services Segment (amzn:AmazonWebServicesSegmentMember): The revenue reported for this segment is \$80,096,000,000.

These figures provide a comprehensive view of the revenue distribution across the company’s major business segments, highlighting where the company is generating most of its income. The North America segment shows the highest revenue, followed by the International segment and then the Amazon Web Services segment.

Here GPT-4 gets the information right. One nuance to notice is that GPT-4 tries to make the segment name human-readable from the technical member name present in the report.
Let’s look at one more example from the same report querying the detailed tagged notes.

What were the major acquisitions?

One of the major acquisitions in 2022 was the purchase of MGM Holdings Inc. by Amazon. The acquisition was completed for a cash consideration of approximately $6.1 billion, net of cash acquired. Additionally, Amazon assumed $2.5 billion of debt, which was immediately repaid after the closing of the deal. The assets acquired primarily included $3.4 billion of video content and $4.9 billion of goodwill. In the same year, Amazon also acquired certain other companies for an aggregate purchase price of $141 million, net of cash acquired.

Here GPT-4 tried to summarise the narrative information tagged as text block for ‘Business Combination Disclosure; again it was impressive.

Here is a query about customer concentration from ‘COCA-COLA CONSOLIDATED, INC.’ 2022 10K .

What was customer concentration reported?

The document reports that the company faces concentration risks related to a few customers, which comprise a large portion of the company’s annual sales volume and net sales. The report includes a table that summarizes the percentage of the company’s total bottle/can sales volume to its largest customers, as well as the percentage of the company’s total net sales, which are included in the Nonalcoholic Beverages segment, that this volume represents. It is noted that no other customer represented more than 10% of the company’s total net sales for the years presented.

Look for detailed information on customer concentration

The detailed information on customer concentration for Coca-Cola Consolidated, Inc. is as follows:

1. For the period from January 1, 2021, to January 1, 2022, Walmart Stores Inc. accounted for 20% of the concentration risk percentage​​.
2. For the period from December 30, 2019, to January 1, 2021, Walmart Stores Inc. also featured in the customer concentration details​​.
3. During the same period, both Walmart Stores Inc. and The Kroger Company were considered for customer concentration risk, indicating significant sales volume or net sales attributed to these customers​​.
4. For the period from January 1, 2021, to January 1, 2022, Walmart Stores Inc. and The Kroger Company together represented a 33% concentration risk percentage​​.
5. Specifically for The Kroger Company, during this period, the concentration risk percentage was 9%​​.
6, From January 1, 2022, to January 1, 2023, The Kroger Company alone accounted for a 10% concentration risk percentage​​.
7. For the same period, the combined concentration risk percentage for Walmart Stores Inc. and The Kroger Company was 32%, as measured against the Coke bottle/can sales volume product member​​.
These figures indicate the significant reliance of Coca-Cola Consolidated, Inc. on major customers like Walmart Stores Inc. and The Kroger Company, highlighting potential risks associated with customer concentration.

This is a pretty good summary from the detailed noted tagged using XBRL dimensions.

GPT-4 seems to be good at understanding xBRL-JSON report and answering the natural language queries. Having said that there are some general points to be read in context:

  • The experiment was purely based on the XBRL report and did not refer to the taxonomy. This means it did not have access to the rich documentation about each concept and their interrelationships. Converting a taxonomy into a “triple set” is a very straightforward exercise and training a bespoke LLM with this information, including corporate extension taxonomies and the underlying data would be a promising area for experimentation.
  • The element names in the sample report resembled the semantic meaning, which helped this analysis. XBRL report which have element names as alpha-numeric codes cannot be analysed in the manner explained in this blog. Again, teaching the model about the taxonomy would overcome any limitations in this field, as it would understand the labels in use.
  • In a few instances, GPT-4 failed to upload XBRL-JSON reports, citing it had a lot of text data. This might be the current scaling constraint of GPT-4 users. There is a limit to what can be done with an open environment of this sort.
  • At times it requires to be directed to correct concept or dimensions. Remember, this is “untrained” so the results are still impressive.
  • If required data are a mix of dimensional and non-dimensional facts, it might need to be explicitly told to include only basic dimensions.
  • Sometimes GPT-4 generalises concept names. For example, “Goodwill” paid in a business acquisition” is re-termed as “acquisition value.”
  • Consistency and reliability are a concern; in some instances, GPT-4 did not understand the xBRL-JSON structure. In other cases, I was not able to reproduce the exact answer I got before, reducing reliability.

This was an interesting experiment to test if the GPT-4 model can understand XBRL data and answer fundamental questions, and it seems to be doing this pretty well. GPT-4 answered accurately, as the XBRL data was structured and consistent across reports. I do not consider myself an expert in writing optimised prompts; I am sure many others can do much better. This experiment demonstrates the potential. I’d encourage our readers to dive deeper into this topic – hopefully this blog provides some food for thought. We are interested to hear about your results!

Other Posts


Would you like
to learn more?

Join our Newsletter mailing list to
stay plugged in to the latest
information about XBRL around the world.

By clicking submit you agree to the XBRL International privacy policy which can be found at xbrl.org/privacy