Inline XBRL is a technology for embedding XBRL into human-readable documents, such as XHTML Web pages. XBRL International has published an open-source reference implementation for Inline XBRL, called the Inline XBRL Extractor. The implementation makes it easy to add support for Inline XBRL to existing products. This document is a guide to integrating the Inline XBRL Extractor into existing applications.
The Extractor is implemented in both XSLT 1.0 and 2.0. While it can invoked from the command-line, the most common expected use of this tool will be as an integrated feature of 3rd-party applications. For one example of such integration, see the XBRL Add-On for Firefox.
Part 1 is a step-by-step tutorial designed to get you up and running as quickly as possible, walking you through the following tasks:
The tutorial shows screenshots from a Mac but will work just as well on a Windows or Linux machine. The only requirement is that you have Java installed on your machine.
Part 2 discusses some of the Extractor's technical details, usage scenarios, and more specific ways of customizing it.
Finally, the Appendix at the end of this document includes detailed instructions on how to install an XSLT 2.0 processor.
Download the reference implementation files from here: https://sourceforge.net/projects/inlinexbrl/files/. Click the "Download" button as shown below:
Unzip the zip file and browse to the "processor" directory. For example, on Windows, you'd see these folder contents:
Main_xslt20.xsl is the file you'll be concerned with in this tutorial.
This tutorial comes with a sample Inline XBRL file (right-click this link to save the MassiveDynamic.html file to a directory of your choice on your machine). The sample contains a financial report for a fictitious company called "Massive Dynamic Inc." An excerpt is shown below:
Open the page in your browser and view the HTML source. Intermixed with the HTML, you will see Inline XBRL tags that look like this:
The Extractor is not a standalone utility. It requires an XSLT processor to do its magic. For the sake of simplicity, in this tutorial, you will be using a tool called Kernow for Saxon, which provides a GUI wrapper for the XSLT 2.0 processor, Saxon. Moreover, it can be launched and automatically installed from the browser (using Java Web Start). The only requirement is that you have Java 1.6 on your machine.
To launch Kernow, go to Kernow's Java Web Start page, and click the link at the top of the page, as shown below:
Confirm that you are okay with running the application by clicking "Run" (on Windows) or "Allow" (on a Mac) when your computer prompts you.
You may then (on a Mac) have the opportunity to save the application to a local shortcut. (This step is optional and may be skipped by clicking the "Cancel" button.)
If you have any problems running Kernow for Saxon, you may need to clear your Java network cache. For details on how to do that, read the instructions on Kernow's Java Web Start page. (The Mac equivalent of the "Java Control Panel" is "Java Preferences," located in the /Applications/Utilities folder.)
Once Kernow is up and running, you will see a number of tabs at the top of the window. The "Single File" tab is the only one that you'll be concerned with. Select it as shown below:
Click the first "..." button (to the right of the "XML File" field), and browse on your machine to where you saved the sample Inline XBRL file, MassiveDynamic.html.
Similarly, click the second "..." button (to the right of the "Stylesheet" field), and browse on your machine to where you unzipped the Inline XBRL Extractor. Select the file named Main_xslt20.xsl in the "processor" directory.
Finally, check the "Send output to file" checkbox and enter the path to an output file. This should be a new file name. (WARNING: If the file already exists, it will be overwritten.)
Once you've entered all three file names, the form should look something like this:
Note the three file names. The first two must be named as such (MassiveDynamic.html and Main_xslt2.0.xsl). The third one (result.xbrl) can be named whatever you want.
The final step is to click the "Run" button. This causes Saxon to apply the stylesheet, Main_xslt20.xsl, to the source document, MassiveDynamic.html, producing the result document, result.xbrl. It may take in excess of 10 seconds, as the sample input file is fairly large.
Once the processor is complete, it will display a message such as "Done in 12 seconds 11 ms". If you view the contents of the result.xbrl file, you will see a complete XBRL instance document. For example, here's Firefox's "View Source" display of result.xbrl:
That concludes the tutorial. For more technical details and for other ways of invoking the processor, please continue reading.
Since Kernow includes its own API, you conceivably could use Kernow in your application. However, unless you need its value-added features (such as batch XSLT processing in the same VM instance), you are more likely to use an XSLT processor directly, either from the command-line or via its own API. Since the Inline XBRL Extractor works in XSLT 1.0 processors as well, this enables you to integrate it into any application, regardless of your XSLT processor constraints. This section discusses everything else you'll need to know to use the Inline XBRL Extractor to the full extent of its power.
The Inline XBRL processor takes a primary input file (e.g., "input.xhtml" or "collection.xml") and produces a primary output file (e.g., "output.xbrl"). It may also process secondary input documents and produce secondary output documents, as described below.
The primary input document must be in one of the following two formats:
<collection> <doc href="dir/part1.xhtml"/> <doc href="dir/part2.xhtml"/> <doc href="dir/part3.xhtml"/> <doc href="dir/part4.xhtml"/> ... <collection>(where each href value is the path or URI of a file in the collection).
The primary output document is the XBRL 2.1 default target document. Secondary target documents (identified by the target attribute on Inline XBRL facts) are additionally created in the same directory as the base output URI, using the following file naming convention:
If there is no default target document (because all Inline XBRL facts explicitly specify a named target), then the primary output document will be empty.
The Inline XBRL processor does not itself perform structural schema validation. You're encouraged to validate your input using the W3C XML schemas provided with the Inline XBRL specification.
For validation constraints that go beyond the structural constraints supported by W3C XML Schemas, you can validate input using the Schematron-based validation code found in the "validator" directory of the Extractor distribution (see validator/inlineXBRL.sch). Also, the Main_xslt20.xsl wrapper stylesheet incorporates validation into its processing (including termination upon finding a validation error), effectively making it a validating Inline XBRL processor. Note, however, that if you use any of the other mechanisms described below, then the processor will not perform input validation. In that case, you can always validate in a separate process beforehand, as described below in "Running the validator as a standlone tool".
For information on how to disable termination upon invalid input, see the "Disabling termination on invalid input" section below.
If you're running an XSLT 2.0 processor (such as Saxon), you can invoke a single top-level script, named Main_xslt20.xsl, as follows:
Transform -s:input.xml -xsl:Main_xslt20.xsl -o:output/mainResult.xbrl
As described above, mainResult.xbrl will consist of the default target document (or empty if all targets are explicit). Any secondary output documents (with explicit target names) will appear in the same directory as the primary output document, provided that you set the "base output URI" when you invoke the XSLT processor. For Saxon, the above command-line example effectively uses the -o option to set the base output URI. Thus, all secondary output documents will appear in the same directory (named "output" in this case).
An advantage of using this script is that it automatically invokes the Schematron-based input validation (imported from the Extractor's "validator" directory). See "Validating vs. non-validating processing" above.
Another advantage is that relative URIs in HTML content that are to be escaped in the output (as dictated by escape="true" on <ix:nonNumeric>) are resolved even when the HTML <base> element is not included in the input. That's because XSLT 2.0 includes the base-uri() function for accessing the base URI of the document, a feature not present in XSLT 1.0.
If you're running an XSLT 1.0 processor (such as libxslt), you can invoke a series of transformations as follows:
For example, using libxslt, your script might look like this:
xsltproc prepare-input.xsl input.xml >stage1output.xml xsltproc extractXBRL.xsl stage1output.xml >stage2output.xml xsltproc split-output-documents.xsl stage2output.xml >mainResult.xbrl
NOTE: The third stylesheet, split-output-documents.xsl, depends on an EXSLT extension for producing multiple output documents. If your processor, e.g., MSXML, does not support EXSLT, then you'll need to provide another way to split the result of the stage 2 process into separate files.
The resolution of relative URI references in escaped HTML content (using the @escape attribute) depends on the presence of the XHTML <base> element in the input document. For URI resolution to work, you must include the <base> element in the XHTML input. This is not an issue for the 2.0 implementation (Main_xslt20.xsl).
If you're running an XSLT 1.0 processor that supports EXSLT (such as libxslt), you can also invoke a single top-level script, named Main_exslt.xsl, as follows:
xsltproc Main_exslt.xsl input.xml >output.xml
This should behave exactly the same as Main_xslt20.xsl, except that the input is not validated.
There are two top-level parameters that can be used to customize aspects of the Extractor's behavior:
The Main_xslt20.xsl stylesheet has an optional parameter called disable-termination-on-invalid-input. If you set this to the string "true", then it will charge ahead and create an output regardless of whether the input has validation errors. In this case, when there is a validation error, instead of terminating with an error message, the processor will print a warning to the console.
For example, here's the default behavior (when you do not set this option):
$ Transform -s:invalidInput.html -xsl:Main_xslt20.xsl -o:output/result.xbrl ERROR: Input is invalid. Please look for "failed-assert" in validation-results.xml. Processing terminated by xsl:message at line 72 in Main_xslt20.xsl
And here's the behavior when you do set the option:
$ Transform -s:invalidInput.html -xsl:Main_xslt20.xsl -o:output/result.xbrl disable-termination-on-invalid-input=true WARNING: Input is invalid. Please look for "failed-assert" in validation-results.xml.
In the second case, processing does not terminate, because the disable-termination-on-invalid-input parameter was set to "true".
Setting this parameter to boolean true will prevent namespace declarations from ever appearing in escaped output. This is only applicable when your Inline XBRL uses escape="true" on an <:ix:nonNumeric> element. Setting it has the effect of removing the XHTML namespace declarations (and any other namespace declaration) from the resulting escaped HTML markup.
Typically, validation will be encompassed in the Extractor behavior (specifically, when you use the Main_xslt20.xsl stylesheet). However, there may be cases where you wish to perform validation independent of extraction. To ensure that a given document is a valid Inline XBRL document, first validate it against the W3C XML schema provided with the Inline XBRL specification. Then validate it against the Schematron schema provided in the "validator" directory (inlineXBRL.sch). For your convenience (and for integration with the Extractor), the Schematron rules are already pre-compiled into an XSLT 2.0 stylesheet (validator.xsl).
The validation script requires an XSLT 2.0 processor, such as Saxon. To run the validator, apply validator.xsl to your input document. The output is an XML-based report of all the constraints that were checked, as well as information about which constraints were violated, if any.
Here's an example of how to apply validator.xsl to your input document:
Transform YourInlineXBRL.xhtml validator.xsl >validation-results.xml
Query the resulting file (validation-results.xml in the above example) to see if there are any validation errors and what they were. Validation errors are signified by the presence of an <svrl:failed-assert> element.
The validating Extractor (Main_xslt20.xsl) uses this exact same mechanism for deciding whether to terminate upon validating the input.
This section contains instructions for installing the open-source XSLT processor, Saxon-HE. You can choose whether to use the Java or .NET versions. Either will work.
If you are running on Windows and can use the .NET platform, it may be slightly more convenient to choose the .NET version, as it comes pre-packaged with an executable called Transform.exe, which you can invoke directly, provided it remains in the same directory as the DLL files downloaded with it.
The .NET version requires the .NET platform version 2.0.
The Java version requires JDK 1.5 (i.e., the Java 2 Platform, Standard Edition 5.0) or later (it also runs under JDK 1.6).
Here's how I did it on my Mac, from a Terminal prompt. This should work just as well on Linux or Cygwin:
#!/bin/bash java -jar /Applications/saxon/saxon9he.jar "$@"
Test the installation at a command prompt.
You should see a long list of command-line options for Saxon, as shown below:
If you need more help installing Saxon, see the "Getting Started" section on the Saxonica website. For information about embedding Saxon into your application, see "Invoking XSLT from an application".