xBRL-CSV design 1.0

Working Group Note 4 August 2021

This version
https://www.xbrl.org/WGN/xbrl-csv-design/WGN-2021-08-04/xbrl-csv-design-2021-08-04.html
Editor
Paul Warren, XBRL International Inc. <pdw@xbrl.org>
Contributor
Mark Goodhand, CoreFiling <mrg@corefiling.com>

Table of Contents

1 Overview

This document serves to document the motivation behind certain design decisions made in xBRL-CSV. As the metadata format for xBRL-CSV is closely related to the xBRL-JSON syntax, this document should be read in conjunction with the xBRL-JSON design document.

2 Identifier constraints on column headings

xBRL-CSV constrains column headers to be "identifiers", which are strings that cannot contain whitespace, amongst other constraints. This means that column headers must be defined using a string such as country_of_incorporation rather than a more human-readable "Country of incorporation".

As the column names are used to tie columns to their associated metadata, it was felt that restricted identifiers were more appropriate, particularly as human-readability of the CSV files is not a design priority.

3 Support for CSV dialects

There are many different dialects of CSV file in common use, varying in details such as escape characters and line endings. Consideration was given to allowing the metadata file to specify exactly which dialect of CSV is being used in a report.

It is expected that existing CSV files will not be used directly as part of an xBRL-CSV report; at least some degree of value transformation will typically be required. As such, requiring transformation to a required CSV dialect should not be a significant burden. It was decided to adopt a single, permissive CSV format that accommodates all of the most commonly used dialects, rather than the non-trivial additional complexity of making the dialect configurable.

4 Abbreviated period syntax formats

Earlier drafts of the xBRL-CSV required all periods to be denoted using the full ISO8601 datetime format, minimising the burden on the implementers of consuming applications. This is consistent with the approach taken in xBRL-JSON, and removes any ambiguity about whether a date refers to the start or end of the day.

Many common applications of XBRL operate at a "whole day" level of granularity, and periods are naturally described using inclusive dates. For example, a year is typically described as running from "1st January to 31st December" (and includes both of those dates). Expressed as an ISO8601 duration, this would be:

2019-01-01T00:00:00/2020-01-01T00:00:00

This is both verbose, and unintuitive as the end date is expressed as the start of 1st January.

As compactness of representation is a design priority for xBRL-CSV, it was felt that a more concise format should be supported, but in an attempt to avoid a repetition of the confusion caused by XBRL v2.1's handling of dates and date times, it was decided to use a different notation for the abbreviated syntax with .. as the date separator. For example:

2019-01-01..2019-12-31

The .. separator is explicitly defined as an inclusive date range, and cannot be used with datetimes. Similarly, the / separator cannot be used with dates.

Having taken the step to move away from a single notation for durations, in the name of conciseness and convenience, it was felt reasonable to add further abbreviated notations for common calendar periods, e.g.:

5 Transformation of values

Aside from the short hand period notations described above, xBRL-CSV does not provide any mechanism for transforming the values that appear in a CSV report.

This leads to somewhat verbose representations for some values, for example, currencies need to be specified as iso4217:USD rather than just USD. Similarly, entity identifiers must include a prefix to indicate the scheme, for example, cik:12345678.

The two examples cited above could be addressed by permitting the definition of a default namespace for a column, but this would break the simplifying assumptions used in both xBRL-JSON and xBRL-CSV of a 1:1 mapping between prefixes and namespaces that applies throughout a report. This prevents simple lexical comparison of values, and requires processors to maintain scoped namespace mappings.

An alternative approach would be a more general value transformation mechanism. This was discussed, but deemed out-of-scope for 1.0. Supporting pre-existing CSV formats was explicitly ruled out-of-scope in the xBRL-CSV requirements, meaning that some level of transformation of inputs is acceptable.

Where the possible values for a dimension column are readily enumerable, it is possible to simplify the input format using property groups.

Transformation of input values may be added in a future version of the xBRL-CSV specification.

6 Extensibility of metadata files

A primary use case for xBRL-CSV is the reporting of large volumes of data within a regulatory reporting environment. Such reporting environments are typically closed reporting systems, where the data to be reported is entirely prescribed by the data collector. In such a system, the ability of preparers to define the layout of the reported tables is neither required nor desirable.

On the other hand, where entities are publishing data in xBRL-CSV, it is desirable for such reports to be self-describing, by including metadata that documents the meaning of the CSV data tables.

xBRL-CSV's use of extensible metadata files caters for a number of scenarios:

7 Multiple inheritance of metadata files

It is anticipated that the designers of xBRL-CSV reports will wish to modularise metadata definitions, using separate metadata files for different reporting templates. xBRL-CSV supports inheritance from multiple metadata files, allowing the overall metadata definition to be assembled from such modular metadata definitions.

The specification allows the same properties to be inherited from multiple sources provided they have the same values. This is to allow common information, such as namespace declarations, to be inherited by all modular metadata definitions. Providing such modular definitions do not introduce namespace definitions that bind different values to the same prefix, they can be combined without conflict.

8 Single parameter CSV file

The specification allows report parameters to be provided in a CSV file. The group discussed the possibility of permitting multiple such CSV files. The motivation for this is that the multiple inheritance of metadata files permits the possibility that a single report can combined multiple metadata definitions in order to satisfy multiple reporting requirements, possibly from different authorities, in a single submission, and in this scenario, the different component parts of the metadata may specify different names for the report parameter CSV files, which is an error under the specification as currently drafted.

It was agreed that any such reporting arrangement would require co-ordination between the different parts of the reporting requirements, and structuring the metadata in such a way as to name only a single CSV parameter file was not an unreasonable additional burden, and as such, the additional complexity of multiple CSV parameter files was not warranted.

9 Namespacing of CSV metadata components

CSV metadata may be assembled from multiple component files, and it is envisaged that these files may related to different report requirements and may be defined by different authorities. As such, consideration was given to the possibility of supporting namespaces for components such as table templates and parameters.

It was noted that whilst xBRL-CSV has some level of namespace awareness, in order to support QNames and SQNames used in both data and metadata, namespaces are not a native or common feature in JSON, and their inclusion may be confusing to new users of the specification. It was felt that it real world reporting scenarios, there will be sufficient co-ordination between the authorities defining metadata that naming conflicts can be avoided with simple naming schemes, without the need to resort to the globally unique, URI-based names that XML namespaces provide.

10 Completeness of mapping

The specification has a number of constraints that required that data reported in CSV data files, and as parameter values, are incorporated into the resulting report model. For example, it is an error to have a value in a parameter column if there is not a value in at least one fact column that references it.

This is to avoid data that is included in an xBRL-CSV report not being reflected in the resulting report model, leading to potential disputes over unreflected data being shared to a regulator and therefore known (or not) by that regulator.

11 Comment columns

The specification provides support for "comment columns". These are columns that are explicitly not mapped to XBRL data. It is hoped that this feature will be used sparingly; where a data collector is requesting the inclusion of data in a report, it should be included as structured, XBRL data, so that it can be stored and analysed within the OIM framework.

Nonetheless, there are possible use cases where such additional, unmapped data may be useful. Generally this would be data that is of use to the preparer of the report, rather than the collector such as information about the source system from which data was obtained.

Comment columns should not be used to provide explanatory information about data that may be of use to a consumer of the report. Such information should be provided as separate XBRL fact and linked to the relevant data using links (footnote relationships) or other mechanisms.

12 Document info

xBRL-CSV metadata files contain a top-level documentInfo object. Amongst other properties, this includes a documentType string which identifies the document as conforming to the xBRL-CSV specification.

A documentInfo object is also defined in xBRL-JSON, and it is planned to include documentInfo objects in other JSON-based XBRL formats so that processors can automatically determined document type.

documentInfo is also used to hold other information, including namespace prefixes, and there was some discussion within the working group about which components should be defined within documentInfo.

The principle that has been adopted is that documentInfo should be used to contain information that is necessary is necessary to correctly understand the rest of the document.

13 Namespace prefix scoping

xBRL-CSV makes use of QNames and SQNames which rely on prefixes to refer to namespaces. xBRL-CSV does not require prefixes to be in scope at the point that they are used; a metadata file may legally use a prefix that is only defined by another metadata file that extends it.

The reason for this is that it is not always possible to determine which values contain prefixed content until all metadata files are consumed. For example, it is not possible to determine the expected type(s) of a parameter without knowing all the places in which that parameter is used.

As full validation of prefixes cannot take place until all metadata files have been processed, requiring processors to validate that prefixes were in scope at the point of use would be non-trivial additional validation.

It is worth noting that xBRL-CSV's use of prefixes is much simpler than that of XML, as each prefix can only be bound to a single namespace throughout all metadata files in an xBRL-CSV report.

14 Validation of unused data and metadata

It is possible that metadata definitions will include values that are not used in a report. This is particularly likely where the metadata is defined by a regulator collecting data from a number of report preparers.

The working group considered possible approaches to the validation of metadata.

  1. Only validate the minimum required to ensure that the report itself is valid.
  2. Validate as much as possible, including unused information.
  3. Require (1) by default, but specify a "full metadata validation" mode that can be enabled.

An underlying principle is that all validators should return the same overall result on a given report, so it was not considered acceptable to require (1) but permit additional validation other than as an explicitly specified mode as per (3).

As an example, consider a unit dimension with an invalid value:

{
    "unit": "this is invalid"
}

It can be immediately determined that this value is invalid, and it seems undesirable to ever permit this in metadata.

A dimension may come from a parameter reference:

{
    "unit": "$unitParameter"
}

unitParameter may be the name of a parameter defined in the metadata or parameter CSV file, or the name of a column in the table in which it is used. In this case, the dimension can only be validated as part of processing a CSV file.

As some dimension values can only be validated upon use, the simplest approach to both specify and implement would be to require that all values are only validated upon use (option (1) above).

The downside of this simple approach is that errors in a metadata file will only be caught when used in a report. In a scenario where a preparer is attempting to create a report that complies with regulator-provided metadata, this is problematic as fixing the error will require an update to the metadata by the regulator.

Option (2) is more complicated as it requires that values are validated even if unused where possible, but the same validation needs to be applied to all used values once parameters have been resolved.

It was decided that the benefits of this additional validation outweighed the additional implementation and specification burden. It was considered that the additional implementation burden did not justify specifying a separate additional "mode" for this behaviour.

14.1 Edge cases

Unused dimension values are only validated where they are specified as literal values. There are cases where a parameter reference can be resolved using only metadata, but these are not validated as it is possible that the values will be overridden.

Some dimensions (unit and language) may be excluded based on the datatype of the concept. Where both concept and one of these dimensions are obtained via parameter reference, there is a required order of validation: the concept dimension must be resolved and validated first, in order to determine whether the potentially-excluded dimension should be used and thus validated.

14.2 Future developments

Early work on the xBRL-CSV specification assumed that at least some of the metadata would be defined or customised by the preparer of the report. As the specification developed, is has become clear that the most likely use case is for a preparer to use an unmodified regulator-provided metadata definition.

As such, it is considered desirable to introduce a clearer separation between data and metadata, and to specify separate validation requirements for metadata. This may be considered in future versions of the specification.

15 Support for links (footnotes)

The xBRL-CSV specification provides limited support for the inclusion of links (footnote relationships). Links can be included in xBRL-CSV metadata, and must be specified by referring to individual fact IDs. As fact IDs are generated based on table, column and row identifiers, and it is an error for links to refer to facts that are not present in the report, it will not typically be possible for a regulator to define xBRL-CSV metadata containing links that can be imported and used by report preparers.

The working group explored various options for more powerful support for links, but the approaches added significantly to the complexity of the specification. Existing environments for which xBRL-CSV adoption is envisaged do not generally make use of XBRL footnotes, and as such, it was felt that this complexity was not justified. Instead, minimal functionality for including links in JSON metadata files is provided in order to ensure that all OIM report information can be included in an xBRL-CSV report, but this functionality is not expected to be widely used for data collection purposes.

The freedom and flexiblity provided by links is not considered to be consistent with the highly constrained nature of the typical "closed form" reporting use cases for xBRL-CSV.

Data collectors can prevent the use of links in xBRL-CSV reports by not defining any linkGroups or linkTypes and declaring these objects to be final.