xBRL-CSV design 1.0

Working Group Notes 3 February 2021

This version: https://www.xbrl.org/WGN/xbrl-csv-design/WGN-2021-02-03/xbrl-csv-design-2021-02-03.html
Editor: Paul Warren, XBRL International Inc. <pdw@xbrl.org>
Contributor: Mark Goodhand, CoreFiling <mrg@corefiling.com>

1 Overview

This document serves to document the motivation behind certain design decisions made in xBRL-CSV. As the metadata format for xBRL-CSV is closely related to the xBRL-JSON syntax, this document should be read in conjunction with the xBRL-JSON design document.

2 Identifier constraints on column headings

xBRL-CSV constrains column headers to be "identifiers", which are strings that cannot contain whitespace, amongst other constraints. This means that column headers must be defined using a string such as country_of_incorporation rather than a more human-readable "Country of incorporation".

As the column names are used to tie columns to their associated metadata, it was felt that restricted identifiers were more appropriate, particularly as human-readability of the CSV files is not a design priority.

3 Support for CSV dialects

There are many different dialects of CSV file in common use, varying in details such as escape characters and line endings. Consideration was given to allowing the metadata file to specify exactly which dialect of CSV is being used in a report.

It is expected that existing CSV files will not be used directly as part of an xBRL-CSV report; at least some degree of value transformation will typically be required. As such, requiring transformation to a required CSV dialect should not be a significant burden. It was decided to adopt a single, permissive CSV format that accommodates all of the most commonly used dialects, rather than the non-trivial additional complexity of making the dialect configurable.

4 Abbreviated period syntax formats

Earlier drafts of the xBRL-CSV required all periods to be denoted using the full ISO8601 datetime format, minimising the burden on the implementers of consuming applications. This is consistent with the approach taken in xBRL-JSON, and removes any ambiguity about whether a date refers to the start or end of the day.

Many common applications of XBRL operate at a "whole day" level of granularity, and periods are naturally described using inclusive dates. For example, a year is typically described as running from "1st January to 31st December" (and includes both of those dates). Expressed as an ISO8601 duration, this would be:

2019-01-01T00:00:00/2020-01-01T00:00:00

This is both verbose, and unintuitive as the end date is expressed as the start of 1st January.

As compactness of representation is a design priority for xBRL-CSV, it was felt that a more concise format should be supported, but in an attempt to avoid a repetition of the confusion caused by XBRL v2.1's handling of dates and date times, it was decided to use a different notation for the abbreviated syntax with .. as the date separator. For example:

2019-01-01..2019-12-31

The .. separator is explicitly defined as an inclusive date range, and cannot be used with datetimes. Similarly, the / separator cannot be used with dates.

Having taken the step to move away from a single notation for durations, in the name of conciseness and convenience, it was felt reasonable to add further abbreviated notations for common calendar periods, e.g.:

5 Extensibility of metadata files

A primary use case for xBRL-CSV is the reporting of large volumes of data within a regulatory reporting environment. Such reporting environments are typically closed reporting systems, where the data to be reported is entirely prescribed by the data collector. In such a system, the ability of preparers to define the layout of the reported tables is neither required nor desirable.

On the other hand, where entities are publishing data in xBRL-CSV, it is desirable for such reports to be self-describing, by including metadata that documents the meaning of the CSV data tables.

6 Multiple inheritance of metadata files

It is anticipated that the designers of xBRL-CSV reports will wish to modularise metadata definitions, using separate metadata files for different reporting templates. xBRL-CSV supports inheritance from multiple metadata files, allowing the overall metadata definition to be assembled from such modular metadata definitions.

The specification allows the same properties to be inherited from multiple sources provided they have the same values. This is to allow common information, such as namespace declarations, to be inherited by all modular metadata definitions. Providing such modular definitions do not introduce namespace definitions that bind different values to the same prefix, they can be combined without conflict.

7 Single parameter CSV file

The specification allows report parameters to be provided in a CSV file. The group discussed the possibility of permitting multiple such CSV files. The motivation for this is that the multiple inheritance of metadata files permits the possibility that a single report can combined multiple metadata definitions in order to satisfy multiple reporting requirements, possibly from different authorities, in a single submission, and in this scenario, the different component parts of the metadata may specify different names for the report parameter CSV files, which is an error under the specification as currently drafted.

It was agreed that any such reporting arrangement would require co-ordination between the different parts of the reporting requirements, and structuring the metadata in such a way as to name only a single CSV parameter file was not an unreasonable additional burden, and as such, the additional complexity of multiple CSV parameter files was not warranted.

8 Namespacing of CSV metadata components

CSV metadata may be assembled from multiple component files, and it is envisaged that these files may related to different report requirements and may be defined by different authorities. As such, consideration was given to the possibility of supporting namespaces for components such as table templates and parameters.

It was noted that whilst xBRL-CSV has some level of namespace awareness, in order to support QNames and SQNames used in both data and metadata, namespaces are not a native or common feature in JSON, and their inclusion may be confusing to new users of the specification. It was felt that it real world reporting scenarios, there will be sufficient co-ordination between the authorities defining metadata that naming conflicts can be avoided with simple naming schemes, without the need to resort to the globally unique, URI-based names that XML namespaces provide.

9 Completeness of mapping

The specification has a number of constraints that required that data reported in CSV data files, and as parameter values, are incorporated into the resulting report model. For example, it is an error to have a value in a parameter column if there is not a value in at least one fact column that references it.

This is to avoid data that is included in an xBRL-CSV report not being reflected in the resulting report model, leading to potential disputes over unreflected data being shared to a regulator and therefore known (or not) by that regulator.

10 Document info

xBRL-CSV metadata files contain a top-level documentInfo object. Amongst other properties, this includes a documentType string which identifies the document as conforming to the xBRL-CSV specification.

A documentInfo object is also defined in xBRL-JSON, and it is planned to include documentInfo objects in other JSON-based XBRL formats so that processors can automatically determined document type.

documentInfo is also used to hold other information, including namespace prefixes, and there was some discussion within the working group about which components should be defined within documentInfo.

The principle that has been adopted is that documentInfo should be used to contain information that is necessary is necessary to correctly understand the rest of the document.

11 Namespace prefix scoping

xBRL-CSV makes use of QNames and SQNames which rely on prefixes to refer to namespaces. xBRL-CSV does not require prefixes to be in scope at the point that they are used; a metadata file may legally use a prefix that is only defined by another metadata file that extends it.

The reason for this is that it is not always possible to determine which values contain prefixed content until all metadata files are consumed. For example, it is not possible to determine the expected type(s) of a parameter without knowing all the places in which that parameter is used.

As full validation of prefixes cannot take place until all metadata files have been processed, requiring processors to validate that prefixes were in scope at the point of use would be non-trivial additional validation.

It is worth noting that xBRL-CSV's use of prefixes is much simpler than that of XML, as each prefix can only be bound to a single namespace throughout all metadata files in an xBRL-CSV report.

12 Validation of unused data and metadata

It is possible that metadata definitions will include values that are not used in a report. This is particularly likely where the metadata is defined by a regulator collecting data from a number of report preparers.

An underlying principle is that all validators should return the same overall result on a given report, so it was not considered acceptable to require (1) but permit additional validation other than as an explicitly specified mode as per (3).

{
    "unit": "this is invalid"
}

It can be immediately determined that this value is invalid, and it seems undesirable to ever permit this in metadata.

{
    "unit": "$unitParameter"
}

unitParameter may be the name of a parameter defined in the metadata or parameter CSV file, or the name of a column in the table in which it is used. In this case, the dimension can only be validated as part of processing a CSV file.

As some dimension values can only be validated upon use, the simplest approach to both specify and implement would be to require that all values are only validated upon use (option (1) above).

The downside of this simple approach is that errors in a metadata file will only be caught when used in a report. In a scenario where a preparer is attempting to create a report that complies with regulator-provided metadata, this is problematic as fixing the error will require an update to the metadata by the regulator.

Option (2) is more complicated as it requires that values are validated even if unused where possible, but the same validation needs to be applied to all used values once parameters have been resolved.

It was decided that the benefits of this additional validation outweighed the additional implementation and specification burden. It was considered that the additional implementation burden did not justify specifying a separate additional "mode" for this behaviour.

12.1 Edge cases

Unused dimension values are only validated where they are specified as literal values. There are cases where a parameter reference can be resolved using only metadata, but these are not validated as it is possible that the values will be overridden.

Some dimensions (unit and language) may be excluded based on the datatype of the concept. Where both concept and one of these dimensions are obtained via parameter reference, there is a required order of validation: the concept dimension must be resolved and validated first, in order to determine whether the potentially-excluded dimension should be used and thus validated.

12.2 Future developments

Early work on the xBRL-CSV specification assumed that at least some of the metadata would be defined or customised by the preparer of the report. As the specification developed, is has become clear that the most likely use case is for a preparer to use an unmodified regulator-provided metadata definition.

As such, it is considered desirable to introduce a clearer separation between data and metadata, and to specify separate validation requirements for metadata. This may be considered in future versions of the specification.