Crushing It: Central Bank of the Russian Federation Embraces Granular Data
This is a guest post by Stanislav Korop, Deputy Director Data Governance Department, Bank of Russia. It is based on his 15 April 2021 session at Data Amplified Virtual, catching us up on the Bank’s new xBRL-CSV framework, which has just been launched for Russian filers.
Like many regulators we find ourselves wanting to collect more and more detailed information, and to analyse and aggregate it ourselves in different ways. In other words, we need to be able to handle granular data as effectively and simply as possible, minimising the technological burden of such huge volumes of data. As one of the first to implement xBRL-CSV as a new XBRL format to help us achieve that, I will be exploring here the background, roadmap and outcomes of our xBRL-CSV deployment at the Central Bank of the Russian Federation.
The Russian context
As the country’s financial regulator, the Bank of Russia collects a large amount of data to understand the health of supervised companies and ensure their stability. Our work on xBRL-CSV is the latest advance in the project I lead on implementation of XBRL-based reporting for non-credit financial institutions. In other words, our scope – so far – extends to all financial companies except for banks. We started work in 2015, and began collecting reports in XBRL in 2018.
We divide the project into four streams: organisational, incorporating several pilots; methodological, including the taxonomy and legislation; IT, covering the filing platform and software; and change management, which includes training for stakeholders. You can see the detailed roadmap, alongside other slides, in my updated (end April 2021) conference presentation here.
A lot of work has been done in every stream, but a few specifics are particularly worth pointing out. It took us only two years to develop what I suspect is the world’s most complex taxonomy. Right now it consists of approximately 17,000 unique tags, and 25,000 validation rules. That makes it very challenging to update and maintain, but so far we are equal to it!
We established a provisional jurisdiction very early on, and a permanent jurisdiction when we began XBRL data collection. XBRL RU is now working very effectively, with over 90 members. It has become a place to discuss and share opinions, and it is helping us a great deal.
We also provide two software services to help our regulated companies file XBRL reports. I believe that we are one of very few regulators around the world that have provided free software. I mention the software and jurisdiction because these have proved to be two important tools in enabling us to successfully start collecting financial information in XBRL.
Looking forward, in 2021 we have three new markets approaching the XBRL transition, so by the end of the year the only non-credit financial institutions not filing in XBRL will be microfinance companies and digital financial assets infrastructure participants, and we aim to convince those of the benefits of XBRL over the following year. I hope that by the end of 2022 we will have the scope of our project fully covered, and we will be able to start talking about XBRL for credit institutions. That alone is a very difficult and challenging project – which is what makes it exciting!
The transition to granular data
I certainly agree with John Turner that the world is changing fast. When we were designing our software systems in around 2016-17, we and our colleagues from our supervisory departments really did not anticipate that granular data would become useful and that we would want to collect it to carry out deeper, more detailed and more flexible analyses. We all thought that aggregated figures were the way to be, and we did not foresee changing that paradigm
Yet by 2018 we already had three forms – two from pension funds and one from securities market participants – that can be considered to require granular data. That was a problem, because neither our IT systems nor traditional XBRL itself is designed to handle that kind of very detailed data. The data is not complex per se; in fact, it is basically a simple list, albeit a very long one with five or ten million rows.
That means, however, that we are generating absolutely enormous files. For example, form No. 0420257, which deals with pension funds, created XBRL reports up to 50 GB in size and zip files up to 1.8 GB, with average files around half that.
That is clearly a heavy burden on IT systems, and at the Bank of Russia we also had two particular technical restraints that would not allow us to collect the data in the normal way. File size limits within the system prevented users from uploading very large zip files to the data collection system, and then made it difficult to validate and process larger files ready for storage and analysis.
We found ourselves, in some cases, accepting files in custom CSV-format, and manually downloading them to our system, bypassing the validation rules. Clearly, this was a far from optimal approach! The most obvious workaround was to cut these large submitted files into several pieces, and merge them in our system. However, Russian rules require that each file should be business meaningful – or in other words, complete. This ‘solution’ would therefore have required significant legislative and IT updates, without really addressing the fundamental issue of excessive file sizes. We needed a solution that would reduce technological costs and simplify – rather than complicate – the collection process for granular data.
At around this time, we heard the news that XBRL International was developing the Open Information Model (OIM), and that there was a new xBRL-CSV specification available in draft form. It sounded perfect – you just continue to use the same reporting metadata, i.e. the XBRL taxonomy, combined with the most appropriate format for granular data transfer, namely comma-separated values, or CSV, and your files become as light as possible. So we decided to study it and test whether it was suitable for us to use.
Success with xBRL-CSV
Here I must thank our colleagues from Fujitsu, with whom we worked very closely on our 2018-19 xBRL-CSV trial. We generated record-based data for testing and assessed collection using the new xBRL-CSV format, achieving highly promising initial results. For tables with more than 1,000 rows, we obtained a 30-fold reduction in file size. For example, a table with one million rows and 63 million facts produced a report with an uncompressed file size of 22 GB using traditional XML-based XBRL, and only 0.7 GB using xBRL-CSV. (Each xBRL-CSV report is also accompanied by a JSON metadata file of fixed negligible size, 28 kB in our example, describing the structure of the CSV files.)
We were greatly impressed by those outcomes, as were our IT department colleagues. If it were put into practice it would significantly reduce the unnecessary burden of transferred data being generated by these forms. In fact, the results were so good that we almost immediately decided to implement the technology in the data collection system of the Bank of Russia.
That was an endeavour that took us approximately nine months as we developed and adjusted our XBRL engine (you can see the roadmap in my slides). While initial work was based on the xBRL-CSV Candidate Recommendation, we have continued to review it as the specification has evolved to reach its final stage. Another important improvement in our system was to increase file size limits. We now have two separate, parallel options for filers that allow them to submit using ether traditional XBRL or xBRL-CSV. Importantly, in either case files are being successfully validated and transferred to our data warehouses.
We carried out pilot data collection using the new system at the beginning of 2021. The results with real-world data were quite not as impressive, achieving a 15-fold reduction in file size, but still very positive. We have also started to explore theoretical ideas that in some cases we might be able to achieve substantially greater reductions.
We hope that the system will be relatively futureproof, and we have increased the size constraint on zip files from around 2 GB to 5+ GB. Given that XBRL is usually archived with very high compression rates of over 95%, that gives us good assurance that as we switch to more and more granular data we will still have headroom available without needing to upgrade every year. Right now we are looking into receiving additional granular data for insurance markets, and while we have not estimated the volume it could be considerable.
One other important part of the rollout was to issue training materials for IT vendors. This supported certain upgrades in regulated companies’ in IT systems required for the switch to xBRL-CSV.
A (first) drop in the ocean
While we have embraced granular data collection, it’s fair to say that the development remains largely unnoticed as yet. That is because only around 3-5 companies currently produce these huge files. The vast majority of the data we collect today is not granular, and traditional XBRL works well for most reporting. So why did we go through with this project for the sake of a very small quantity of regulated companies?
The answer, of course, is that we are not stopping here with the current achievements of the project, and I believe that our new xBRL-CSV capacity will be very valuable to us in the future. Our need for granular data is only likely to grow in the future, and we are even now discussing implementation of granular data collection for insurance companies and expansion for securities market participants and pension funds.
We are proud to be among the first regulators to deploy xBRL-CSV, and we are very grateful to for all the help of Fujitsu, XBRL International and the XBRL community. In fact, we even took Data Amplified as our springboard to officially launch the new framework with the publication of guidance documents on the Bank of Russia website. We’re looking forward to analysing our first filings and bringing you the next chapter in the story.