Attention: You are using an outdated browser, device or you do not have the latest version of JavaScript downloaded and so this website may not work as expected. Please download the latest software or switch device to avoid further issues.

LEARN > Blogs > Explainer: What makes a data standard a standard?

Explainer: What makes a data standard a standard?

23 May 2024
Written by Data Foundation
Blogs

Written by members of the Data Foundation’s FDTA Roundtable

The term data standard has become common in law and policy directly affecting business, investors, and government regulators, with benefits accruing across society. Even with laws like the Financial Data Transparency Act (FDTA) that outline expectations and definitions of data standards, there are differences in interpreting what is included under the mantra. 

Language in the FDTA defines data standards most simply as “a standard that specifies rules by which data is described and recorded.” But in greater detail, the FDTA also describes data standards for financial services in that law as including “...a common nonproprietary legal entity identifier that is available under an open license,” and having the ability to “...render data fully searchable and machine-readable,” and as using “schemas, with accompanying metadata documented in machine-readable taxonomy or ontology models, which clearly define the semantic meaning of the data.” 

Standardization brings a practice or process into conformity, with the goal of  consistency and efficiency across a range of actors or users. Standards can make an existing process less expensive, more productive, and more timely. Common standards used every day include UPC codes and QR codes. 

A QR (quick-response) code is a two-dimensional matrix barcode that contains information that can be scanned and converted to something useful like a website URL. QR codes can be scanned by a commercial authenticator app or by a smartphone camera.

UPC codes are machine-readable images used to track product type and pricing, among other purposes. UPC barcodes can be “read” by any UPC-scanning machine or smartphone app. 

Both examples illustrate data standards read with modern technologies. The application of these standards and widespread use stems from accessible technology that allows the standards to be read by a wide range of scanning machines and apps that provide the support system for the standard. These are well-known examples of successful standards.

What is a “Standard?”

The International Organization for Standardization (ISO) and its sister organization, the International Electrotechnical Commission (IEC), define a “standard” as “a document, established by consensus and approved by a recognized body, that provides, for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context.”

In the context of the FDTA, there are five key characteristics that convey the definition of a standard in practice:

Global, widely used.

The term “standard” refers to something that is widely used. For example, billions of barcodes and QR codes are scanned every day around the world and convey information through a common framework. When an application or process is used by numerous entities, it becomes a standard or common practice. 

Supported by a competitive marketplace.

Because the QR code and the UPC code are widely used, their technologies are broadly supported by an infrastructure of tools that create and distribute the codes, perform the scanning and collect the data reported by each code. 

GS1 is the nonprofit standards organization responsible for maintaining the UPC code standard –  commonly referred to as the “barcode” in the United States – which can be freely used by any market participant. The competitive market encourages low costs to obtain codes, and to extract and collect data.

The QR code was created by the automotive company Denso Wave, because the UPC code could not accommodate the amount of data needed to be recorded for automotive products that have numerous parts. While Denso Wave owns the QR patent, it made the standard open and freely available for creation, distribution, and use. Because a QR code is open, nonproprietary, and can be read by cameras, it gained widespread use and a  support system when smartphones began to include built-in cameras.  

Open, nonproprietary.

Open and nonproprietary are characteristics of a standard important for regulatory adoption and access, because such characteristics allow rapid adoption by the market and provide low cost options. Open, nonproprietary standards may be safer or less risky options for regulators than proprietary standards because they are freely available, and not subject to commercial business needs, which could result in a discontinuation of the technology or application, change in features, or unexpected increased pricing.

Machine-readable.

Machine readable data are “data in a format that can be easily processed by a computer without human intervention while ensuring no semantic meaning is lost,” as defined by the OPEN Government Data Act (Title II of the Evidence Act of 2018). 

Scanning a UPC code or QR code automatically and unambiguously captures the information it represents with no need for manual review or data entry.  

The FDTA aims to similarly automate the capture of financial and business data. A number on a financial table can be understood by looking at the corresponding rows and columns to capture context to know that it represents, for example, product revenues, reported in thousands of US dollars, for the year ending 12/31/22. 

A computer needs the same kind of context transported along with the fact to know what it means. PDFs or spreadsheet applications are electronic files that can be sent electronically, but the data in the PDF or spreadsheet is not machine-readable because the context is not digitally attached to each fact. A data standard that renders facts machine-readable must be able to accurately and consistently represent and transport context, and in the case of the FDTA – semantic meaning, along with each fact. 

Taxonomy and ontology.

The terms “taxonomy” and “ontology” refer to systems of classification. Similar in concept to the approaches in biology, it remains applicable with information systems sending and receiving data for information and figures that are alike and unalike.  A taxonomy classifies information into categories or sub-categories. An ontology describes knowledge by defining concepts and characteristics within a particular domain and then establishes the relationship between the domains. Together, taxonomies and ontologies establish an organized system, clarify definitions, and make connections among related data.  Effective data standards require both: classification and relationships. 

Taxonomies and ontologies represent data as a standardized  framework that represents what can be reported, how reported data relates to other data, and how data should be presented. For example, a taxonomy for a financial statement includes a hierarchy of terms to illustrate how line items should appear on a balance sheet. 

Ontologies provide structured “metadata models” that consistently convey information about the data that is needed for unambiguous understanding. Metadata (data about the data) may include definition, label, data type (whether it is monetary, integer, text, etc). These factors are vital for concrete shared understanding and can enable deep analytics and the future application of technologies associated with artificial intelligence (AI).

The taxonomy structure enables data standardization by expressing a consistent data model. Machine-readable digital business and financial data reports are generated by referencing the taxonomy. 

Digital reports can be prepared in schema formats like XML or JSON without using a taxonomy, but the reports will not produce standardized data. An XML file, for example, that generates machine-readable data and is not associated with a taxonomy or schema, must contain all the contextual information that definitively describes the data.

When a taxonomy is used to create machine-readable information, data is more consistent, more comparable, more efficiently produced, less costly, and more timely because the data can be automatically and rapidly ingested, consumed, and used. Most of the contextual information needed to fully understand a fact does not need to be incorporated in the reporting file because it is already in the taxonomy (which the reporting file references). The taxonomy serves as the “single data model” so that every digital report that references the taxonomy, is structured in the same, standardized way. 

Conclusion

Effective standards have certain common characteristics. They are widely used, supported by a competitive marketplace of tools, and are open and nonproprietary. Successful data standards have these characteristics, and more. The data standards can render data machine-readable, and are supported by taxonomies that consistently express the data model. 

When you ask what makes a data standard a standard in the context of the FDTA, the law provides definitions and functions of the related data standards. But to know what makes those characteristics and definitions significant, we look to the integrity, efficiency, reliability, and timeliness of the data and information shared under those structures.  

Additional resources:

  • Ritz, D. and T. Randle. 2023. Implementing the FDTA: From Data Sharing to Data Meaning. Washington, D.C.: Data Foundation. https://www.datafoundation.org/implementing-the-fdta1
  • XBRL US. 2023. Data Standards & the Financial Data Transparency Act (FDTA).https://xbrl.us/research/data-standards-and-fdta/
image

DATA FOUNDATION
1100 13TH STREET NORTHWEST
SUITE 800, WASHINGTON, DC
20005, UNITED STATES

INFO@DATAFOUNDATION.ORG

This website is powered by
ToucanTech