Next: Definition of SDDS Protocol Up: User's Guide for SDDS Previous: User's Guide for SDDS Contents

Why Use Self-Describing Files?

Before answering the question posed by the title of this section, it is necessary to define what a self-describing file is. As used here, data in self-describing files has the following attributes:

The data is accessed by name and by class. For example, one might ask for ``the column of data called X'', or ``the array of data called Y''. Self-describing data is not accessed by position in a file; e.g., one would not ask for ``the third column of data''.
Various attributes of the data that may be necessary to using it are available. For example, one can ask ``what are the units of column X?'', ``what is the data-type of array Y?'', or ``how many dimensions does array Y have?'' .

The primary advantage of accessing data and its attributes by name rather than the traditional position method is that one can then construct generic tools to manipulate data. Self-describing data contains the information that tools need to manipulate various types of data correctly. For example, one can plot data with a generic tool that accepts the names of the quantities to plot; such a tool will be able to plot data of different types (e.g., integer or floating-point), and display relevant information (e.g., units) on the plot.

Another advantage of self-describing data is that it makes the interface between programs more robust and flexible. Since programs only look for data by name, insertion of additional data into a file is irrelevant. Multiple programs may interface to a single program even in the face of differences in what data each places in its output files. E.g., program A may create data in single-precision, with columns called X, Y, and Z. Program B may create data in double-precision, with columns called X, Y, and W. If all programs employ self-describing files, then a properly-written program C could access X and Y from the output of either program A or B. It could also determine that the output of program B didn't contain data called Z, and warn the user of this.

The SDDS file protocol incorporates these aspects of self-describing data. It has been found extremely valuable for storing data from simulation, experiment, and accelerator operation at the Advanced Photon Source (APS). SDDS is made more valuable by the existence of a growing ``toolkit'' of over 40 generic commandline programs that perform many varied operations using SDDS files. Indeed, while there are more general self-describing protocols than SDDS, to the author's knowledge only SDDS has a powerful, generic program toolkit built around it. In the author's opinion, this is possible because SDDS protocol is general but not too general. The SDDS Toolkit is used to postprocess simulation output, to analyze experimental and archival data, to prepare data for input to other programs, to provide a bridge between separate simulation codes, to display data graphically, to collate and section accelerator save/restore files, and much more.

While it is very flexible, SDDS is also fairly simple. Because SDDS features interchangeable binary and ASCII formats, it is an easy matter to create an SDDS data set ``by hand'', when necessary. It is also easy to modify existing programs to print in SDDS protocol, and to create headers to convert existing text data to SDDS. At the same time, data archivers, large-scale simulations, and similar applications can store data in binary for quick access and disk economy. These and other features contribute to the widespread use of SDDS at APS.

Next: Definition of SDDS Protocol Up: User's Guide for SDDS Previous: User's Guide for SDDS Contents

Hairong Shang 2004-01-16