Greetings. My name is Sakuya Izayoi, and I am the chief maid of the Scarlet Mansion.
The Scarlet Mansion is known throughout Gensokyo for its vast library, consisting of kilometers of dusty bookshelves made to fit into a smaller space by dimensional manipulation.
However, a large part of the library’s knowledge is not contained within the books or within its aptly named librarian, but within electronic storage. Physical books are still necessary because certain magic spells need a physical object to be bound to, but electronic storage does a better job of cramming information into a small space than dimensional magic ever can.
The electronic storage of the Scarlet Mansion contains a catalog of every physical book in the library, research data generated by the Kappas of Youkai Mountain, feeds from the many dimensional gaps opened by the Yakumo house, a certificate from every soul passing before the Yama’s throne for judgment, a record of all donations received by the Hakurei shrine (a file less than 1 MB in size), and more. Some of this data is in relational databases or XML files and is easy to search and organize, but some of it is not.
And that’s where PADS comes in.
When you receive data from sources you’re interested in, often the source won’t be bothered to give it to you in a format you can easily process. The data comes to you in whichever arbitrary format is easiest for their machines (or even fairy slave labor in some cases) to output.
U, U, U, U is not as helpful as a database table.
Miss Knowledge and her assistant Koakuma take care of the physical books, but the task of managing the digital data falls to me, the chief maid. I tried writing a parser for some data using regular expressions and Perl, but such solutions are not easily reusable for data they were not specifically written for (not to mention regular expressions are difficult to read). The same thing holds for parsers written in C-family languages. The problem is made even worse by the fact that these arbitrary data streams are often buggy (particularly the ones entered by fairy slave labor), requiring error checking code that grows even larger than the parsing code itself.
In fact, most of the data feeds going into the Scarlet electronic libraries were unusable until I discovered PADS.
“Processing Arbitrary Data Streams”, or PADS for short, is a data description language from AT&T, a famous telecommunications company from the country known as “USA” in the outside world. It is used to parse data such as phone calls monitored for fraud, but has many other applications. It applies programming language concepts such as typing and recursion so that users can describe to the computer how the data is structured.
The example above, is data from commuter train networks around the USA, some of which supply the great Yukari Yakumo with trains. The string of “U”s is described as an arbitrary length array where the elements are separated (sep) by commas and the array is terminated (term) by an “end of record” character, usually a linebreak. The elements of the array are described as “OptInts”, which are either “U” or an integer. You continue doing this until you have described to the computer how the data is structured, hence the term “data description language”.
Once the description has been given and applied to the data, PADS handles the error checking, telling you how many records in the arbitrary data did not fit the description. These errors can be sought out, allowing you to correct them (if they are indeed errors) or change your description to accommodate them (if they are actually not errors).
While sometimes you would want a human familiar with the data to enter the description, PADS can also try to do it for you. If you specify what begins and ends each record, PADS can break down each record into its components.
When the data description is complete, the computer can now generate tools from the data. You can parse it, sort it, export it into a database or an XML file, or whatever it is you want.
In dealing with arbitrary data streams, PADS is as useful to me as my broom and time-stopping powers are in doing house cleaning. In fact, I would even say I can’t get through the day without PADS.
Chief Maid, Scarlet Mansion, Gensokyo
Proud user of PADS since 2004
(Information taken from the University of Washington colloquium lecture by Kathleen Fisher)