Lost in the Cloud

Any software development process involves a fair amount of extraneous creation. Code is revised, documents created and destroyed, prototypes and demos constructed, all in the pursuit of a final, stable digital object. Digital games add even more to this crush of documentation with an unending multitude of art assets, proprietary file types, and a lack of internal documentation.  Since most development today relies on cloud storage and backup, code repositories and all forms of digital spatio-temporal communication, just finding out where everything is stored necessitates significant technical effort and time.

The team for Prom Week, the object at the heart of my current research for the NEH (info here), made use of numerous cloud services throughout the duration of the project. Fortunately, most of the documents are stored on only two services, Dropbox and Google Docs. Unfortunately, the organization is about as structured as I would expect from a rotating development team with intense time pressures and significant distractions. The Dropbox repository proved particularly onerous in analysis. Each team member had their own individual directory, which usually duplicated some files from another major folder. Aside from duplicates, there is no real structure to the folder names or documentation. This is usually not a problem, however, as Dropbox is searchable and I’m assuming when this folder was active each person responsible for a file knew where and what it was. As an outsider to the Prom Week development process, I can usually ascertain what a document relates to, but that is definitely due to the last few months I’ve spent researching the project.

I’m going to continue the focus on Dropbox for two reasons: first, the Google Documents for the project, while interesting and post-worthy, are only 24 in number and 5 in type, and second, the points I want to make about file extensions and confusion in the cloud are easier to argue when I’m dealing with the 1.8 gigabytes of haphazardly organized Dropbox data. Those nearly two gigabytes of information breakdown into 2,051 individual files spanning 4 years of creation and modification by 8 people. Now while this appears to be a rather small set of data, making sense of it and potentially using it turns out to be more difficult than I had even assumed. And I’m generally rather cynical about such things.  The following post is mainly about file extensions, and is the first in a series on file formats and the cloud.

The major issue for archiving such a collection of documents isn’t necessarily about trying to figure out what they represent at the level of content. Although that does get quite hairy, the major issue I had with the documents is at a much lower level that I’ll discuss in a second. Ascertaining what a document might be about is usually available from context, like the name of the document and its related folders, and from personal development experience. I’ve worked on many software projects with multiple collaborators and so I’m generally keen to the types of documents created. To give a sense of Prom Week’s documentary complexity here is a rough outline of the types of documents I found in just the Dropbox folder:

  • Research Papers
    • outlines
    • shared notes
    • major versions
    • revisions,
    • abstracts
    • submissions
    • figures
    • templates
  • Conferences
    • posters
    • poster templates
    • presentations
    • ephemera documents and records
  • Assignments for undergraduate researchers
  • Data analysis maps
  • Demo and Test Programs for different game elements
  • Screenshots
  • Demo Videos
  • Game Files
    • background images
    • character art
    • sound files
    • video files
    • application project files for a specific Integrated Development Environment (IDE) (Actionscript project files)
    • application files
    • structured data files
    • operating system scripting files
    • system configuration files
    • IDE configuration files
    • source code files
      • game processes
      • data processing
      • asset management
      • software objects
      • data structures
    • user interface files
      • mock ups
      • test applications
      • database files
  • Developer Specific Folders
  • Secondary Creative Software Files
  • Web Site Resources
    • web embedding files
    • website icons
  • Backup Files

This list is assembled from document names and my personal understanding of Flash game development files. However, not all the files were easy to identify, which leads to what I consider the most pressing issue: file types.

There are many files in the folder that are obviously named but not easy to open. Essentially, you can know the context of a file (what type of file it should be) and still have no idea what program created it or how the data is organized. This leads to the three major ways I figure out how to read a file:

  1. Examine the Context
  2. Search the Internet
  3. Mine the Headers

I’m going to illustrate these approaches through examples of problematic files from the Prom Week Dropbox folder, illuminating the pitfalls of each approach.

The first example is the aptly named e0000d3a.au. Now you probably know exactly what type of file this is (aren’t you smart!) but I had no flippin’ clue. So the first thing I did was examine the context and the file’s full path gave me a pretty big clue:

…/altprom/Jacob’s SFX/VOCAL/utterances/mohawk/mohawk_data/e00/d00/e0000d3a.au

Evidently this is a part of an audio file for the vocal sound effects for Prom Week. The ‘mohawk’ refers to (I think) an earlier version of the be-mohawked character in the earlier demos of the game.

I still don’t know what program is associated with the file extension .au, so I use the second approach, I search the Internet. The first hit is a wikipedia page describing an audio format created by Sun Microsystems and popular on NeXT Workstations and early Web sites. This seems totally off, since I’m positive that no one has used a NeXT machine at UCSC for at least 15 years and possibly never (cue angry UCSC NeXT users). Now NeXT is the progenitor of Apple’s OS X, the former being purchased by Apple in 1996, and is a totally interesting topic not for this blog post. In fact, my iTunes application detected the .au as an audio file but could not run it, which is good sign it’s not a valid Sun .au file.

Looking at the Internet results again, I noticed that the second result is  a FAQ answer for the open-source application Audacity. The site asks, “Why does Audacity create a folder full of .au files when I save a project?” Looking at the file in question, our friend e0000d3a.au, I see that she’s in a folder with a bunch of other .au files, there’s a big (.au)dacious party up in there. The FAQ page also mentions that there should be an .aup (Audacity Project File) associated with the .au Audacity Block Files and sure enough, there’s a .aup file in a parent directory.

Now I’ve figured out what type of file .au is referring to, and it makes sense that a student researcher on the project would use a free, open-source audio editor for an academic project. However, I still haven’t mentioned the third identification method, mine the headers, because I actually didn’t need to do that for this particular file. If I had I would have seen this:

So many colors...

Audacity Block File in vi editor

The file clearly states in the header information that it’s associated with Audacity, so I could have examined that first and probably saved a bit of work. Regardless, I’ll explain the process for doing dirt-cheap header analysis on UNIX-based systems. I don’t generally use a Windows PC for anything but gaming, so most of the methodology on this blog will be from the technical context of Mac OS X available tools. All OS X’s flashy graphical flourishes are underwritten by the BSD-derived Darwin operating system and it is UNIX compliant, therefore I’m using mostly common UNIX tools for my surface analysis.

A good deal of file formats, though not all, have some text-based header information at the ‘head’ of the file. You know, at the top. So if you open those files in a format-agnostic text editor and if they have encoded text, you can see what type of file it is. To obtain the screenshot of the Audacity file above I used a terminal application, basically a program that lets a user interact with the command line interface to the Darwin OS running my Mac. Every Mac has the Terminal application installed, so you can follow along if you open it. If you’re on Linux I’m assuming you are already aware of how to access the terminal. When I’m in a command line interface, I just use the command: vi path-to-file to open the file in the vi editor. vi will open pretty much anything, though if it’s not encoded as text it will be gobble-de-gook like the Audacity file above. I keep active files in a convenient place if I just want to snoop so I copy them to my desktop temporarily. Therefore, the command to look at the audio file was:

vi ~/Desktop/e0000d3a.au

Okay, so I’ve covered the types of files and common methods I use to find out about file types. In the next few blog posts I’ll elaborate on how these methods can lead to some confusion with particularly knotty files, and discuss some other issues related to file formats, like versioning and dependent applications. I’ll also try to make them less than 1,537 words in length. Bye for now.

National Endowment for the Humanities

Last April I, along with Noah Wardrip-Fruin and Christy Caldwell at University of California, Santa Cruz (UCSC) and Henry Lowood at Stanford, received a National Endowment for the Humanities Digital Humanities Start Up Grant. Our project proposal (online here) is a first pass on an archival and appraisal strategy for academically produced computer games. The focus of the research is on the process of creating computer games in an academic context. We are looking deeply at a game produced by UCSC graduate students and divining the trajectory of development and all the types of artifacts produced by such an effort. There is a general lack of knowledge about how game production functions in an academic research project and our goal is to shed some light on it, both through a technical dive into the development process and an archival narrative of object production.


The game we chose is Prom Week, a social simulation game produced by my lab (the Expressive Intelligence Studio) at UCSC. Although its selection is slightly self-serving we needed full access to a development process and its resulting game. If we want to understand how the development process worked and aggregate all the different outputs it produced then we had to choose something close to home. Private software development is notoriously insular and shielded, an effort to protect IP issues and development talent. Therefore we figured an academic game would provide more open tools and access, especially one in which I could just ask the developers questions if I ran into them at lab meetings, in the hallway or at the food truck.

Given that this type of work is new, we needed to find helpful examples to provide some initial guidance. The two major sources of inspiration are the 1983 Joint Committee on the Archives of Science and Technology (JCAST) report on scientific process, and the Preserving Virtual Worlds Report on game preservation issues. The JCAST report is essentially a detailed description of the problems inherent in the records management and archiving of the scientific process. There is less interest in the official publications, since those are generally clean and organized documents representing the output of a messy research process. JCAST is concerned with archiving the mess, specifically how research institutions should handle and evaluate the myriad artifacts incumbent to scientific research. This type of investigation seemed applicable to the archival treatment of video game development processes and has provided nice guidance so far. The other source, Preserving Virtual Worlds, was the first major government research into the issue of game preservation specifically, and while not concerned with game development process, it still highlights numerous types of documents and extensive technical warnings about the issues inherent in reproducing digital documents.

The process of my current work on Prom Week is informed by both reports and seeks a middle path to explain how the development process works in an academic context, what types of documents are produced and what technical issues one would face if they were crazy enough to actually archive it all.

A final note here. The academically produced games that we are concerned with are those pieces of digital entertainment software produced with a specifically teleological bent. They are designed to research some processes or validate some novel system of play, design, or pedagogy with hopes of publishable academic results. This context is then slightly different than the corporate or independent development process, but it is hoped that many considerations will map accordingly.

References (yes in a blog post):

Some Research Goals

The problem is that most computer game history focuses on the outputs of the development process. A game is made, it plays a certain way, introduces a novel mechanic and gets recorded into the annals as a completed experience. Everything is packaged and ready for consumption. As a designer and a historian I find this type of perspective maddening. Development is a complex and fraught process, rarely clean and delineated. Developers and designers could learn more from knowing how a particular game was made than from the analysis of its final outputs. The issue is that in most commercial development all records of the development process are eliminated or sealed away after production. Some craft knowledge escapes through sporadic conference presentations and post mortem analysis in trade publications, but there is no place to go for a full technical history of level design, lighting design, environmental art, etc. I generally feel that if we develop ways to organize and share the creative process it will benefit the collective as a whole and allow even more amazing advancements in this medium I adore.

My research goals are then targeted at revealing process, technique and design, and then remediating it back to students, researchers, and professionals looking for answers to questions that have surely been addressed in secret hundreds of times. I aim for deep technical history.  My goal is primarily pedagogical and organizational, I’m concerned with the long term survival and parse-ability of process and craft, including novel ways to share and disseminate information.

As a solution I’ve chosen first to devote effort to the saving of digital artifacts in their entirety and the artifacts of their development processes. For without a basic archive, an intellectual and historical basis, learning and growth cannot even begin to take shape. There must be a firm foundation to stand on, and I figure I might as well help build it.

I’m currently working on two research projects devoted to archiving and organization: a National Endowment for the Humanities Digital Humanities Start Up Grant to record and appraise the development records of Prom Week, a social AI game developed by graduate researchers at UCSC, and an Institute for Museum and Library Services National Leadership Grant hoping to reshape game cataloging, metadata and citation efforts. Both projects will be discussed extensively in future blog posts.

A Place for Research and Thoughts

Howdy. I’ve had a couple people ask me to start putting up my work online. This quick, minimal template is the first stab at that. This site will change significantly once I have time to work on it. However, that time is not now. I’m flirting with writing my own blogging platform for the site, so that it can eventually capture elements of my research ethos and philosophy. For now I’m just trying to get something I can use up as quickly as possible.

Posts here will be cross linked throughout the Twitter, Facebook (for my friends and associates), possibly Medium, and the Expressive Intelligence Studio blog when relevant.

You’ve found yourself here somehow, so welcome. More to come.