💾 Archived View for despens.systems › fencing-apparently-infinite-objects captured on 2024-05-10 at 10:23:16. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-03-21)
➡️ Next capture (2024-05-12)
-=-=-=-=-=-=-
Dragan Espenschied, Klaus Rechert, 2018
The digital preservation discipline has traditionally worked with object definitions in which object boundaries are mostly based on established metaphors like “files” and “records,” which seem largely self-explanatory, and help with establishing systems for measuring completeness and preservation success.
As performative digital objects—“software” in the broadest sense rather than digital simulacra of documents—are sought to be preserved, object boundaries appear increasingly “blurry”: for instance, many software applications are staged to look and behave like locally running binaries, when in fact an orchestration of networked processes is required for their operation. This ranges from simple software installers checking for license servers to desktop applications presenting networked resources such as maps, videos, or communication messages exclusively or mixed with local artifacts.
Furthermore, creating such performative objects has become increasingly accessible to general computer users, for example via popular frameworks like React Native, Electron, or Jupyter Notebooks, (Ragan-Kelley, 2014) and is already an established practice in many areas like digital humanities, net art, and computer games.
It appears that the good old “desktop application” is becoming a metaphor similar to “files” and “records”: it quickly approaches the limits of its usefulness and is creating conceptual limitations to preservation practice.
Hence, this paper explores an expanded definition of object boundaries for performative objects.
The demarcation of a digital object is usually done in an “at rest” state, during storage, i.e. a static representation of data while no computing activity is happening, and is therefore shaped by units of storage alone, such as files and storage media. As part of this practice, the term “object” is usually used synonymously with “file” or a collection thereof. Even the simplest file, however, is dependent on technical performance to transform it from a static bitstream-preserved artifact into a an active object that is fit for human consumption or interaction. (Heslop, 2002)
For that reason, we propose to examine such objects in a “switched-on” state and to identify components based on their effect on the performative potentials of the object.
When it comes to software in the form of executables, it can be assumed that any binary that is fully locally available can be executed or performed via hardware emulation, re-creating the the same potentials as if the binary was made to run on actual hardware.
Software preservation frameworks like EaaS (Rechert, 2012) have demonstrated that productive abstraction layers can be drawn by gradually separating “common” or “mass-produced” artifacts from “unique” ones, and when put together, each component changes a system’s performance. For instance, every basic installation of a Microsoft Windows operating system offers the same capabilities, such as hardware abstraction and the possibility to execute a wide range of binaries; adding a copy of the popular QuickTime 6 player extends these with capabilities to replay certain types of media files; finally, a specific QuickTime movie is added which can be enacted by the previously combined components. Each artifact in this example is clearly _bound_ and therefore all potentials, known and unknown, can be reproduced.
A _blurry_ boundary is introduced once a locally kept artifact is performing interactions with remote objects or reacts to states specific to the context of the actual execution. Remote APIs or resources could be required on every abstraction layer: the operating system might query a license server (Windows XP), the QuickTime player’s installer might attempt to download system-dependent components (QuickTime 6.01), the QuickTime movie itself might embed remote resources via SMIL (supported since QuickTime 4). All of the technical interfaces designed to interact with remote resources in this example have changed over time or disappeared completely.
If an object is in its entirety located remotely, exposing an unknown range of possible performances, it must be considered _boundless_. A typical example might be any social media platform like Twitter, which provides many modes of access to items with complex relationships inside the platform and further remote sources, but offers no way to inspect the defining processes or even creating an index of provided items from an outside perspective.
In some cases, the computer code that causes interactions with remote resources might be removed from a digital artifact. Skipping calls to obsolete usage tracking services or licensing checks could remove a major blockade and make a piece of software usable again. However, commercial, closed-source software (esp. in binary form) cannot be easily modified or adapted. Even if the effort might be economically justified—when it happens on a high abstraction layer like an operating system, positively affecting the re-performance of many other objects—, simple (binary) patches that, let’s say, redirect requests to a different network target may not be sufficient to re-enable the object’s original behavior; more sophisticated patches may have unpredictable side effects.
In order to keep a blurry object operational, three different technical approaches are possible:
1) Auxiliary machines, e.g. databases, web-servers, or even license-servers may be preservable using an emulation strategy, and included into an execution environment for a class of software or a single specific artifact.
2) If auxiliary machines are not within the curator's reach, stub-interfaces, emulating the original external interface, could be implemented to allow a reduced but functional set of communication with a locally available software object.
3) Finally, if a software is sending a countable amount of queries in a fixed range to a remote system, the occurring network traffic could be recorded and bundled with the software. For instance, a software installer that downloads components from the web would be stored with a web archive containing the required resources.
The option to fully preserve a computational service via emulation (1) seems to be the most desirable, as it offers the full range of potentials of the original setup. For bound units, emulating and orchestrating such services is technically within reach.
When it comes to web-based services such as YouTube, twitter, etc, technical complexity and size poses limits. These objects are to be regarded as boundless, there is no way to preserve them while ensuring the continuous availability of all provided interfaces and potentials. Even if the technical infrastructure to create a complete copy of YouTube would be available, the main purpose of preservation—reducing the actively maintained surface and maintenance frequency of an object by abstracting its complexity—would be economically unattainable. YouTube the service requires YouTube the organization to provide its full performative potential.
Even a small to mid scale web service under the control of the organization that seeks to preserve it might turn out to be spread across several virtual machines or making use of external microservices, requires new strategies and concepts.
Web archiving as a discipline has probably the longest development history with the preservation of black-box networked resources: storing only requests and responses occurring between a client and servers on the HTTP protocol layer, web archiving is abstracting any software running on remote servers, and therefore effectively creating documentary rather than performative resources.
Still, concepts for object boundaries have not been articulated with too much clarity in web archiving. Either the web as a whole is defined as a single object, or, in the common scenario of crawler-based web collecting, object boundaries are assumed to technically match a hierarchical structure of URLs pointing to web resources, usually being located under a single domain.
Both assumptions are problematic. “The whole web” doesn’t really exist, since the most relevant web sites today are interactive and customizable so they appear different on each access for each user’s (or robot’s) context. The responses to a request for data from a URL such as https://twitter.com naturally has to differ for each user in order for the service to make any sense. As twitter in itself has no generalizable form, it remains boundless from the perspectives of web archiving, and, as previously laid out, software preservation.
Additionally, in today’s web, within a single session, static resources, services, and complex JavaScript building blocks are pulled in from from dozens of CDNs, service providers, social media sites, advertising networks, and more, in order to create the impression of a single web page object by the browser. Many web sites, such as Instagram, do not implement a hierarchical URL scheme at all.
Even if all URLs under a single domain could be accessed and stored, blurriness would occur under a host-centric boundary definition when the amount of meaningful requests and responses approaches infinity; this can easily be the case with for example database-driven sites where the relationship of request and response is dependent on computation of complex or unknown states on client and server. For instance, while given enough time and resources, it is possible to preserve every map tile graphic and its URL from a mapping service such as Google Maps by web archiving practices, it is impossible to store all possible requests and responses for lat/long coordinate queries to the Google Maps service.
New web collecting mechanisms and concepts, as implemented by Rhizome's tool Webrecorder (Perricci, 2017) and described by Hawes as “the act of archiving,” have introduced the possibility for creating contextual object boundaries, taking into account the perspective and intentions of the curator as a web user, shifting to storing HTTP traffic occurring during time-bound web sessions rather than namespace-defined delimiters. Actions performed in collecting sessions are recallable in access sessions:
The recording represents the curator's own vantage point reflecting the specific _requests_ made (commands, clicks) — and tracing the actual path taken. Unlike a traditional web-crawler, which is provided with a seed URL and automated to explore a site in full, Webrecorder is curator-operated: subjectivity and selection replace automation and exhaustivity. (Hawes, 2018)
This approach has the curator defining a synthetic boundary inside a boundless object, consciously discarding any performative potential outside of it, creating an attainable and verifiable goal: the actions performed during the capturing of the object need to be reproducible in the future.
Digital preservation practitioners have been dealing with variable objects by defining “significant properties,” core attributes of objects that should resist change over time even as performance environments change (Laurenson, 2014). With blurry and boundless objects discussed above, all of which expose aspects of infinity, unavailability, and unknownness, significant properties cannot provide much guidance for preservation purposes. For instance, countless works of net art achieve their affect by performing computation on resources from all over the web, and make use of or are fully located on black box services. The same might be true for a modest Excel sheet that attempts to load tabular data from a remote data source.
Describing significant properties of that kind means setting up preservation projects for failure, as there isn’t a way to meaningfully address blurry or boundless objects within this framework. In the field of art conservation, this often leads to “remakes,” in which large parts of a dysfunctional piece are re-created from scratch. This process represents a very large “actively maintained surface” with big economical demands.
This paper suggest to define _reproducible properties_, clearly defining what potentials or “paths” will be preserved from a curatorial perspective, and then to take the required preservation actions. These properties shall be reproducible in an objective manner, to guide the development or adaptation of future preservation strategies and technical systems.
The tighter such a performative boundary is drawn and the fewer reproducible properties are included, the closer an object moves from full performative preservation towards documentation.
The following curatorial measurements can be used to define and preserve reproducible properties:
1) Network traffic occurring during certain performance states of a software could be recorded, bundled with the preserved software, and be made available to the software during re-performance.
2) The breadth of possible execution paths varies throughout performance sessions of an object. A curator could identify a state in a session that is the most relevant and snapshot the environment during performance, in a “switched-on state.” With that snapshot being stored and used for re-enactments later, the object becomes pre-configured with resources available and input occurred before the freeze, making it more independent from these sources being available in the future.
3) The access environment is modified to remove the ability for the user performing certain actions; for instance, an emulation framework would prevent local input events like mouse clicks or key presses to be routed to the re-performed environment under certain conditions, preventing the user from bringing the system into undesired states.
4) The environment used for re-performance is configured or modified to reduce the potential breadth of certain computational performances by deactivating any interaction with external entities that are determined to be not relevant for the preservation goal.
5) Finally, the mis-en-scène of a preserved object could highlight certain affordances to the user inside the environment that lead to a successfully bound re-performance, while not technically preventing other paths from being explored. For instance, a browser could be configured to launch with a certain home page, a limited set of icons could be placed in the center of an otherwise empty desktop, and so forth. Rhizome has used this technique for its online exhibition program “Net Art Anthology”[1]. Similar techniques have been successfully used for gallery space (“offline”) exhibitions of net art
(Espenschied, 2016).
None of these methods provides a generalized solution, but can be applied in combination to reduce an object’s performative breadth for preservation purposes. Specifically, the shortcomings are:
A major category of software prone to preservation will be (mobile) “apps.” Most apps have been designed to run on “always-on” network-enabled devices, whereas the locally available executable only provides a viewer for remote content. Additionally, some apps rely on an extended user-context such as GPS data. An out-of-context execution of apps would then result in a defunct application. These circumstances are not tied to the rise of mobile apps: Defining object boundaries based on locally running binaries has already caused the effective loss of software created in the late 1990's, when Windows’ OLE architecture made web resources easily available to developers (Moulds, 2017).
Also, the “traditional” software industry is changing, as app stores for desktop computers (e.g. Apples OSX App Store), so called in-application purchases with only basic software versions shipped to customers and further features being added on-demand, as well as remote applications like Office365 or rolling releases like Windows 10 are becoming more popular. These new kinds of objects form a huge body of novel challenges to the preservation community, requiring not only technical analysis but increasingly curatorial agency. The aspiration of fully preserving any type of computational performance has to meet a reality of highly complex, networked ensembles, limited access to core components, highly context-dependent operation, and—as always—limited resources for preservation.
To address these challenges the concept of object boundaries and reproducible properties can help to define preservation intentions and to measure the (future) success of preservation actions.
Working with boundless objects entails the option and sometimes need of future changes, additions or adaptation of preservation strategies. Reproducible object properties are therefore important to protect the results of past preservation activities. Furthermore, boundless objects require closer collaboration—different preservation initiatives may be contribute different aspects or views on the same object.
Similar, (better) orchestration of different preservation strategies is a precondition. It is difficult to imagine that there is a single strategy or technological solution to preserve complex, distributed and multi-faceted objects. Web archiving, emulation and migration are all able to contribute to an object's fidelity and future quality of access. A coherent way of orchestrating such objects—time context, technologies used, etc.—is yet to be defined.
[1] See https://anthology.rhizome.org/
Edits 2022-10-30:
Dragan Espenschied, Klaus Rechert. 2018. “Fencing Apparently Infinite Objects.” iPRES 2018. DOI 10.17605/OSF.IO/6F2NM. https://osf.io/6f2nm/. gemini://despens.systems/fencing-apparently-infinite-objects/.
-- 310 views, last modified 2022-10-30