«A Confederation of Tools for Capturing and Accessing Collaborative Activity Scott Minneman Xerox Palo Alto Research Center (PARC) 3333 Coyote Hill ...»
Perhaps the most successful implementation decision in Coral was to develop it such that minimal buy-in was required for an application to begin participating in the Coral framework. A program that wished to become a indexing client simply needed to locate a master WhereWereWe object and submit Events. Simple dedicated indexing clients can be written in less than a page of Python; piggybacking on Tivoli or Emacs requires minimal initial programming investment. This meant that, after minimal modifications, programs could participate in activity capture settings without major interruption to their own research agendas.(15) Coral's basis as a loose confederation has proved to be very powerful, because applications can choose to participate at a variety of levels.
Generalizing and extending the infrastructure turned out to be more difficult than desired. For example, the inclusion of new media types, perhaps higher quality audio or video, required that WhereWereWe be recompiled. While not fundamentally a big deal, this ran counter to the spirit of a collection of loosely connected elements. We have since redesigned and repartitioned the Coral infrastructure to include the notion of independent media servers which implement and serve all media-specific functionality, using something like the MIME types mechanism to determine what media server is needed for a particular stream, and a broker to connect to an existing instance or launch a new one.
Time is a slippery quantity in the WhereWereWe internals, the API, and in many of the application programs.
Although the use of absolute time makes many problems simpler, the complexity of time in the tools does not vanish.
Indexing applications used during playback, creating post hoc indices, obviously create marks whose creation time is not coincident with the time they mark--both times need be retained, but representation is problematic. Further, it is clear that client applications may need the ability to access the future with events, as they may need to begin the event generation process before the actual event occurs. While the application programmer can easily reference these quantities using absolute time and the current WhereWereWe API, supporting these capture and access concepts is a significant implementation challenge.
While Coral's confederation approach has worked well for getting a suite of diverse applications working together, it has resulted in a few problems. Applications have the opportunity to stay blissfully unaware that they are participating in an activity capture setting. We need to provide lightweight ways to keep the tools adequately coordinated. For example, WEmacs beams text up onto the current Tivoli page, submits it to WhereWereWe as an event, and retains a local copy in its buffer. Modifications made in each of those locations is not necessarily reflected in the others. Our current suite of applications has evolved a set of ad hoc interfaces for portions of this functionality (e.g., WEmacs to Tivoli for beaming does not go through the WhereWereWe infrastructure). We are working on an extended notification system--one that includes events--that will help with some of these difficulties, but a general solution to this problem remains a major challenge.
At the level of applications, we are still gaining experience with various types of functionality and their numerous interactions. WEmacs and Tivoli offer a solid start in using multimedia capture in simple capture settings, but are both somewhat lacking in the access setting. If hooking Tivoli to WhereWereWe spotlighted how any program with a timebased history is already 90% of a marking client, then writing access applications is revealing how everything is potentially a stream. If we want Tivoli or WEmacs to look the way it did when a particular utterance was made, then the best way to have that happen is for Tivoli or WEmacs to act as players. Tivoli has already been augmented with some of this functionality, working in both a playback mode, which animates the exact appearance and construction of past states, and a "bouncing-ball" mode, where a cursor points to the area where drawing or editing was happening.
Once more and more of the functionality of the capture and access tools is exported via recorder and player interfaces, we gain a uniformity that can be exploited to solve other interface problems. Currently, using the suite of tools for review is plagued by having a variety of applications that each may want to control the playback of assorted multimedia streams and each other. This coordination has been the source of many of the ad hoc inter-process communication paths described above or compromises in user interface generality. Once these programs all appear as players, they can then more easily be gathered into composite players and uniformly handled.
The unification of tools into composite streams quickly gathers other potential uses. Recording the activity of a composite stream, i.e., its constitution and the messages it distributed to its member objects, will allow us to playback a playback session. This is potentially a crucial notion when a user wants to review the accessing done by another user (e.g., seeing what a close colleague found interesting in a recorded seminar). These situations quickly bring up the time and past- vs. present-event subtleties discussed above.
We currently have minimal query support; application programmers end up writing code to sift through all the Events for a Session in order to find those that they want to represent. As we shift to a greater focus on accessing, we will need finer grain query support for getting subsets of events and sessions. Furthermore, we will need to devise formalisms for formulating and performing temporal queries.
Although video is a supported datatype in the infrastructure and current suite of tools, we have had little chance to adequately explore its utility. The low resolution and framerate of our current video offering leaves much to be desired, particularly in settings where documents or detailed physical objects are of interest. On the other hand, the sheer size of video streams is, in large part, the root of our inexperience with video, so improved quality will need to be balanced against the costs of transmission and storage. We are improving the quality and reliability of the video datatype in this current reworking of our infrastructure, and expect to be using more prevalently in the near future.
The Coral architecture and the particular tools described here have already proven remarkably flexible, and are proving their utility in regular use. As the examples illustrate, the WhereWereWe API, coupled with ILU, makes it relatively painless to explore multimedia recording, playback, and indexing in a variety of settings. The Coral suite of tools is providing us with a foundation for interesting applications and has supplied invaluable fodder for our current infrastructure revision efforts. We think there are further gains that can be made, particularly in areas involving automatic and semiautomatic indexing, studies of use and refinement of the applications, and in browsing tools for timestream data.
Table of Contents Related Work The work has its historical roots in CoLab [Stefik et al., 1987] and Media Space [Stults, 1986]. From the former came a focus on meeting support tools and from the latter a focus on multimedia communications environments. What has emerged is neither the intersection nor union of those two projects; in fact, few multimedia systems have aimed at recovering casual information from everyday work settings. Most research in multimedia systems has concentrated on either real-time interaction or static, authored, multimedia documents.(16) A couple of projects have superficially similar motivations, but tackle other aspects of the problem. There are electronic meeting rooms to enhance decision making, multimedia systems for instructional presentation or usability testing, systems that augment human memory recall, video-on-demand systems, video conferences, and hypermedia systems with audio and video to organize records of informal activity.
Electronic Meeting Rooms. Early electronic meeting rooms, such as the EDS Capture Lab [Mantei, 1989], attempt to provide computer support for meeting process. The CaptureLab and our project both support and extend some otherwise paper or whiteboard-based activity using computational tools. However, CaptureLab was focused on decision-making and more formal meeting process, while our project involves making records from informal aspects of meeting room activity.
More recent meeting-room systems that have included multimedia focus on making and accessing recordings of technical presentations; e.g., Bellcore's STREAMS [Cruz and Hill, 1994] is aimed directly at this application.
Importantly, these tend to be monolithic systems with a clearly defined model of use which all tools buy into: there is a speaker/audience model of setting, they are integral with a multimedia telecommunications system, and notetaking is a purely private activity outside the scope of the system. Consistent with the focus on presentation, such systems provide individuals with means of locating and displaying meetings in remote or post-facto settings. In contrast, our system does not distinguish or privilege particular users' activity and integrates through a confederation strategy.
Memory Aids. One way in which the recordings and notes are employed is to improve recollections of the meeting; a couple of systems have tackled this problem area directly. The IBM We-Met system [Wolf et al., 1992] started down this path; followed by H-P's Filochat [Whittaker et al., 1994], which used a pen-based computer and digital audio recording to provide a single user with a means to take notes in a meeting and, by selecting the handwritten note, replay the recording made when the note was taken. Although discussing many issues common to our effort, the Filochat work, with its emphasis on personal use, excludes many aspects that arise when that same functionality becomes collaborative and is offered as a network service.
Pepys [Newman et al., 1991] kept an automatic diary of offices visited and colleagues encountered using a network of sensors and communicating identification badges; it did not employ recording--in our parlance, it created events but not streams. Although the system demonstrably stimulated recall of some events, remembering is but one step in recovering content from casual activity.
Xcapture [Hindus et al., 1993] is a short-term memory device; by constantly rerecording the last few minutes of audio, it is possible to replay something that was just uttered. This scheme obviates the need for marking but requires immediate action on the part of the user.
Video on demand. Video on demand systems allow a user to select a video clip (perhaps a long clip, like a movie) and have the video, audio, and perhaps supporting documents, be instantly available for viewing [Rangan et al., 1992;
Rowe and Smith, 1992]. The data usually can be played back at various speeds and with random access. These systems concentrate on allowing synchronous access to data recorded at a previous time.
Teleconferencing. Video conferencing systems tend to be modelled on telephony; they give the user the capability to conduct a face-to-face type interaction with a user a remote location [Fish et al., 1990; Watabe, 1990; Ahuja and Ensor, 1992]. These systems do not usually allow the user to review the session; the data is not stored in the system.
These systems focus on connecting people who are synchronized in time.
Multimedia documents. Multimedia document systems focus on the construction, layout, and retrieval of mixed media documents, especially those containing video or high-resolution images [Buchanan and Zellweger, 1992;
Hardman et al., 1993]. These systems focus on the presentation of previously constructed data, allowing asynchronous communication between author and reader. Of particular note in this category is Raison d'Etre [Carroll et al., 1994].
This system did not augment the capture of activity, but rather organized fragments of recorded video using an issuebased hypermedia framework. The source material consisted of video recordings of interviews with members of a design project that were then manually segmented and categorized. Thus, segment retrieval was conceptualized as a pre-structured (rather than emergent) activity, organized around the content and not an augmented one based on activity indices.
Table of Contents
Summary This work has demonstrated that users can reap considerable benefit from appropriately designed and deployed activity capture and access technologies. Working closely with a set of motivated users has done much to hone our notions of what such systems might do. Further experiences of how these users' work practices and our technologies have coevolved over the 18 months we have been working together will be reported elsewhere.
The Coral confederation of applications resulted from a mixture of top-down and bottom-up development; the flexible approach permits expedient changes to serve the needs of our users, while supporting a smooth transition from prototype to architectural changes. The confederation approach has served us well over the course of the project, but elements of the system are currently being redesigned to better support the uses and demands that have emerged from our expereinces with real applications and actual users. In particular, the move to media servers will permit easier exploration of new datatypes, and an improved notification system should ease application coordination.
We are indeed shifting some attention from the capture setting to the range of accessing that might be useful for a population of users. A wide range of scenarios surface here, from looking over a meeting that one missed to searching for a remembered comment to maintaining a group notebook. These applications take further advantage of the network and multi-user aspects of the infrastructure, allowing us to investigate the power of merging information from multiple users' marking activity and derived indices.
Activity capture and access via the recording of time-based data has turned out to be an extremely rich area with diverse research threads--speech signal processing, pen-based user interfaces, distributed object systems, real-time multimedia indexing, and so on. The niche of near-synchronous and pre-narrative multimedia has proven to hold opportunities for both novel applications and truly useful functionality.
Table of Contents