| Alan Kaplan
Department of Computer Science Clemson University Box 341906 Clemson, SC 29634-1906 USA kaplan@cs.clemson.edu |
Jack C. Wileden
Department of Computer Science University of Massachusetts Box 34610 Amherst, MA 01003-4610 USA wileden@cs.umass.edu |
Our research is directed toward developing computer science foundations for transparent data exchange and integration, in particular, exploring both theoretical and practical aspects of this problem domain. The primary objective of our work is to hide the boundaries or seams between heterogeneous data repositories or between data repositories and applications that need access these repositories. Developing appropriate theoretical and practical foundations for transparent data exchange and integration results in software and data that it easier to develop, reuse, share and maintain.
In the remainder of this paper, we give an overview of our research program. In Section 2 we outline some formal models of type compatibility, type safety, and name management. We also describe how these models are used to evaluate and compare various approaches to data exchange and integration. In Section 3 we discuss the practical aspects of our work, specifically the development of a new, highly transparent approach to data exchange and integration, called PolySPIN. A collection of prototype, automated tools supporting the use of the PolySPIN approach, as well as our experience with their application, is also described. The paper concludes with a summary and directions for future work in Section 4.
To improve our understanding and assessment of suitability of different interoperability mechanisms, we have begun to develop a taxonomy of important facets of interoperability approaches. (See [KW96, BKW96] for details.) One critical aspect of this taxonomy involves the time at which the decision to share data components is made. Specifically, three distinct timing scenarios for interoperability decisions can be characterized by the relationship between the relative times at which the sharing or shared components are developed and the decision to share them is made:
Our work also includes the development of formal models of name management. Name management, i.e., how a computing system allows names to be established for objects, permits objects to be accessed using names, and controls the availability and meaning of names at any point in time, is fundamental to almost all aspects of computing. It is of particular importance in the domain of data exchange and integration since how an application assigns and controls the meaning of names has a direct impact on the ease with which other applications can share that data. At present, applications typically rely on ad hoc mechanisms that hinder sharing and integration efforts.
For example, an application may create a collection of named data files. The bindings (i.e., the name-file pairs) may reside in a single directory or be organized in a directory hierarchy. Other applications that require access to these data files must have, therefore, a precise understanding of the way these data files are named and organized in order to use them properly.
We have developed a formal model of name management called Piccolo, with an operational semantics based on evolving algebras [KW95]. The model allows for the precise specification of some formal properties and analyses of name management approaches. Using Piccolo, we can precisely express the name management requirements for applications, thus facilitating the sharing and integration of (named) data.
We are also developing a suitable formal foundation, based on concepts from signature matching and object-oriented type theory, that can aid in reasoning about, and implementing support for cross-language type compatibility [BRW97,B98]. Programming language type systems are extremely rich and flexible mechanisms for describing complex data and their relationships. However, once an application has defined and created complex data using a specific programming language's type system it is extremely difficult to access that data from applications written in different languages. Our formal model allows developers to precisely express compatibility relationships between object-oriented types that are defined in different programming languages.
For example, several C++ and CLOS (Common Object System Lisp) applications may define and create geographical maps using the native type systems of the languages in which they are written (i.e., C++ and CLOS, respectively). Another application (either written in C++ or CLOS) may need to access and manipulate these maps. A formal understanding of the differences and similarities between the C++ and CLOS map data definitions (i.e., type definitions) provides a basis for allowing other applications to access and process both the C++- and CLOS-defined maps.
The PolySPIN approach provides a transparent interoperability mechanism for programming languages. More specifically, it provides support for polylingual interoperability[KW96], where applications can access compatible types defined in distinct languages as if they were defined in the language of the application. The fact that the types are defined and implemented in a different programming language is hidden from the application. Related to this mechanism is PolySPINner, a collection of tools that automates PolySPIN and supports type-safe polylingual interoperability [BKW96,B98,K96].
To better understand the concept of polylingual interoperability and its relationship to the PolySPIN mechanism and the PolySPINner toolset, we expand on the geographical map example outlined in Section 2. In this scenario, a data repository is populated with geographical maps, some of which have been defined in C++ and others defined in CLOS. (Data repositories provided by object-oriented databases [WBT92] provide such capabilities.) Later an application, perhaps written in CLOS, needs to access and manipulate these maps. Approaches that support polylingual interoperability allow the application to treat the maps as if they were all implemented in CLOS (i.e., hiding the fact that some of the maps are also implemented in C++). Figure 1 illustrates the concept.

Although approaches such as standard file formats, relational databases, IDLs (e.g., CORBA, COM) support certain aspects of polylingual interoperability, our approach offers several advantages over such mechanisms:
For example, suppose both the CLOS and C++ class (type) definitions for the map objects provide operations for computing Euclidean distance. The re-engineered version of the implementation of the CLOS operation will first check whether the actual object to which it is being applied is implemented in CLOS or C++. If the object is implemented in CLOS, then the CLOS code is executed just as it would have been prior to PolySPINner's re-engineering. If the object, however, is implemented in C++, then the parameters to the CLOS operation are mapped to corresponding C++ representations and the C++ operation (corresponding to the invoked CLOS operation) is invoked. It is important to understand that the code supporting the interoperability logic is maintained by the PolySPIN mechanism (e.g., language implementation information) and generated by the PolySPINner toolset. Equally important, this interlanguage code is transparent to application. Thus, a CLOS application views and accesses all data as CLOS data. Furthermore, operation si gnatures remain unchanged and modifications are made only to the implementations of the relevant operations. With respect to software engineering concerns, this means that only the operations need to be compiled and therefore there is no impact on existing applications.

[BRW98] Barrett, D.J., Ridgway, J.V.E. and Wileden,
J.C.,
Polylingual Object Usage Made Easy ... and Safe (submitted)
[B98] Barrett, D.J.,
Polylingual Systems: An Approach to Seamless Interoperability,
PhD Thesis, Department of Computer Science,
University of Massachusetts, Amherst,
MA, February 1998.
[K96] Kaplan, A.,
Name Management in Convergent Computing Systems: Models, Mechanisms and Applications,
PhD Thesis,
Technical Report TR--96--60, Department of Computer Science,
University of Massachusetts, Amherst,
MA, May 1996.
[KMRW96] Kaplan, A., Myrestrand, G.A., Ridgway,
J.V.E. and Wileden, J.C.,
Our
SPIN on Persistent Java: The JavaSPIN Approach,
Proceedings
First International Workshop on Persistence and Java, Drymen, Scotland,
September 1996.
[KRW97] Kaplan, A. Ridgway, J.V.E. and Wileden,
J.C.,
Why IDLs Are Not Ideal,
Ninth IEEE International Workshop on Software Specification and Design, Ise-Shima, Japan, April, 1998..
[KW96] Kaplan, A. and Wileden, J.C.,
Toward
Painless Polylingual Persistence,
Proceedings Seventh International Workshop on Persistent Object
Systems,
Cape May, NJ, May 1996,
[KW95] Kaplan, A. and Wileden, J.C.,
Formalization
and Application of a Unifying Model for Name Management,
Third Symposium on the Foundations of Software Engineering,
Washington, D.C., October, 1995.
[KW94] Kaplan, A. and Wileden, J.C.,
Conch:
Experimenting with Enhanced Name Management for Persistent Object Systems,
Proceedings Sixth International Workshop on Persistent Object Systems,
Washington, D.C., October, 1995.
[RTW97] Ridgway, J.V.E., Thrall, C. and Wileden,
J.C.,
Toward
Assessing Approaches to Persistence for Java,
Proceedings
Second International Workshop on Persistence and Java, Half Moon
Bay, CA, August 1997.
[WWRT91] Wileden, J.C., Wolf, A.L., Rosenblatt, W.R.
and Tarr, P.L.,
Specification Level Interoperability, Communications of the ACM,
34:5, May 1991, pp. 72--87.
[WBT92] Wells, D.L., Blakely, J.A. and Thompson, C.W.
Architecture of an Open Object-Oriented Database Management System,
IEEE Computer, 25(10), October, 1992, pp 74--82.