Web Magazine for Information Professionals


Matthew Dovey outlines an Object Oriented approach to metadata.

In the last few decades, notions of computer science have undergone a number of paradigm shifts. Underlying the majority of these is the concept of object-orientation, namely the recognition that the decoupling of data and the code that acts upon them, is based on an artificial distinction, and that models which combine the data and code into distinct "objects" offer both more intuitive and a functionally richer conceptual entities. This paradigm shift can be particularly seen in three areas: programming languages, databases and user-interfaces. Within programming, the evolution has been from procedural based languages where the code was encapsulated within procedures and kept separate from the data, to object-oriented languages where the program is built up of self contained "objects" which encapsulate both the data and the actions of the items being modeled. A similar shift has occurred in database development, where it has been recognized that it is often not appropriate to separate the data from the business processes that act upon them, and it is better to store the processes with the data, not least for preserving integrity of the data. Both of these paradigm shifts have occurred at a level not usually perceived by the end user, but these have, however, provided the foundations for a shift in the metaphors of the user interface, namely the move from application-centric to object-oriented, or more descriptively, document-centric user interfaces. Whereas in the past the user was aware of both the data, in the form of files or documents, and the tools required to work with that data, in the form of applications, in a document-centric user interface, the user is merely aware of the documents, and the document itself is responsible for running the code or application appropriate to the nature of the data and the task in hand.

There has always been a gray area between what is hardware, and what is software, as anything that can be done in hardware can be emulated in software, and to a certain degree vice versa. Hardware and software in fact form two ends of a continuum. In recent years the rapid increase in processing power, when not being eaten by superfluous user interface "enhancements", have lead to more emphasis on emulating other platforms, for example running MS Windows software on a Macintosh platform and vice versa. This has ultimately lead to the concept of the virtual machine, realized in the form of the Java VM, which aims to achieve the nirvana of writing code that can run anywhere, independent of platform. This concept when applied to object-oriented code, leads to "objects" which can run their code anywhere, and ultimately to agent-oriented systems, where the object encapsulating both data and code can take autonomous actions as they move (or are moved) across the network, independently of the platforms through which they move.

However, despite these changes in computer science, there has been no corresponding change in the concepts underlying metadata. Currently the data is "tagged" to indicate its construction and content. The paradigm behind this approach is application centric in that the recipient of the data is expected to have an application that can parse and interpret the metadata tags. This can result in interoperability problems between different metadata standards and also between different variants of modifications of the same standard. The recent RDF proposals attempt to address this issue by constructing a hierarchy of tagging schemas, so that an application which does not "understand" a particular schema can make educated guesswork based upon other schemas in the hierarchy. As can be appreciated there are limitations with this approach, and questions as to how feasible such an approach is in practice. There are also issues which the RDF proposals do not address in the maintenance of these schema hierarchies.

The application of the object-oriented or document-centric paradigms to current metadata practices leads to the concept of intelligent documents, where the document not only contains the metadata tags detailing the construction and content of the document, but also the application code for interpreting and manipulating this metadata. In effect the document is introspective in that it understands itself. The use of virtual machine concepts would enable such application code embedded in the document to be platform independent, and agent-oriented paradigms would enable the document to take autonomous action. Some applications of such intelligent documents include:

* Platform and application independent metadata "tagging" in that the document "understands" itself and its own construction and content.

* The document can autonomously assist the user in navigating, understanding and even manipulating its own content

* The document could autonomously communicate with other intelligent documents, for example in automatically organising themselves when placed in an object-oriented database or container, or in dynamically establishing and maintaining links with other relevant documents (relevance being defined within the context of the users requirements)

All the above however is only a vision. Whilst it may sound an attractive concept in theory this may not necessarily mean that it works well (or at all) in practice. Even given that such an approach can be achieved and offers advantages over other approaches, there are still practical issues that would need to be addressed. It may be that current technology is not yet mature enough for such a vision to be realized; there are still question marks over the future of the Java virtual machine technology. The blurring of data and code, which is rapidly occurring in many areas, also brings with it, sometimes exaggerated and sometimes justified concerns as regards security. However in conclusion, I think that this is a direction which warrants some investigative research.

Author details

Matthew J. Dovey