External data representation and marshalling
SENG 41283 — Distributed and Cloud Computing
External data representation
The information stored in running programs is represented as data structures. In a distributed system information in messages transferring between components consists of sequences of bytes. So, to communicate any information these data structures must be converted to a sequence of bytes before transmission. Also in the arrival of messages, the data should be re-converted into its original data structure.
There are several different types of data that use in computers and these types are not the same in every place that data needed to transfer. Let’s see how these types differ from one to another.
- Integers have two different types — big-endian and little-endian
- Floats — Different representation in different architectures
- Characters — ASCII and Unicode
To effectively communicate these different types of data between computers there should be a way to convert every data to a common format. External data representation is the data type that act as the intermediate data type in the transmission.
Marshalling is the process of taking a collection of the data structures to transfer and format them into an external data representation type which suitable for transmission in a message.
Unmarshalling is the inverse of this process, which is reformatting the transferred data on arrival to produce the original data structures at the destination.
Let’s find how this external data representation works in different use cases.
CORBA’s common data representation
Common Object Request Broker Architecture aka CORBA is a specification developed by Object Management Group (OMG) currently the leading middleware solutions in most distributed systems. Its a specification for creating, distributing, and managing objects in distributed networks. CORBA describes a messaging mechanism by which objects distributed over a network can transfer messages with each other irrespective of the platform or language used to create those objects. This enables collaboration between systems on different architectures, operating systems, programming languages as well as computer hardware.
CORBA’s Common Data Representation specification includes 15 primitive data types and other constructed types.
Java’s object serialization
In Java remote method invocation (RMI), both objects and primitive data values may be passed as arguments and results of method invocations. In Java, the term serialization refers to the activity of flattening an object(An instance of a class) or a connected set of objects into a serial form that is suitable for storing on disk or transmitting in a message.
XML (Extensible Markup Language)
XML is a markup language that was defined by the World Wide Web Consortium for general use on the web. XML was initially developed for writing structured documents for the web. XML is used to enable clients to communicate with web services and for defining the interfaces and other properties of web services.
In CORBA’s common data representation and Java’s object serialization, the marshalling and unmarshalling activities are intended to be carried out by a middleware layer without any involvement on the part of the application programmer. Even in the case of XML, which is textual and therefore more accessible to hand-encoding, software for marshalling and unmarshalling is available for all commonly used platforms and programming environments. Because marshalling requires the consideration of all the finest details of the representation of the primitive components of composite objects, the process is likely to be error-prone if carried out by hand. Compactness is another issue that can be addressed in the design of automatically generated marshalling procedures.
In CORBA’s common data representation and Java’s object serialization, the primitive data types are marshalled into a binary form. In XML, the primitive data types are represented textually. The textual representation of a data value will generally be longer than the equivalent binary representation. The HTTP protocol, which is described in Chapter 5, is another example of the textual approach.
Another issue with regard to the design of marshalling methods is whether the marshalled data should include information concerning the type of its contents. For example, CORBA’s representation includes just the values of the objects transmitted and nothing about their types. On the other hand, both Java serialization and XML do include type information, but in different ways. Java puts all of the required type information into the serialized form, but XML documents may refer to externally defined sets of names (with types) called namespaces.
Although we are interested in the use of an external data representation for the arguments and results of RMIs and RPCs, it does have a more general use for representing data structures, objects or structured documents in a form suitable for transmission in messages or storing in files.