Usability - Productivity - Business - The web - Singapore & Twins

What's your Data Definition Language?

After recent insights in data structures I was wondering what's the right format to describe data models. Its a thought almost alien to Notes developers since we " just add what we need". Nevertheless having a data model eases maintenance and documentation (which brings up the question what comes first: the application or the data model). Clarity about data models also fosters the contract first way of developing applications where agreements about interfaces and data structures are made up-front before implementation. There are a number of contestants available to choose from (with no claim to completeness):
  • SQL Data Definition Language. Basically that's all the CREATE TABLE statements you use to create your RDBMS tables and views. Advantage of SQL DDL is its closeness to RDBMS, which makes implementing the described data easy, the ability to create the definition in a simple text editor but also rich visual tools (like ERwin which was one of the first of its kind) and the capablity of other DDL (like UML) to read/write SQL DDL. The biggest drawback in a world where SQL no longer rules alone is its closeness to RDBMS and its lack of support for transmittable data (think web service, sync or two folded MVC pattern). Today I would say SQL DDL is hardly the source of your data model anymore, but an output from one of the other DDLs (most likely UML)
  • UML and its XML representation. The Unified Modeling Language is designed to do much more than describing data. Besides data one can model other structures (like components, deployments or packages), behaviours and interactions. There is a huge offering of UML tools available on the market: Rational System Architect, Visual Paradigm, DIA (or Visio ), Violet, UMlet, Altova UModel (Windows only) and many more. UML can output almost any other format including export to XML Schema. There is also a lot of literature available about this topic.
    While UML certainly covers all aspects of modelling it doesn't come without drawbacks. Working with UML does require a dedicated tool or plug-in, so it is hard to quickly getting stuff done (graphics can get in the way of execution speed). UML offers 2 ways to model data: Entity Relationship Diagrams and Object Diagrams. None of them really fit the document working style we find in web services or web 2.0 applications, so some of the XML Schema generated look a bit "hammered into shape". Furthermore I haven't seen a lot of tooling that would verify data to conformance with an UML model at runtime.
  • Eclipse EMF models. EMF has been designed mainly as a modelling framework to generate code (and thus other data definitions) out of its models. EMF data itself is stored as XML and there is an impressive list of documentation available online and in print. Nathan is a big EMF fan. One would most likely work in Eclipse to work with EMF, I haven't explored the suitability for text editing. There also are a lot of transformations available from and to EMF, which are pending my attention.
  • XML Schema (including RELAX.NG and DTD). When you are comfortable reading and writing XML, using XML Schema comes natural. While you perfectly well can write it in a plain text editor, you most likely will use at least an XML aware editor like oXygen XML, Stylus Studio, XMLSpy (which all run inside Eclipse and are Schema aware), Eclipse's base XML Editor or any other of the countless offerings. If you deal with SOA you will realise, that WSDL, the contract language of SOA, uses XML Schema in its bowels. XML Schema also can be used to validate documents on the fly without the need to first generate code out of it. I like the capability to define my own data types and to mix and match existing Schemas to fit my specific needs. Custom data types would be classes/objects in UML and are (to my knowledge) absent in SQL DDL. Since XML Schemas live at a (usually public) URL and there are a lot of <="">existing Schemas available already (diverse topics like music or eBusiness (UBL) etc) duplication of efforts can be reduced. XML Schema can be transformed to SQL (I would of course rather suggest to use PureXML, but that's a story for another day)
  • I didn't find any tool to model/verify JSON data. Since JSON is by definition schema free. However I expect, that some generic JSON schema will appear over time, the declaration driven validation is too valuable.
Looking at Notes and Domino, the XML nature of XPages, ATOM and RDF, Activity Streams and the IBM Social Business Toolkit I made XML Schema my first choice, however I'm ready to be convinced otherwise.
What is your Data Definition Language?

Posted by on 06 March 2011 | Comments (2) | categories: Show-N-Tell Thursday Software XPages


  1. posted by vector on Sunday 06 March 2011 AD:
    thanks for posting this article Emoticon smile.gif
  2. posted by Dan Sickles on Monday 07 March 2011 AD:
    Great post Stephan.

    JSON Schema been brewing for a while:

    { Link }

    For Notes, forms have been used as schema to varying degrees as plain ol' Notes forms or DXL.

    With the use of ORMs, classes can be the DDL. Many modern web frameworks do this (Rails, Grails, Django, Lift...)

    I'm currently working with iOS Core Data, a UMLish object graph model over sqllite or binary. XCode has a graphical editor to build the model.

    As Tim Tripcony mentioned in passing in his hacking DDE sesion at LS, pointing EMF at the DXL schema is *really worth exploring*. Read that again. Once more.