An XML-Based approach to Application Development

David Moskowitz, President, Infoblazer LLC.

January 2006

What is XX?

This article describes our approach to application development, which we call XX. The methodology was originally called Extreme XML. However, with the adjective “extreme” being applied to everything from team programming methodologies to soft drinks and deodorant, a name change was deemed necessary.

Our development methodology is both UML and XML-centric. Most of our development work is currently done in the Java J2EE platform, though the principles discussed can also be applied to other platforms, such as Microsoft ASP. The concepts behind XX are platform and language independent and apply to application design in general and XML development in particular

The Origins of XX

My first use of xml was in 1999 in the creation of a web based ActiveX reporting/charting application. This component was implemented as an XML web service, retrieving XML data from a back end server to the web client, where it was processed and formatted.

The most common approach at that time to build such an application was to install (Microsoft) ADO client libraries on the client computer, an ActiveX control in this case, and access remote data in a client-server fashion. Another common approach, taken a little later on, was to use something like Microsoft RDO, which lets a web server act as a proxy for connecting to the database. This approach alleviated the problem of excessive data connections (and related performance issues) to the database. While performance improved somewhat, the problems distributing a proprietary client (ADO/RDO) still remained. In addition, that approach is also only compatible with one company’s web server, namely Microsoft’s, which is counter to the nature of the web.

One other, thinner and less proprietary approach I had seen around that time had an application query a web site and the web site return a comma delimited list of data (flat file). The Problem with flat files in general is that they are not amenable to format changes and the format is difficult to describe (usually done in a separate Word documents). XML proved to be the perfect solution for this problem. It seemed a logical step, though somewhat of a chasm crossing leap at the time, to have the data be returned as XML instead of a flat file stream. This proved to be advantageous for the application and change my approach to development.

Returning to the reporting application mentioned earlier, the ActiveX component instantiated an internet control that queried a web server using a simple HTTP Get. Based on the URL parameters, the web server queried the database and converted the resulting recordset to XML, the XML being generated by manually processing each record in the resultant Recordset. The xml was streamed back to the client using a simple ASP “response.write” output statement. Later on, Microsoft built on some sort of XML capabilities into ADO and SQL Server to automatically convert a recordset to XML or to retrieve an SQL query as XML, though these never proved satisfactory in their actual implementation and result.

In later applications, I added the ability to not only retrieve data as XML but to send data as XML as well, also over HTTP, this time as a POST. This was a precursor to current SOAP web services approach, but admittedly in an ad-hoc, non standardized manner.

In subsequent years, thin client approaches to web applications became more common. However, fat clients were, and are still, used when a more interactive client side experience is needed.

While this example illustrates a simple web service, I found I was spending much of my time translating database queries into xml, then manually parsing the XML and outputting HTML with ASP code. It felt quite fortuitous when I discovered a language especially geared towards processing XML, namely XSL. Using XSL, I could apply a declarative template and have the necessary HTML generated in a non-procedural manner. In some cases, the performance of the transform was a significant improvement over the hand coded approach.

The approach described above eventually developed into the XX framework.

XML Based Development

XML solves many programming problems. The more we work with it, the more problems we find where it can be applied. These XML based solutions have generally proved superior to other solutions.

Many of the advantages of XML are well know. XML is a self describing, text based representation of data that is both human and machine readable. The XML format can be, used universally across programming languages and platforms. The format is naturally a choice for cross platform data exchange.

XML is used as a universal data storage model. Many applications are using XML as the native data format. These include desktop tools, such as the upcoming Microsoft Office 12, as well as server based XML database, and all types of application in between. XML is advantageous for storing application preferences, as opposed to the older registry or .ini file approaches.

XML is cross platform and language independent. By implementing a large portion of an application in XML, that portion also becomes cross platform and language independent. In XML-Centric development, a large part the application’s creation will be the development of XSL transforms and the various XML based communication and configuration files and data representations.

XML also has advantages in application performance. In discussing stateless versus stateful operation, stateless operations are generally considered more scalable. XML makes stateless operations possible since entire requests can be packaged as a single XML document and returned as XML as well, all in a single method call. XML allows maintenance of state in the data itself, and not server. In a binary or Object Oriented approach, multiple requests are generally made to retrieve individual attributes and related objects. This gives the developer the flexibility to use either XML requests or individual OO requests as needed. A combination of the two is almost always the best approach in most applications.

Type 1 vs. Type 2 Architectures

Most application development discussions describe Type 1 versus Type 2 architectures, with the conclusion (correct I might add) that Type 2 is preferable for all but the simplest applications.

Type 1 architecture is typically seen in ASP or simple JSP applications. Using the ASP example, a web response is served by a single ASP page, possibly with some include files containing additional functions. Database access code is interspersed with “response.write” or other HTML formatting statements. An application built using a collection of such pages is very hard to maintain. Also, this type 1 scripting approach doesn’t lend itself to optimized performance, code reuse, and other critical application features.

Many of today’s Java applications are nothing more than JSP pages implementing this type 1 architecture. The better ones might make use of java beans for some of the embedded business logic or database access. Either way, the web page is based around a single JSP script, with lots of Java and HTML output interspersed.

Classical Type 2, or Model-View-Controller, architecture addresses this problem. The MVC architecture was originally adopted by Sun Microsystems as a best practice for J2EE development.

The classical MVC approach adopted by Sun uses servlets as the entry point to the application Controller, Java and Enterprise Java Beans as the Model, and JSP pages as the View. XML based development allows the use of XSL, instead of pure JSP, as the presentation layer. The XX framework uses JSP in conjunction with XSL as the presentation layer, utilizing the best features of each in a flexible implementation.

We also feel the Type 2 approach is a preferable design even for the simplest applications, since simple applications eventually morph into complex applications. The XX framework, described later, make it simple to implement MVC for even simple applications.

MVC Implementations

While MVC pattern originated in Smalltalk, it can be implemented using the two most popular development platforms, Microsoft technologies and Java.

Microsoft vs. Java

We see enterprise level application development technology selection today as a choice between Microsoft technologies and Java. The good news is that our XML based development methodology can be implemented in both of these platforms, and on other platforms as well. A standard MVC approach can be also used in both.

Using Java Servlets as the view

Another approach to generating web output is through Java Servlets. Most Java Servlet/JSP books start off showing the servlet approach, complete with multiple “out.println” statements to generate HTML output. This is a simple Type 1 approach illustrates basic web output using a single component. Most discussions then, correctly go on to state the problems (ex. maintainability) with this approach, and then to introduce JSP as a solution.

Using Java Server Pages (JSP) as the View

JSP can be used as the view in a web application. This approach is an improvement to a simple servlet, since the presentation layer is more reasonably separated from the rest of the application functionality. This method still uses a servlet as the controller, with Java objects (beans) made available via the session object to the JSP page. However, JSP is itself not without its problems. Any non trivial formatting tasks involve a large amount of Java code, making the application less maintainable, since JSP is a scripted and not an Object Oriented language.

Microsoft ASP Implementation

We will only briefly discuss an example of the Microsoft approach, to MVC and focus for the rest of this discussion on Java programming.

In conventional ASP scripting, a single ASP page acts as both the controller and view. HTML output is generated by embedding HTML within ASP blocks, or using “response.write” output statements from ASP code.

An improvement can be made by using XSL as the view. The ASP controller calls XSL functions to provide output. The controller now needs access to an XML generation routine to produce the necessary XML needed for the transformation. This XML layer needs to access the Model component, possibly implemented as a COM object. After retrieving the model data as XML, control is not really transferred to another View component, but remains within the original controller page. More recent versions of ASP allows for this (forward) functionality, finally putting ASP equal to Java in this respect.

When using a COM component as model, we still need an ASP facade to make it visible. This scenario is similar in function to a JSP controller in Java programming. Microsoft created Webclasses to allow a more binary entry point (namely a DLL), similar to Java servlets. However, this technology wasn’t widely adopted and was discontinued.

A Better MVC: XSL as the view

A great advancement in MVC architecture is the ability to use XSL as the presentation layer. XML, returned from the model layer is passed to an XSL transform, yielding the presentation. The transformation can produce whatever presentation format is needed, usually HTML or XML, though other client specific formats are also possible. XSL can also be used to generate PDF through XSL-FO.

By using an XSL layer, you achieve greater separation among the MVC components. XSL simply cannot access a database. If Java is used as the view, the temptation is there to short cut good application design and use a Type 1 approach for simple applications. In addition, XML is often easier to work with that Java. In addition to other reasons mentioned earlier, XML is an easily recognizable data format, while XSL is a declarative language. JSP and Java beans on the other hand, involve procedural programming against a machine oriented data format.

XSL also opens up a greater choice of application architectures, since transformational capabilities are now available on most client platforms. This can be used to offload XSL processing from the back end server to the client. By contrast, it is not always possible to offload processing of Java objects to a client. Even given these possible architectures, most applications will still benefit from centralizing the XSL processing on the server side.

It is likely that transformations will still take place on back end servers, with only simple html being transmitted to the client. The alternative is for the client to receive pure xml along with a stylesheet, which may also be transmitted or may exist on the client, or may be updated periodically in some dynamic fashion. While this approach servers to off load work from the server, it greatly burdens individual nodes/clients, which may not have the power needed to run a complicated transform. We still see the server based approach being the preferred. However, new paradigms such as peer to peer, where no server even exists, may cause a break from this model in some case.

The example of offloading XSL processing to a client is simply another example of an XML Web service. It doesn’t matter whether the client retrieving XML is a handheld PDA or a large Payroll system; they are still acting as a client of an XML web service. This approach provides greater standardization and may become an even more appropriate choice, as most data is now beginning to be made available as XML.

XSL is not always the best choice. XSL presupposed the availability of data in XML format. If the data is easily available in say Java object format, it might not make sense to convert if to XML. The format conversion is another step in the processes and could impact performance and response times. On the other hand, processing data with Java is code intensive and therefore suffers from readability and maintenance issues. Sometimes, it is worth the possible performance penalty to have a more logical application model, especially in cases where performance is sufficient in either approach. In general, if the data you are processing is already in XML, use XSL. If the data you are processing is contained in java classes/beans, or other native java data structure, use JSP.

While using XSL with the single servlet approach is preferred to a single JSP page, there are still several issues that can be improved upon.

First, since we are using the single document-transformation approach, it is necessary to combine all xml “data” into a single xml document. This sometimes makes senses, but often doesn’t. Remember the need for XML documents to make semantic sense. Combining disparate data elements, which have no real relationship, is not an optimal and logical organization.

As an example, consider need to include a US state pull down list on a contact registration screen. With the need to provide a single XML document for the resulting page, we might create the following document

<Firstname>David</Firstname>

<Lastname>Moskowitz</Lastname>

<State>Alabama</State>

<State>Arkansas</State>

<!– more states go here –>

</States>

</Contact>

However, all States are really not part of a contact element. A better alternate document might be

<Firstname>David</Firstname>

<Lastname>Moskowitz</Lastname>

<State>Arkansas</State>

</Contact>

<State>Alabama</State>

<State>Arkansas</State>

</States>

</Contactdetail>

Contact is now independent to States, but now a fictitious “Contactdetail” element is created. This XML element only serves to describe the data behind the Contact Detail page. It doesn’t really correspond with a real world entity. This type of compound document is often necessary

CSS: Icing on the cake

One downside of implementing a large portion of application functionality in XSL is that the transformations can become large and complex as well. It can also become more difficult to interact with the web designer, since the XSL must produce the final version of the HTML display. The XSL code is therefore responsible for understanding the web designers intention and how the final HTML markup must be implemented.

Cascading Style Sheets (CSS) can go most of the way in solving this problem. Using CSS based HTML display allows the XSL programmer to produce a very slim HTML. The graphical markup, including positioning and formatting, can be handled by an entirely separate CSS stylesheet. The XSL transformation can now produce XHTML containing solely semantic data stripped of all formatting codes. It is likely that the transformation from XML to XHTML will be very simple, since both formats are XML.

Using CSS is a very recent addition and we advocate the complete incorporation of this approach into page design, including the elimination of all HTML Tables other than for the display of actual tabular data.

CSS now provides the final component to provide a true separation of application data, control, and presentation.

Portal based approaches

The XML design example above corresponds to a type of design predominant in many web sites, the portal approach. Portals commonly refer to sites that aggregate data from numerous other sources, such as bank accounts, mail systems, etc. Even if the sites are not true portals, they usually contain usually multiple, independent elements. For example, a recurring menu or banner may not be directly related to the page’s data. These should be represented as separate XML and transformed independently.

The portal approach to design breaks up the page into several independent sections, Independence implies parallelization, so it should be possible to process sections of a page in parallel to speed up display and processing. This parallelization is, in fact, one of the core features of the XX framework.

A portal approach also allows the use of smaller XML documents and XSL stylesheets, greatly enhancing transformational performance. As mentioned elsewhere, the XSL transformation will most likely be the bottleneck in any XML/XSL application. Also, using smaller documents avoids the need to (artificially) assemble separate documents into a single large document for a single transformation.

The XX framework handles requirement by providing for separate independent transformations, glued together by a JSP page. Another advantage of using JSP is in the inclusion of non-xml-friendly text, such as JavaScript functions. It is much easier to include these in a JSP page than in an XSL stylesheet. Tools like Dreamweaver are now quite good for JSP development.

By implementing a pure XSL display layer, the Presentation component is purely platform neutral. Porting to a new platform can then be implemented by simply porting the control layer to the new platform.

A Note on Web Services

A common approach to “web service enable” existing applications is to create an XML or Soap layer on top of the existing application. Tools exist to automatically create SOAP calls for existing functions.

The problem with this approach is that they take methods such as method(param1,param2) and wrap them in an XML document such as

</method>

Microsoft .NET takes a similar approach with its “webmethod” tag.

A problem with this approach is that it does not make for good XML and is limited to single dimension of parameters. Passing an arbitrarily complex and deep XML document will allow more flexibility, as any data transmission need could be modeled as a single XML document. In many cases, it is better to develop a document-based approach to building an XML interface rather than a parameterized, function based approach. Simply converting existing parameter based function to flat XML will assist in cross platform transmission, but when possible, function and method signatures should be designed based on the required data input and output. In many cases, this will yield a more hierarchical XML representation.

New XML to Java bridged, such as Castor, can easily translate a complicated Java object model to a very reasonable XML representation. Therefore, using an object oriented approach to method design, the method parameters can be translated to the appropriate XML for transmission. These improved XML translations point to the continued use of Object Oriented design, along with their inherit advantages in application design.

XX currently implements rudimentary web service capabilities, where XML can be returned from or posted to an arbitrary URL. . This approach is generally simpler than and just as usable as SOAP. To allow for greater interoperability, SOAP compatibility is in the development stage and will be part of future versions of the XX framework.

Caching XML

One of the best used for XML is in data caching. Using the XX approach, most of the application’s operations return XML which is then transformed using XSL. The resulting XML document can be easily cached for future use. The XX implementation handles this caching in a declarative manner.

I first used XML caching in the web based ActiveX control discussed at the beginning of this document. Since this was a reporting tool where data was updated infrequently, data from one request was stored on the server as xml and served to any other user requesting that data. This approach decreased response times dramatically.

As a general caching principle, store frequently access data in an easy to access location. For frequent queries, the resulting XML can be stored on the file system on the web server. This type of caching makes sense when the time to create the resultant xml is the bottleneck, as where database queries are involved.

To cache data in a non xml environment, you are forced to cache binary objects. In ASP or JSP, this can be done using the session object. However, the overhead for this is much greater than caching xml strings to disk. If warranted, simple XML can be cached in the session object, as long as the volume or requests don’t tax the memory and response times of the system.

Testing XML

XML has advantages when it comes to application testing. A use case may have a complex Java implementation class, which does behind the scenes data processing, etc. Instead of waiting for that class to be fully implemented, we can create a wrapper class which simply returns the same XML that the actual class will eventually return. This XML could even be returned via reading a file. In that case, the XSL transformation and surrounding application logic can be fully tested. In a more typical Java database application, we need to fully implement the database access logic before the application flow can be tested. Furthermore more, using tools such as XML Spy, the XSL transformation can be tested independently from the application. This approach is also conceptually neater, since lots of the process is contained in a single, easily understandable and viewable XML document.

Once the XML interfaces are defined, the back end coding can be given to the Java programmers and the XSL transform given to the XSL programmer. This will provide a more exact specification of the desired results, improving programmer efficiency and results.

This testing approach also locks implementations into an agreed upon standard, namely the XML document it returns. In complex applications, DTD’s can be enforced on the return xml and errors trapped in this manner. This is similar to internet based web services, where an XML contract exists between the service and the subscriber. In XX, the service and subscribe both exist within the application.

Use cases can be modeled as XML request/response. Depending on the necessary byproducts of a function call, the correctness can be specified as a function of the input and the output. By looking at the request and response XML, simple unit tests can be set up for each.

In JSP or other non- XML approaches, you usually need a live database for meaningful development to take place. This necessitates hardware and software infrastructure more extensive than pure XML development would need. While this infrastructure will be necessary for the final stages of development of the XML application, it will not be necessary in the beginning design and early implementation stages. These non-XML approaches also involve working with arrays, collections, beans, and other proprietary Java objects. JSP approaches can also be a little more complicated and require detailed documentation and flow charts to fully understand.

On the documentation side of testing, you can store xml in and xml out in the application test plan and testing results document. This allows for easier documentation and testing records.

Designing a good XML document

In XML based development, it is important to design good XML documents. This aids in readability of the application and enhances the applications documentation. XML documents appear throughout an application. The documents that appear as part of a public interface, especially those available to resources outside the application, should be given greater care. A DTD should be used in most cases when an outside interface is needed.

It helps to think outside the specific application and look at the XML as a standalone, meaningful representation of data.. Does the document read well? Does it make logical sense at all levels? Should the document be broken into multiple sub documents? This might mean that the operation addresses too much functionality and may be better broken into more than one component.

Are attributes or elements used where appropriate? Attributes should describe the main element. If the attributes can stand alone and themselves be described or be used in multiple locations, they should be created as elements.

Issues and Problems with XX and XML-Based approaches

Performance

The main issue seen with XML centric approaches is performance in the XSL transformation. In this respect, Microsoft XML parsers seem to have an advantage of Java based ones. Care must be taken to break down larger transformations into smaller ones, cache data appropriately, and limit the XML results passed into the transform.

One reason for [the XSL bottleneck] is that processing that would be done elsewhere is simply moved into the XSL transform. Therefore, even though a transform may take 90% of processing time, this is really equal to the sum of other, more manual processes that would be done anyway, using some other approach.

In general, I believe XSL is performance neutral, falling somewhere in the middle of various manual approaches. One can always fine tune a manual approach, but sometimes, application readability will suffer In keeping with the rule of avoiding premature optimization, fine tuning or rewriting XSL should take place at the later, optimization stage of development.

You can profit from advances in xml parser technology (such as latest MSXML, apache XSLTC) without changing the underlying code. This is one advantage of using a framework or third party tools as part of a design approach. These tools may generally be improved as time goes on and you can take advantage of these improvements. Using a manual process, any improvements rest solely in the hands of the application developer.

Readability

XSL can be confusing. Luckily, you can accomplish a lot with just a little code.

However, some features more difficult to do, for example, string processing. There are libraries of code that can be downloaded to do some of these tasks, but most of them are quite hard to follow.

XSL doesn’t contain all the native functions of a true programming language. Even simple stuff we take for granted is not there. XSL is not yet meant to be a true language.

As mentioned earlier, the incorporation of CSS into page design can simplify the XSL greatly, therefore enhancing readability and improving transformation performance.

Incorporating UML

The UML goes hand in hand with XML in the overall design approach promoted here. UML has many advantages unrelated to XML.

Our development approach is very Use Case oriented. Use Cases look at proposed system functionality from a user’s perspective, describing the desired system functionality. Use Case Models allow the client or user to see, at a glance, the overall high level functionality of a system.

Use Case models provide a simple visual representation of the major system functionality. Being composed of a simple schematic and explanatory text, this diagram is most useful to end users or others who may not be interested in the more technical details of the implementation. The use cases serve ultimately as a check to see whether the system is providing all the services needed and is satisfying the client’s requirements.

A valid UML model should always be the first step of development. Resist the impulse to begin coding early in the project, other than experimental undertakings to determine feasibility of certain approaches. Refine and review the model, preferably in group brainstorming sessions and with clients and stakeholders. This is sure to pay excellent dividends and will actually shorten the project time.

UML can also be used to represent the interaction between various XX components. UML collaboration diagrams can be used to visually display the components of a Servlets implementation.

A note on stored procedures

We don’t support the use of stored procedures in most cases and never have. We prefer to have most of the database access and related logic in a data access tier and not the database itself. Business logic, which is sometimes incorporated into stored procedures, should be implemented in the business object layer of the application.

Stored procedures where originally developed for Client server applications as a way to centralize common routines unrelated to the user interface. By implementing this functionality in a central database, these business rules could be updated without updating the client.

Centralizing this code also yields performance improvements in a client server application. In web based applications of today, this is no longer necessary, since applications are server and thin client based.

There are other problems with stored procedures. Their use limits database independence, since procedures written for one database will almost certainly not run on another. Using stored procedures adds another language skill requirement to the development team. While Database administrators usually have these skills, DBA’s are usually concerned with database management tasks and rarely get involved in application development tasks. These limitations make switching back end database difficult. Contrast this approach to a cross platform language such as Java or to tools such as Hibernate, made to work transparently with various back end databases. If we use flexible tools like Java and Hibernate, we shouldn’t lock ourselves into a particular platform through the use of stored procedures.

While stored procedures are sometimes seen as improving performance and scalability, they can lead to less scaling options, since procedures can only run on the database tier, and on a particular Database at that. It is generally more difficult to scale databases horizontally (add more database servers). Therefore, to increase database performance, moving the database to a more powerful server is a likely option. By contrast, J2EE and web servers can be more easily scaled horizontally, which is sometimes the cheapest and more cost effective.

Developers generally have less control over the database than over their Java environments, which often are running on their workstation. Lack of control makes the development process more cumbersome and less flexible, especially in debugging. Assistance of the DBA is often required and access protocols must be determined.

Modern database technologies, such as Enterprise Java Beans (EJB), preclude the use of stored procedures entirely, since database access is handled by the application server. While possible, it does not make sense to use stored procedures in an EJB application. Tools like Hibernate can be used in place of EJB’s and also handle DB access in a cross platform manner.

Stored procedures should only be used during the optimization phase if performance testing shows that this is the only way to achieve the required results. We’ve never come across the situation where other optimization options aren’t sufficient, but it may happen. In this case, all other optimization approaches should still be considered first.

Extreme XML

In the original development of the XX methodology (then called Extreme XML), the idea was proposed to use XML as common parameters to all application methods. This is similar to some approaches to Visual Basic development that recommend using variants as parameters to all functions. The advantage of this approach is that parameters can be modified without changing the method signature, thereby avoiding breaking existing applications or needing a recompile

The original name of Extreme XML proved correct, as this approach proved to be too unwieldy. In practice this proved cumbersome, mainly because development tools still work better with binary object approaches. In additional, Object oriented design is still more conceptually grounded than XML or service oriented representations. Object oriented design more accurately represents real world entities, while XML is still best as a convenient packaging or transport mechanism or readable textual representation. Also, the performance overheard of such an extreme approach is probably too great. Every function will generally require lookup of relevant data from the xml parameter, or an XSL transformation to produce another format of the data to be passed to another function or processing node.

We recommend that XML based document parameters are used between application layers, while object oriented and functional calls are used within layers. Use of XML between logical layers maintains consistent interfaces by providing the communication “contract”, as in Web services and allow an XML pipeline form of architecture to be implemented. Using XML communications also eliminates some additional classes that are sometimes use to decouple layers, as in some object marshalling approaches such as DCOM and EJB.

We currently recommend a combination of Object Oriented and XML approaches.

Object Oriented approaches work best within tiers or packages of an application while XML works best as transport mechanisms (as platform independent text) across application, or across application tiers or packages.

Application design should be done using Object Oriented approaches, incorporating UML. A use case based approach should be used, where each use case is implemented by a single servlet.

Web page design should incorporate CSS and avoid table based layouts entirely.

When designing functions, functions that return rows or sets of data should be implemented as XML Functions that return or operate on a single row or object should be done using Object access approaches. OO techniques, using OO-relational mapping tools can also be used, but performance needs to be watched when large numbers of rows are returned. By using tools like hibernate, performance improvements, through caching, etc, can be assured.

Infoblazer LLC

Java and Artificial Intelligence * Consulting and Software Development

An XML-Based approach to Application Development

What is XX?

The Origins of XX

XML Based Development

Type 1 vs. Type 2 Architectures

MVC Implementations

A Note on Web Services

Caching XML

Testing XML

Designing a good XML document

Issues and Problems with XX and XML-Based approaches

Incorporating UML

A note on stored procedures

Extreme XML

Leave a comment Cancel reply