XML — the Slower, Larger, and More Complicated Option for Today’s Apps

XML is the acronym for Extensible Markup Language. It was created and standardized by the W3C in 1996, more than three decades ago with RFC 4825. Until Douglas Crockford formalized JSON in 2002 with RFC 7159, XML was the primary format for data interchange on the Web.

It has also been used for everything from schema definition to configuration files — everywhere structured, or hierarchical text data was needed. It was extended to support namespaces, schemas, and even queries through XSL, XSD, XPath, and XQuery as a markup language. However, in the last decade, it has lost a lot of ground to JSON, which has proven to be simpler, faster, and a lot more compact.

JSON Advantages

With the prevalence of JavaScript, from the browser to the servers with Node.JS and REST-like web services gaining popularity over SOAP/WSDL, JSON has become the preferred format for the Web. Especially if you have a lot of data to return and display, the speed and size boost from JSON becomes very significant and compelling. As a result in the graphs below, we see that both on Google and Stack Overflow searches, XML has lost a lot of ground, and JSON has been picking up speed:

Google Trends Search Interest for XML and JSON — Source: https://www.toptal.com/web/json-vs-xml-part-1
Stack Overflow Questions for XML and JSON — Source: https://www.toptal.com/web/json-vs-xml-part-1

XML Advantages

To deal with XML in a performant way, there are different types of parsers available: SAX and DOM. SAX parsers can stream the XML as it is downloaded. This makes it easy to extract just the parts you want if you need only a little bit of the data. This also works well if you need to process or aggregate specific fields. On the other hand, DOM parsers create an object model for your program similar to HTML. Your program can access and modify various nodes and save the XML back. DOM parsing is typically necessary for the advanced features of XML, such as schema validation and transformations.

XML hierarchical structure powers query languages like XPath, XQuery as well as transformations with XSLT. With XPath and XQuery, the developers can formalize how the data is used in the application: for instance, which fields of which children are used. XSLT can formally transform one XML document to another XML document or other formats like plain text and HTML. XSLT itself is represented in XML, so its schema and format can be verified with XSL.

Unfortunately, all the power and extensibility of XML comes with significant complexity. Even though XML is supposed to be human-readable, when you start adding namespaces or trying to put together XSL or XSLT (including XPath and XQuery), it starts to look very arcane. Various editors come to the rescue to help highlight syntax errors as you write the XML and even any deviations from the document's schema. Editors like Visual Studio can even do completion for tags and attribute names based on the schema. However, when something does not work, it can be tough to figure out what went wrong. Whenever there is complexity, there is a high price to pay.

Support

On the other hand, when it comes to documents, XML is still the primary format. Everything from Microsoft Word and Office and Apache OpenOffice and LibreOffice save documents in XML-based interchangeable formats that have been standardized. This allows different editors to open and edit the documents, even if they had created them in another application. This is especially important for enterprises. They want to move from one vendor’s application suite to another without losing their data or having to convert it all.

On the database front, the XML support is standard in most of the big SQL databases from big companies, such as Microsoft SQL Server, Oracle, and IBM DB2. XML is supported natively in MySql too. This is because XML has been used in enterprises for the last couple of decades, and enterprises are slower in moving forward their applications and data.

On the other hand, all the newer NoSQL databases that have gained popularity in the last decade are all using JSON as the data format. These include MongoDB and CouchDB. These databases focus on simplicity and scalability, so lightweight JSON works well for them. Some of the older databases like Oracle, PostgresSQL, and Microsoft SQL Server have also added support for JSON, so it is easier to extract fields from columns with JSON data. This makes it easier for enterprise developers to store richer and more dynamic data in the classical bigger databases without splitting it out or updating tables.

Future

On the web, however, JSON will continue to pick up speed and dominate XML. Startups and big internet companies need to move fast, and the proven way to do that is to keep things simple and fast. These are both the strengths of JSON. A major development that increased the prevalence of JSON was the popularity of Node.js — a system that brings JavaScript development to the backend. With NoSQL databases like MongoDB or Cassandra that natively communicate in JSON, the entire stack for web applications can exchange data in this format: all the way down to the client/web browser natively runs JavaScript.

One of the areas expected to grow very fast over the next decade is the Internet of Things. These are typically devices that do not have a lot of computing power or performance. However, they are connected to the internet, and they provide and consume data. Examples of these devices include thermostats, monitoring, and security devices, as well as refrigerators, televisions, and lights.

A lot of the smart home devices are in this category, including smart speakers and smart plugs. These smart devices can provide data to the internet, such as when doors are opened, or motion/smoke is detected to the temperature of different rooms. They can also get data from the internet, such as the weather and temperature reports for a smart landscape irrigation system. In addition to data, these devices can also get commands from the Internet, such as turning on the lights or unlocking gates/doors remotely for visitors. JSON is a better fit for the data exchanged for almost all of these scenarios because it needs to be compact, simple, and fast. These devices don’t have fast internet connections or a lot of computing power: some of them may run on battery for long periods and still provide service, such as a security system in a power outage.

XML was critical for the web to grow three decades ago. Since then, it has not been the centerpiece of any new major development in computing. JSON came later, but its strengths have proven essential for fast growth. As Web development and the Internet of Things become more prevalent, together with JavaScript across the stack, the future looks to be dominated by JSON.

Takeaway

A Seattle web design and online marketing agency that delivers high-end websites. A passion for web development and SEO.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store