XML — the Slower, Larger, and More Complicated Option for Today’s Apps

XML is the acronym for Extensible Markup Language. It was created and standardized by the W3C in 1996, more than three decades ago with RFC 4825. Until Douglas Crockford formalized JSON in 2002 with RFC 7159, XML was the primary format for data interchange on the Web.

It has also been used for everything from schema definition to configuration files — everywhere structured or hierarchical text data was needed. It was extended to support namespaces, schemas, and even queries through XSL, XSD, XPath, and XQuery as a markup language. However, in the last decade, it has lost a lot of ground to JSON, which has proven to be simpler, faster, and a lot more compact.

JSON Advantages

JSON’s primary advantage is in the data transfer and exchange domain. It is simply lightweight: it has a smaller size and does not require complicated libraries or software to parse and process it. This allows the availability of quick and straightforward libraries to parse and process in many languages and environments. The JSON website lists almost 200 tools and libraries for over 60 different languages. Escaping for values is simple, unlike the HTML-like syntax in XML. JSON can be easily parsed into a ready-to-use object and back with a standard JavaScript function for JavaScript.

With the prevalence of JavaScript, from the browser to the servers with Node.JS and REST-like web services gaining popularity over SOAP/WSDL, JSON has become the preferred format for the Web. Especially if you have a lot of data to return and display, the speed and size boost from JSON becomes very significant and compelling. As a result in the graphs below, we see that both on Google and Stack Overflow searches, XML has lost a lot of ground, and JSON has been picking up speed:

Google Trends Search Interest for XML and JSON — Source: https://www.toptal.com/web/json-vs-xml-part-1
Stack Overflow Questions for XML and JSON — Source: https://www.toptal.com/web/json-vs-xml-part-1

XML Advantages

While JSON is simple, XML is robust and extensible. It has support for namespaces as well as schemas. Namespaces, together with attributes, make sure that various tags and elements can be unique and schematized, even across different versions over time. Schema support is the primary differentiator from JSON. Developers can explain what the data should look like and validate it when producing and consuming the data. With the namespaces and extensible schema support, even as the data format changes across clients and backend, developers can make sure that the data will be exchanged as intended.

To deal with XML in a performant way, different types of parsers are available: SAX and DOM. SAX parsers can stream the XML as it is downloaded. This makes it easy to extract just the parts you want if you need only a little data. This also works well if you need to process or aggregate specific fields. On the other hand, DOM parsers create an object model for your program similar to HTML. Your program can access and modify various nodes and save the XML back. DOM parsing is typically necessary for the advanced features of XML, such as schema validation and transformations.

XML hierarchical structure powers query languages like XPath, XQuery as well as transformations with XSLT. With XPath and XQuery, the developers can formalize how the data is used in the application: for instance, which fields of which children are used. XSLT can formally transform one XML document to another XML document or other formats like plain text and HTML. XSLT itself is represented in XML, so its schema and format can be verified with XSL.

Unfortunately, all the power and extensibility of XML come with significant complexity. Even though XML is supposed to be human-readable, when you start adding namespaces or trying to put together XSL or XSLT (including XPath and XQuery), it starts to look very arcane. Various editors come to the rescue to help highlight syntax errors as you write the XML and even any deviations from the document's schema. Editors like Visual Studio can even do completion for tags and attribute names based on the schema. However, when something does not work, it can be tough to figure out what went wrong. Whenever there is complexity, there is a high price to pay.

Support

The majority of the public web APIs from big companies like Facebook and Google use JSON as the data exchange format. For instance, Facebook Graph API and Google Maps API return data in JSON, similar to Twitter, Pinterest, Reddit, Foursquare, and even AccuWeather. On the other hand, only older APIs, like Amazon Product Advertising API, still support XML. Other older APIs have added support for JSON and now support both. These include LinkedIn, Flickr, and Google Cloud Storage API.

On the other hand, when it comes to documents, XML is still the primary format. Everything from Microsoft Word and Office and Apache OpenOffice and LibreOffice save documents in XML-based interchangeable formats that have been standardized. This allows different editors to open and edit the documents, even if they had created them in another application. This is especially important for enterprises. They want to move from one vendor’s application suite to another without losing their data or having to convert it all.

On the database front, the XML support is standard in most big SQL databases from big companies, such as Microsoft SQL Server, Oracle, and IBM DB2. XML is supported natively in MySql too. This is because XML has been used in enterprises for the last couple of decades, and enterprises are slower in moving forward their applications and data.

On the other hand, all the newer NoSQL databases that have gained popularity in the last decade use JSON as the data format. These include MongoDB and CouchDB. These databases focus on simplicity and scalability, so lightweight JSON works well for them. Some of the older databases like Oracle, PostgresSQL, and Microsoft SQL Server have also added support for JSON, so it is easier to extract fields from columns with JSON data. This makes it easier for enterprise developers to store richer and more dynamic data in the classical bigger databases without splitting it out or updating tables.

Future

XML has a solid foothold in enterprises today. Stability and versioning are essential in the enterprise space, and they will continue to pay the extra overhead and cost of XML over JSON. In return, they will get extra power and extensibility. Enterprises are already invested in the tools and systems to deal with both the complexity and overhead of XML. They also have existing data in this format that is harder to move forward.

On the web, however, JSON will continue to pick up speed and dominate XML. Startups and big internet companies need to move fast, and the proven way to do that is to keep things simple and fast. These are both the strengths of JSON. A significant development that increased the prevalence of JSON was the popularity of Node.js — a system that brings JavaScript development to the backend. With NoSQL databases like MongoDB or Cassandra that natively communicate in JSON, the entire stack for web applications can exchange data in this format: down to the client/web browser natively runs JavaScript.

One of the areas expected to grow very fast over the next decade is the Internet of Things. These are typically devices that do not have a lot of computing power or performance. However, they are connected to the internet, and they provide and consume data. These devices include thermostats, monitoring and security appliances, and refrigerators, televisions, and lights.

A lot of smart home devices are in this category, including smart speakers and smart plugs. These smart devices can provide data to the internet, such as when doors are opened or motion/smoke is detected to the temperature of different rooms. They can also get data from the internet, such as the weather and temperature reports for an intelligent landscape irrigation system. In addition to data, these devices can also get commands from the Internet, such as turning on the lights or unlocking gates/doors remotely for visitors. JSON is a better fit for the data exchanged for almost all of these scenarios because it needs to be compact, simple, and fast. These devices don’t have fast internet connections or a lot of computing power: some of them may run on battery for long periods and still provide service, such as a security system in a power outage.

XML was critical for the web to grow three decades ago. Since then, it has not been the centerpiece of any new significant development in computing. JSON came later, but its strengths have proven essential for fast growth. As Web development and the Internet of Things become more prevalent, together with JavaScript across the stack, the future looks to be dominated by JSON.

Takeaway

Both XML and JSON are frequently used to accomplish similar tasks on the Web and in application development. They both have areas of strength and will continue to be used in their domains. However, the overhead of XML limits its use in fast-growing new areas of development on the Web as well as for the Internet of Things. This means JSON is getting a lot more mindshare and much broader use. Over time, XML will likely be limited to enterprise systems and file formats, while simpler and lightweight JSON will be the prevalent data exchange format across the board.

A Seattle web design and online marketing agency that delivers high-end websites. A passion for web development and SEO.