XML — the Slower, Larger, and More Complicated Option for Today’s Apps
XML is the acronym for Extensible Markup Language. It was created and standardized by the W3C in 1996, more than three decades ago, with RFC 4825. Until Douglas Crockford formalized JSON in 2002 with RFC 7159, XML was the primary format for data interchange on the Web.
It has also been used for everything from schema definition to configuration files — everywhere structured or hierarchical text data was needed. It was extended to support namespaces, schemas, and queries through XSL, XSD, XPath, and XQuery as a markup language. However, in the last decade, it has lost a lot of ground to JSON, which has proven simpler, faster, and more compact.
While JSON is simple, XML is robust and extensible. It has support for namespaces as well as schemas. Namespaces and attributes ensure that various tags and elements can be unique and schematized, even across different versions over time. Schema support is the primary differentiator from JSON. Developers can explain what the data should look like and validate it when producing and consuming the data. With the namespaces and extensible schema support, even as the data format changes across clients and the backend, developers can ensure that the data will be exchanged as intended.
To deal with XML performantly, different types of parsers are available: SAX and DOM. SAX parsers can stream the XML as it is downloaded. This makes extracting the parts you want easy if you need only a little data. This also works well if you need to process or aggregate specific fields. On the other hand, DOM parsers create an object model for your program similar to HTML. Your program can access and modify various nodes and save the XML back. DOM parsing is typically necessary for the advanced features of XML, such as schema validation and transformations.
XML hierarchical structure powers query languages like XPath, XQuery and transformations with XSLT. With XPath and XQuery, the developers can formalize how the data is used in the application: for instance, which fields of which children are used. XSLT can formally transform one XML document to another XML document or other formats like plain text and HTML. XSLT is represented in XML, so its schema and format can be verified with XSL.
Unfortunately, all the power and extensibility of XML come with significant complexity. Even though XML is supposed to be human-readable, it starts to look very arcane when you add namespaces or try to assemble XSL or XSLT (including XPath and XQuery). Various editors come to the rescue to help highlight syntax errors as you write the XML and even any deviations from the document's schema. Editors like Visual Studio can even do completion for tags and attribute names based on the schema. However, when something does not work, it can be tough to figure out what went wrong. Whenever there is complexity, there is a high price to pay.
Most public web APIs from big companies like Facebook and Google use JSON as the data exchange format. For instance, Facebook Graph API and Google Maps API return data in JSON, similar to Twitter, Pinterest, Reddit, Foursquare, and even AccuWeather. On the other hand, only older APIs, like Amazon Product Advertising API, still support XML. Other older APIs have added support for JSON and now support both. These include LinkedIn, Flickr, and Google Cloud Storage API.
On the other hand, when it comes to documents, XML is still the primary format. Everything from Microsoft Word and Office and Apache OpenOffice and LibreOffice saves documents in XML-based interchangeable formats that have been standardized. This allows different editors to open and edit the documents, even if they had created them in another application. This is especially important for enterprises. They want to move from one vendor’s application suite to another without losing their data or having to convert it all.
On the database front, XML support is standard in most big SQL databases from big companies, such as Microsoft SQL Server, Oracle, and IBM DB2. XML is supported natively in MySql too. This is because XML has been used in enterprises for the last few decades, and enterprises need to move forward with their applications and data faster.
On the other hand, all the newer NoSQL databases that have gained popularity in the last decade use JSON as the data format. These include MongoDB and CouchDB. These databases focus on simplicity and scalability, so lightweight JSON works well. Some older databases like Oracle, PostgreSQL, and Microsoft SQL Server have also added support for JSON, making it easier to extract fields from columns with JSON data. This makes it easier for enterprise developers to store richer and more dynamic data in the classical more enormous databases without splitting it out or updating tables.
XML has a solid foothold in enterprises today. Stability and versioning are essential in the enterprise space, and they will continue to pay the extra overhead and cost of XML over JSON. In return, they will get extra power and extensibility. Enterprises are already invested in the tools and systems to deal with XML's complexity and overhead. They also have existing data in this format that is harder to move forward.
One of the areas expected to grow very fast over the next decade is the Internet of Things. These are typically devices that have a limited amount of computing power or performance. However, they are connected to the Internet and provide and consume data. These devices include thermostats, monitoring and security appliances, refrigerators, televisions, and lights.
Many smart home devices are in this category, including smart speakers and plugs. These smart devices can provide data to the Internet, such as when doors are opened or motion/smoke is detected to the temperature of different rooms. They can also get data from the Internet, such as the weather and temperature reports for an intelligent landscape irrigation system. In addition to data, these devices can also get commands from the Internet, such as turning on the lights or unlocking gates/doors remotely for visitors. JSON is a better fit for the data exchanged for almost all of these scenarios because it needs to be compact, simple, and fast. These devices don’t have fast internet connections or a lot of computing power: some of them may run on battery for long periods and still provide service, such as a security system in a power outage.
XML and JSON are frequently used to accomplish similar Web tasks and application development tasks. They both have areas of strength and will continue to be used in their domains. However, the overhead of XML limits its use in fast-growing new areas of development on the Web and for the Internet of Things. This means JSON is getting a lot more mindshare and much broader use. Over time, XML will likely be limited to enterprise systems and file formats, while simpler and lightweight JSON will be the prevalent data exchange format across the board.