In case you haven’t noticed, XML is not a silver bullet. (google xml+silver+bullet). It is not, and should not be an automatic choice when thinking of a data storage format. The ubiquitous libraries for working with XML are often hard to use, and are often overkill for a simple storage format. In today’s world, I’d suggest that the following options should be considered (at least briefly).
- Native Object Serialisation
- Custom format
- XML – Extensible Markup Language
- YAML – YAML Ain’t a Markup Language (obviously created by geeks with the recursive name)
Join me in having a look at these formats, and I’ll let you know some of the issues to consider. The main problem I’m solving is for data that belongs to your own application. I’m not considering databases or interoperability.
Native Object Serialisation
Consider this briefly before running away. I’m particularly familiar with the idea of Java Object serialisation. I’ve used Prevayler in the past storing java objects, and xml (So while I’m having a dig at Java Object serialisation in general, I’m not specifically having a go at prevayler).
While the use of native object serialisation is often easy, it has costs, making the content unreadable by humans, coupling the data storage to your implementation language, and can create object migration issues. These costs will typically outweigh the benefits. Having human readable data to aid debugging would provide reason for not using native object serialisation if there was nothing else.
The use of a custom simple text format should not be discarded out of hand. The lack of any third party dependancies is a useful feature, and should be considered. That said, if you have a library that does the parsing for you, that should not be sneezed at.
As wikipedia says, “XML is a general-purpose specification for creating custom mark-up languages” (Wikipedia on XML). Parsers and tools exist for many platforms and environments, which makes it a useful tool when you want to share information between different environments. While a good tool, the syntax is verbose, and can be hard for humans to read.
These factors combine to make JSON an excellent choice.
Tim Bray makes a good case for this being an automatic choice based on your circumstances(http://www.tbray.org/ongoing/When/200x/2006/12/21/JSON ). You’ll still need to think about the pros and cons of the different technologies for your situation (see http://webignition.net/articles/xml-vs-yaml-vs-json-a-study-to-find-answers/), but you’ll often find that JSON is a good format to use for data storage.