The SOSI-format: The crazy, Norwegian, geospatial file format

Imagine trying to coordinate the exchange of geospatial data long before the birth of the Shapefile, before XML and JSON was thought of. Around the time when “microcomputers” was _really_ new, and mainframes was the norm. Before I was born.

Despite this (rather sorry) state of affairs, you realize that the “growing use of digital methods used in the production and use of geospatial data raises several coordiantion-issues” [my translation]. In addition, “there is an expressed wish from both software companies and users of geospatial data that new developments does not lead to a chaos of digital information that cannot be used without in-depth knowledge and large investments in software” [my translation].

Pretty forward-thinking if you ask me. Who was thinking about this in 1980? Turns out that two Norwegians, Stein W. Bie and Einar Stormark, did this in 1980, by writing this report.

This report is fantastic. It’s the first hint of a format that Norwegians working with geospatial data (and few others) still has to relate to today. The format, known as the “SOSI-Format” (not to be confused with the SOSI Standard) is a plaintext format for representing points, lines, areas and a bunch of other shapes, in addition to attribute data.

My reaction when I first encountered this format some 8 years ago was “what the hell is this?”, and I started on a crusade to get rid of the format (“there surely are better formats”). But I was hit by a lot of resistance. Partly because I confused the format with the standard, partly because I was young and did not know my history, partly because the format is still in widespread use, and partly because the format is in many ways really cool!

So, I started reading up on the format a bit (and made a parser for it in JavaScript, sosi.js). One thing that struck me was that a lot of things I’ve seen popping up lately has been in the SOSI-format for ages. Shared borders (as in TopoJSON) Check! Local origins (to save space) Check! Complex geometries (like arcs etc) Check!

But, what is it like? It’s a file written in what’s referred to as “dot-notation” (take a look at this file and you’ll understand why). The format was inspired by the british/canadion format FILEMATCH and a french database-system called SIGMI (anyone?).

The format is, as stated, text (i.e. ASCII) based, with the reason was that this ensured that data could be stored and transferred on a wide range of media. At the time of writing the report, there existed FORTRAN-implementations (for both Nord-10/S and UNIVAC 1100) for reading and writing. Nowadays, there exists several closed-source readers and writes for the format (implemented by several Norwegian GIS vendors), in addition to several Open Source readers.

The format is slated for replacement by some GML-variation, but we are still waiting. There is also GDAL/OGR support for the format, courtesy of Statens Kartverk. However, this requires a lot of hoop-jumping and make-magic on Linux. In addition, the current implementation does not work with utf-8, which is a major drawback as most .SOS-files these days are in fact utf-8.

So, there we are. The official Norwegian format for exchange of geographic information in 2018 is a nearly 40 year old plain text format. And the crazy thing is that this Norwegian oddity is actually something other countries are envious about, as we actually have a common, open (!), standard for this purpose, not some de-facto reverse-engineered binary format.

And, why indeed, why should the age of a format be an issue, as long as it works?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>