The SOSI-format: The crazy, Norwegian, geospatial file format

Imagine trying to coordinate the exchange of geospatial data long before the birth of the Shapefile, before XML and JSON was thought of. Around the time when “microcomputers” was _really_ new, and mainframes was the norm. Before I was born.

Despite this (rather sorry) state of affairs, you realize that the “growing use of digital methods used in the production and use of geospatial data raises several coordiantion-issues” [my translation]. In addition, “there is an expressed wish from both software companies and users of geospatial data that new developments does not lead to a chaos of digital information that cannot be used without in-depth knowledge and large investments in software” [my translation].

Pretty forward-thinking if you ask me. Who was thinking about this in 1980? Turns out that two Norwegians, Stein W. Bie and Einar Stormark, did this in 1980, by writing this report.

This report is fantastic. It’s the first hint of a format that Norwegians working with geospatial data (and few others) still has to relate to today. The format, known as the “SOSI-Format” (not to be confused with the SOSI Standard) is a plaintext format for representing points, lines, areas and a bunch of other shapes, in addition to attribute data.

My reaction when I first encountered this format some 8 years ago was “what the hell is this?”, and I started on a crusade to get rid of the format (“there surely are better formats”). But I was hit by a lot of resistance. Partly because I confused the format with the standard, partly because I was young and did not know my history, partly because the format is still in widespread use, and partly because the format is in many ways really cool!

So, I started reading up on the format a bit (and made a parser for it in JavaScript, sosi.js). One thing that struck me was that a lot of things I’ve seen popping up lately has been in the SOSI-format for ages. Shared borders (as in TopoJSON) Check! Local origins (to save space) Check! Complex geometries (like arcs etc) Check!

But, what is it like? It’s a file written in what’s referred to as “dot-notation” (take a look at this file and you’ll understand why). The format was inspired by the british/canadion format FILEMATCH and a french database-system called SIGMI (anyone?).

The format is, as stated, text (i.e. ASCII) based, with the reason was that this ensured that data could be stored and transferred on a wide range of media. At the time of writing the report, there existed FORTRAN-implementations (for both Nord-10/S and UNIVAC 1100) for reading and writing. Nowadays, there exists several closed-source readers and writes for the format (implemented by several Norwegian GIS vendors), in addition to several Open Source readers.

The format is slated for replacement by some GML-variation, but we are still waiting. There is also GDAL/OGR support for the format, courtesy of Statens Kartverk. However, this requires a lot of hoop-jumping and make-magic on Linux. In addition, the current implementation does not work with utf-8, which is a major drawback as most .SOS-files these days are in fact utf-8.

So, there we are. The official Norwegian format for exchange of geographic information in 2018 is a nearly 40 year old plain text format. And the crazy thing is that this Norwegian oddity is actually something other countries are envious about, as we actually have a common, open (!), standard for this purpose, not some de-facto reverse-engineered binary format.

And, why indeed, why should the age of a format be an issue, as long as it works?

The Open Geospatial Data Ecosystem

This summer my first peer-reviewed article, “The Open Geospatial Data Ecosystem”, was published in “Kart og plan”. Unfortunately, the journal is not that digital, and they decided to withhold the issue from the web for a year, “in order to protect the printed version”. What?!

However, I was provided a link to a pdf of my article, and told I could distribute it. I interpret this as an approval of me publishing the article on my blog, so that is exactly what I’ll do.

The full article can be downloaded here: http://docs.atlefren.net/ogde.pdf, and the abstract is provided here:

Open Governmental Data, Linked Open Data, Open Government, Volunteered Geographic Information, Participatory GIS, and Free and Open Source Software are all parts of The Open Geospatial Data Ecosystem. How do these data types shape what we define as Open Geospatial Data; Open Data of a geospatial nature? While all these areas are well described in the literature, there is a lack of a formal definition and exploration of the concept of Open Geospatial Data as a whole. A review of current research, case-studies, and real-world examples, such as OpenStreetMap, reveal some common features; governments are a large source of open data due to their historical role and as a result of political pressure on making data public, and the large role volunteers play both in collecting and managing open data and in developing open source tools. This article provides a common base for discussion. Open Geospatial data will be even more important as it matures and more governments and corporations release and use open data.

Prosjektbeskrivelse i boks

Ting tar tid. I høst fikk jeg beskjed om at jeg måtte levere inn en formell prosjektbeskrivelse av PhD-prosjektet mitt til doktorgradsutvalvet ved IVT. Det endte opp med at jeg fikk levert denne 18. april, og 2. mai var den blitt behandlet. Hyggelig melding der:

Doktorgradsutvalget godkjenner den endelige prosjektbeskrivelsen for ph.d.-avhandlingen til Atle Frenvik Sveen

Men hva er en prosjektbeskrivelse? Det sier seg vel igrunn selv? Mer spesifikt er det en beskrivelse av bakgrunn, mål, omfang (og begrensninger), metode, etiske vurderinger, forventede resultater, og en plan for arbeidet. For meg virker det litt søkt å skulle svare så mye i detalj før jeg er skikkelig i gang, men jeg skjønner jo at man må reflektere litt. Jeg vet ikke helt om dette dokumentet regnes som offentlig materiale, men jeg tenker nå uansett å sakse det “viktigste” innholdet her, sånn for å gi en oversikt over hva jeg driver med.

I bakgrunns-delen går jeg inn på hva som har blitt gjort tidligere, og snakker om hva som gjør at dette arbeidet er relevant:

Geospatial Data has been created and managed since the first maps where made (Garfield, 2013). The impact of the digital revolution on this field have far-ranging consequences. A map is but one of several representations of the underlying digital data. The digitalization of the map-making process thus involves several shifts. One is the de-coupling of the printed map from the actual data, another is the fact that geospatial data can be used for more than printing maps.

Open Data is another consequence of digitalization. There is an increasing political pressure to make digital data produced and maintained by governments available to the public (Cox & Alemanno, 2003; Ginsberg, 2011; Yang & Kankanhalli, 2013). Political accountability, business opportunities, and a more general trend towards openness are all cited as reasons behind this movement (Huijboom & Broek, 2011; Janssen, Charalabidis, & Zuiderwijk, 2012; Sieber & Johnson, 2015). In practice this means that geospatial data from a range of sources are becoming available for everyone to use for whatever purpose they see fit.

A third trend is crowdsourcing, or Volunteered Geographic Information (VGI) (Goodchild, 2007). This concept bears some resemblance to Free and Open Source Software. The underlying concept is that amateurs collaborate on tasks such as writing online encyclopedias, writing computer software, or, as in the case of geospatial data, create a database of map data covering the world: OpenStreetMap (OSM) (Haklay & Weber, 2008).

What is lacking is a combined overview and a set of best practices. What characterizes a system built to handle an automated gathering of geospatial data published in a myriad of formats, with different metadata standards (or no metadata at all), with different update frequencies, and different licenses? A thorough investigation of these problems will enable a better understanding of what data is of interest, how it should be shared, and how the promised value of Open Geospatial Data can be extracted.

Målene oppsummerer jeg ganske enkelt slik:

The overall objectives of this project are (1) to establish guidelines on how to store and manage geospatial data from disparate sources, with different structure and quality, and (2) to explore how this data can be utilized for value generation and decision support. The overarching theme of both objectives are how the Open Source mindset can be utilized.

Når det gjelder forventede resultater summerer dette det meste ganske greit opp:

There are two main results we hope to obtain from this project. The fist is a better understanding of how geospatial data can be gathered from disparate sources and stored in an efficient manner that can be utilized. The other main result is to find new areas, products, and methods that be carried out by using this data. Establishing systems for assessing quality and fitness for use of the data is also an important aspect.

Hvis du er interessert i å lese hele prosjektbeskrivelsen finner du den her: phd_prosjektbeskrivelse_atlefren.

Fastmail not taking security seriously?

About three years ago I figured I’d had enough Google-control of my online communication and was looking for an alternative email-provider. A friend of mine recommended Fastmail, which seemed like a good solution: Great web-interface, Android app, and the possibility of using an address from my own domain.

I signed up and have been using Fastmail since (with a redirect from my Gmail-address). The service has had some small issues (mainly the Android app being anything but “fast”), but overall I’ve been a happy customer.

Yesterday I figured out that I wanted to test 1password, moving away from LastPass after the recent security issues. In this process I decided to use the “generate password” functionality in 1password to set a new, strong password for my Fastmail account. Before I did that I made sure to set the “Account Recovery” email and phone number, so that if I made en error I would still be able to access my email.

And I was right. Indeed I made an error. I copied the generated password from 1password and pasted it into the change password dialog on fastmail. This logged me out, and then I managed to copy something else, removing the password from my clipboard. Then I managed to do something stupid in the 1password app, and my generated, 30-character, completely random, password was lost. I had managed to lock myself out of my email-account! Stupid! But hey, I have a recovery-email, right?

So I headed to the “Lost password screen” and typed in my gmail.address (to which I 10 minutes before had recieved a confirmation mail from fastmail).

Then I got the message:

The existing email address you entered was not for an existing user, or was for an account that has been disabled. Please try again

What?! Ok, after re-trying 5-6 times i had to open a ticket and provide a lot of information to regain-access by a manual process. In the ticket I wrote:

Thanks for the verification details.
I have now set your backup email address to:
*****@gmail.com

And I’m back in. Hooray! But I’m still wondering why the recovery email I entered did not work, so I’m asking:

Wasn’t my backup email set, or was there some problems regarding this feature? I am quite sure that I set my backup email yesterday.

The reply to this confused me:

Looks like the backup email address was not set. We then set it from our end and it worked for you. Please let me know if you need any further assistance.

After some back and forth I find out why:

Did you set this address from the Password & Security screen? If that is the case, you had set the “Recovery email address”. This is currently different from the backup email. Backup email can be set from the backend only.

And the password reset can be done using the backup email address only. The recovery process through recovery email address is not yet released into production. So I am afraid it will not work as of now.

What the actual, flying, fuck? The “Password & Security screen” is a frontend for some code that does not work? It presents itself as a way of setting a recovery mail, while it actually does nothing? The situation seems to have been like this for about 8 months, as this page from july 2016 clearly states:

Add your mobile phone number(s) and backup email address to the recovery options on the Password & Security screen. If you get locked out, we can use this to help verify your identity and restore access to your account.

I did express these concerns, and the reply I got was:

I really understand your frustration. I am sorry about that. I will pass your feedback to our supervisors.

We hope to implement the recovery procedure very soon.

But who knows? If they’ve been delaying this for 8 months now, I’m not confident that this will be fixed anytime soon, and that the “Password & Security screen” will continue to be a non-functioning, misleading page that does nothing but confuse the users. If the information isn’t used, don’t give the user the impression that it will. I can understand that not everything can be implemented at once, but have the balls to admit it, don’t lie to me. And about security issues? This is talentless!

So, to recap: The “Password & Security screen” of Fastmail is a sham. The information used there is not used. In order to regain access to your account if if loose your password you have to have a “backup email”. This backup email is not the same as the “recovery email”. The backup email has to be set by Fastmail staff.

Geospatial anarchy

It’s not that long ago since I started my PhD, but it feels like more time than a mere 2.5 months since.

But, what have I been doing? Well, one thing is that I’m taking classes, so some time has been spent attending lectures and examns (had my first examn in 8 years today, strange feeling). I’ve also started my literature review, so I’ve done a lot of reading.

But, to not derail too much: the title of this blog post is “Geospatial Anarchy”, which was the title of a talk I gave at the danish mapping conference “Kortdage” a week ago (see abstract here). The talk was in Norwegian (but understandable by danes, I hope). There is not much of a point in sharing my slides, as they are kinda devoid of meaning without me talking.

But, even better, the conference also asked if I could write an article covering the topic of the talk. Given the rather short deadline I opted out of the peer-review-process, but submitted a non-reviewed article.

I’ll post the abstract here, and if you want to read the whole article it’s available here.

OpenStreetMap (OSM) is the largest and best-known example of geospatial data creation using Volunteered Geographic Information (VGI). A large group of non-specialists joins their efforts online to create an open, worldwide map of the world. The project differs from traditional management of geospatial data on several accounts: both the underlying technology (Open Source components) and the mindset (schema-less structures using tags and changesets). We review how traditional organizations are currently using the OSM technology to meet their needs and how the mindset of OSM could be employed to traditional management of spatial datasets as well.