Thursday, June 23, 2016

Public review of Handling and Analyzing Marine Traffic Data

This mornings reading: Handling and Analyzing Marine Traffic Data, Masters Thesis by ERIC AHLBERG, JOAKIM DANIELSSON.  I hate to be harsh in public, but this thesis is more of a tease than anything else.  I was hoping for more and I hope that those involved follow on with more depth to the work and next time give better background to increase the value of the research.  This thesis shows that there is a start to interesting work.
With the emergence of the Automatic Identification System (AIS), the ability to track and analyze vessel behaviour within the marine domain was introduced. Nowadays, the ubiquitous availability of huge amounts of data presents challenges for systems aimed at using AIS data for analysis purposes regarding computability and how to extract valuable information from the data. This thesis covers the process of developing a system capable of performing AIS data analytics using state of the art Big data technologies, supporting key features from a system called Marine Traffic Analyzer 3. The results show that the developed system has improved performance, supports larger files and is accessible by more users at the same time. Another problem with AIS is that since the technology was initially constructed for collision avoidance-purposes, there is no solid mechanism for data validation. This introduces several issues, among them is what is called identity fraud, that is when a vessel impersonates another vessel for various malicious purposes. This thesis explores the possibility of detecting identity fraud by using clustering techniques for extracting voyages of vessels using movement patterns and presents a prototype algorithm for doing so. The results concerning the validation show some merits, but also exposes weaknesses such as time consuming tuning of parameters.
I skimmed to the reference section and conclusion and, while they reference some key relevant papers, they are missing a lot of references that you might expect.  No reference to ITU, IEC, IMO, IALA, or other relevant specifications.  No references to papers, presentations, or blog posts by me, ESR, or SkyTruth about AIS troubles or using "Big Data" type methods for AIS.  I'm uncomfortable tooting my own horn here, but come on.

Reading through the thesis, I couldn't find any real meat to the introduction and, when I got to the evaluation section, I was disappointed by this.  No references to even what model they used.  They could have easily reached out to a number of folks with AIS data and stats about data errors.  The thesis hasn't even described how AIS messages really work or any background on what perfectly functioning AIS message traffic might look like and its error characteristics.  Their one reference to spoofing was to the annoying web hack of injecting AIS messages into a companies feed, which no other ships would even see on their bridge.
The problem of AIS validation has been studied before, but to the knowledge of the authors of this thesis, there is no data consisting of documented cases of invalid data openly accessible. In addition, there is no measure of how often the specific problem occurs in real situations, which means that it might be too time consuming to use real data. Therefore, the evaluation focused on constructing dummy data to realistically model interesting scenarios which could be a sign of invalid AIS messages, and thereby get an indication of how well the solution performs.
Hey guys, check out my 2012 blog post: AIS Security and Integrity:

It was nice to see them go through various computing platforms, but the analysis was rather weak.  I have to wonder what they mean that a command line interface is hard to upgrade.  That to me seems easier that updating web apps.

Later we get to 4.2.2 AIS message validation.  When they refer to "Static validation, i.e. checking that the messages conform to the syntax of an AIS message" I really have no idea what they mean.  They haven't even defined a syntax for AIS nor told the reader where it might be defined.

The clustering stuff is okay, but the figures are very difficult to read until you get to 4.10.  Just when things are starting to get interesting, the thesis ends.  There is a section on ethical concerns that appears to be an afterthought and provides no new information (and not even a reference to the IMO announcement of > 15 years ago on the topic), analysis or opinions.  There were a whole pile of thoughts submitted to the US Federal Gov for a request some years ago.  Both sides of the argument submitted opinions.

Wishing for more...

No comments:

Post a Comment