Monday, July 5, 2010

More thoughts on sensor data in GeoRSS

We have started working with GeoRSS fields and my frustration with GeoRSS continues. I posted back in March asking for thoughts on GeoRSS: Sensor data in GeoRSS?. I got some helpful thoughts from Sean Gillies,
but have not heard from any others.

Here is an example of my USCGC Healy feed of science data that is only in HTML table form within the description:

Healy Science feed in Google Maps



It's got some really nice features that include:

  • Being pretty simple (when compared to SensorML and SOS)

  • You can just throw a URL in Google Maps and see the data

  • It works inside normal RSS readers (e.g. my USCGC Healy Feeds)

  • It's a simple file that you can copy from machine to machine or email.

  • It should be easy to parse

  • Points and lines work great


The trouble comes with:

  • Validation is a pain, but appears to be getting better

  • What should you put in the title and description?

  • It seams you have to create separate item tags for points and lines for things like ships

  • There is no standard for machine readable content other than the location


Here are my thoughts on the last point. First a typical entry that has info in the title and description:
<item>
<title>HOS IRON HORSE - 235072115: 0KTS @ 17 Jun 2010 16:07:10 GMT</title>
<description>Time: 17 Jun 2010 16:07:10 GMT Speed: 0</description>
<pubDate>17 Jun 2010 16:07:10 GMT</pubDate>
<georss:where>
<gml:Point>
<gml:pos>28.0287 -89.1005</gml:pos>
</gml:Point>
</georss:where>
</item>

This makes for easy human reading of data and works with GoogleMaps and OpenLayers.

This is how the AIS feed looks in Google Maps... mostly enough to see what is going on. If you know the name of a vessel, you can find it on the left side list of ships. That's a great start, but it is hard to take that info into a database. If the fields change at all in the title or description, all parsers that use that feed MUST be rewritten.



This buoy field shows the opposite idea of putting all the information into fields. It's also missing pubDate and such, but we should focus on the SeaSurfaceTemp and SampleDate. These are more machine readble. The XML and an image:
<item>
<title>SLDMB 43166</title>
<description>Most recent posrep from 43166</description>
<SeaSurfaceTemp>30C</SeaSurfaceTemp>
<SampleDate>2010-06-25 11:00:00.0Z</SampleDate>
<georss:where>
<gml:Point>
<gml:pos>27.241828, -84.663689</gml:pos>
</gml:Point>
</georss:where>
</item>




There are some problems in the sample above. First, the description doesn't contain a human readable version. This causes the GoogleMaps display to give us nothing more than the current position of this
"buoy" (and a recent track history that comes as a separate entry). Ouch. That's hard to preview. Second, the machine readable portion is fine, but I can't write anything that can discover additional data fields if they are added. If someone adds <foo>1234</foo>, is that field a part of something else or is it more sensor data that I should be tracking? A namespace for sensor data would help. Then I could pick off all of the fields that are in the "SimpleSensorData" namespace. But, I have to say that namespaces are good, but a pain. I would prefer a data block, where everything in there is a data field. It would also be good to separate units from the values. Here is how it might look:
<rss xmlns:georss="http://www.georss.org/georss" xmlns:gml="http://www.opengis.net/gml" version="2.0">
<channel>
<title>NOAA GeoRSS Feed</title>
<description>Vessel Tracks</description>

<item>
<title>HOS IRON HORSE - 235072115: 0KTS @ 17 Jun 2010 16:07:10 GMT</title>
<description>Time: 17 Jun 2010 16:07:10 GMT Speed: 17KTS
<a href="http://photos.marinetraffic.com/ais/shipdetails.aspx?MMSI=235072115">MarineTraffic entry for 235072115</a> <!-- or some other site to show vessel details -->
</description>
<updated>2010-06-17T16:07:10Z</updated>
<link href="http://gomex.erma.noaa.gov/erma.html#x=-89.1005&y=28.00287&z=11&layers=3930+497+3392"/>
<!-- Proposed new section -->
<data>
<mmsi>235072115</mmsi>
<name>HOS IRON HORSE</name>

<!-- Enermated lookup data types -->
<type_and_cargo value="244">INVALID</type_and_cargo>
<nav_status value="3">Restricted Maneuverability</nav_status>

<!-- Values with units -->
<cog units="deg true">287</cog>
<sog units="knots">0</sog>
<!-- Add more data fields here -->
</data>

<!-- The meat of GeoRSS -->
<georss:where>
<gml:Point>
<gml:pos>28.0287 -89.1005</gml:pos>
</gml:Point>
</georss:where>
</item>

<!-- more items -->
</channel>
</rss>

Perhaps it would be better inside of data to have each data item have a <DataItem> tag and inside have a name attribute.
<data_value name="sog" long_name="Speed Over Ground" units="knots">12.1</data_value>

Or we could just embed json within a data tag... but that would be mixing apples and oranges. If we start doing json, the entire response should be GeoJSON.

For referece, here is what one of those blue tails looks like in the GeoRSS:
<item>
<title>SLDMB 43166 Track</title>
<georss:where>
<gml:LineString>
<gml:posList>
27.241828 -84.663689
27.243782 -84.664666
27.245442 -84.66574
27.246978 -84.666779
27.248248 -84.668049
27.250104 -84.669318
27.251699 -84.670985
27.253158 -84.673045
27.254232 -84.6749
27.255209 -84.676561
27.256076 -84.678023
</gml:posList>
</gml:LineString>
</georss:where>
</item>

There was also an assertion that KML would be better. However, I would argue against KML as a primary data transport mechanism. KML is a presentation description, not a data encoding format. It's much like when I ask someone for an image and they take a perfectly good JPG, TIF, or PNG and put it in a PowerPoint... sending me only a PowerPoint. That can be useful, especially if there are annotations they want on a final, but I must have the original image or a lot is lost.

Looking at the KML, you can see that we have all the same problems with data being delivered by GeoRSS.
<Placemark>
<name>38681</name>
<description>Sample Date: 2010-07-02 18:00:00.0 Lat: 30.247965 Lon: -87.690424 SST: 35.8C</description>
<styleUrl>#msn_circle</styleUrl>
<Point>
<coordinates>-87.690424,30.247965,0</coordinates>
</Point>
</Placemark>




It's too bad that in Google Earth, you can't put a GeoRSS in a Network Link.

In case you are curious, wikipedia has an entry on SLDMBs: Self-Locating Datum Marker Buoy

Friday, June 11, 2010

Planning an AUV mission on the lake

The gavia is about to be driven across the lake to the test site.

Monday, June 7, 2010

Friday, June 4, 2010

Which c++ logging and unit test framework(s)?

Please comment!

Back a few years ago, I wrote slogcxx as an experiment in creating and easy to use logging system. It spent a little under a year as the logging system for GeoZui. I haven't touched the code in a long time and I am sure it can be done better. So, I have two questions that were triggered by a discussion with BRC. What packages would you suggest for C++ logging and unit testing. I did not including anything real in libais and would really like to work towards the best possible infrastructure to make this a long lived package. Some basic thoughts on requirements:
  • Must not be GPL, but LGPL is okay. It has to play nice

  • Be open source

  • Must be thread safe - a unit test may fire off a bunch of threads for a test

  • The logger must be able to handle multiple producers and consumers that are both in the same process and external

  • Likely to stick around for a few years

  • Relatively easy to use

  • Cross platform to Windows, Mac, and Linux

  • If not already available in fink, rpm, and deb form, then easily packaged

  • Have a safe for work project name... ahem

Just looking on Freshmeat, there are a lot of projects out there and slogcxx is in the pile still: C++ logging and C++ unit testing. The thought is often to look at Boost (e.g. Boost Logging Lib v2 and Boost test) because of the incredible peer review, but it is a pretty heavy dependency.

There are just so many options. Building a large system on the wrong one will lead to serious pain, but using a good one will encourage better software.

Dreaming of being able to subscribe to an AMQP feed for a cluster and user interface tearing through massive data crunching and knowing that the system passed a slew of unit tests that give us confidence that new code is less likely to pull the system down...

Another source of info: Wikipedia List of unit testing frameworks. There doesn't seem to be an equivalent page for loggers, but there is a little in the Java lib4j ports section.

P.S. I'm really starting to think that GNU autoconf sucks. It may be the standard, but a configure system is supposed to help, not break my brain. It shouldn't require massive training and experimentation to get it into place. I really should go look at cmake and scons again.

Wednesday, June 2, 2010

BAG file naming convention

Originally posted here: NOAA needs a public software interface specification for multibeam data. Reposted here for discussion.

Next week, I will be talking in Norfolk about technology transfer between NASA And NOAA. One thing that is clear, we need a file naming convention for published data files that come out of NOAA. These things are not fun, but they make or break the long term usability of data. When I join a NASA mission, the first few times I try to use these filename, it is a bit confusing, but once you lookup the codes and start to use them, you realize the power of well controlled file names. The naming convention for NOAA BAGs is great that it has a survey name, but the rest of the filename is basically information. It's done differently and there is no reference in the metadata to the standard applied. To kick off this idea, let us first think about just BAGs. What should be in the filename and how can we code it?

Take a look through the Mars Exploration Rover (MER) Software Interface Specification (SIS): MER_camsis_v4_9-25-07.pdf [JPL]

First, take a look at the Data Processing Level. With multibeam and lidar data, we have the similar kinds of concepts, but the data usually starts off collected with an IMU providing GPS derived positioning (not something we have on Mars at the moment).



Now for the part that I really want you to take a look at: Section 4.4, File Naming, P22. The file names are broken into a whole bunch of fields:
<scid><inst><sclk><prod><site><pos><seq><eye><filt><who><ver><ext>

Each of these fields is defined in the document. For BAG instrument type, I could see codes something like this: S = Single beam sonar M = Multibeam, l = topo lidar, L = bathy Lidar, G = predicted from Gravity, c = combined sensors, C = lead line or similar from a Cable, etc. Then when I had a BAG that
was from a bathy lidar, I would not be surprise when I opened it and it did not look like multibeam sonar data. There was no need to look at the metadata (which would not have told me) or the non-machine-readable Descriptive Reports.



The document has some examples of decoding image file names:



These kinds of conventions are really powerful, but they do take a while to create and debug. I've spent hundreds of hours working on spacecraft imaging teams making sure that everything in the SIS document is correct.