Friday, December 24, 2010

The best possible python file reading example?

I need help from the community. I've been working on a chapter that walks through the process of creating a python driver for reading a fairly simple binary format: Smoothed Best Estimate of Trajectory (SBET).

Python - parsing binary data files

My main constraint is that I am requiring Python 2.7.x. I should really make the read work with Python 3.x, but I'm not sure how to do that.

But... I am now at a point where I want to create the complete and final version. I can then back fill the second half of the chapter. The question is what do you all think should be in the end code and how does what I have so far stack up? I've just taken a stab at a __geo_interface__ and I've used Shapely. It can now do GeoJSON I want to provide simple SQLite, CSV, and KML writing.

If you are into python and up for giving the chapter and code a look, I could really use some feedback!

Things on the todo list... should do or not do any of these and if I should implement them, are there any strong opinions on how or best examples?

  • How to produce a good python package with a nice setup.py using distribute

  • documentation - pydoc and/or sphinx (it seems that epydoc is out of favor)

  • unit and doc testing

  • optparse command line interface

  • using sqlite3 to make a simple database

  • allowing the iterator to handle every nth sample or have a minimum distance between returned datagrams

  • kml output that has some flexibility

  • Possibly examples using sqlalchemy and/or sqlobject ORMs to access a database


Should I include a templated ISO XML metadata creation? If yes, which
templating engine would be best? I can use Python 2.7's .format or
one of the many other templating engines, but which one?

The code I have so far:
#!/usr/bin/env python

import math
import struct
import os
import mmap
import shapely.geometry
import geojson

datagram_size = 136

field_names = ('time', 'latitude', 'longitude', 'altitude',
'x_vel', 'y_vel', 'z_vel',
'roll', 'pitch', 'platform_heading', 'wander_angle',
'x_acceleration', 'y_acceleration', 'z_acceleration',
'x_angular_rate', 'y_angular_rate', 'z_angular',
'lon_deg', 'lat_deg')

class SbetDatagram(object):
def __init__(self, data, offset=0):
'Unpack the values from a datagram'

values = struct.unpack('17d', data[ offset : offset+ datagram_size ])

sbet_values = dict(zip (field_names, values))

sbet_values['lat_deg'] = math.degrees(sbet_values['latitude'])
sbet_values['lon_deg'] = math.degrees(sbet_values['longitude'])

self.__dict__.update(sbet_values)

@property
def geom(self):
return shapely.geometry.Point(self.lon_deg, self.lat_deg)

@property
def __geo_interface__(self):
'Provide a Geo Interface for GeoJSON serialization'
#
return {'type': 'Point', 'coordinates': (self.x, self.y)}

class SbetFile(object):
def __init__(self, filename, use_mmap=True):

self.filename = filename
sbet_file = open(filename,'rb')

if use_mmap:
sbet_size = os.path.getsize(filename)
self.data = mmap.mmap(sbet_file.fileno(), sbet_size, access=mmap.ACCESS_READ)
else:
self.data = sbet_file.read()

# Make sure the file is sane
assert(len(self.data)%datagram_size == 0)

self.num_datagrams = len(self.data) / datagram_size

def get_offset(self, datagram_index):
return datagram_index * datagram_size

def get_datagram(self, datagram_index):
offset = self.get_offset(datagram_index)
#values = self.decode(offset)
dg = SbetDatagram(self.data, offset)
return dg

def __repr__(self):
# http://docs.python.org/reference/datamodel.html#object.__repr__
return 'Sbet('+self.filename+')'

def __unicode__(self):
return unicode(self.__str__())
def __str__(self):
# http://docs.python.org/reference/datamodel.html#object.__str__
return 'SBet:' + self.filename

@property
def metadata(self):
'''Summary of an SBet - give min/max for all parameters.
May be slow the first time it is requested.'''
if '_metadata_' not in self.__dict__:

# Compute and cache the metadata
m = {}
for name in field_names:
m[name+'_min'] = None
m[name+'_max'] = None
for dg in self:
for name in field_names: # + ('',''):
if m[name+'_min'] is None or dg.__dict__[name] < m[name+'_max']:
m[name+'_min'] = dg.__dict__[name]
if m[name+'_max'] is None or dg.__dict__[name] > m[name+'_max']:
m[name+'_max'] = dg.__dict__[name]

self._metadata_ = m
return self._metadata_

def geom(self):
m = self.metadata
return shapely.geometry.Polygon( [
(m['lon_deg_min'],m['lat_deg_min']),
(m['lon_deg_min'],m['lat_deg_max']),
(m['lon_deg_max'],m['lat_deg_max']),
(m['lon_deg_max'],m['lat_deg_min']),
] )

@property
def __geo_interface__(self):
# http://geojson.org/geojson-spec.html#bounding-boxes
m = self.metadata
r = { "type": "Feature",
"id": self.filename,
"bbox": [m['lon_deg_min'], m['lat_deg_min'], m['lon_deg_max'], m['lat_deg_max']],
"geometry": {
"type": "Polygon",
"coordinates": list(self.geom().boundary.coords)
}
}

return r

def __iter__(self):
return SbetIterator(self)

class SbetIterator(object):
'Independent iterator class for Sbet files'
def __init__(self,sbet):
self.sbet = sbet
self.iter_position = 0

def __iter__(self):
return self

def next(self):
if self.iter_position >= self.sbet.num_datagrams:
raise StopIteration

values = self.sbet.get_datagram(self.iter_position)
self.iter_position += 1
return values

def main():

sbet = SbetFile('sample.sbet')

geodata = geojson.dumps(sbet)
print 'geodata:',geodata
print
decoded = geojson.loads(geodata)
print 'decoded:',decoded


if __name__ == '__main__':
main()

Wednesday, December 1, 2010

Which personal wiki / note taking system?

As you can probabably guess if you read my blog, I mostly use emacs
org-mode for my personal note taking and wiki needs. However, that
isn't alwasy the best way for a particular project and not everyone is
comfortable with emacs. At CCOM, we have a general mediawiki instance
for the whole group (70+ people) and a trac instance for coding
related wiki'ing, bugs, and svn/mercurial monitoring. At NASA, I've
been around a lot of twiki and elsewhere I've seen people using
MoinMoin. I didn't like twiki, but I saw Kanef do some amazing
auto-updating calendaring in it with some javascript magic that he
whipped up. Monica was using Wiki On A Stick (WOAS), but version
0.12.0 has been super buggy. There is also TiddyWiki and commercial
software (what was it than J. Dingler used??).

Wikipedia: Comparison of notetaking software and Comparison of wiki software. Note: neither one has WOAS. There is Personal wiki, which is more the topic at hand.

So... what do people use and why? Things like Trac and such can be
backed up, but I can see cross machine deployments being trouble.

Friday, November 12, 2010

Fortran 2008 demo / template file?

Does anybody have suggestions or pointers for a demo Fortran 2008 file or files? I'd like to find a couple that do a good job of simply demonstrating excellent Fortran 2008 coding style. I'm looking for things like properly defining functions outside of modules, simple modules, pointers, and handling command line arguments, and parsing data files. What should people do if they want a linked list, hash/dictionary, deque (deck), or set? Are there implementations of these in the Fortran standard library?

I gave a somewhat pathetic attempt at doing this...

First impressions of Fortran 2008

Wednesday, September 1, 2010

Open Source and GSF - the "Generic Sensor Format" for multibeam sonars

I am repeating myself... see Generic Sensor Format (GSF) Meeting (Sept 2008).

What makes open source software successful is a community that contributes back to the code base to make it better. Val is making a huge step towards that for the multibeam Generic Sensor Format by working on a sidescan addition to the format and posting about how to use GSF: A GSF Primer. Val even called for a code review. Yesterday, 5 of us sat down with Val and the code to give it a look. Many eyes for review is a great thing (unlike design by committee, which typically makes everyone equally unhappy).

That said, I worry about people using GSF as an archive or interchange format for multibeam sonar data right now. Here are some of the issues, some of which can be fixed and others that are intrinsic to the design. There needs to be open discussion and I argue that the original data (pre GSF) and the code that generated that data need to be archived.

First, the name implies that it is a "generic" format, but if you look into the code. One look at the gsfSensorSpecific struct should put this question to rest. There is a huge amount of information in a GSF file that is not generic. For some vendors, there are even multiple model specific structures (I count 7 for Simrad/EM/Kongsberg). This comes from the rapid evolution of sonars since GSF was first started in the early 1990's (I see a first date of 1994). If we really do want to have a generic sonar format, I think we need to design a series of new messages that cover the basic multibeam and back scatter data returned such that we don't need these vendor specific payloads. How does the MGD77 format need a special case??? This is just x, y, z, depth, gravity, and magnetic field. The 77 means 1977. This format has been around for a long time.

The next thing that is needed is a major code over hall. This would entail distributing a better build system (maybe CMake) that builds proper libraries for all the major architectures. As a part of this, GSF needs a series of unit tests that take a very wide range of sample multibeam files and convert them to a GSF file. Then read these back in and verify that the resulting GSF file makes sense. Even simpler yet, we need code that exercises all of the GSF code base that doesn't need large input files to test it. This unit test suit needs to be public and the non-input-file based code should be a part of the standard build process - aka unit testing. These unit tests also serve a second purpose of providing documentation for how the library should be used. To go along with this code base update, the library should be put into the major linux distributions as a standard package. This will mean that the library can't be called "libgsf" as that conflicts with the libgsf that is the GNOME Structured File library. Gnome is going to trump the sonar library for Ubuntu, Debian, RedHat, etc.

The next code update would be to have functions that can do very basic validation of every structure that is passed around the GSF library. Client code can then call these to verify that they are, at least at a basic level, passing in data that makes sense. There is still tons of room for errors, but if roll is +/- 180 degrees, we should not pass in a roll of 720 degrees. NOTE: roll right now is +/- 90 degrees, which will cause trouble for vehicles under overhangs (e.g. under ice, under ships, or in caves). The no data value for roll is 99.0. That is going to be a problem. I guess we can have rolls that go from -270 to +90 to get around this.

We also need to look at the performance of GSF. A mmap implementation of GSF would likely be much faster. What else can be done to speed up the code? We should discuss the idea of a SQLite standard 2nd file to go along with GSF and other multibeam log files that is similar to what MBSystem does. If is contains the basic metadata and possibly a prebuild index of packets, anything beyond the first pass of a GSF file will go much faster. An example would be pulling the navigation and sound velocity profiles (SVP) from the tail of the prior log file, would be faster if the file was already indexed in a standard way.

A final step of the code update would be to turn on all the major compiler warnings and fix them. At least -Wall for gcc should return no warnings. There appear to be headers that should be included and lots of point magic that works right now, but should be better documented so compilers can check any code changes. Also, the readers and writers should probably be switched to use a set of inline functions that do type checking that wrap the byte swapping and memcpy packing. Is the code totally 32 and 64 bit safe for all execution paths???

A very useful addition would be to package native reader/writer interfaces the common languages used by people who process this kind of data. This means having perl, python, and matlab interfaces. This should be a part of GSF and distributed along side. I know many people who have written their own interfaces to GSF and, while that is instructional to create one, at least one for each major language should be included in the distribution.

Finally, the documentation that goes with GSF needs to be updated. I have heard from several people who have written GSF code that the documentation is not enough to write a working encoder/decoder. Missing from the specification document are a lot of the motivations behind these packets. SAIC has put 16 years of hard work into GSF and learned a lot of lessons that can benefit the whole sonar community. We need to capture this.

It is super important to note that SAIC is only able to work on GSF based on its contracts with the US Navy. Either someone needs to pay SAIC to do some of this work or we, as a community, need to get cracking on this if GSF is going to have staying power. The same goes for MBSystem and other critical software. The lead authors are up to their eyeballs in work. This is a plea for the community to jump in. I try to contribute back as much as possible, but am maxed out. Find any nitch and pitch in. It doesn't matter if your contribution is large or small. Code, documentation, translating documentation to other languages, quality bug reports, building test cases, even just learning the tools and how they work... it's all important.

Sunday, August 29, 2010

Which clock should I get?

A seriously geeky clock:



We need to get a new clock for the table across the room in the
bedroom. It would be best if it was readable at night without my
glasses (my vision isn't that bad, but a lighted display is essential)
and if it reset itself using radio broadcasts (refered to as a href="http://en.wikipedia.org/wiki/Radio_clock">radio-controlled
clock) as we have so many power outages. More features would be
nice (weather, AM/FM/Scanner, etc), but I don't want a cluttered
display. What to people recommend?

NIST has a list of manufactures of time and frequency receivers

I'd like to spend less than $100.

Thursday, August 12, 2010

NSFastEnumeration success?

I am definitely a beginner when it comes to Objective-C! Any help would be hugely appreciated.

This code works for me, but I really feel like it will certainly blow up on me in the future. For example, what if someone calls autorelease drain in the middle of a for loop. Also, what the heck is the difference between itemPtr and stackbuf? The documentation for NSFastEnumeration is really weak and my code is not behaving as described:
stackbuf
A C array of objects over which the sender is to iterate.
itemsPtr
A C array of objects

That's not very helpful. I'm using only itemsPtr and it works. What exactly am I supposed to do with stackbuf and how should I handle memory allocation/deallocation of stackbuf and itemsPtr??? I read this discussion on Implementing NSFastEnumeration and feel even less confident that I have any idea what is going on.

The code is online here: simple-fast-enum2.m. I know this is a lot of code, but hang in there with me.
001: #import <Foundation/Foundation.h>
002: #include <assert.h>
003:
004: @interface Datagram : NSObject
005: {
006: int dgId;
007: }
008: -(id)initWithDatagramType:(int)datagramType;
009: -(void)dealloc;
010: -(NSString *)description;
011: @property (readonly) int dgId;
012: @end
013:
014: @implementation Datagram
015: @synthesize dgId;
016: - (NSString *)description {
017: return [NSString stringWithFormat: @"Datagram: dgId:", dgId];
018: }
019:
020: -(id)initWithDatagramType:(int)datagramType {
021: self = [super init];
022: if (!self) return self;
023: dgId = datagramType;
024: return self;
025: }
026:
027: -(void)dealloc {
028: NSLog(@"dealloc datagram: %d",dgId);
029: [super dealloc];
030: }
031: @end
032:
033: // Pretend sequence of packet ID's coming from a sonar
034: int testSeq[] = {
035: 3, 12, 4, 19, 8,
036: 2, 2, 2, 2, 2, 2, 2, 2, 2, 9,
037: 2, 2, 2, 2, 9,
038: 2,2,2,2,9,
039: 1,2,3,4,5,6,7,8,9,
040: 11,12,13,14,15,16,17,18,19,
041: 3,
042: 0 // End of sequence / array sentinal
043: };
044:
045: @interface DatagramFile : NSObject <NSFastEnumeration>
046: {
047: // No ivars
048: }
049: -(id)init;
050: @end
051:
052: @implementation DatagramFile
053: -(id)init {
054: self = [super init];
055: if (!self) return self;
056: // NOP
057: return self;
058: }
059:
060: - (NSUInteger)countByEnumeratingWithState:(NSFastEnumerationState *)state objects:(id *)stackbuf count:(NSUInteger)len
061: {
062: NSLog(@"In countByEnumeratingWithState: stackbuf: %p, count: %d", stackbuf, len);
063: NSLog(@"\t state struct: state=%d %p %p", state->state, state->itemsPtr, state->mutationsPtr);
064: if (stackbuf) {
065: NSLog(@"***INSPECTING STACKBUF\n");
066: for(int i=0;i<1000 && stackbuf[i]!=0;i++) {
067: NSLog(@"Stackbuf %d: %p",i,stackbuf[i]); // What should I do with stackbuf[i]?
068: }
069: }
070: if (0 == state->state) {
071: NSLog(@"Initializing loop");
072: assert(0==state->itemsPtr);
073: state->itemsPtr = malloc(sizeof(id)*16);
074: memset(state->itemsPtr, 0, sizeof(id)*16);
075: } else if (0==len) {
076: // Will this get called if the call uses break inside the for loop?
077: NSLog(@"Finished loop. cleanup");
078: free(state->itemsPtr);
079: state->itemsPtr = 0;
080: return 0;
081: }
082: state->mutationsPtr = (unsigned long *)self; // Tell the caller that the file has not changed
083:
084: NSUInteger count=0;
085: for (; count < len && testSeq[state->state]!=0; count++, state->state++) {
086: NSLog(@"Creating datagram of type %d state: %d count %d",testSeq[state->state], state->state, count);
087: Datagram *dg = [[Datagram alloc] initWithDatagramType:testSeq[state->state]];
088: state->itemsPtr[count] = dg;
089: [dg autorelease];
090: }
091: NSLog(@"countByEnumeratingWithState read %d datagrams. state->state: %d",count, state->state);
092: return count;
093: }
094: @end // implementation DatagramFile
095:
096: int main (int argc, const char * argv[]) {
097: NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
098:
099: DatagramFile *df = [[DatagramFile alloc] init];
100: for (Datagram *dg in df) {
101: NSLog(@"About to read datagram in for");
102: NSLog(@" Datagram type: %d", [dg dgId]);
103: }
104:
105: NSLog(@"about to drain pool");
106: [pool drain];
107: NSLog(@"pool drained. ready for winter");
108: return 0;
109: }

Here is the output of the program that says that, at least in this case, it works.

2010-08-12 07:56:25.213 simple-fast-enum2[22266:a0f] In countByEnumeratingWithState: stackbuf: 0x7fff5fbfd930, count: 16
2010-08-12 07:56:25.215 simple-fast-enum2[22266:a0f] state struct: state=0 0x0 0x0
2010-08-12 07:56:25.216 simple-fast-enum2[22266:a0f] ***INSPECTING STACKBUF
2010-08-12 07:56:25.216 simple-fast-enum2[22266:a0f] Stackbuf 0: 0x7fff5fc005f8
2010-08-12 07:56:25.217 simple-fast-enum2[22266:a0f] Stackbuf 1: 0x7
2010-08-12 07:56:25.217 simple-fast-enum2[22266:a0f] Stackbuf 2: 0x7fff5fc3c028
2010-08-12 07:56:25.218 simple-fast-enum2[22266:a0f] Stackbuf 3: 0x7fff5fbfda40
2010-08-12 07:56:25.218 simple-fast-enum2[22266:a0f] Stackbuf 4: 0x7fff5fbfda00
2010-08-12 07:56:25.219 simple-fast-enum2[22266:a0f] Stackbuf 5: 0x7fff5fc016d2
2010-08-12 07:56:25.219 simple-fast-enum2[22266:a0f] Initializing loop
2010-08-12 07:56:25.220 simple-fast-enum2[22266:a0f] Creating datagram of type 3 state: 0 count 0
2010-08-12 07:56:25.220 simple-fast-enum2[22266:a0f] Creating datagram of type 12 state: 1 count 1
[snip - parsing data and adding it to itemPtr]
2010-08-12 07:56:25.227 simple-fast-enum2[22266:a0f] Creating datagram of type 9 state: 14 count 14
2010-08-12 07:56:25.227 simple-fast-enum2[22266:a0f] Creating datagram of type 2 state: 15 count 15
2010-08-12 07:56:25.228 simple-fast-enum2[22266:a0f] countByEnumeratingWithState read 16 datagrams. state->state: 16
2010-08-12 07:56:25.228 simple-fast-enum2[22266:a0f] About to read datagram in for
2010-08-12 07:56:25.229 simple-fast-enum2[22266:a0f] Datagram type: 3
[snip - inside for loop using datagrams]
2010-08-12 07:56:25.243 simple-fast-enum2[22266:a0f] About to read datagram in for
2010-08-12 07:56:25.244 simple-fast-enum2[22266:a0f] Datagram type: 2

[ snip - 16 blocks handled]

[ snip - 12 blocks handled ... finished the list of datagrams]

Last block...

2010-08-12 07:56:25.314 simple-fast-enum2[22266:a0f] In countByEnumeratingWithState: stackbuf: 0x7fff5fbfd930, count: 16
2010-08-12 07:56:25.314 simple-fast-enum2[22266:a0f] state struct: state=44 0x1001100a0 0x10010c760
2010-08-12 07:56:25.315 simple-fast-enum2[22266:a0f] ***INSPECTING STACKBUF
2010-08-12 07:56:25.315 simple-fast-enum2[22266:a0f] Stackbuf 0: 0x7fff5fc005f8
2010-08-12 07:56:25.316 simple-fast-enum2[22266:a0f] Stackbuf 1: 0x7
2010-08-12 07:56:25.316 simple-fast-enum2[22266:a0f] Stackbuf 2: 0x7fff5fc3c028
2010-08-12 07:56:25.317 simple-fast-enum2[22266:a0f] Stackbuf 3: 0x7fff5fbfda40
2010-08-12 07:56:25.317 simple-fast-enum2[22266:a0f] Stackbuf 4: 0x7fff5fbfda00
2010-08-12 07:56:25.318 simple-fast-enum2[22266:a0f] Stackbuf 5: 0x7fff5fc016d2
2010-08-12 07:56:25.318 simple-fast-enum2[22266:a0f] countByEnumeratingWithState read 0 datagrams. state->state: 44

2010-08-12 07:56:25.319 simple-fast-enum2[22266:a0f] about to drain pool
2010-08-12 07:56:25.319 simple-fast-enum2[22266:a0f] dealloc datagram: 3
2010-08-12 07:56:25.320 simple-fast-enum2[22266:a0f] dealloc datagram: 19
2010-08-12 07:56:25.320 simple-fast-enum2[22266:a0f] dealloc datagram: 18
2010-08-12 07:56:25.321 simple-fast-enum2[22266:a0f] dealloc datagram: 17
[ snip ]
2010-08-12 07:56:25.340 simple-fast-enum2[22266:a0f] dealloc datagram: 12
2010-08-12 07:56:25.341 simple-fast-enum2[22266:a0f] dealloc datagram: 3
2010-08-12 07:56:25.341 simple-fast-enum2[22266:a0f] pool drained. ready for winter

This shows that code runs all the way through and deallocs all of the datagrams when the pool was drained. I'm leaking my itemsPtr array as "Finished loop" never appears in the output. I need to test for the loops to build itemsPtr does not add any items and free itemPtr then.

So... help! Is this right? How do I make it better? What should I be doing with stackBuf? How do I not get in trouble with a break?

Wednesday, August 4, 2010

A cleaner way to have an NSMutableArray of counters?

How do I make this clean. I know how to make this really simple with a C array or a C++ vector, but I'm trying to learn ObjC! Any help greatly appreciated.

NSMutableArray *count_id = [NSMutableArray new];
// Can this be done better for an initialization?
for (int i; i < 256; i++) {
[count_id addObject:[NSNumber numberWithInt:0]];
}

NSData *data = [NSData dataWithContentsOfFile:fileName];

for (size_t dgStart = 0; dgStart < data.length; ) {
unsigned char id;
unsigned int size;
[data getBytes:&size range:NSMakeRange(dgStart, 4)];
[data getBytes:&id range:NSMakeRange(dgStart+5, 1)];

//
// Yuck! I just want to increment the number by 1. How do I improve this?
//
[count_id replaceObjectAtIndex:id withObject: [NSNumber numberWithInt:[[count_id objectAtIndex:id] integerValue]+1 ] ];

// Jump to the next packet
dgStart += size+4;
}

Tech and science podcasts?

I have been listening to a lot of FLOSS
podcasts
lately. What suggestions do you all have for great
podcasts for Mac, Linux, spatial stuff, and science pod casts?

Monday, July 5, 2010

More thoughts on sensor data in GeoRSS

We have started working with GeoRSS fields and my frustration with GeoRSS continues. I posted back in March asking for thoughts on GeoRSS: Sensor data in GeoRSS?. I got some helpful thoughts from Sean Gillies,
but have not heard from any others.

Here is an example of my USCGC Healy feed of science data that is only in HTML table form within the description:

Healy Science feed in Google Maps



It's got some really nice features that include:

  • Being pretty simple (when compared to SensorML and SOS)

  • You can just throw a URL in Google Maps and see the data

  • It works inside normal RSS readers (e.g. my USCGC Healy Feeds)

  • It's a simple file that you can copy from machine to machine or email.

  • It should be easy to parse

  • Points and lines work great


The trouble comes with:

  • Validation is a pain, but appears to be getting better

  • What should you put in the title and description?

  • It seams you have to create separate item tags for points and lines for things like ships

  • There is no standard for machine readable content other than the location


Here are my thoughts on the last point. First a typical entry that has info in the title and description:
<item>
<title>HOS IRON HORSE - 235072115: 0KTS @ 17 Jun 2010 16:07:10 GMT</title>
<description>Time: 17 Jun 2010 16:07:10 GMT Speed: 0</description>
<pubDate>17 Jun 2010 16:07:10 GMT</pubDate>
<georss:where>
<gml:Point>
<gml:pos>28.0287 -89.1005</gml:pos>
</gml:Point>
</georss:where>
</item>

This makes for easy human reading of data and works with GoogleMaps and OpenLayers.

This is how the AIS feed looks in Google Maps... mostly enough to see what is going on. If you know the name of a vessel, you can find it on the left side list of ships. That's a great start, but it is hard to take that info into a database. If the fields change at all in the title or description, all parsers that use that feed MUST be rewritten.



This buoy field shows the opposite idea of putting all the information into fields. It's also missing pubDate and such, but we should focus on the SeaSurfaceTemp and SampleDate. These are more machine readble. The XML and an image:
<item>
<title>SLDMB 43166</title>
<description>Most recent posrep from 43166</description>
<SeaSurfaceTemp>30C</SeaSurfaceTemp>
<SampleDate>2010-06-25 11:00:00.0Z</SampleDate>
<georss:where>
<gml:Point>
<gml:pos>27.241828, -84.663689</gml:pos>
</gml:Point>
</georss:where>
</item>




There are some problems in the sample above. First, the description doesn't contain a human readable version. This causes the GoogleMaps display to give us nothing more than the current position of this
"buoy" (and a recent track history that comes as a separate entry). Ouch. That's hard to preview. Second, the machine readable portion is fine, but I can't write anything that can discover additional data fields if they are added. If someone adds <foo>1234</foo>, is that field a part of something else or is it more sensor data that I should be tracking? A namespace for sensor data would help. Then I could pick off all of the fields that are in the "SimpleSensorData" namespace. But, I have to say that namespaces are good, but a pain. I would prefer a data block, where everything in there is a data field. It would also be good to separate units from the values. Here is how it might look:
<rss xmlns:georss="http://www.georss.org/georss" xmlns:gml="http://www.opengis.net/gml" version="2.0">
<channel>
<title>NOAA GeoRSS Feed</title>
<description>Vessel Tracks</description>

<item>
<title>HOS IRON HORSE - 235072115: 0KTS @ 17 Jun 2010 16:07:10 GMT</title>
<description>Time: 17 Jun 2010 16:07:10 GMT Speed: 17KTS
<a href="http://photos.marinetraffic.com/ais/shipdetails.aspx?MMSI=235072115">MarineTraffic entry for 235072115</a> <!-- or some other site to show vessel details -->
</description>
<updated>2010-06-17T16:07:10Z</updated>
<link href="http://gomex.erma.noaa.gov/erma.html#x=-89.1005&y=28.00287&z=11&layers=3930+497+3392"/>
<!-- Proposed new section -->
<data>
<mmsi>235072115</mmsi>
<name>HOS IRON HORSE</name>

<!-- Enermated lookup data types -->
<type_and_cargo value="244">INVALID</type_and_cargo>
<nav_status value="3">Restricted Maneuverability</nav_status>

<!-- Values with units -->
<cog units="deg true">287</cog>
<sog units="knots">0</sog>
<!-- Add more data fields here -->
</data>

<!-- The meat of GeoRSS -->
<georss:where>
<gml:Point>
<gml:pos>28.0287 -89.1005</gml:pos>
</gml:Point>
</georss:where>
</item>

<!-- more items -->
</channel>
</rss>

Perhaps it would be better inside of data to have each data item have a <DataItem> tag and inside have a name attribute.
<data_value name="sog" long_name="Speed Over Ground" units="knots">12.1</data_value>

Or we could just embed json within a data tag... but that would be mixing apples and oranges. If we start doing json, the entire response should be GeoJSON.

For referece, here is what one of those blue tails looks like in the GeoRSS:
<item>
<title>SLDMB 43166 Track</title>
<georss:where>
<gml:LineString>
<gml:posList>
27.241828 -84.663689
27.243782 -84.664666
27.245442 -84.66574
27.246978 -84.666779
27.248248 -84.668049
27.250104 -84.669318
27.251699 -84.670985
27.253158 -84.673045
27.254232 -84.6749
27.255209 -84.676561
27.256076 -84.678023
</gml:posList>
</gml:LineString>
</georss:where>
</item>

There was also an assertion that KML would be better. However, I would argue against KML as a primary data transport mechanism. KML is a presentation description, not a data encoding format. It's much like when I ask someone for an image and they take a perfectly good JPG, TIF, or PNG and put it in a PowerPoint... sending me only a PowerPoint. That can be useful, especially if there are annotations they want on a final, but I must have the original image or a lot is lost.

Looking at the KML, you can see that we have all the same problems with data being delivered by GeoRSS.
<Placemark>
<name>38681</name>
<description>Sample Date: 2010-07-02 18:00:00.0 Lat: 30.247965 Lon: -87.690424 SST: 35.8C</description>
<styleUrl>#msn_circle</styleUrl>
<Point>
<coordinates>-87.690424,30.247965,0</coordinates>
</Point>
</Placemark>




It's too bad that in Google Earth, you can't put a GeoRSS in a Network Link.

In case you are curious, wikipedia has an entry on SLDMBs: Self-Locating Datum Marker Buoy

Friday, June 11, 2010

Planning an AUV mission on the lake

The gavia is about to be driven across the lake to the test site.

Monday, June 7, 2010

Friday, June 4, 2010

Which c++ logging and unit test framework(s)?

Please comment!

Back a few years ago, I wrote slogcxx as an experiment in creating and easy to use logging system. It spent a little under a year as the logging system for GeoZui. I haven't touched the code in a long time and I am sure it can be done better. So, I have two questions that were triggered by a discussion with BRC. What packages would you suggest for C++ logging and unit testing. I did not including anything real in libais and would really like to work towards the best possible infrastructure to make this a long lived package. Some basic thoughts on requirements:
  • Must not be GPL, but LGPL is okay. It has to play nice

  • Be open source

  • Must be thread safe - a unit test may fire off a bunch of threads for a test

  • The logger must be able to handle multiple producers and consumers that are both in the same process and external

  • Likely to stick around for a few years

  • Relatively easy to use

  • Cross platform to Windows, Mac, and Linux

  • If not already available in fink, rpm, and deb form, then easily packaged

  • Have a safe for work project name... ahem

Just looking on Freshmeat, there are a lot of projects out there and slogcxx is in the pile still: C++ logging and C++ unit testing. The thought is often to look at Boost (e.g. Boost Logging Lib v2 and Boost test) because of the incredible peer review, but it is a pretty heavy dependency.

There are just so many options. Building a large system on the wrong one will lead to serious pain, but using a good one will encourage better software.

Dreaming of being able to subscribe to an AMQP feed for a cluster and user interface tearing through massive data crunching and knowing that the system passed a slew of unit tests that give us confidence that new code is less likely to pull the system down...

Another source of info: Wikipedia List of unit testing frameworks. There doesn't seem to be an equivalent page for loggers, but there is a little in the Java lib4j ports section.

P.S. I'm really starting to think that GNU autoconf sucks. It may be the standard, but a configure system is supposed to help, not break my brain. It shouldn't require massive training and experimentation to get it into place. I really should go look at cmake and scons again.

Wednesday, June 2, 2010

BAG file naming convention

Originally posted here: NOAA needs a public software interface specification for multibeam data. Reposted here for discussion.

Next week, I will be talking in Norfolk about technology transfer between NASA And NOAA. One thing that is clear, we need a file naming convention for published data files that come out of NOAA. These things are not fun, but they make or break the long term usability of data. When I join a NASA mission, the first few times I try to use these filename, it is a bit confusing, but once you lookup the codes and start to use them, you realize the power of well controlled file names. The naming convention for NOAA BAGs is great that it has a survey name, but the rest of the filename is basically information. It's done differently and there is no reference in the metadata to the standard applied. To kick off this idea, let us first think about just BAGs. What should be in the filename and how can we code it?

Take a look through the Mars Exploration Rover (MER) Software Interface Specification (SIS): MER_camsis_v4_9-25-07.pdf [JPL]

First, take a look at the Data Processing Level. With multibeam and lidar data, we have the similar kinds of concepts, but the data usually starts off collected with an IMU providing GPS derived positioning (not something we have on Mars at the moment).



Now for the part that I really want you to take a look at: Section 4.4, File Naming, P22. The file names are broken into a whole bunch of fields:
<scid><inst><sclk><prod><site><pos><seq><eye><filt><who><ver><ext>

Each of these fields is defined in the document. For BAG instrument type, I could see codes something like this: S = Single beam sonar M = Multibeam, l = topo lidar, L = bathy Lidar, G = predicted from Gravity, c = combined sensors, C = lead line or similar from a Cable, etc. Then when I had a BAG that
was from a bathy lidar, I would not be surprise when I opened it and it did not look like multibeam sonar data. There was no need to look at the metadata (which would not have told me) or the non-machine-readable Descriptive Reports.



The document has some examples of decoding image file names:



These kinds of conventions are really powerful, but they do take a while to create and debug. I've spent hundreds of hours working on spacecraft imaging teams making sure that everything in the SIS document is correct.

Friday, May 7, 2010

PostgreSQL insert or update

What is the fastest way to do this for large numbers of entries? This is working for one entry... at least I thought it was until I tried it again.

#!/usr/bin/env python

import psycopg2
cx = psycopg2.connect("dbname='ais_test'")
cu = cx.cursor()

for i in range(5):
print i

cu.execute('''BEGIN;
SAVEPOINT sp1;
INSERT INTO vessel_name VALUES(12345678, 'RUST BUCKET1', 67);
ROLLBACK TO sp1;
UPDATE vessel_name SET name='RUST BUCKET', type_and_cargo=66 WHERE mmsi = 123456789;
COMMIT;''')

cx.commit()


This is based on http://www.postgresql.org/docs/8.3/interactive/sql-update.html

Friday, April 30, 2010

C++ python - need help

Arg... I really need a simple working example of C++ from a python module with a reasonable and simple setup.py. Why does that seem too much to ask?

http://paste.lisp.org/display/98607

Updated 2010-May-01:
Asked and answered here: Frustration with C++ and Python. May have also stumbled onto a gcc 4.4.2 bug that added to my confusion.

Sunday, April 4, 2010

NMEA NTP status sentence ZNT

Comments? The original post is here: A proprietary NMEA sentence for NTP status

nmea_znt.py

./nmea_znt.py -h
Usage: nmea_znt.py

Options:
--version show program's version number and exit
-h, --help show this help message and exit
-H HOSTNAME, --hostname=HOSTNAME
Host IPv4 address [default: 127.0.0.1]
-v, --verbose

./nmea_znt.py
$NTZNT,1270398503.49,127.0.0.1,17.151.16.23,3,1270398342.13,0.000074,-20,0.268707,0.034073*44

./nmea_znt.py -v
$NTZNT,1270398471.54,127.0.0.1,17.151.16.23,3,1270398342.13,0.000071,-20,0.268707,0.033585*42

ZNT - NMEA Proprietary NTP status report

talker: NT
timestamp: 1270398471.54
host: 127.0.0.1
ref_clock: 17.151.16.23
stratum: 3
last_update: 1270398342.13
offset: 7.1e-05
precision: -20.0
root_delay: 0.268707
root_dispersion: 0.033585

{'root_delay': 0.26870699999999997, 'timestamp': 1270398471.54, 'stratum': 3,
'precision': -20.0, 'nmea_type': 'ZNT', 'last_update': 1270398342.1300001,
'ref_clock': '17.151.16.23', 'host': '127.0.0.1', 'talker': 'NT', 'root_dispersion':
0.033584999999999997, 'offset': 7.1000000000000005e-05, 'checksum': '42'}

Wednesday, March 24, 2010

SONS 2010

I just walked in the door. I am still trying to get organized and
figure out what is going on.

Friday, March 19, 2010

Sensor data in GeoRSS?


  1. The OGC Atom+GeoRSS validation results are confusing. Feed Validator has a better presentation

  2. It loosk like the OGC version only supports GML

  3. How should the payload be encoded? This is the big question. See line 24, where we make hard to parse human readable text.

  4. I think the ID is wrong, but it still not clear to me how id's should be handled.

  5. How many data points should be in the feed? If we are collecting a point every 12 seconds, do we want to just have 120 seconds of data? Do we need a realtime feed an one that is decimated? And if decimated, what is the "best" algorithm? I hate systems where you can't find out the actual algorithm used (Ahem! You know who you are!)

  6. Does anything actually pay attention to the syndication flags? We definitely need to have these match the data window (see previous point)

  7. And lots of other issues that escape me right now...


Thoughts?

Tuesday, February 16, 2010

USCG Request For Comment on NAIS data release

I just submitted this to the USCG's RFC 2010-632. I hope everybody who has a stake in this has submitted comments. I am sure that I will frustrate some with these comments, but I strongly believe that getting raw feeds to as many as possible is critical. There are competing commercial interests, but my end goal is to find the best ways to support mariners getting their jobs done and those who manage the water ways to keep marine use rolling (err... steaming). This includes protection, monitoring, and restoration of the environment. How do we make business, the environment, and safety all work well together? It's a challenge that might not be easy, but it's worth the effort.

I strongly encourage people to comment here or anywhere else the discussion is going on. If you don't agree with what I've written, I would like to hear it

NAIS data release policies for the USCG

Response to RFC
Docket No. USCG-2009-0701
by Kurt Schwehr

Background

These comments are by Kurt Schwehr and only represent my personal opinions. These do not represent the Center for Coastal and Ocean Mapping, the Joint Hydrographic Center, the University of New Hampshire, the RTCM, NOAA, the USCG, or any other organizations with which I work.

First, a brief summary of my experience with AIS: I have worked with AIS since 2006 and have been a member of the RTCM SC121 working group on Expanded AIS for VTS. I have worked with Class A transceivers, Class B transceivers, Basestations, ATON transceivers, and receive only units (including terrestrial, shipborne, and space-based). As a part of my work with NOAA, I have been receiving realtime NAIS for over one year and from the USCG RDC development network starting in 2006. I helped initiate the creation of the Area Notice binary message that is being used to transmit acoustic detections of right whale calls off of Boston, MA.

Summary

My overall opinion is that the only increased risk from releasing raw AIS data comes from the economic impact to those who are trying to sell AIS feeds. Outside of the 3 or 4 groups doing Satellite based AIS, I don't think these companies should get protection. There is already stiff competition, e.g. AIS Hub will give you AIS data for free if you contribute back at least a little bit of data. AIS receivers start from about $190. The real value comes from interpretation - analysis and display of these feeds. More access to AIS data means to me that we will get more people involved in analysis and it will speed the uptake of AIS binary messages. As for security, if the USCG needs to keep this data restricted as sensitive, then it should not have been broadcast in the clear to start with. Hiding addressed messages is strange when anyone with a receiver and gpsd can see most of these addressed messages anyway.

Comments / responses as organized in the RFC

General

We request your general comments on the applicability and levels of
the sharing of information collected by the NAIS, the definition of
historical NAIS information, and any commercial or security
sensitivities with respect to sharing of information collected by the
NAIS.

The key aspect of AIS is that this is a public broadcast technology. Anyone with a receiver in the area is permitted to receive any message at any time without restrictions. There is no copyright associated with these broadcasts. Therefore, placing any restrictions requiring USCG or DHS personnel to protect this public data is unnecessary and places an extra burden and cost of time and money on the USCG. Protecting already public data does not make sense. Messages that are sensitive are already encrypted in the blue force / Hawkeye system. If there is sensitive data being broadcast, these transmissions are inappropriate for AIS.

The NAIS system is paid for with public money and is intended to benefit the public. Public broadcast of AIS messages over the VHF marine channels is mandated by IMO and the USCG. There is no notion of privacy in the system (with the exception of blue force encrypted military transmissions).

1. How might providing real-time, near real-time, or historical NAIS information to the public impact maritime commerce?

Understanding how US waterways are used and have historically been used is critically important for anyone involved in any of these categories:

  • Operating ships
  • Managing the movement of vessels while not on board (e.g. shoreside management or VTS type operations)
  • Search and rescue (SAR)
  • Protecting and allocating marine resources (Coastal & Marine Spatial Planning / CMSP)
  • Emergency response
  • Restoration of marine resources

Analysts in public, private, and academic environments can benefit from the availability of AIS. An important concept to remember is that with any new technology, we cannot predict all of the new uses that will be valuable to at least some segment of the community.

2. What would be the impact of providing this information, if any, on the following?

a. Safety of ships and passengers or crew,

There would be no negative impacts on safety. This data is already public and can be had through a range of commercial and non-commercial services. Anyone with a receiver or access to a service can already get this data.

Greater access to AIS data would allow all involved access to analyze events and usage levels of navigable waters. Hopefully more researchers will use AIS to investigate the causes of incidents and work towards improved mariner training.

b. Security of ships and their cargo,

These messages are already broadcast in the clear for anyone to receive, such that there are no negative impacts from releasing this through N-AIS. If ship position and cargo are at risk, then these should be encrypted at the broadcast level or not transmitted at all.

c. Economic advantage or disadvantage to commercial stakeholders,

The only real negative impact would be for commercial groups selling AIS feeds. However, these companies are competing against groups like AIS Hub, where access is given to all who share their own local data. The one area that currently needs to be grown with protections is the space-based AIS companies (S-AIS). Their costs are high and the maritime community needs these companies to succeed to provide coverage where land based receivers cannot reach.

The competitive advantage for AIS providers comes not from the AIS raw data, but from the value added to the data stream through analysis, filtering, and presentation of what is going on in water ways and what trends exist.

The gains are likely negligible to moderate for most commercial enterprises. Most large companies are not likely change whether or not they are analyzing AIS data. N-AIS data release might encourage more stakeholders to undertake AIS analysis, thereby spurring innovation.

d. Environmental impact on extractable resources or coastal activities.

Increased access to AIS data through N-AIS data release, will likely improve the use of AIS for protecting the environment and understanding the impact for maritime uses on the environment. AIS is likely to the backbone of Coastal and Marine Spatial Planning (CMSP) and is critical to effective environmental response to incidents in the maritime environment.

3. Is information collected by the NAIS considered sensitive?

In general, nothing in the raw data feeds from NAIS should be considered sensitive. The raw data is purely data that has been broadcast in the clear. The key is that all sensitive data must be encrypted before broadcasting. Passing this information through NAIS leaves the data encrypted.

a. Is real-time or near real-time information collected by the NAIS viewed differently than historical NAIS information, and if so, how?

No. Since all broadcasts are done in the clear.

b. Does the sharing of information collected by the NAIS generate concern about unfair commercial advantage? If so, for which segments of the industry is this a concern?

There should be no unfair commercial advantage. The data is already publicly broadcast by ships and it is up to companies to decide if and how they want to use this information, whether it be from NAIS, commercial providers, non-profit groups, or their own ship/shore based receivers.

c. Is there a timeframe within which real-time or historical information collected by the NAIS is considered sensitive or is no longer considered sensitive?

There is no timeframe in which the data should be considered sensitive as it is initially broadcast over public channels.

d. Given that ships last for decades and that their capabilities and capacities are relatively stable, is there a concern that historical NAIS information might be analyzed to derive a competitive advantage?

This data is already broadcast publicly. NAIS will make no difference on whether or not companies choose to use this information (which currently is often incorrect).

4. What controls on sharing real-time, near real-time, or historical information collected by the NAIS with the public are suitable?

a. Who should receive each type of NAIS information?

The data is already publicly broadcast for anyone to receive. Therefore, why should it be restricted?

b. What are appropriate uses of information collected by the NAIS?

Any legal use. These uses range from general Coastal & Marine Spatial Planning (CSMP), resource use, needs for dredging, hydrographic survey planning, risk analysis of collision and groundings, allocation of infrastructure improvements, controlling loads on port facilities and docks, pollution control, estimating best times for activities in ports and waterways (maintenance, port entry, etc), situational awareness, search and rescue (SAR), vessel traffic analysis, predicting arrival times, monitoring RF propagation and noise conditions, incident analysis, detecting unusual ship behavior (e.g. engine failure), reselling modified feeds with value added, and many more.

c. Do message types matter?

No. These are publicly broadcast messages.

d. Should addressed messages be handled differently from broadcast messages? Do addressed messages contain information significant to understanding maritime activity? Should addressed messages be shared with the public?

No. These messages are broadcast publicly without encryption. Anybody with a receiver and appropriate software (e.g. GPSD) can decode and read these messages. The goal of an addressed message is not privacy, but to direct information to a vessel. If a private message is required, some other means of communications should be used with appropriate encryption.