Sunday, August 30, 2015

Done with Generic Sensor Format

I think I am at the point where it is time to set aside the work on the sonar Generic Sensor Format (GSF) that I've been doing.  My personal goal with this was to demonstrate what direction(s) I think GSF should go in.  I think I've done that by showing:

  • Adding unit tests to the old C code and continuous integration testing
  • Auditing the C code with tools like ASAN, MSAN, Coverity, etc.
  • Creating the beginnings a modern C++ library that is designed with testing from the start
  • Starting a python utility library to facilitate creating tests for the C and C++ code
  • Identifying files that would make the beginnings of a good test suite
  • Show that history comments belongs in the revision history and changelog file, not the actual source code
  • Start a list of issues with the code and show solving some of them
  • Demonstrate payback to Leidos (formerly SAIC) for open sourcing GSF
At this point, I have put in quite a bit of time, squashed a lot of bugs, and set the stage for what I think the direction should be.  However, looking at GSF in depth, it is clear that this is not a technology that the community should rely on.  While the idea of GSF is great, it's fundamentally broken in many of the same ways as AIS that ESR and I identified in our toils paper.  There are so many better technologies that could help build a format that was actually robust and capable for long term support of the community.  For the good and the bad alternatives,   see https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats).  I appreciate the people who helped get me.  Evan Robertson went through the NGDC catalog and find files from older versions of GSF and Shannon Bryne at Leidos for the open sourcing process that took form 2008-2014.

There has not been any feedback from the community and no uptake of any of the code, with my goals met, it's time to hang it up.  My hope is that eventually a group of people will pick up on GSF where I left off and finish off the fixes to the old C code and finish writing gsfxx and gsf-py.  And beyond that, I hope yet more people will work on the same process for MB-System.



An incognito Google search to see if my github repo for GSF appears high on the list and it does:


Time

Today I listened to FLOSS Weekly Episode 350 on the Network Time Protocol (NTP) while Lincoln was pass out on me for his afternoon nap.  I still have this massive frustration with time.  I don't feel like I know enough to be able to write software that reasonable logs time for scientific applications; I don't understand how to really specify time correctly,  I don't know the issues that I should be aware of, and I really don't know how to specify the error that is involved.  Having a properly setup NTP network configuration on a device is a great start (see www.pool.ntp.org), but that really isn't a very good.  Most people have 1 to 3 hard coded ntp services, which is a pretty crummy initial setup.   And to top it off, after the recent security issues with ntp, my two primary machines won't let me run "ntpq -p -n" to see how ntp is doing.  It seems like any good text on geophysical data analysis should have precision timing near the beginning of the discussion.  But if I were to write such a text, I know enough to know that I couldn't do a decent job of writing that section.  Very frustrating.  Listening to FLOSS Weekly, there were a bunch of topics that I don't remember ever hearing before.

International Atomic Time (TIA) - I think this is what the USCG RDC meant when they said that UTC was 32 seconds off from GMT back when I visited them in 2007 (32 is from memory.  YMMV).

General Timestamp API Project - I should really look into what this project says before saying anything more about how time should be logged.  Should it be in TIA?

DFC77 The German radio broadcasts of time, which is the same basic concept as WWV broadcasts of time in the US, NPL from England, and TDL from France.

http://www.leapsecond.com/ apparently has tons of time information.

I know just a little about Precision Time Protocol (PTP V2 / IEEE 1588-2008), but not enough to be useful and have never had a chance to try it.

It would be a great project to do an open data logging computer that integrated the ability to use NTP network time if nothing else worked well, GNSS/GPS time(s), PTP, radio times and/or anything else that was available and was designed to accurately (as possible) record data coming into the device from sensors.

BTW, I took a quick peek at the NTP github repo and sad to see that the NTP bug list is hidden behind a login in a bugzilla database.  Not very accessible.  I don't see a continuous integration testing setup.  And to top it off, changes show up as from "unknown."


Saturday, August 8, 2015

And never mind

I'm pretty much giving up on try to blog this month.

Monday, August 3, 2015

Python Testing Cookbook review

I had high hopes for this book.  It is well written and I very much appreciate the detail and dedication that went into it. I am only 4 chapters in, but I already have to say that this book was great for 3-4 years ago.  The tools have improved so much since then that it needs a major rework.  The biggest change: Down with doctest, up with ipython notebooks. I dislike the use of getopt and avoiding the initial configure of setup.py to allow python setup.py test is a bummer.  I think a full mini project would be a better focus for a book like this.  An now we have awesome and easy to use continuous integration (CI) tools like Travis-CI.  It no longer matters if your full tests take 15 minutes to run.  They always get run.

I am definitely learning from this book, but I have to do a lot of modifications of methods to apply them to my world.

Sunday, August 2, 2015

Badges/Shields for software projects

These are kind of fun and sometimes useful, but here are some notes on using badges for a python / c++ project.  While badges seem a little silly in the beginning, they do convey key information in a very obvious way and add a splash of color to otherwise very dull README files.  I'm sure than for than a few is too many, but here are some examples that I played with this weekend.  I still need to push a new version of libais for some of these that go through pypi to work.  Right off, I found it weird that there were 3k downloads of libais a month.  That seemed really high.  But I think that may be coming from virtualenvs being built by SkyTruth and myself.

And it's fun to be able to just make whatever random thing I want...



GAO - Maritime Critical Infrastructure Protection



I recently skimmed this GAO report on maritime security.  I have to conclude that it totally misses the mark.  But that didn't surprise me in the least.  I would have been surprised by an insightful and intelligently written document that prioritized the real issues and strategies that will make a big difference.

There is a list of threats in the document that seems totally out of line: "Table 1: Sources of Cyber-based Threats"  Their threats are:
  • Bot-network operators 
  • Business competitors 
  • Criminal groups 
  • Hackers 
  • Insiders
  • Nations
  • Phishers 
  • Spammers 
  • Spyware or malware authors 
  • Terrorists

Why all of those groups are real, their categories are somewhat nonsensical.  I can't figure out what they use as a criteria for the categories.  For example, a nation (e.g. North Korea) may imploy or buy from an author of malicious software (The Hacking Team), but does that make two sources of threats?

And without trying to figure out the ontology issues, there are a couple changes to that list that I would make right off.  First, my number one source for threats is software developers.  I've been working on auditing and fixing the Generic Sensor Format (GSF) that is used for sonar mapping and I'll use that as an example.  This is C code developed by professional programmers at SAIC for the US Navy and has been around since the early 1990's.  I took the code (not that it is open sourced under the LGPL 2.1 license) and threw it in Coverity.  Right off the bad, I got a whole pile of coding issues that include multiple buffer overflows and all sorts of use of unsanitized data from files.  Many of these issues have been in the code for > 25 years.  If this is in open code that has been used by many companies for ages, what is hiding in all the closed source code in the maritime industry?   There wasn't a good testing strategy for the GSF C code.  Does your ECDIS have decent automated testing?  This situation is likely way worse.  I talked to a maritime professor teaching ECDIS about 10 years ago.  His number one lesson to students was to make sure that the ECDIS computer had not stopped updating by watching the seconds of the on screen clock.  And the students were supposed to do this in every sweep of their watch (so multiple times per minute).  In addition to bad code, there is also bad design.  These are things like inventing your own encryption or not validating data or patches that go into a system.  A nice example of this is with digital charts.  The rules say that a US chart (e.g. an S-57 file) is valid only if you got it directly from NOAA or an authorized retailer.  That really doesn't mean anything.  What if someone man-in-the-middled the download or it got corrupted somewhere along the way.  I'd take a cryptographically signed file is worth more than the source.

The next change is with hacking.  I'd call this category cracking.  And I'd split it up into two groups.  The first are the smart ones doing things themselves.  They are doing real work and really discovering things.  The next category are "script kiddies".  These folks really have no idea what they are doing and just blindly apply tools that are available on the internet.  They often have no idea what they are breaking into and what the consequences are.

Another change to that list would be to add a lack of reasonable support to mariners from the world's "competent authorities".  If the Hydrographic Offices (HOs) and Coast Guards (CGs) around the world, can't give reasonable guidance to software developers and mariners using the gear, then all it lost.  This boils down to people making decisions they shouldn't (e.g. they are not trained for - electrical engineers and lawyers defining software) and/or closed specs that don't have a way to get audited by professionals.  This IEC specs for AIS gear.








August challenge to my self - blog at least 1x per day on average for the month

I used to blog at least once per day pretty much every day.  I amassed > 3000 posts using nanoblogger and posting to schwehr.org.  I haven't gotten around to getting schwehr.org set back up in the last year, so I might as well just try to use the blogger interface and get back into it.  My son is close to 1 year old and he has dominated everything this last year.  And then I lost my father when he was hit in a crosswalk by a driver who didn't see him.  I'm not so sure I will be able to pull this off, but it would be nice to get back into it.  I've had lots and lots of ideas in the last year that have never made it anywhere concrete (not even my private logs).

I do have to say that I really think my blogger account is really really ugly, but in my typical minimalist strategy, I'm just not going to worry about it.