Tuesday, June 30, 2026

What does my local agent know about me?


So, I see a post and I do what it says...

Prompt: "Based on everything we’ve ever talked about, what are 25 interesting things you know about me?”

▸ Thought for 1s, 79 tokens

  Look at that!

● ListDir(/home/schwehr/me) (ctrl+o to expand)

▸ Thought for 2s, 182 tokens

  There is a directory `/home/schwehr/me` with `AGENTS.md`, `JOUR...

● Read(home/schwehr/me/AGENTS.md) (ctrl+o to expand)

So I defeated myself with a side project called "me".

A cut down version of what Gemini / Antigravity said:
Based on your documented professional history, research profile, software contributions, and preferences in your workspace, here are 25 interesting and factual things known about your background and
  career:

  ### Space Exploration & Planetary Robotics

  1. Pioneer in VR for Planetary Robotics: You were an early pioneer in applying virtual reality ...
  3. Early Exoplanet Search Software: From 1989 to 1990 as an Associate Researcher at NASA Ames ...
  4. Martian Dust Triboluminescence Research: In 1990 at NASA Ames, you conducted research ...
  5. Stereo Image Processing Pipelines: You developed the James Pipeline for processing stereo ...

  ### Ocean Engineering, GIS & Maritime Safety

  6. Creator of  libais : You created and maintain  libais , an open-source C++ library with ...
  7. Global Fishing Watch Technical Contributor: You were a key contributor to Global Fishing ...
  8. Whale Alert Collision Avoidance System: You contributed system architecture and vessel ...
  9. ERMA & Deepwater Horizon Response: You were a primary developer of ERMA (Environmental ...
  10. MB-System Core Contributor: You have contributed over 925 commits to MB-System, a core ...
  11. Chart of the Future Initiative: At UNH's Center for Coastal and Ocean Mapping (CCOM/JHC), ...
  12. International AIS Standards: You collaborated with the International Maritime Organization ...

  ### Google & Open Source Leadership

  13. Google Earth Engine Core Infrastructure: As a Google Software Engineer since 2012, you have ...
  14. Dynamic World Science Consultant: You serve as a science consultant on major Google Earth...
  15. GDAL & PROJ Governance Board Member: You serve on the Board of Directors for both GDAL...
  16. Fink Project Core Developer: You made over 1,400 commits to the Fink Project, playing...
  17. Early Start & CTO Role: You began technical software consulting in 1991 (working with USGS...

  ### Academic Roots, Geology & Geophysics

  18. Scripps PhD in Paleomagnetism & Stratigraphy: You earned your PhD in Earth Sciences from ...
  19. Stanford Geology & Robotics: While completing your B.S. in Geology at Stanford (1996), ...
  20. Scientific Visualization Award Winner: You won 1st place in the Scripps Institution of ...
  21. Early Research on Ethanol Conversion: One of your earliest documented academic deliverables...

  ### Working Preferences, Philosophy & Family

  22. Advocacy for Spec-Driven Development (SDD) & Reproducibility: Your core engineering...
  23. Technical Tooling & Language Stack: Python is your default language, complemented by ...
  24. Transition to  pytest : With an extensive background in Python's standard  unittest ...
  25. Family & Scientific Collaboration: Your wife, Dr. Monica Wolfson-Schwehr, is a fellow oceanographer...

Trying again.

Based on everything we’ve ever talked about, what are 25 interesting things you know about me?” Do not look in any non dot subdir in my home directory.


Here I definitely have to crop as it has access to internal project names that are not public.
Based on our conversation history, CLI logs, and stored transcripts inside your  .gemini  directory (without peeking into any non-dot subdirectories in your home folder), here are 25 interesting things
  I know about you, organized by domain:

  ### 🌊 Geospatial, Maritime & Earth Engine

  1. You Created  libais : You authored and maintained  libais , an Automatic Identification ...
  2. Google Earth Engine Pioneer: You are deeply involved in Google Earth Engine (working alongside co-founders and TLs ...
  3. GDAL Modernist: You enforced a strict rule in your  AGENTS.md  requiring agents to exclusively use ...
  4. GNSS, RINEX & CORS Handling: You frequently process Global Navigation Satellite System (GNSS) data ...
  5. Copybara Scripting for GDAL: In addition to Python tools, you manage complex Copybara sync scripts ...

  ### 💻 Software Engineering & Google3 Workflows

  6. Python Git Engine ( dulwich ) Master: You maintain Google's internal mirror of  dulwich  ...
  7. Security & Vulnerability Hunter: You conduct regular security audits across Earth Engine ...
  8. Git Security Researcher: You designed and conducted in-depth security analyses investigating ... (ha ha!  No. That's a total misinterpretation of a project)
  9. Ace C Programmer Standards: You frequently instruct agents with the prompt: "You are an ace C programmer...
  10. [Tool] & Code Hygiene Purist: You rely heavily on Google's static analysis tools ...
  11. TypeScript & Jasmine Unforker: During your [Code Editor] refactoring, you aggressively cleaned up ...
  12. Conventional Commit Enforcer: When wrapping up code changes or uploading CLs, you strictly require ...

  ### 🎮 Game Development & Side Projects

  13. Rust & Bevy Game Developer: In your GitHub repository [fun!], you build games in Rust using the Bevy engine ...
  14. Yahtzee from Scratch: You completely rebuilt a Yahtzee dice game ...
  15. Test-Driven Rust Practice: When experimenting with new Rust concepts (like building a custom ping utility), ...

  ### 🔬 Academic Roots, Research & Mentors

  16. Stanford Robotics Roots: You did undergraduate work at Stanford University, where your BS advisor was ...
  17. Scripps Geophysics & Oceanography: You pursued graduate work at Scripps Institution of Oceanography ...
  18. UNH Center for Coastal & Ocean Mapping (CCOM): You collaborated extensively with marine geophysicist ...
  19. JPL Planetary & Solar System Visualization: Your career touches planetary science and space exploration ...
  20. Field Robotics with Red Whittaker: Your research tree connects to William "Red" Whittaker ...

  ### 🤖 AI Agent Workflows & Philosophy

  21. "Do Not Glaze Me": You added an explicit directive to your  AGENTS.md  instructing AI assistants ...
  22. The  JOURNAL.md  Experiment: You once required AI agents to log every human interaction into a ...
  23. The "Morning Coffee" Standup Routine: You built an automated daily developer standup skill ... (No! But I use that skill made by a coworker)
  24. Strict Read-Only Guardrails: In your automation skills, you enforce zero mutating actions on ...
  25. Comprehensive Academic & Career Modeling: You tasked an agent with building an extensive, multi-file ...
  

That's entertaining. Lots of it is accurate, but some of it misses the point.

From the simple.ai news letter, they talk about "The solution here is portable context". I started that project called "me" to see if I can build up context about myself in an explicite sort of way. What does "me" look like so far? Here is the structure of it so far.

tree
.
├── AGENTS.md
├── EVENTS.json
├── JOURNAL.md
├── people
│   ├── catherine_johnson.md
│   ├── deena_braunstein.md
│   ├── eric_dejong.md
│   ├── james_spelman.md
│   ├── jonathan_stebbins.md
│   ├── kevin_brown.md
│   ├── larry_mayer.md
│   ├── lisa_tauxe.md
│   ├── monica_wolfson.md
│   ├── neal_driscoll.md
│   ├── oussama_khatib.md
│   ├── red_whittaker.md
│   ├── scott_ashford.md
│   ├── william_borucki.md
│   └── yuri_fialko.md
├── skills
│   └── update_me.py
├── user_modeling_plan.md
├── walkthrough.md
└── work
    ├── bio.md
    ├── ocean_engineering.md
    ├── preferences.md
    ├── publications.md
    ├── publications_todo.md
    ├── software_engineering.md
    └── space_exploration.md
It's not particularly good so far, but I haven't put much energy into it. It did pick up a lot of folks that I've worked with, but it misses many many key people.


And a gratuitous AI generated graphic...

Gemini Nano Banana image generate from the text draft of this blog post.

Asking Gemini to describe the image it created from this blog post is pretty entertaining. It's all so over the top.

depicts a detailed, conceptually split digital illustration that juxtaposes traditional exploration and geography with modern artificial intelligence, data science, and robotics. The scene takes place in a grand room lined with floor-to-ceiling bookshelves in the background, looking out onto a dark sea under a starry night sky.

The composition is neatly divided down the center by a glowing vertical line and a prominent, luminous keyhole shape. Inside this central keyhole stands a glowing, wireframe human figure, symbolizing the bridge between human intellect and technological advancement.

Here is a detailed breakdown of the two sides:

The Left Side: Traditional Exploration and Mapping

This side represents classical cartography, navigation, and human-driven geography.

  • The Analyst: A man is seated at a wooden desk with his back to the viewer, looking at a computer monitor displaying complex node diagrams and system architectures.

  • Physical Artifacts: On his desk rests an antique-style paper world map, a rolled-up scroll, and a detailed model of a historic three-masted sailing ship navigating a stylized, sculpted block of blue ocean waves. A second, smaller ship model sits on the corner of the map.

  • Glow and Overlay: A sweeping, crescent-shaped arc of glowing green light frames this side. Embedded within the light are symbols of classical navigation: brass compasses, sextants, a small globe showing Earth's continents, and floating digital icons of keys and a credit card.

The Right Side: AI, Data Science, and Robotics

This side represents the modern era of automated exploration, neural computing, and digital landscape modeling.

  • The Android/Cyborg: Seated opposite the first man is a futuristic humanoid figure with visible cybernetic components on his neck and hands. He looks over a highly digital workspace.

  • Data and Analytics: Multiple holographic displays float before him, showcasing glowing 3D brains, neural network graphs, bar charts, financial-style line graphs, and pie charts. A large network of glowing lines connects various nodes, including icons for padlocks (security), dice, and gears.

  • Physical and Digital Models: His desk features a miniature robotic space rover (resembling a Mars rover), a robotic arm mechanism, scattered white dice, and a vibrant, color-coded 3D topographic terrain model displaying elevation contours and a central crater lake.

The Background

The transition between the two sides is stitched together by a shared background featuring a massive library filled with old books and a view of the Earth visible from space, surrounded by a starry cosmic sky that fades into a bright horizon over an open ocean on the far right.

Fixed libais CVE-2026-56770

 This is a weird milestone: libais got its first CVE. CVE-2026-56770 is a crash coming from my 03f1c92418 where I did "vdm: int → size_t for sequence and sentence numbers" to avoid comparing signed and unsigned numbers and I was able to get rid of "-Wno-sign-compare". I added a "kNoSequenceNumber" that has a huge value (for AIS sequence numbers) of 999999. But, I have code checking the sequence numbers that was only asserts and I was counting on the old value of -1.

So the bug has only been around since Sep 2025 and I haven't done a libais release in more than a decade (ahem, I need to do a release). So it shouldn't impact too many people. According to Gemini, libais falls under "Minor Historic and Niche Libraries" so... ouch.

This patch should fix the issue: ce0fedeba5 / libais#264

libais needs so much cleanup at this point that it is painful. I also asked Antigravity to do a scan of the code for troubles. In addition to finding lots of spelling mistakes, there are plenty of places to improve the code. Sigh.

Here is a sample from ais8.cpp:

### Specific Typos and Grammatical Mistakes
* **Line 293**: 
  ```cpp
  // Some people transmit without the idiodic spare padding
  ```
  *Typo*: `idiodic` should be spelled **`idiotic`**.
* **Line 138**: 
  ```cpp
  // TODO(schwehr): how to treat this spare vrs water level?
  ```
  *Typo*: `vrs` should be spelled **`vs`** or **`versus`**.
* **Lines 324 & 358**: 
  ```cpp
  horz_viz(0.0) ... horz_viz = bits.ToUnsignedInt(246, 8) / 10.;
  ```
  *Inconsistency/Typo*: Abbreviating "horizontal visibility" as `horz_viz` uses a 'z' instead of 's' (`horz_vis`), which conflicts with line 72 and line 
568 where it is correctly named `horz_vis`.
* **Line 421**: 
  ```cpp
  ice_thickness = bits.ToUnsignedInt(324, 7) / 100.;  // m.  Network is cm,
  ```
  *Grammar*: The inline comment ends abruptly with a trailing comma (`cm,`) instead of completing the sentence or ending with a period.

In addition to mispelling "idiotic," I can just remove that word. I don't really need to express my frustration over the AIS specification like that. 😎

Thanks to github.com/raefko for the bug report!

My current work computer configuration

Two days and two blog posts. Doing well so far on my goal to write / blog a lot more. Now, I just need to use more emojis to keep up with the folks at work. 🎉

I wish I had done this each time I had changed computers so I could see the progression (or regression - my last desktop was far less powerful than the 2nd to last desktop). 

On my desk is my personal machine, a System76 Mini Meerkat from 2024 running Ubuntu 2024. I don't use it all that often, but it works great and it's small.

I currently have this as my primary human interface for work:
  Model Name:	MacBook Pro
  Model Identifier:	Mac15,7
  Model Number:	MRW23LL/A
  Chip:	Apple M3 Pro
  Total Number of Cores:	12 (6 performance and 6 efficiency)
  Memory:	36 GB
  System Firmware Version:	mBoot-18000.120.36
  OS Loader Version:	11881.140.96.701.1

  Drive: APPLE SSD AP0512Z

  uname -a
  Darwin schwehr-mac 24.6.0 Darwin Kernel Version 24.6.0: Tue Apr 21 20:16:56 PDT 2026; root:xnu-11417.140.69.710.16~1/RELEASE_ARM64_T6030 arm64
My primary workstation is a Debian based linux cloud VM:
tail -25 /proc/cpu

processor	: 127
vendor_id	: AuthenticAMD
cpu family	: 25
model		: 1
model name	: AMD EPYC 7B13
stepping	: 0
microcode	: 0xffffffff
cpu MHz		: 2449.998
cache size	: 512 KB
physical id	: 1
siblings	: 64
core id		: 31
cpu cores	: 32
apicid		: 127
initial apicid	: 127
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
TLB size	: 2560 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual

head -1 /proc/meminfo 
MemTotal:       247571292 kB
That seems like a lot, but the VM is likely sharing that with a lot of other VMs. And I'm using Linux 6.18.14 kernel.

I spend most of my time ssh'ed into Linux running work's setup which is mostly Bazel builds with Google's version of Perforce called piper/g4. I frequently use copybara to migrate open source code to the local monorepo. I often use the command line version of Antigravity. But I also spend a fair bit of time using git and emacs for open source work. On the graphical side, it's all in Chrome with Googles internal version of VC Code, Google's code review tool Critique, Google's Buganizer, and a lot of Google Docs. Not very exciting. Occasionally, I get to try out jj (which is fun) and sometimes I'm forced into mercial/hg (which I haven't liked since I tried to teach it in a class back in 2011). 

I used to keep documents that I used for each machine I had on how to configure them. Work locked me out of a lot of software on the Mac and with my cloud vm, I don't to machine updates any more. I regret not trying to do that. That's especially true now that one of my kids just got a Windows 11 HP laptop. I last worked on configuring a Windows machine somewhere in the 2009-2011 time frame. I know nothing first hand anymore about Windows configuration except to disable MacAfee's trial antivirus and make sure Windows Defender is running.

Sunday, June 28, 2026

Gemini Deep Research on AIS

 


Authorship statement: Mostly me with lots of quotes from Gemini Deep Research.


Back in Oct of last year (2025), I came back to the idea of writing an open book about the maritime Automatic Identification System. I thought that maybe with the power of Gemini Deep Research, I could maybe manage the process of creating a book without getting overwhelmed. Between the onslaught of AI generated text and the competing demands from the day job and all the other distractions of life, the idea collapsed by December. I had even tried talking to friends at NOAA to put some positive peer pressure on myself to make myself find enough focus to keep progress going. I totally failed. AIS can be an immensely broad topic and that by itself overwhelmed me. Doing even just a little bit of research pointed out to me how little I know about the current state of AIS.


So what if I start trying to blog about AIS again? I keep wanting to get back into periodic writing to force myself out of the narrow world view that comes from grinding through work with all of its processes, thrashing through code and bugs, and endlessly trying to understand massive volumes of code that is crazy diverse. This aims to be a quick start on that topic.


One key task in writing out topics boils down to which questions to ask. Each person’s skill level and background will heavily influence which directions an inquiry will go. We are all full of biases, but what questions will drive the research and writing in a positive direction?


Also when a question is asked and answered is crucial. Before AI, we only had to contend with the changing state of the material that we can query. Now we have to deal with the rapidly evolving state of LLMs’ ability to digest the internet’s massive trove of knowledge and turn it into a response. How will the LLM get things wrong; what biases will it bring? That changes every time the model or the harness / system prompt changes. And it changes from run-to-run. Repeatability is near impossible and fact checking / verifying is essential. 


We can easily generate more LLM output than we can read and we can fact check an even smaller amount. Should we trust an LLM to fact check LLM output? Maybe a bit. It certainly does catch some problems.


Here is a funny response of an LLM reviewing a report created by an LLM. It’s not actually useful, but it isn’t wrong.


Monolithic Bloat & Schizophrenic Document Scope: The document collapses under its own ambition, conflating historical codebase archaeology, operational runbooks, speculative GenAI roadmaps, and threat models into an unreadable 850-line monolith that serves no single engineering persona effectively.


Here are three questions I asked today. I threw the questions in Google Gemini’s Deep Research. It was quick to generate, but takes a while to read the results.


To make matters worse here, I keep running into issues where Deep Research’s Export to Docs function doesn’t work for me.


1. What is the current state of open source software for maritime Automatic Identification System (AIS) messages? What are the currently used packages? What are the strengths and weaknesses of each?


Right off, this is helpful. It starts at the Software Defined Radio (SDR) level. GNU Radio back in the day supported AIS, but I never had the energy to get that working. The report points at SDRangel, which does support AIS and looks really nice. I soooo need to find the time to get an SDR setup and use it to receive some AIS messages at home. I should be able to get some messages as I am in line of sight of some boats even if I am not right near a big shipping areas.


https://github.com/f4exb/sdrangel/blob/master/plugins/channelrx/demodais/readme.md 

https://github.com/f4exb/sdrangel/blob/master/plugins/feature/ais/readme.md


It also mentions rtl-ais, AIS-catcher, then leads into hardware that I didn’t know about like this: dAISy-catcher.


https://shop.wegmatt.com/products/daisy-catcher-high-performance-ais-receiver


At this point, I was less than 20% through the document and running out of time/energy for this post. I saw a mention of GPSD, but hadn’t run into my libais code. Sadly, this is where I found libais when I searched for it. Ouch. Yes, I haven’t had much time to work on libais, but it’s pretty complete for most message types.


Minor Historic and Niche Libraries

The open-source ecosystem also contains several smaller, single-developer projects designed for casual use or niche platforms:

  • ais-parser (kokufu): A simple Gradle-built Java library designed for basic message parsing. It parses standard sentences starting with !AIVDM or !AIVDO into structured Java instances, supporting Types 1, 2, 3, 5, 18, and 24. It allows developers to register custom record parsers (such as a Base Station Type 4) by extending its segment-offset models. However, it is an archived project with low community engagement.

  • libais (Schwehr): A legacy, high-performance C++ decoder with SWIG-generated Python bindings. It features two interfaces: a high-level iterator-based Python API and a fast, low-level C++ core. It supports translating parsed objects into legacy GPSD formats. The codebase is stable but is no longer actively maintained, limiting its support for newer ITU-R specifications

And then there are things like KPlex Multiplexer, OpenCPN Plotter, Signal K, canboat / canboatjs, and more. I hadn’t heard of Parameter Group Numbers (PGNs). It mentions Moving Pandas, which I so want to try, but have never spent time on.


I’m left with massive FOMO at this point with some many things like the above and this:


To clean raw AIS logs, analysts use data-driven frameworks such as the $\alpha$-method, which is implemented in Python through the PyTSA (Python Trajectory Search Agent) library.

  

And from the end of the Deep Research:


Conclusions and Technical Recommendations

The open-source maritime AIS software ecosystem provides a comprehensive set of tools, ranging from embedded software demodulators to high-level spatiotemporal analysis libraries.

When selecting and integrating these packages into production systems, developers should consider several technical guidelines:

  • SDR Demodulation: For deployments using SDR hardware, AIS-catcher is the industry standard. Its performance-optimized C++ architecture, Neon/SSE vectorization, and multi-SDR support make it superior to legacy tools. To manage its GPL-3.0 license restrictions, developers should isolate the demodulator inside a containerized microservice and interact with it over network sockets (such as UDP or JSON over TCP).

  • High-Volume Ingestion: For cloud pipelines handling millions of messages per second, the zero-allocation, C#-based Ais.Net core is the most performant option. Where copyleft licensing terms (AGPL-3.0) are a blocker, developers should use go-ais (MIT) or nmea-parser (Apache-2.0).

  • Data Integration and Sensor Fusion: On-board marine networks should utilize Signal K combined with canboatjs. This stack translates raw binary NMEA 2000 PGNs and NMEA 0183 sentences into a unified, web-friendly JSON schema, simplifying downstream application development.

  • Data Science and Spatial Analytics: For trajectory analysis and fleet pattern modeling, developers should pair the pure-Python pyais library with MovingPandas and PyTSA. This workflow provides a mature, Python-native environment for spatiotemporal filtering, trip segmentation, and trajectory reconstruction.


2. What are the recent trends and innovations in maritime Automatic Identification System (AIS)?


And now I realized I need to read up on VHF Data Exchange System (VDES). The spec’s first draft is from 2016, so I’m clearly very out of the loop by 10 years. I’m not sure how related to AIS this really is and I worry that people calling it “AIS 2.0” is likely yet more misleading terminology. 


https://navcen.uscg.gov/sites/default/files/pdf/AIS/IALA_G1117_Ed2_VHF_Data_Exchange_System_VDES_Overview_Dec2017.pdf

https://www.itu.int/rec/R-REC-M.2092 


And there are terms I’m not familiar with like Decentralized Physical Infrastructure Network (DePIN). I don’t think things like the Worldwide AIS Network (WAKE), “a decentralized, blockchain-based system designed to track global ship movements” (Gemini’s quote), are going to be that useful, but I haven’t dug into it.


And I didn’t know about S&P Global buying ORBCOMM’s AIS business, but I did know that Andy Lorette died. He was what made my collaboration with ORBCOMM so awesome.


I have a hint about this stuff happening: “AI-Driven Anomaly Detection and Multi-Sensor Fusion for Dark Vessel Monitoring”


Conclusions and Strategic Outlook

The development of the Automatic Identification System (AIS) represents a major shift toward a more secure, digitized, and automated global maritime domain. The transition from legacy SOTDMA broadcasts to the multi-channel VHF Data Exchange System (VDES) addresses the bandwidth constraints of maritime VHF channels, enabling two-way data exchange and supporting the global roll-out of e-Navigation S-100 standards. Furthermore, VDES's IP-less architecture and integrated cryptographic protocols, such as the two-tiered TESLA authentication framework, provide a secure foundation to protect ships from GNSS spoofing, code injection, and electronic manipulation.

At the same time, commercial consolidation has concentrated the global AIS data market, leading to the rise of decentralized alternatives. Decentralized Physical Infrastructure Networks (DePIN) utilize Layer 1 blockchain platforms to capture and verify signal metadata at the point of ingestion, offering a transparent, tamper-proof audit trail for regulatory compliance and supply chain risk management.

Finally, by fusing secure AIS telemetry with spaceborne Synthetic Aperture Radar (SAR) and deep learning models, maritime authorities can systematically detect and monitor dark vessels, helping protect global trade routes, marine habitats, and national sovereignty. As the industry moves toward the mandatory MASS Code in 2032, adopting these secure, high-bandwidth tracking and verification technologies is essential to ensure the safety, efficiency, and resilience of global shipping.



3. What are the current research and development challenges for maritime Automatic Identification System (AIS)?


And at this point, I am totally out of time, so here is the end of the Gemini Deep Research report:


Structural Synthesis of Next-Generation AIS Challenges

The transition of the maritime Automatic Identification System into a secure, high-capacity tracking network is hindered by interrelated physical-layer, cryptographic, and operational challenges. To resolve these limitations, R&D initiatives must coordinate spectrum management, cybersecurity protocols, and autonomous navigation standards23.

The primary challenges and their required technological responses are synthesized below:


Technical Challenge

Root Systemic Dependency

Primary R&D Objective

Strategic Action Plan

Physical Channel Saturation

2,250 slots-per-minute limit of SOTDMA channels15.

Offload non-safety data to expand bandwidth14.

Deploy VDES to transition data exchange to dedicated ASM and VDE channels14.

Satellite Signal Collisions

Unsynchronized reception over footprints up to 5,000 km3.

Isolate and reconstruct overlapping signals15.

Train multiscale CNNs to detect boundaries and separate colliding signals19.

Unauthenticated Transmission

Legacy open RF broadcast architecture from the 1990s2.

Implement protocol-level message verification2.

Apply signature-splitting over VDE-TER channels for legacy compatibility12.

Encryption Bandwidth Overhead

Saturated links cannot carry heavy PKC digital signatures24.

Develop low-overhead broadcast authentication24.

Implement the TESLA protocol using symmetric-key chains and delayed disclosure24.

Timing Dependency Vulnerability

Protocol reliance on GPS for SOTDMA and TESLA timing9.

Establish resilient, independent backup timing systems9.

Deploy VDES R-Mode to provide 10-100m fallback PNT in GNSS-denied areas17.

Indirect Display Exploitation

Direct sensor integration with the ship's ECDIS23.

Prevent malicious payloads from compromising bridge networks23.

Build secure communication interfaces and parsers to isolate incoming data23.

Algorithmic COLREGs Alignment

Vague, qualitative rules in maritime collision law38.

Codify rules of the road into machine-readable logic38.

Train AI models with explainability layers and standardized numeric margins35.

Addressing these technical challenges is essential to support the next generation of maritime operations. Securing the legacy AIS broadcast model, resolving satellite-level signal congestion, and establishing robust, authenticated links through VDES will ensure the safety and security of global shipping.

Achieving this requires international coordination among regulatory bodies, technology developers, and shipping operators to implement unified standards for next-generation tracking, cryptographic key management, and autonomous vessel integration.




Saturday, April 25, 2026

Using joern-scan and Gemini on PROJ

 I have been meaning to take a look more at knowledge graphs for some work and I've been trying to figure out what useful tasks I can do along the way. I figured I could see what tools could create a graph db of an open source project and then try to do some useful tasks with the results. I think my first try did not accomplish anything useful.

Gemini claimed that https://github.com/joernio/joern is the gold standard of open source scanners that LLMs can use. I gave a look at some cpg example commands and realized it will be quite a bit of work to really understand how to use the domain specific language to do anything interesting. I figured I'd give it a try with https://github.com/OSGeo/PROJ/ to see what the gemini-cli can do. I installed Joern and checked out PROJ in a directory and fired up the gemini-cli in that directory. My first prompt:

Use joern and the cpg to search PROJ for bugs

Gemini went off and found joern-scan in the joern-cli directory and started running it. The tool created a cpg database for the project:

tree workspace/PROJ

workspace/PROJ

├── cpg.bin
├── cpg.bin.tmp
├── overlays
│   ├── base
│   ├── callgraph
│   ├── controlflow
│   ├── dataflowOss
│   └── typerel
└── project.json

 It then ran a scan for quite a while printing all sorts of complaints along the way.

head -20 /home/schwehr/.gemini/tmp/joern/tool-outputs/session-.../run_shell_command.txt
Output: Writing logs to: /tmp/joern-scan-log.txt
=======================================================================================================
Invoking CPG generator in a separate process. Note that the new process will consume additional memory.
If you are importing a large codebase (and/or running into memory issues), please try the following:
1) exit joern
2) invoke the frontend: /home/schwehr/joern/joern-cli/c2cpg.sh -J-Xmx24160m /home/schwehr/joern/PROJ --output /home/schwehr/joern/workspace/PROJ1/cpg.bin.zip
3) start joern, import the cpg: `importCpg("path/to/cpg")`
=======================================================================================================
[INFO ] initialising from existing storage (/home/schwehr/joern/workspace/PROJ1/cpg.bin.tmp)
[INFO ] Start of pass: io.joern.x2cpg.passes.base.FileCreationPass
[INFO ] Pass io.joern.x2cpg.passes.base.FileCreationPass completed in 95 ms (62% on mutations). 17643 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.NamespaceCreator
[INFO ] Pass io.joern.x2cpg.passes.base.NamespaceCreator completed in 16 ms (53% on mutations). 710 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.TypeDeclStubCreator
[INFO ] Pass io.joern.x2cpg.passes.base.TypeDeclStubCreator completed in 27 ms (11% on mutations). 0 + 0 changes committed from 1 parts.
[INFO ] Start of pass: io.joern.x2cpg.passes.base.MethodStubCreator
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname osgeo.proj.datum.Datum.conventionalRS:ANY() (we have 1 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname osgeo.proj.common.IdentifiedObject.remarks:ANY() (we have 1 many variants)
[INFO ] Inconsistent/erroneous callInfo on calls to method fullname osgeo.proj.util.BaseObject.shared_from_this:osgeo.proj.util.BaseObjectNNPtr() (we have 1 many variants)

 

tail /run_shell_command.txt
Result: 5.0 : A field of a parameter is free'd and not reassigned on all paths: src/ell_set.cpp:89:pj_ellipsoid:int(PJ*)
Result: 5.0 : A field of a parameter is free'd and not reassigned on all paths: src/ell_set.cpp:203:ellps_size:int(PJ*)
Result: 5.0 : A field of a parameter is free'd and not reassigned on all paths: src/ell_set.cpp:252:ellps_shape:int(PJ*)
Result: 5.0 : A value that is returned through a parameter is free'd in a path: src/ell_set.cpp:175:ellps_ellps:int(PJ*)
[INFO ] Start of pass: io.shiftleft.semanticcpg.Overlays$$anon$1
[INFO ] Pass io.shiftleft.semanticcpg.Overlays$$anon$1 completed in 1 ms (81% on mutations). 1 + 0 changes committed from 1 parts.
[INFO ] writing to storage at `home/schwehr/joern/workspace/PROJ1/cpg.bin.tmp`
[INFO ] closed graph at /home/schwehr/joern/workspace/PROJ1/cpg.bin.tmp
Run `joern --for-input-path PROJ` to explore interactively 

Gemini printed a summary of what it found after a few minutes of work. I asked it to write a summary:

Write the summary as security-scan-summary.md

Which got me:

### 1. Memory Management & Freeing Issues (Severity: High)
A field of a parameter or returned value is `free`'d but not properly reassigned on all execution paths, which could lead to Use-After-Free (UAF) or Double-Free vulnerabilities:
* `src/ell_set.cpp:83`, `85`, `87`, `89` in `pj_ellipsoid:int(PJ*)`
* `src/ell_set.cpp:203` in `ellps_size:int(PJ*)`
* `src/ell_set.cpp:252` in `ellps_shape:int(PJ*)`
* `src/ell_set.cpp:175` in `ellps_ellps:int(PJ*)`

### 2. Format String Vulnerabilities (Severity: High)
A non-constant format string is passed to `printf`, `sprintf`, or `vsprintf`, which can be exploited if the format string contains un-sanitized user input:
* `src/apps/cct.cpp:246` in `main:int(int,char**)`
* `src/apps/gie.cpp:294` in `main<duplicate>2:int(int,char**)`

### 3. Unchecked Read/Recv/Malloc (Severity: Medium)
The return value of memory allocation functions (like `malloc`) or file reading functions does not appear to be checked for `NULL` or error codes before being used:
* `src/apps/cct.cpp:331` in `main:int(int,char**)`
* `src/apps/projinfo_lib.cpp:353` in `buildObject`
* `src/initcache.cpp:50` in `pj_clone_paralist`
* `src/initcache.cpp:148`, `157`, `170` in `pj_insert_initcache`

### 4. Dangerous Functions (`strcpy` / `strncpy`) (Severity: Medium)
Use of unsafe string copy functions that are prone to buffer overflows. These should be replaced with safer alternatives like `strlcpy` or `snprintf`:
* **`src/apps/gie.cpp`**: Lines 637, 756, 757, 762, 774, 1589
* **`src/datum_set.cpp`**: Lines 84, 85
* **`src/filemanager.cpp`**: Lines 1674, 1913
* **`src/grids.cpp`**: Multiple instances (Lines 3946, 3956, 3997, 4032, 4050, 4067, 4083, etc.)
* **`src/init.cpp`**: Lines 159, 194, 200, 244, 271, 289, 555, 644
* **`src/initcache.cpp`**: Lines 55, 172
* **`src/pr_list.cpp`**: Line 86
* **`src/param.cpp`**: Line 23

### 5. String Length Truncation (Severity: Low-Medium)
Assignments involving `strlen()` calls where truncation might occur due to integer type mismatches or unsafe size conversions:
* `src/iso19111/factory.cpp:7959`, `7961`, `7963`
* `src/iso19111/io.cpp:1655` and `6385`
* `src/pr_list.cpp:18` and `79`
* `src/apps/optargpm.h:532`

### 6. File Operation Race Conditions (Severity: Low)
Two file operations on the same path can act on different files (Time-of-Check to Time-of-Use):
* `src/filemanager.cpp:994`, `1002` in `FileManager.open`
* `src/grids.cpp:2698`, `2709`, `2722` in `HorizontalShiftGridSet.open`
* `src/grids.cpp:3974`, `3987` in `proj_grid_info`
* `src/networkfilemanager.cpp`: Multiple lines (e.g. `2541`, `2717`, `2819`) in `proj_download_file` and `proj_is_download_needed`. 

I did some quick inspection of the "Memory Management & Freeing Issues" listings and was not impressed. They were all set to nullptr right after being freed. I asked gemini and it basically said the same thing.

Look closer at the issues in src/ell_set.cpp 

I then asked Gemini to go through them all:

Go through each of the issues listed in @security-scan-summary.md. For each issue, do an evaluation. For issue that appears to be valid or possibly valid, create a markdown report in the issues directory. The report should include a detailed description of the issue. Explain the risk level of the issue. And give a proposed patch for a fix.

Gemini wrote one issue file:

# Unchecked Memory Allocations in `src/initcache.cpp`
## Detailed Description
In `src/initcache.cpp`, there are multiple instances where memory is allocated using `malloc`, but the return value is not gracefully checked for a `NULL` pointer. Instead, the code relies on `assert()` to enforce that the pointer is valid.
For example, in `pj_clone_paralist` (line ~50):
```cpp
        paralist *newitem =
            (paralist *)malloc(sizeof(paralist) + strlen(list->param));
        assert(newitem);
        newitem->used = 0;
```
And similarly in `pj_insert_initcache` (lines ~148, 157, 170):
```cpp
        cache_key_new = (char **)malloc(sizeof(char *) * cache_alloc);
        assert(cache_key_new);
        // ...
        cache_paralist_new =
            (paralist **)malloc(sizeof(paralist *) * cache_alloc);
        assert(cache_paralist_new);
        // ...
        cache_key[cache_count] = (char *)malloc(strlen(filekey) + 1);
        assert(cache_key[cache_count]);
```
The issue with using `assert()` here is that `assert` statements are typically compiled out in production builds (when `NDEBUG` is defined). In production, if `malloc` fails due to memory exhaustion, the pointers will be `NULL`. The subsequent dereferences (like `newitem->used = 0;`) will cause an immediate segmentation fault, crashing the host application.
## Risk Level
**Medium**
Out-of-memory (OOM) conditions are relatively rare, but robust libraries should handle allocation failures gracefully without crashing the host application. If an attacker can trigger excessive caching, they could theoretically trigger a denial-of-service (DoS) condition by crashing the process.
## Proposed Patch
Replace `assert()` with a proper `if` condition that handles the `NULL` pointer by cleaning up and returning an error or `nullptr`.
```patch
--- a/src/initcache.cpp
+++ b/src/initcache.cpp
@@ -48,7 +48,15 @@
     for (; list != nullptr; list = list->next) {
         paralist *newitem =
             (paralist *)malloc(sizeof(paralist) + strlen(list->param));
-        assert(newitem);
+        if (newitem == nullptr) {
+            // Free already copied elements
+            while (list_copy != nullptr) {
+                paralist *tmp = list_copy->next;
+                free(list_copy);
+                list_copy = tmp;
+            }
+            return nullptr;
+        }
 
         newitem->used = 0;
         newitem->next = nullptr;
@@ -145,14 +153,16 @@
         cache_alloc = cache_alloc * 2 + 15;
 
         cache_key_new = (char **)malloc(sizeof(char *) * cache_alloc);
-        assert(cache_key_new);
+        if (cache_key_new == nullptr) return; // Silent failure on cache insert
+        
         if (cache_key && cache_count) {
             memcpy(cache_key_new, cache_key, sizeof(char *) * cache_count);
         }
         free(cache_key);
         cache_key = cache_key_new;
 
         cache_paralist_new =
             (paralist **)malloc(sizeof(paralist *) * cache_alloc);
-        assert(cache_paralist_new);
+        if (cache_paralist_new == nullptr) return;
+        
         if (cache_paralist && cache_count) {
             memcpy(cache_paralist_new, cache_paralist,
                    sizeof(paralist *) * cache_count);
@@ -167,7 +177,8 @@
     ** Duplicate the filekey and paralist, and insert in cache.
     */
     cache_key[cache_count] = (char *)malloc(strlen(filekey) + 1);
-    assert(cache_key[cache_count]);
+    if (cache_key[cache_count] == nullptr) return;
+    
     strcpy(cache_key[cache_count], filekey);
 
     cache_paralist[cache_count] = pj_clone_paralist(list);
```

I rate that report as not very interesting. Gemini's conclusion:



So all that and no real value added. Now I have to wonder what non-security check things can be done with the cpg file? Can it aid an LLM cli in making normal code changes? I'm thinking things like adding new features, improving performance, refactoring, checking numerical stability, etc.?