Kurt Schwehr's more recent blog

Tuesday, July 21, 2026

BitVector 0.0.4 - pylint focused

I'm learning lots. I'd love feedback and suggestions from the python community.

AI generated summary:

The v0.0.4 release of bitvector-modern primarily focuses on extensive code cleanup, linter adherence, and testing improvements. A major portion of the updates involves setting up pylint configurations (including the pylint-pytest plugin) and refactoring code across the repository to address linter rules—such as updating function/class docstrings, switching to f-strings and with statements, simplifying chained comparisons, and cleaning up break, return, and raise statements. Additionally, the release enhances overall quality by expanding unit test coverage for rotation logic, bit shifts, and BitVector methods such as reset(), __int__, __getitem__, and __setitem__, while resolving a bug related to slice assignment length calculations with negative start indices.

Sunday, July 19, 2026

bitvector-modern release 0.0.3

I just did the first proper release of bitvector-modern with version 0.0.3. I hadn't setup 2 factor authentication (2FA) for my personal PyPI identity, but now I have so I can actually send updates out. I have been out of town with only mobile phone access to the internet, so I have been doing a bunch of updates to the package via Google Jules. That process went really well in general. I had some trouble getting it to do more ambitious changes line adding mmap and numpy based implementations, but you should see from release notes that there were plenty of successes.

Sadly, I did the 0.0.2 release just before this incorrectly, so that's where the changes are listed, but that triggered a 0.0.1 pypi release. I could yank it, but I don't think I need to as nothing is broken. The changes for 0.0.3 are in 0.0.1 on pypi.

I wouldn't recommend using bitvector-modern now for anything other than experimental work as I intend to rework the API in a few intrusive ways. Top of the list is simplifying __init__ and putting the extra functionality into class methods of the for from_. I have already removed the filename and filepointer args from __init__, but I have yet to add back file reading outside of __init__.

At this point, there are quite a few performance improvements over the old library. For starters, I switched the array.array storage from 16-bit to 64-bit blocks. This means that BitVectors smaller than 48-bits will take more storage, but that should not be a problem unless folks are working with large numbers of very small bitvectors that have to remain in memory. With the in-progress Protocol class work, that issue should be mitigated as the package should be able to let the user pick an implementation that works best with their use case.

I hope to start working on my old noaadata package soon and port it to bitvector-modern. I'm excited for the increase performance and improved usability.

The release:

The detailed release notes are here:

From the 0.0.2 release notes:

docs: add Code of Conduct by @schwehr in #33
style(docs): format CODE_OF_CONDUCT.md with mdformat by @schwehr in #35
test(properties): add hypothesis property-based testing and fix iadd padding bug by @schwehr in #44
ci(github): add zizmor static analysis of GitHub Actions by @schwehr in #46
refactor: convert asserts to ValueErrors and add bandit check by @schwehr in #52
ci: remove osv-scanner workflow by @schwehr in #54
⚡ perf: optimize runs to avoid unnecessary list allocation by @google-labs-jules[bot] in #55
🧹 Refactor complex type ignoring map into list comprehension by @google-labs-jules[bot] in #57
🧹 [Code Health] Remove commented-out code in BitVector.py by @google-labs-jules[bot] in #58
⚡ Optimize divide_into_two to use native slicing by @google-labs-jules[bot] in #56
⚡ Optimize integer conversion using bitwise shifts by @google-labs-jules[bot] in #59
⚡ [Performance] Optimize string concatenation in _readblock by @google-labs-jules[bot] in #60
🔒 Use cryptographically secure pseudo-random number generator by @google-labs-jules[bot] in #61
docs: add mkdocs and initial documentation outline by @google-labs-jules[bot] in #65
chore: add .editorconfig file by @google-labs-jules[bot] in #68
Add slots to BitVector class by @google-labs-jules[bot] in #70
Create BitVectorProtocol by @google-labs-jules[bot] in #67
Migrate BitVector to use 64-bit array (Q type) by @google-labs-jules[bot] in #71
feat: add py.typed marker for PEP 561 compliance by @google-labs-jules[bot] in #75
fix: resolve hexlist type collision in BitVector constructor by @google-labs-jules[bot] in #78
Refactor BitVector array type "Q" to a constant ARRAY_TYPE by @google-labs-jules[bot] in #81
Remove list option from self.vector in BitVector by @google-labs-jules[bot] in #82
Change divide_into_two to return a tuple instead of a list by @google-labs-jules[bot] in #86
Refactor: Rename local variable intVal to int_val by @google-labs-jules[bot] in #85
Rename _getsize to len by @google-labs-jules[bot] in #87
build(deps): bump astral-sh/setup-uv from e58605a9b6da7c637471fab8847a5e5a6b8df081 to d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 by @dependabot[bot] in #83
Refactor BitVector to use getitem instead of _getbit by @google-labs-jules[bot] in #88
Optimize BitVector.eq using zip by @google-labs-jules[bot] in #95
Refactor gf_MI to return f-string instead of string tuple by @google-labs-jules[bot] in #97
style: Rename reverseList to reverse_list by @google-labs-jules[bot] in #98
Simplify is_power_of_2_sparse conditional by @google-labs-jules[bot] in #99
Refactor tuple return syntax in divide_into_two by @google-labs-jules[bot] in #100
Optimize permute and unpermute methods by @google-labs-jules[bot] in #101
Refactor get_bitvector_in_ascii to use comprehension by @google-labs-jules[bot] in #102
refactor: rename int_val to int by @google-labs-jules[bot] in #103
Remove _setbit and only use setitem by @google-labs-jules[bot] in #104
Refactor BitVector size attribute to be private by @google-labs-jules[bot] in #105
refactor: simplify get_bitvector_in_hex with comprehension by @google-labs-jules[bot] in #106
Optimize shift_left and shift_right implementations by @google-labs-jules[bot] in #107
Remove Any from reset signature by @google-labs-jules[bot] in #108
Optimize reset method performance using direct array operations by @google-labs-jules[bot] in #109
Remove deprecated length method from BitVector by @google-labs-jules[bot] in #110
Optimize BitVector.reverse() by @google-labs-jules[bot] in #111
docs: Add explanatory comment to ARRAY_TYPE by @google-labs-jules[bot] in #112
Remove file I/O operations from BitVector by @google-labs-jules[bot] in #113
chore: remove array like subscripting comment by @google-labs-jules[bot] in #116
Refactor: Extract textstring instantiation to from_string by @google-labs-jules[bot] in #115
chore(github): setup CODEOWNERS with repository owners by @schwehr in #119
Add __reversed__ benchmark and optimized implementation by @schwehr in #121
ci(github): configure PyPI secret token for release publishing by @schwehr in #122

Friday, July 10, 2026

The future of Generic Sensor Format (GSF) for sonar data and processing

This is a response to this post: Lindsay Gee: GSF meeting at CCOM next week. I won’t be able to attend the meeting next week at CCOM even remotely, so I’ll try to capture my current take on GSF. I’ve not been shy in saying that the Generic Sensor Format (GSF) is not my vision of a generic sharing / processing format for multibeam sonar. However, it is critically important to the community. The key text from Lindsay:

The toolkit supports the GEBCO Technical Strategy by improving the sharing, archiving, and reuse of processed swath bathymetry data. It is designed to enhance interoperability across software applications, reduce barriers to working with GSF data, and support the generation of bathymetry and backscatter products without always needing to return to raw sonar data.

The first public release of the open-source toolkit, documentation, and code repository is available, and this session will provide an overview of the project, its current status, roadmap for future development, followed by an open discussion.

A key component of the project is a profile framework and JSON-based schema that captures differences between GSF library implementations. As this framework expands, community participation and feedback will be essential to ensure it meets the needs of the broader hydrographic and ocean mapping community.

The success of the toolkit depends on broad community engagement, and we welcome feedback from software developers, data producers, archives, and end users.

https://github.com/oceanmapping/mbtoolkit

Trying to get my thoughts organized quickly is tough. Apologies as this is a bit disorganized.

I see there are at least 3 key packages that need to exist and be well maintained: The original GSF library from SAIC/Lidos originally written in C that should be converted to C++, the pure python package mbtoolkit, and a complete rewrite from scratch of a gsf library in Rust. Those packages should be on a public VC repo like GitHub, GitLab, etc and have permissive licenses allowing the most number of people to use and contribute to them. In the age of LLM coding agents, getting this work done will go faster than it ever could have in the past, but it’s not easy to do this correctly and make the results maintainable. But I have high hopes considering the positive messages from folks around the community recently.

0. Common to all three - best practices

For each package, they should be done to the best available modern software engineering practices. The surface area is pretty small, so achieving this isn’t that hard. What does it entail? (for Rust, a lot of this is already baked in)

Code in git VC and publicly available with the ability to take pull requests. I will assume GitHub for now.
Standard package management for the repo (a.k.a., CMake for C/C++, uv for Python, Cargo for Rust)
Automatic code formatting so that code is always consistent no matter who contributes a change
Static analyzers automatically run (Coverity, cppcheck, clang static analyzer. ty/mypy/pyright/pyrefly, codespell, etc.)
100% unittest coverage
A shared set of gsf files used for integration testing
Coverage based fuzzing to detect all sorts of bugs and provide artifacts that can be used to add testing
Performance benchmarks
GitHub actions and local automation to run all the checks (e.g. pre-commit)
The best security scanning available (e.g., zizmor for GitHub actions, bandit for python, CodeQL, etc.)
Great documentation built with each of the packages
A strong defensive AGENTS.md

And a part of this should be moving the GSF spec from a PDF to a GitHub project written in markdown.

1. The future of the C/C++ libgsf

It has been frustrating that SAIC/Lidos developed GSF behind closed doors and only occasionally released versioned snapshots to the community. It was hard to see what was going on with development and the community wasn’t able to contribute fixes. I tried to jump start this process with https://github.com/schwehr/generic-sensor-format/ back in 2015 (check out all the issues that I documented). I wasn’t able to get any engagement and there wasn’t any uptake as far as I can tell. This work should be started from the most recent release.

This is about fortifying the valuable historical usage of the GSF library. This is the reference implementation and extreme care should be taken not to lose the value that comes with that. As a community, we should make sure that existing usage can continue without requiring major reworking of code by the current users.

What should this entail? First off, I should be clear about switching to C++. I don’t mean converting the core to object oriented design. I do mean that converting the code from C to C++ means that the code base can use the C++ std library for numerical values, more const-ness, constexpr, static_asserts, safer casting, and internal use of unique_ptr for safer memory management. The core library should still export an `extern “C”` interface with the same calls. From there, it is reasonable to add an optional wrapper later on top that implements a more natural C++ interface that can be done as a class that knows how to manage memory and presents a more natural C++ error interface with std::optional, exceptions, and/or StatusOr<T>.

After that, it’s reasonable to provide a parallel all C++ from scratch rewrite of GSF that takes advantage of all the modern (and some not so modern 🙂) capabilities like mmap, select, epoll, etc.

Maybe there should be language wrappers in one or both of these, but I’m not sure what is best. swig is pretty frustrating and I don’t know what else is around that has a solid future.

2. Pure Python

The community should get behind https://github.com/oceanmapping/mbtoolkit or something similar. Drive it to completely cover GSF with pure python. Working with Python is great for learning and developing protocols even if it’s not the fastest. I suggest aiming for python >= 3.14 and pushing for the strongest possible package.

An example of some of what can be done for packages when starting from scratch can be seen in https://github.com/schwehr/bitvector-modern. With the help of a coding agent and a lot of careful reviews, I was able to get that package to a pretty good place. Please adapt the best ideas from that and any other solid package out there. Old school python packaging and setups was creaky and painful.

3. A Rust implementation

The Rust Cargo ecosystem is amazing. Unlike python and C++, there is one primary packaging tool: Cargo. The trick is making a Rust based IO system that is both performant and reasonable to maintain. The async code in Rust has a bad reputation for many people. I have little experience, but there is great potential. I suggest having LLM agents cook up a range of designs and compare the results.

Once that basic IO layer is done, that will unlock a huge potential. First, many low level codecs have recently gotten pure Rust replacement libraries that can be used in place of old C libraries gaining both safety and performance (e.g., zlib, png, jxl). Second, Rust makes a great ecosystem to build CLIs, TUI (text/terminal user interfaces), and GUIs/3D interfaces, c.f. ratatui and Bevy. It even works well to have Bevy use ratatui to build terminal based 2D applications that can be extremely useful.

Conclusion

The community needs security, stability, performance, and collaboration with GSF. This post leaves out a lot, but I hope it gets people thinking, motivated, prototyping, and implementing. The time to start is now!

Thursday, July 9, 2026

Open source multibeam sonar processing software

I tried to do another code review for MB-System is morning. It's been many years since I dug deep into the system. Back then (2019/2020), I was trying to add testing to the core library to make it easier to maintain. I made progress, but didn't get traction from the community. The review today reminded me how much friction there is in the old code. The style works great for Dave, the project lead, so I don't want to change it, but here is an example. It's pure C.

int mb_sonarsettings(int verbose, void *mbio_ptr, void *store_ptr, int *kind, double *frequency,
                     double *sample_rate, double *tx_pulse_width, double *power_selection, double *gain_selection,
                     double *absorption, double *spreading, double *sound_velocity, double *beamwidth_tx,
                     double *beamwidth_rx, int *error) {
  if (verbose >= 2) {
    fprintf(stderr, "\ndbg2  MBIO function <%s> called\n", __func__);
    fprintf(stderr, "dbg2  Input arguments:\n");
    fprintf(stderr, "dbg2       verbose:    %d\n", verbose);
    fprintf(stderr, "dbg2       mb_ptr:     %p\n", (void *)mbio_ptr);
    fprintf(stderr, "dbg2       store_ptr:  %p\n", (void *)store_ptr);
  }

  struct mb_io_struct *mb_io_ptr = (struct mb_io_struct *)mbio_ptr;

  /* call the appropriate mbsys_ extraction routine (NULL for formats that do not
     record per-ping sonar settings; reson7k3 is the first to implement it) */
  int status = MB_SUCCESS;
  if (mb_io_ptr->mb_io_sonarsettings != NULL) {
    status = (*mb_io_ptr->mb_io_sonarsettings)(verbose, mbio_ptr, store_ptr, kind, frequency, sample_rate,
                                               tx_pulse_width, power_selection, gain_selection, absorption,
                                               spreading, sound_velocity, beamwidth_tx, beamwidth_rx, error);
  }
  else {
    status = MB_FAILURE;
    *error = MB_ERROR_BAD_SYSTEM;
  }

  if (verbose >= 2) {
    fprintf(stderr, "\ndbg2  MBIO function <%s> completed\n", __func__);
    fprintf(stderr, "dbg2  Return values:\n");
    fprintf(stderr, "dbg2       kind:       %d\n", *kind);
  }
  if (verbose >= 2 && *error == MB_ERROR_NO_ERROR) {
    fprintf(stderr, "dbg2       frequency:       %f\n", *frequency);
    fprintf(stderr, "dbg2       sample_rate:     %f\n", *sample_rate);
    fprintf(stderr, "dbg2       tx_pulse_width:  %f\n", *tx_pulse_width);
    fprintf(stderr, "dbg2       power_selection: %f\n", *power_selection);
    fprintf(stderr, "dbg2       gain_selection:  %f\n", *gain_selection);
    fprintf(stderr, "dbg2       absorption:      %f\n", *absorption);
    fprintf(stderr, "dbg2       spreading:       %f\n", *spreading);
    fprintf(stderr, "dbg2       sound_velocity:  %f\n", *sound_velocity);
    fprintf(stderr, "dbg2       beamwidth_tx:    %f\n", *beamwidth_tx);
    fprintf(stderr, "dbg2       beamwidth_rx:    %f\n", *beamwidth_rx);
  }
  if (verbose >= 2) {
    fprintf(stderr, "dbg2       error:      %d\n", *error);
    fprintf(stderr, "dbg2  Return status:\n");
    fprintf(stderr, "dbg2       status:     %d\n", status);
  }

  return (status);
}

That really boils down to this:

int mb_sonarsettings(int verbose, void *mbio_ptr, void *store_ptr, int *kind, double *frequency,
                     double *sample_rate, double *tx_pulse_width, double *power_selection, double *gain_selection,
                     double *absorption, double *spreading, double *sound_velocity, double *beamwidth_tx,
                     double *beamwidth_rx, int *error) {
  struct mb_io_struct *mb_io_ptr = (struct mb_io_struct *)mbio_ptr;

  if (mb_io_ptr->mb_io_sonarsettings == NULL) {
    *error = MB_ERROR_BAD_SYSTEM;
    return MB_FAILURE;
  }
  
  // Call the appropriate mbsys_ extraction routine. It's NULL for formats that do not
  // record per-ping sonar settings; reson7k3 is the first to implement it.
  return (*mb_io_ptr->mb_io_sonarsettings)(
      verbose, mbio_ptr, store_ptr, kind, frequency, sample_rate,
      tx_pulse_width, power_selection, gain_selection, absorption,
      spreading, sound_velocity, beamwidth_tx, beamwidth_rx, error);

}

Now imagine that spread across a huge amount of drivers. It adds up really quickly. I've been wanting to redesign an open source multibeam sonar processing system for more than 2 decades. In the beginning, I was thinking C++, but trying to avoid the crazy side of C++ and that was looking better and better as new C++ standards came out that smoothed over a lot of the rough edges that were in old C++.

But now that there is Rust, Bevy (3D and 2D game engine), and Ratatui (a Rust TUI) combined with help from LLM coding agents, I'd like the time to do it that way instead. I don't yet have the intuition of how best to build the core IO library. And it will need a solid core of geospatial code that is the equivalent to substantial parts of PROJ, GEOS, and GDAL. I think we as a community have the potential for creating something that scales for next couple of decades and honors the accomplishments of MB-System and GMT over the last 33 years.

Sunday, July 5, 2026

Antigravity with an AGENTS.md file requesting suggestions

I added a section to the AGENTS.md like this:

## 7. Version Control & Commit Messages

- **Code Review**:
  - Always do a code review before committing. In addition to finding and
    suggesting fixes to issues, try to create 1-3 suggestions for improvement to
    the code based on the current changes.

I have gone with the changes the last couple runs as they have almost all been great so far.

● Bash(git status -s) (ctrl+o to expand)

▸ Thought for 3s, 27 tokens

Everything is completed, verified, tested, and cleanly committed! Let's present ...

I have added macOS to the GitHub Actions CI test matrix, updated the project documentation, and committed the

changes.

### Summary of Work

• Updated CI Workflow: Expanded the matrix strategy in test.yml to include macos-latest alongside

ubuntu-latest (with fail-fast: false ), ensuring matrix jobs run across Python 3.13 and 3.14 on both Linux and

macOS.

• Updated Documentation: Added a note under Section 5 in AGENTS.md documenting that CI tests run

automatically across Linux and macOS.

• Verified & Committed: All pre-commit checks passed ( uv run pre-commit run --all-files ), and created

conventional commit 4e50f4c on branch main :

ci(workflow): add macOS to GitHub Actions test matrix

Expand test job strategy matrix in .github/workflows/test.yml to execute across both ubuntu-latest and macos-

latest with fail-fast disabled, and update AGENTS.md cross-platform testing guidelines.

──────

### Code Review Suggestions for Improvement

In accordance with our mandatory code review policy in AGENTS.md, here are 3 suggestions for future

improvement based on this CI addition:

1. Windows CI Support: To guarantee full cross-platform compatibility across all desktop environments, we could

also evaluate adding windows-latest to the OS matrix in test.yml, specifically to verify line endings,

file paths, and file object streaming behavior on Windows.

2. Benchmark Execution Scoping: Currently, --benchmark-only runs in CI when matrix.python-version == '3.14'

across all operating systems. Shared macOS runners in GitHub Actions can experience high timing variance; if

this causes flaky benchmark thresholds in CI, we could restrict benchmark execution strictly to Ubuntu ( if: ...

&& matrix.os == 'ubuntu-latest' ).

3. OS-Specific Cache Monitoring: We should monitor uv cache restore and save durations across OS runners in

GitHub Actions to ensure that astral-sh/setup-uv cache eviction and key scoping remain optimal on macOS

virtual environments.