How to draw a line, through the center

05/13/2020

One of the most fascinating aspects of being a scientist, in my opinion, is all the unexpected things one has to learn. Not the science, nor the instruments, nor the software. But rather, the bits from highly technical fields that one might not even otherwise know existed, let alone have plunged into, if not for this very odd career we call science.

In particular, I'm thinking today of a pet project of mine from grad school that is now, thankfully, just a few minor revisions away from publication. That project lured me into the weird and wonderful world of computational geometry, with a smattering of computer vision for good measure. There are a couple reasons that I find computational geometry so beguiling.

First, I'm downright terrible at spatial reasoning! Or rather, I'm sufficiently (if not especially) naturally talented at just about everything else that I feel wholly incompetent when forced to reason spatially. Yet, as a geologist, especially when working in the field, that is a very common need. Computational geometry therefore allows me to leverage something I have a knack for--namely, programming skills and numerical analysis--to tackle something that I find unpleasantly challenging--namely, spatial reasoning.

Second, computational geometry often seems profoundly unintuitive to the uninitiated, precisely because it requires one to think abstractly and numerically about concepts that are fundamentally spatial and geometric. What "ought" to be almost tactile instead is purely mathematical. Of course, this is where my non-spatial skills comes in quite handy, but all the same, the path from A to B in computational geometry often seems to me a splendid puzzle.

So, without further ado, let's touch on this pet project, shall we?

In short, I created a new algorithm. It just so happens that by some useful metrics, it's actually the state of the art in its very specialized field. So, what does it do, you might ask? Well, recalling what I said about the sometimes extraordinary counter-intuitiveness of computational geometry, let me whimsically put it this way: It does something any 5-year-old could do, and takes a very roundabout way of getting there!

Now, to answer your (or rather, my) question more directly: Imagine the outline of a river. It's basically a very long, squiggly, complicated polygon, when you get right down to it. Better still, let's make that a river network: the Mississippi, the Ohio, the Arkansas, and Tennessee, etc. Now we're talking about an extremely long, squiggly, complicated polygon. Next, imagine that you wanted to draw a line down the center of each river, lengthwise. In other words, you want to shrink this polygonal footprint to a bunch of lines, or more accurately, a network (or even more technically, a "graph"). You wouldn't imagine it would be that hard, right? Simply take out a crayon and you're off to the races, wending your way around each bend in each river as though you were tracing through a maze on a backside of a place mat kids' menu. Like a 5-year-old.

The problem is, although you could certainly draw a "centerline" using this crayon approach, it wouldn't be reproducible. The next person might draw a line that deviates from yours in many places. And in science, of course, reproducibility is the gold standard. Worse yet, arguably, these deviations could significant affect your results. As one simple example, say that you were testing a hypothesis about the factors that influence a river's width. In that case, you'd have to measure the width correctly at every point along the river to minimize the noise in your measurements. However, it may not be (and generally is not) obvious what the correct width orientation is at any point in the river. To be rigorous, one could define the width as the span in the cross-river direction, but then would have to define what the along-river direction is in the first place! Therefore, knowing the location of the centerline allows you to more accurate measure the river's width.

So, how does this algorithm go about deriving this centerline? Well, it's a multi-step process that's best shown in pictures.

Above, in (a), one starts with a polygon. One them samples the polygon at some regular interval and then generate what's call a Voronoi diagram (c), this is basically a bunch of polygonal cells. Each of those cell represents the "neighborhood" around one of the boundary sample points, enclosing all coordinates closer to that sample point than to any other sample point. In (d), the Voronoi cell sides between consecutive boundary samples are removed. Then, in (e), I reconstruct a crude approximation of the original polygon from (a), using only the boundary samples as vertices, and finally, in (f), I isolate those remaining Voronoi segments for which intersection points (or "hubs") lie within the input polygon (from (a)). There are carefully considered reasons for every unintuitive step here, but for this post (at least for now), I'll just focus on explaining the steps.

Now, everything I just described is standard. It's been around for decades. However, with this image, we're getting more into what's really novel with this algorithm. In short, I route all possible paths between all hub pairs and then find the longest of all those length-optimizes paths. That path I call the trunk, and forms something like the 'backbone" of my centerline. To that trunk, I add branches (hub to 'stub' (end-node) paths and bridges (hub-to-hub paths) in a particular sequence. At the end, I'm simply bundled all Voronoi segments into a smaller number of paths, and labeled each of those paths a trunk, branch, or bridge. So, how does that help me? Well, here comes the trick...

As you can see here, the smaller the spacing between those boundary samples way back at the beginning, the busier our networks gets, and the more unwanted bits there are. So, all of the path-finding--that trunk, branch, and bridge identification--that I just did was to serve one purpose: to allow me to calculate what I call the "normalized length", which is simply the length of a branch divided by the width at its stem. (Incidentally, this logic is guided by Horton stream ordering, which is a geomorphic concept from the 1940s!) Once I've calculated that normalized length, it allows me to prioritize what bits are important and what bits aren't, and voila! I have an algorithm that is state of the art in that it (1) preserves continuity while at the same time being (2) smooth and (3) well-centered.

Now, I know that was a whirlwind, and I could have done a much better job, but I'm already late as it is. Therefore, I'll leave you with a comparison image between my algorithm and a couple other state-of-the-art algorithms (below). If you want to learn more, check out this link, which will bring you to my algorithms repository: https://github.com/eischaefer/numgeo/releases