Flutterby™! : Interpreting data

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

Interpreting data

2010-04-08 18:10:47.370321+00 by Dan Lyke 4 comments

Ogle Earth: In GIS as in economics, a little knowledge is a dangerous thing (Zimbabwe edition). Over at Marginal Revolution there was a blog post that looked at a Center for Global Development look at the effects of land reform in Zimbabwe. Stefan Greens noticed that the differences in the imagery were due to different image processing, but the underlying satellite images were most likely the same.

A recurring theme at WhereCamp was that map data is produced for a use, and understanding that use is necessary to know what sort of information to present in the map, and in the processing of the original source data into the final map. This is a very difficult process, and somewhere in the chain of data that between someone deciding that the Google Earth imagery would be more useful if it were processed differently and someone thinking that the processing of those images was constant and that quantity of green was indicative of vegetation, there was a breakdown.

[ related topics: Maps and Mapping Economics ]

comments in descending chronological order (reverse):

#Comment Re: made: 2010-04-11 16:02:46.88856+00 by: Dan Lyke

If you consider that at the equator 1 degree is roughly 60 nautical miles, and that 32 bit floating point math is good to about 5 places (with lots and lots of caveats), then lat/lon with floats gives you about, what, 3½" feet? That's not good enough for lot line.

Never mind your observation that if the boundaries are static and globally defined, all it takes is one good earthquake and my shed is in my neighbor's parcel.

And actually, computer-graphics wise, a topic of discussion at Pixar back when I was there was ways to use more local coordinate systems. It didn't really go anywhere because in the end it all had to go back to global space for rendering, and memory was getting cheap enough fast enough that moving to doubles for those situations was better (and a couple of other reasons. It was one of my "wow, that guy's brilliant at math, but he thinks symbolically and in a different conceptual space than I do and I already have that answer" moments).

Two related topics:

The first is that I can get the city budget in its entirety. In some years that's even in text based PDF. Getting an abstraction from that budget so I can compare it to an abstraction of other budgets is something I'd like, but because Petaluma owns its fire department and Novato pays into the Novato Fire Protection District in some less obvious ways, the mechanisms for coming up with those abstractions aren't easy.

The second is that Apple has made third party generated apps verboten on the iPhone, and I've been spending the last few weeks playing with C# to C++, a similar lock-in. The vast majority of coding on both Windows and the Mac is generic. I've been pondering a language built for translation to Objective-C and C#. One that has objects, you can call methods on objects, understands a few basic dynamic types, but does away with the goofier aspects of syntax lock-in.

The thing that's got me thinking like this is converting gobs of C# to C++ because I'm going to do this translation layer at runtime with objects. What's important in that conversion isn't the data representation in both spaces, it's the process to convert one abstraction to another. Which, dragging back to maps, suggests that where we need more concentration on awareness isn't on the data but in the processes of abstracting the data for human consumption.

#Comment Re: made: 2010-04-10 18:34:39.736898+00 by: ebradway

One of the recurring themes I hear from CompSci people is a desire to make a 1:1 map - be it automated feature extraction from hi-res satellite imagery or massive sensor networks. Where the "Isn't this all in Lat/Long?" comes from is good ol' graphics systems. When you display something, say a spaceship, on screen, you start at some x,y position. You use the same x,y position for things like collision detection.

There is a strong tendency to think Lat/Long works like x,y in such a system. And if we lived on a perfect, smooth sphere, Lat/Long would work very much like x,y (or r,theta). Unfortunately, the surface of the earth is very complex and dynamic. So much so that for precise surveys, you always have to measure from a local control point or survey monument.

Lat/Long is nice for describing things on a global scale - but those descriptions are gross generalizations. For instance, the San Francisco Bay may be omitted in maps used for things like continental drift. So the Lat/Long of Angel Island is somewhat superfluous.

Most parcel data is described in State Plane. State Plane systems start from some benchmark and measure, usually in feet, northings and eastings. Because state plane assumes a square grid, states that are long, south to north, usually have two or more state plane systems. California has six state plane systems that are divided not along lines of latitude, but follow county boundaries (fun, eh?).

But parcel data is actually measured to lower standards than what you'd use to build a bridge. In which case, you see the engineers and surveyors out - not with GPSes - but lasers and tripods.

Of course, it doesn't help that the surface of the earth in California tends to slip and slide regularly! There was a couple of scientists from GNS New Zealand at Where 2.0 who were talking about dynamic datums because their entire country moves by centimeters per year.

#Comment Re: made: 2010-04-09 17:54:17.944302+00 by: Dan Lyke

Yeah, I think you can't be responsible for the ways in which people may misinterpret whatever data you publish (no matter how you say something, there'll always be some whackjob who can try to misinterpret it to suit their own ends), but you can make the limitations on what you publish as clear as possible.

I'm reminded of a recent conversation with four smart geek type people, one of whom was a GIS guy, one of whom was me, and one of the others asked "isn't all that parcel boundary information all stored as latitude and longitude?" The GIS guy knew all the reasons why it wasn't necessarily, I knew all the reasons why it wasn't a good idea, but without that cross-discipline communication the idea that there's one common coordinate space that everyone could use, ie: WGS84, sounds really really great.

In fact, unrelated to maps, I had a conversation about color spaces with another really really smart guy about two weeks ago, and I'm afraid he's going to get it very very wrong because he's bought into the "we'll just have one color space with a very clear definition and store everything in that" simplification, and despite the fact that a number of smart people who understand color very well are involved I think it's going to end badly. Or at least sub-optimally.

In other words, the problem happens any time you have abstractions, and the best you can do is to make sure you publish the assumptions and abstractions prominently along with the data sets.

#Comment Re: made: 2010-04-09 17:21:10.072786+00 by: ebradway

This is a perfect example of the core problem of "open data" as well as demonstrating the need for more dynamic metadata.

To approach this from a slightly different angle: Is it possible to produce a representation of a geographic phenomenon that is universally appropriate? This is the fallacy of the 1:1 scale map. All representations of real phenomena are approximations appropriate for only a limited range of applications. If the limits to application are not easily understood from the metadata, the data will be used inappropriately.

From an ethical standpoint, should data be opened? Should we not allow people to learn from the data what they can? As a GIS professional and academic, I see it as my job to help people understand these limits and how to get real meaning from data.