Breaking up shapefiles Re: [Xastir] ESRI shapefiles and dbfawk

Fri Dec 17 07:28:00 EST 2004

On Fri, Dec 17, 2004 at 04:01:51AM -0700, we recorded a bogon-computron collision of the <russo at bogodyn.org> flavor, containing:
> On Fri, Dec 17, 2004 at 12:54:19AM -0700, we recorded a bogon-computron collision of the <russo at bogodyn.org> flavor, containing:
> > On Fri, Dec 17, 2004 at 12:42:19AM -0700, we recorded a bogon-computron collision of the <jewen at shaw.ca> flavor, containing:
> > > 
> > > Excellent, I will chop the file up into smaller chunks. I'm going to make
> > > them the same size that the B250K Toporama files are chopped up in.
> > 
> > Might want to wait just a bit.
> 
> or more than a bit.

Or not, if you are feeling adventurous.

> > My toy code that built the index and queried it worked out just fine, now 
> > I'm trying to incorporate it into Xastir.  Might be done in a few hours.
> 
> Well, it works.  Once indexed, viewing even huge shapefiles is really fast
> at tight zoom levels.
> 
> Unfortunately, the RTree spatial indexing takes up a *vast* amount of memory. 
> As in, when I view the state of New Mexico with all 66 of the TIGER/Line 
> shapefiles (both linear and area versions) ranging from 300K to 13M each, the 
> NOAA county shapefiles ranging from 8M to 16M each, and a handful of 
> additional ones, the nodes of the RTree take up some 74MB of ram.  

I managed to pare this down to 40MB of index-related memory for the same set
of maps.  That's still an obscenity, but it's the most I can squeeze it down
for the moment.  The algorithm can certainly be streamlined --- I just found
a reference to a journal article that describes a way to compress the tree 
node structure and get as much as a 60% improvement in memory usage, and 
even without that there must be a way to make a more intelligent decision
of when to index and when to just blast through the whole shapefile the way
we did before.

Since I set up my code so it could be selectively enabled or disabled at 
configure time,  I just committed it for anyone who wants to play with it.
By default it is disabled and this commit will have no effect on those who
don't choose to experiment.  To enable it use "--with-rtree" on the 
configure command line.

What this new code does is build a spatial index of the bounding boxes for
the shapes within each shapefile the first time that shape file shows up 
in your viewport.  After that, the rtree search algorithm is used to get
the indices of only those parts of the shapefile that would intersect the 
current viewport, and only reads those out of the shapefile.  Without the
option xastir reads the entire shapefile every time it tries to render any
part of it.

Please let me know if you can think of ways to improve it further.

BTW, one of the things I like about having these spatial indices set up is that
it opens up the possibility of querying shapefile maps with the mouse
(i.e. you use the spatial indices to determine what shapes of what files
are under the mouse pointer, and read in the DBF records).  That's a ways
off, though --- first order of business is to get that memory usage under
control.

-- 
Tom Russo    KM5VY     SAR502  DM64ux         http://www.swcp.com/~russo/
Tijeras, NM  QRPL#1592 K2#398  SOC#236 AHTB#1 http://www.qsl.net/~km5vy/
 "When life gives you lemons, find someone with a paper cut."