[Xastir] Rtree

Sun Mar 27 00:36:28 EST 2005

On Sat, Mar 26, 2005 at 10:12:31PM -0700, we recorded a bogon-computron collision of the <jewen at shaw.ca> flavor, containing:
> >I think your last comments there are the key.  Rtree appears to work
> >best for me when I'm using larger shapefiles or zooming in quite a
> >ways into a shapefile.  It helps in quickly finding the shapes in
> >the file that fit your view.
> >
> >If you have smaller shapefiles, Xastir's native code that checks
> >whether the bounding rectangle fits within your view will knock any
> >that don't fit out of the loading procedure.  Therefore rtree
> >doesn't get a chance to do its thing.
> 
> I split up my shapefiles just before Tom created the rtree option. I use a 
> province wide file at wide zoom levels that only has major highways. When I 
> get in closer, I use sections of the full data set. At the worst, I would 
> have 4 grids to parse.

You are describing what I think would be the optimum shapefile set as far
as speed goes.  The fewer shapes that need to be sifted through by the 
rendering code, the faster it is.

rtree will be most beneficial for shapefile sets that are not so
optimized, such as the full dataset you describe below.

> When I am zoomed into my local area (about 20 miles across), I can pan in 
> about 2 seconds. With rtree enabled, that cuts down to about 1 second, but 
> for the first couple of times to display can take 15 to 20 seconds.

It really should only be the first time any specific file is loaded.

The very first time a shapefile is drawn in a window that's smaller than
the full file extent, every shape in the file is read and its extents saved
in the rtree index for that file.  Then, any shape that's in the current
window is read in and drawn --- that's why the first draw of a file is slower.

Without rtree, every time a shapefile map is read all shapes in that file
are scanned linearly from the top, and those that lie in the current window
are drawn.  If the file contains many more shapes than are visible in the
current window, this leads to poor performance, because the code has to do
disk I/O and a bunch of calculating just to determine whether to draw a shape 
or not.

Since you've picked apart your master shapefiles into only the shapes you 
want displayed at specific zoom levels, the inefficiency of the non-rtree
code is minimized and you wouldn't see much benefit from rtree.
But you had to pick apart your data manually to do it.  

> Should I change to the full dataset (40 MB shapefile and 492 MB dbf) to get 
> the full speed advantages of rtree?

I think you have a very efficient set of shape files that appear to suit your
needs, and you did a bunch of work to get them set up.  I'd say that you're
better off without the rtree stuff as long as that's all you need.

> (Speaking of which, what file(s) does rtree index?)

An rtree index is built for each shapefile (the .shp file, the one
with the actual geometry data in it) the first time it is displayed
when the current window is smaller than the bounding box for the whole
file.

> What is considered a large shapefile? How many MB is large?

I'd consider a 40MB shapefile of an entire province quite large.  My
guess is that if you tried to use that whole file as is, that the
difference between rtree and non-rtree would be more noticeable.  But
since you've already gone through the effort of making smaller
shapefiles that suit your needs, I wouldn't go back to the huge one.

I see noticable speedups when using unaltered TIGER/Line county-sized
files of 8-13MB, as long as my view is smaller than a county (which it
usually is).

-- 
Tom Russo    KM5VY     SAR502  DM64ux         http://www.swcp.com/~russo/
Tijeras, NM  QRPL#1592 K2#398  SOC#236 AHTB#1 
 "The only thing you can do easily is be wrong, and that's hardly
  worth the effort." -- Norton Juster