STRUCTURING MAPS
-
Vector Data Structures
-
Spaghetti Format
-
Polygon Format
-
Point Dictionary Format
-
Arc/Node Format
-
TIN Data Structures
-
Raster Data Structures
-
Storage
-
Advantages
-
Disadvantages
-
Alternatives
-
Run-length Encoding
-
Quad-Tree Format
-
Topology
-
Definition
-
Advantage
-
Disadvantage
Structuring Maps
All geographic data can be reduced to three basic topological
concepts; the point, the line, and the area. A label is also needed to
identify what the entity is. For example a section of railroad line could
be represented by a line consisting of a starting and ending x, y coordinate
and the label "railroad".
Vector Data Structures
In computer Cartography and GIS, the vector data structures
were first to be used because they were created simply from the digitizing
tablets, they better represented the complex features like land parcels,
and they could be easily drawn on output devices that were in use back
then (Plotters). Here the coordinate space is assumed to be continuous
allowing all positions, lengths and dimensions to be precisely defined.
However the exact representation of a coordinate is limited by computer
word size and the step size of the vector display devices.
Spaghetti Format
The first type of vector files were just a series of lines,
with "arbitrary starting and ending points, that duplicated the way a cartographer
would draw a map" (Clarke 1997, 78). The file might have a few long lines,
some short lines or a mix of both. These files were usually written in
binary or ASCII code. The problem with this type of format is that it had
absolutely no structure, spatial relationships are not present, no attributes,
therefore, one could do nothing with it, there is no application for it.
A programmer named Nick Chrisman compared this file to following a strand
of spaghetti, hence the name cartographic spaghetti.
Polygon Format
In this format the user would digitize the world as a series
of polygons or shapes in order to describe the topological traits of areas
in such a way that these traits can be displayed and manipulated as thematic
map data. This particular format works very good when dealing with a map
of just counties. There are however some definite problems with this format.
One particular problem is that of duplicate borders. This can happen when
the user digitizes a border twice. This double digitizing also creates
what we call sliver polygons. Although there are problems this format is
still used today in what they call a shape file.
Point Dictionary Format
Here all coordinate pairs are numbered sequentially and referenced
by a dictionary that records which points are associated with each polygon.
The dictionary enables boundaries between adjacent polygons to be uniquely
identified but problems with neighborhood functions still exist.

Arc/Node Format
Beginning as far back as the 1960's, "a hierarchy for spatial
data was worked out that became the arc/node model" (Clarke 1997, 79).
This particular method uses arcs to store a series of points (coordinate
pairs) that start and end with nodes and possible include vertices (shape
points). A node is the intersection of two or more arcs. This means that
areas are composed of connected lines and lines are composed of connected
points. Each component in turn, having its own separate file. Actually
three data tables are used to record the topology namely Polygon Topology
table (lists the arcs that comprise each polygon), the Node Topology table
(defines the arcs that belong to each node, and the Arc Topology table
(defines the relationships of the nodes and polygons for each arc). Arc-node
data structure is used in ArcInfo and ArcView because spatial relationships
are definitely present and it is very efficient for spatial analysis.
Polygon Topology Table
Below are four polygons (A - D) with a Node Topology
Table describing their location with respect to one another. Numbered
1 - 7 are the arcs that make up the polygons. The nodes that connect
the arcs are numbered I - V. Arc 1 begins at node I and ends at node
II. Polygon A is to the right of arc 1; 0 - a null value for the
area with no polygons (called the "universe" in Arc Info terminology) -
is to the left. Arc 6 starts at node IV and terminates at node V.
Note that polygon D is the right polygon - this location is based on the
perspective of movement from node IV to node V.
| Arc |
Begin Node |
End Node |
Right Poly |
Left Poly |
| 1 |
I |
II |
A |
0 |
| 2 |
II |
III |
B |
0 |
| 3 |
III |
IV |
C |
0 |
| 4 |
V |
I |
B |
0 |
| 5 |
III |
V |
B |
C |
| 6 |
IV |
V |
D |
C |
| 7 |
I |
IV |
D |
A |
| 8 |
II |
IV |
A |
B |
TIN Data Structures
Triangulated Irregular Network (TIN) is a list of points
with their coordinates that are stored into a file that also contains information
about the topology. The network itself contains a series of triangles,
which is constructed by connected the points within a group of triangles
called a Delaunay triangulation. There are two separate ways to construct
a TIN. One contains a file that has information about the arcs that connect
the points of the triangles, and the other contains all of the information
about the topology of the triangle network. A contour line is easy to draw
with a TIN, as well as making 3-D views of an area.

Raster Data Structures
Storage
Raster Data Structures consist of an array of grid cells
or pixels referenced by a row and column number and containing a number
representing the type or value of the attribute being mapped. The 2-dimensional
surface via which the geographical data are linked is not continuous and
this can have an important effect on the estimates of lengths and areas
when grid cell sizes are large with respect to the features being represented.
Advantages
One of the biggest advantages of the raster data structure
is that within the computer's own memory the data form their own map. This
means that neighborhood analysis, comparing a grid cell with its neighbor
can easily be performed just by looking at the values in the next row and
column of the particular cells in question. The process of overlay is also
much easier by just matching up two grids and overlaying one on top of
the other.
Disadvantages
One of the first disadvantages of the raster data structure
format is that it is not very good at showing lines or points, because
a single line or point becones a whole set of cells in the grid.
Another major disadvantage is that of the problem of mixed
pixels. Meaning that a aprticular pixel does not do a good job of showing
two attributes within one pixel, leaving one to assign edge pixels, those
that are not exclusively in one class or another
Alternative Storage
Atleast two alternatives have been created to deal with the
problems that a grid often contains duplicated or altogether missing data.
-
Run-length encoding is a format in which "along each row, only changes
between attributes and the numbers of pixels of that same attribute are
stored'(Clarke 1997, 83). As a way of saving space, if an entire row is
one class, it is then stored as that class and the number of pixels only.
-
Quad Tree (Like the US public Land Survey) is a method that works by dividing
a grid into four quadrants. Then if needed each quadrant is further broken
down in four quadrants

Topology
Definition
According to the text book, topology is the "property that
describes adjacency and connectivity of features. A topological data structure
encodes topology with the geocoded features". Another definition is "The
numerical description of the relationships between geographic features,
as encoded by adjacency, linkage, inclusion, or proximity. Thus a point
can be inside a region, a line can connect to others, and a region can
have neighbors. The numbers describing topology can be stored as attributes
in the GIS and used for validation and other stages of description and
analysis".
Advantages
Topology has for the first time allowed the user to do some
error detection. The major problems associated with sliver polygons and
unsnapped nodes are fixed because each line is stored only once and that
the only duplication is the endpoints. The user is now allowed to "clean"
the map. The best advantage of having a "topologically consistant map is
that when two or more maps must be overlain, much of the initial preparation
work has been done" (Clarke 1997, 85). The last major advantage would be
that many of the operations of retrieval and analysis have the ability
to be conducted without using the x, y data.
Disadvantages
The basic disdvantages to using this system is that when
using arcs or polygons, some reconstruction of the map is a necessity and
it is a sophisticated database to go through, with very complex software.
Possible test questions:
-
Name and describe two types of vector data storage including any advantages
and disadvantages.
-
When representing points and lines which data structure format is better,
raster or vector and why?
-
What is the definition of topology?
Submitted by (Lauren Shapiro) on (2/28/98).
Updated by William Kinnison on 5 Dec 98.