Structuring Maps

All geographic data can be reduced to three basic topological concepts; the point, the line, and the area. A label is also needed to identify what the entity is. For example a section of railroad line could be represented by a line consisting of a starting and ending x, y coordinate and the label "railroad".

Vector Data Structures

In computer Cartography and GIS, the vector data structures were first to be used because they were created simply from the digitizing tablets, they better represented the complex features like land parcels, and they could be easily drawn on output devices that were in use back then (Plotters). Here the coordinate space is assumed to be continuous allowing all positions, lengths and dimensions to be precisely defined. However the exact representation of a coordinate is limited by computer word size and the step size of the vector display devices.

Spaghetti Format

The first type of vector files were just a series of lines, with "arbitrary starting and ending points, that duplicated the way a cartographer would draw a map" (Clarke 1997, 78). The file might have a few long lines, some short lines or a mix of both. These files were usually written in binary or ASCII code. The problem with this type of format is that it had absolutely no structure, spatial relationships are not present, no attributes, therefore, one could do nothing with it, there is no application for it. A programmer named Nick Chrisman compared this file to following a strand of spaghetti, hence the name cartographic spaghetti.

Polygon Format

In this format the user would digitize the world as a series of polygons or shapes in order to describe the topological traits of areas in such a way that these traits can be displayed and manipulated as thematic map data. This particular format works very good when dealing with a map of just counties. There are however some definite problems with this format. One particular problem is that of duplicate borders. This can happen when the user digitizes a border twice. This double digitizing also creates what we call sliver polygons. Although there are problems this format is still used today in what they call a shape file.

Point Dictionary Format

Here all coordinate pairs are numbered sequentially and referenced by a dictionary that records which points are associated with each polygon. The dictionary enables boundaries between adjacent polygons to be uniquely identified but problems with neighborhood functions still exist.

Arc/Node Format

Beginning as far back as the 1960's, "a hierarchy for spatial data was worked out that became the arc/node model" (Clarke 1997, 79). This particular method uses arcs to store a series of points (coordinate pairs) that start and end with nodes and possible include vertices (shape points). A node is the intersection of two or more arcs. This means that areas are composed of connected lines and lines are composed of connected points. Each component in turn, having its own separate file. Actually three data tables are used to record the topology namely Polygon Topology table (lists the arcs that comprise each polygon), the Node Topology table (defines the arcs that belong to each node, and the Arc Topology table (defines the relationships of the nodes and polygons for each arc). Arc-node data structure is used in ArcInfo and ArcView because spatial relationships are definitely present and it is very efficient for spatial analysis.

Polygon Topology Table

Below are four polygons (A - D) with a Node Topology Table describing their location with respect to one another.  Numbered 1 - 7 are the arcs that make up the polygons.  The nodes that connect the arcs are numbered I - V.  Arc 1 begins at node I and ends at node II.  Polygon A is to the right of arc 1; 0 - a null value for the area with no polygons (called the "universe" in Arc Info terminology) - is to the left.  Arc 6 starts at node IV and terminates at node V.  Note that polygon D is the right polygon - this location is based on the perspective of movement from node IV to node V.

Arc Begin Node End Node Right Poly Left Poly
1 I II A 0
2 II III B 0
3 III IV C 0
4 V I B 0
6 IV V D C
7 I IV D A


TIN Data Structures

Triangulated Irregular Network (TIN) is a list of points with their coordinates that are stored into a file that also contains information about the topology. The network itself contains a series of triangles, which is constructed by connected the points within a group of triangles called a Delaunay triangulation. There are two separate ways to construct a TIN. One contains a file that has information about the arcs that connect the points of the triangles, and the other contains all of the information about the topology of the triangle network. A contour line is easy to draw with a TIN, as well as making 3-D views of an area.

Raster Data Structures


Raster Data Structures consist of an array of grid cells or pixels referenced by a row and column number and containing a number representing the type or value of the attribute being mapped. The 2-dimensional surface via which the geographical data are linked is not continuous and this can have an important effect on the estimates of lengths and areas when grid cell sizes are large with respect to the features being represented.


One of the biggest advantages of the raster data structure is that within the computer's own memory the data form their own map. This means that neighborhood analysis, comparing a grid cell with its neighbor can easily be performed just by looking at the values in the next row and column of the particular cells in question. The process of overlay is also much easier by just matching up two grids and overlaying one on top of the other.


One of the first disadvantages of the raster data structure format is that it is not very good at showing lines or points, because a single line or point becones a whole set of cells in the grid.

Another major disadvantage is that of the problem of mixed pixels. Meaning that a aprticular pixel does not do a good job of showing two attributes within one pixel, leaving one to assign edge pixels, those that are not exclusively in one class or another

Alternative Storage

Atleast two alternatives have been created to deal with the problems that a grid often contains duplicated or altogether missing data.



According to the text book, topology is the "property that describes adjacency and connectivity of features. A topological data structure encodes topology with the geocoded features". Another definition is "The numerical description of the relationships between geographic features, as encoded by adjacency, linkage, inclusion, or proximity. Thus a point can be inside a region, a line can connect to others, and a region can have neighbors. The numbers describing topology can be stored as attributes in the GIS and used for validation and other stages of description and analysis".


Topology has for the first time allowed the user to do some error detection. The major problems associated with sliver polygons and unsnapped nodes are fixed because each line is stored only once and that the only duplication is the endpoints. The user is now allowed to "clean" the map. The best advantage of having a "topologically consistant map is that when two or more maps must be overlain, much of the initial preparation work has been done" (Clarke 1997, 85). The last major advantage would be that many of the operations of retrieval and analysis have the ability to be conducted without using the x, y data.


The basic disdvantages to using this system is that when using arcs or polygons, some reconstruction of the map is a necessity and it is a sophisticated database to go through, with very complex software.

Possible test questions:

  1. Name and describe two types of vector data storage including any advantages and disadvantages.
  2. When representing points and lines which data structure format is better, raster or vector and why?
  3. What is the definition of topology?
Submitted by (Lauren Shapiro) on (2/28/98).
Updated by William Kinnison on 5 Dec 98.