GIS Analysis Functions

Outline

1. Introduction
2. Spatial Data Functions
• Format Transformations
• Geometric Transformations
a. Relative Position
b. Absolute Position
• Projection Transformations
• Conflation
• Edge-matching
• Editing Functions
• Line Coordinate Thinning
3. Attribute Data Functions
• Retrieval
• Classification
• Verification
4. Integrated Analysis of Spatial and Attribute Data
• Overlay
a. Region-wide Overlay
b. Topological Overlay
• Neighborhood Function
• Point-in-Polygon and Line-In-Polygon
• Topographic Functions
a. Slope
b. Aspect
c. Sun Intensity
• Thiessen Polygon
• Interpolation
• Density Functions
5. Cartographic Modeling
• Entirely in the Raster Domain
• Ability to form a logical sequence
• Map Algebra
6. Connectivity Functions
• Contiguity Measures
• Proximity Functions
a. Target locations.
b. The unit of measurement.
c. A function to calculate proximity.
d. The area to be analyzed.
• Network Functions
a. Prediction of loading on the network itself
b. Rate optimizing
c. Resource allocation
• Sets of Constraints
a. Set of resources
b. One or more locations where the resources are located
c. An objective to deliver the resources to a set of destinations
d. Limits on how the objective can be met
• Spread Functions
• Seek or Stream Functions
• Intervisibility Functions
7. Output Functions
• Map Annotation
• Line Styles
• Various Graphic Systems

Introduction

Geographic Information Systems (GIS) analysis functions use spatial and non-spatial attribute data to answer questions about the real world. It is the spatial analysis functions that distinguish a GIS from other information type systems. When using a GIS to inventory or for analysis, the question will arise as to which specific analysis function to use to solve a problem. Wise use and selection of these functions will lead to high quality of the information produced from the GIS. Individual analysis functions must be used within the context of a complete analysis strategy. (Aronoff, 1989)

Spatial Data Functions

Spatial data refers to information about the location and shape of, and relationships among, geographic features, usually stored as coordinates and topology. Spatial data functions are used to transform spatial data files, such as a digitized map, edit them, and assess their accuracy. They are mainly concerned with the spatial data.

Format Transformations
Format is the pattern into which data are systematically arranged for use on a computer. Format transformations are used to get data into acceptable GIS format. Digital Files must be transformed into the data format used by the GIS, such as transforming from raster to vector data structure. Raster data often requires no re-formatting. Vector data often requires topology to be built from coordinate data, such as arc/node translations. Transformation can be very costly and time-consuming with poor coordinate data.

Geometric Transformations
Geometric transformations are used to assign coordinates to a map or to a data layer within the GIS. Such transformations adjust one data layer so it can be correctly overlaid on another data layer of the same area. The procedure used to accomplish this correction is termed registration. Two approaches are used in registration, absolute and relative.

a. Relative Position refers to the location of features in relation to a geographic coordinate system. Rubber sheeting (registration by Relative Position) is the procedure using “slave” and “master” mathematical transformations to adjust coverage features in a non-uniform manner. Links representing from- and to- locations are used to define the adjustment. It needs easily identifiable, accurate, well-distributed control points. This solution is more common when the number of layers is small.
b. Absolute Position is the location in relation to the ground. This registration is done to each individual layer. The advantage to the absolute methodology is that it does not propagate errors from one layer to the next. When the GIS system is extensive and the layers are many and varied the absolute method of registration is the most viable.

Projection Transformations
Geodesy is the study of the size, shape and motion of the Earth. It is used to calculate map projections, which are mathematical transformations used to represent a spherical surface on a flat map. The transformation assigns to each location on a spherical surface a unique location on a 2-dimensional map. Map projections always cause some distortion. It can produce errors in calculations of area, shape, distance, or direction. Currently, commercial GIS applications commonly support several projections and have software to transform data from one projection to another. The map projection most commonly used for mapping at scales of 1:500,000 or larger in North America is the Universal Transverse Mercator (UTM) Projection. For maps of continental extent, the Albers, Lambert’s Azimuthal, and Polyconic projections are utilized.

Conflation
Conflation is the procedure of reconciling the positions of corresponding features in different data layers. Specifically this is a set of functions and procedures that aligns the arcs of one coverage with those of another and then transfers the attributes of one to the other. Alignment precedes the transfer of attributes and is most commonly performed by rubber-sheeting operations

Edge-matching
Edge matching is an editing procedure to adjust the position of features extending across adjacent map sheet boundaries. This function ensures that all features that cross adjacent map sheets have the same edge locations. Links are used when matching features in adjacent coverages.

Editing Functions
Editing functions are used to add, delete, or manipulate the geographic position of features. Sliver or splinter polygons are thin polygons that occur along the borders of polygons following digitizing and the topological overlay of two or more coverages.
Address Matching is a mechanism used to integrate two files using their features’ respective addresses as the common item. Geographic coordinates and attributes can be transferred from one address to the other. For example, a data file containing a student’s street address can be matched to a spatial map coverage that contains addresses, thus creating a point coverage of where the student lives.

Line Coordinate Thinning
The Line Coordinate Thinning function reviews all the coordinate data in a file, identifies and then removes non-pertinent (insignificant) coordinates. Depending on the scale, the number of coordinate pairs can often be reduced without a perceived loss of detail. An example would be the Kansas-Nebraska border (a straight east-west line). For all practical purposes, it could be represented within a USA raster representation map by two points. This function is used to reduce the quantity of coordinate data that must be stored by the GIS. Coordinate thinning, by reducing the number of coordinate points, reduces the size of the data file, thereby reducing the volume of data to be stored and increasing processing speed.

Attribute Data Functions

Attribute Data is a characteristic of a geographic feature. It can be described by numbers, characters, images or CAD drawings, and it is typically stored in tabular format and linked to the feature by a user-defined identifier. (e.g., the attributes of a well might include depth and gallons per minute).

Retrieval (selective search)
Retrieval operations of spatial and attribute data involve the selective search, manipulation, and output of data without the resultant modification of the geographic location of features or the creation of new spatial entities. These operations work with the spatial elements as they were entered in the data base. Information from database tables can be accessed directly through the map, or new maps can be created using information in the tabular database. Both graphic and tabular data must be stored in formats the computer can recognize and retrieve.

Classification
Classification is the procedure of identifying a set of features as belonging to a group. Some form of classification function is provided within every GIS. The simplest form of this function occurs in a raster-based GIS, with similar numerical values used to indicate classes. Classification is the basis for defining patterns, and one of the most important functions of a GIS is to assist in recognizing new patterns. Classification can be done using single data layers, as well as integrating multiple data layers as part of an overlay operation. One type of generalization, called map dissolve, is the process of making a classification less detailed by combining classes. Generalization is often used to reduce the level of classification detail to make an underlying pattern more apparent.

Verification
Since all data entry is subject to error, the first step after entry should be verification. Verification is a simple process where a report or printout of the data output produced using the GIS is then checked against the original input. Nearly all GIS are in a state of flux due to insertion, deletion and modification; therefore verification is an ongoing process. (Clarke, 1997)

Integrated Analysis of Spatial and Attribute Data

Overlay Overlay is a GIS operation in which layers with a common, registered map base are joined on the basis of their occupation of space (Clarke, 1997). Another definition of overlay is an analysis procedure for determining the spatial coincidence of geographic features.  Overlay function output is capable of creating composite maps by combining diverse data sets. These outputs can reflect simple operations such as laying a road map over a map of local wetlands, or more sophisticated operations such as multiplying and adding map attributes of different value to determine averages and co-occurrences.
Raster and vector models differ significantly in the way overlay operations are implemented. Overlay operations are usually performed more efficiently in raster-based systems. In many GIS’s a hybrid approach is used that takes advantage of the capabilities of both data models. A vector-based system may implement some functions in the raster domain by performing a vector-to-raster conversion on the input data, doing the processing as a raster operation, and converting the raster result back to a vector file.

a. Region Wide Overlay: “Cookie Cutter Approach”
The region wide, or “cookie cutter,” approach to overlay analysis allows natural features, such as forest stand boundaries or soil polygons, to become the spatial area(s) which will be analyzed on another map.
For example: given two data sets, forest patches and slope, what is the area-weighted average slope within each separate patch of forest? To answer this question, the GIS overlays each patch of forest from the forest patch data set onto the slope map and then calculates the area-weighted average slope for each individual forest patch.
b. Topological Overlay:
Co-Occurrence mapping in a vector GIS is accomplished by topological overlaying. Any number of maps may be overlaid to show features occurring at the same location. To accomplish this, the GIS first stacks maps on top of one another and finds all new intersecting lines. Second, new nodes (point features where three or more arcs or lines come together) are set at these new intersections. Lastly, the topologic structure of the data is rebuilt and the multifactor attributes are attached to the new area features.

Neighborhood Function
Neighborhood Function analyzes the relationship between an object and similar surrounding objects. For example, in a certain area, an analysis of potential land available for a playground could be done in reference to its adjacency to attractive nuisances such as junkyards. Another application for this analysis function is often used in image processing. In this raster-based process a new map is created by computing the value assigned to a location by averaging the independent values surrounding that location. Neighborhood functions are particularly valuable in evaluating the character of a local area.

Point-in-Polygon and Line-In-Polygon
Point-in-Polygon is a topological overlay procedure which determines the spatial coincidence of points and polygons. Points are assigned the attributes of the polygons within which they fall. For example, this function can be used to analyze an address and find out if it (point) is located within a certain zip code area (polygon).
Line-in-Polygon is a spatial operation in which lines in one coverage are overlaid with the polygons of another coverage to determine which lines, or portions of the lines, are contained within the defined polygons. This function would determine, for example, the total miles of state highways within the boundaries of a certain county. Furthermore, polygon attributes can be associated with corresponding lines in the resulting line coverage. Using the state highway example, overlaying property ownership with a “to be widened” state highway would produce a mailing list for condemnation notices.
In a vector-based GIS, the identification of points and lines contained within a polygon area is a specialized search function. In a raster-based GIS, it is essentially an overlay operation, with the polygons in one data layer and the points and/or lines in a second data layer.

Topographic Functions
Topography refers to the surface characteristics with continuously changing value over an area. Elevations, aeromagnetics, noise levels, income levels, and pollution levels are examples of topography.  Data are simple numeric values, such as meters above sea level. Topographic functions are used to calculate values that describe the topography at a specific geographic location or in the vicinity of the location. The two most commonly used terrain parameters are slope and aspect, both of which are calculated using the elevation data of the neighboring points.  Topography of a land surface can be represented in a GIS in the raster format by the digital elevation model (DEM).  An alternative form, used in vector-based systems, is the Triangulated Irregular Network or TIN

a. Slope is the measure of change in surface value over distance, and can be expressed in degrees or as a percentage. For example, a rise of 2 meters over a distance of 100 meters describes a 2% slope with an angle of 1.15. Mathematically, slope is referred to as the first derivative of the surface. The maximum slope is termed the gradient. In a raster format, the Digital Elevation Model (DEM) is a grid where each cell is a value referenced to a common datum (sea level). Any two points on the grid will be sufficient to ascertain a slope. Once the slopes have been calculated, then the maximum difference can be found and the gradient can be determined.
b. Aspect is the compass direction that a surface faces, measured in a clockwise direction from North. In a raster format DEM, another grid can be created for aspect and a number can be assigned to a specific direction.
c. Sun intensity is a measure of the combination of slope and aspect. An illumination source (sun) is algorithmically calculated as originating from a certain direction and elevation. The directness of its impingement on each cell is then calculated (the more direct, closest to perpendicular, the higher the value). Digitally displaying these values (illumination) portrays the effect of shining a light onto a 3-dimensional surface.

Thiessen Polygons
Thiessen or Voronoi polygons define individual areas of influence around each of a set of points. Thiessen polygons are polygons whose boundaries define the area that is closest to each point relative to all other points. Thiessen polygons are generated from a set of points. They are mathematically defined by the perpendicular bisectors of the lines between all points. A TIN structure is used to create Thiessen polygons.

Interpolation
Interpolation is the procedure of predicting unknown values using the known values at neighboring locations. The quality of the interpolation results depends on the accuracy, number, and distribution of the known points used in the calculation and on how well the mathematical function correctly models the phenomenon.

Density Functions
"
In a simple density calculation, points or lines that fall within the search area are summed and then divided by the search area size to get each cell's density value.  Possible uses include finding density of roads as an influence on wildlife habitat, or the density of utility lines in a town. The Population field could be used to weigh some roads or utility lines more heavily than others, depending on their size or class. For example, a divided highway probably has more impact than a narrow dirt road to wildlife habitat, and a high-tension line has more impact than a standard electric pole to visual quality."  (ArcGIS Desktop Help, Density Calculations.)

Questions
Definitions
1. conflation 2. edge-matching 3. line coordinate thinning
4. address matching 5. rubber sheeting 6. retrieval
7. classification 8. overlay 9. point-in-polygon
10. Thiessen polygon  11. interpolation 12. line-in-polygon 13. density functions
Essay
1. Please present your own opinion on how to use GIS analysis functions.
2. Please discuss: “Spatial Analysis Functions are the power of GIS.”
3. Please give examples to describe GIS overlay functions.
4. Describe the differences and the relationships of spatial data and attribute data.
5. Using figures or examples describe the Edge Matching Function.
6. What is a sliver polygon? Describe how sliver polygons are created and then what function you would use to delete them.
7. Describe slope and aspect, and then how they are measured.

Cartographic Modeling

Entirely in the Raster Domain
Cartographic Modeling is the use of basic GIS functions in a logical sequence to solve complex spatial problems. It was developed to model land use planning alternatives, and applications that require the integrated analysis of multiple geographically distributed factors. Dana Tomlin coined the term in 1983. Cartographic Modeling occurs entirely in raster domain. Due to the additive (or subtractive) nature of analysis the raster format with its assigned values is inherent to the data. Digitized data is layered, and these layers are combined to construct constraint maps that then can be subjectively analyzed as to the best alternative.

Ability to form a Logical Sequence
Cartographic Modeling mathematically provides the ability to form a logical sequence. The process of cartographic modeling is characterized by working backward through the process to insure that all the data that will be needed are identified. Data that will not be used are simply not collected. This backward procedure insures that any judgments made are explicitly identified. Subjective judgments, therefore, are an integral feature of cartographic modeling.

Map Algebra
The premise for this concept follows that since digital data can be assigned a value, mathematical algorithms can manipulate it. For example, (Map #1 + Map #2) / (Map #3 * Map #4). This is one of the basic foundations of an analytical GIS. Maps have VALUE.
For example, the usual map user perceives the distance from Chicago to Omaha is 500 miles, but to a trucker it is also:
\$.45 cents per mile on the interstate road system, and \$ .67 cents per mile on the roads around the DOT (department of transportation) scales. Given the fine for being caught overloaded is \$500.00, and that 32% of truckers driving the roads to avoid the scales get caught by mobile DOT patrols, what is the cheapest way to get to Omaha from Chicago?
Some examples of programs that use Cartographic Modeling are Map Factory, GRASS, and MAP II.

Connectivity Functions

Contiguity Measures
Contiguity measures evaluate the characteristics of spatial units that are connected. These units share one or more characteristics with adjacent units and form a group. The term UNBROKEN is the key concept. Different adjacent features may have more than one attribute but they must all have a COMMON attribute to be considered as reflecting contiguity.  Contiguity is used to measure shortest and longest straight-line distances across an area and to identify areas of terrain with specified size and shape characteristics.

Proximity Measures
Proximity is the simple distance between features, commonly units of length. A proximity function is an algorithm that calculates this quantity. It is always spatial but not always linear.

Four parameters are used to measure proximity.
a. Target locations.
b. The unit of measurement.
c. A function to calculate proximity.
d. The area to be analyzed.

Probably the most common type of proximity analysis is the buffer zone. Coverage of this zone can be quite simple (ten feet from the property line), or very complicated involving many layers. It also can be mathematically complex, such as a decreasing sound level due to the inverse proportion law of noise generated by various types of air traffic in the vicinity of a housing area.

Network Functions
A network function is a set of interconnected linear features that form a pattern or framework. They are commonly used for moving resources from one location to another. City Streets, Power Transmission Lines, and Airline Service Routes are examples.

There are three principal types of GIS Analysis performed by Networking.
a. Prediction of loading on the network itself (prediction of flood crests)
b. Rate optimizing (emergency routing of ambulances)
c. Resource allocation (zones for servicing rescue areas)

Sets of Constraints
Constraints are limits imposed on a model (network). For example, in an interaction model, a constraint could be that the number of trips generated from an origin to all destinations cannot exceed the origin’s production capacity.

Common constraints are as follows.
a. Set of resources (goods to be delivered)
b. One or more locations where the resources are located (several warehouses where the goods are located)
c. An objective to deliver the resources to a set of destinations (customer location data base)
d. Limits on how the objective can be met (is it economically feasible to deliver pizzas to Lincoln from a store in Omaha?).

Spread Functions
The Spread Function is simply the “best” way to get from point A to point B. “Best” can be fastest, it can be most the most economical, or a subjective measurement such as most scenic. It is an evaluation of phenomena that accumulates with distance.
Imagine a square and you are going to travel from the lower left corner to the upper right corner. The straight-line distance is 1.414 times the side of the square, and the distance across the sides is 2.0 times the length of a side. If this square represented a pasture containing angry buffaloes it would probably be beneficial to walk around the fenced perimeter and go the extra distance.
Output of this particular GIS function is sometimes referred to as ACCUMULATION SURFACE or FRICTION SURFACE. These concepts refer to the “effort” it takes to get from A to B. For example, if the square traversed was knee deep mud (or a lake) across the diagonal but dry at the perimeter, it would be “farther,” but “easier,” to again go the extra distance.

Seek or Stream Functions
Seek and Stream are synonymous and refer to functions that are directed outward in a step by step manner using a specified decision rule. This procedure is initiated and proceeds until any further movement violates the decision rule.  This GIS function, as an example, could be used to evaluate erosion potential. The decision rule in this case would be elevation. As the process proceeds outward from the source (rainfall), the decision will always proceed downhill, never uphill. The path of least resistance best describes this function.  Sea level, interior drainage or the edge of the area analyzed will cause this function to terminate.

Intervisibility Functions
This GIS function is typified by the phrase LINE OF SIGHT. It is a graphic depiction of the area that can be seen from the specified target areas. Areas visible from a scenic lookout or the required overlap of microwave transmission towers can be mapped using this procedure. Intervisibility functions rely on digital elevation data to define the surrounding topography. Applications such as landscape layouts, military planning, and the obvious communication utilization is best serviced by this raster conceptualization. The output of this function is somewhat unique in that it is often displayed in a SIDE VIEW format. The vertical field of view and maximum viable distance are the component parameters.  It is a powerful tool for trial and error analysis in which the placement of objects can constantly be re-evaluated. Offshoots of this type of procedure can produce graphics that exhibit three-dimensional perspective. SHADED RELIEF IMAGES or SHADED RELIEF MODELS, along with PERSPECTIVE VIEWS are valuable presentational tools. The process called draping is used to apply another data set over this shaded depiction to further enhance presentation.

Output Functions

Map Annotation
Map Annotation is best defined by inventory.  Titles, Legends, Scale Bars, and North Arrows are all forms of Map Annotation. Functionally they are used to depict information concerning the map. The various programs available usually handle this as user input and the software does not generate it. Flexibility as to location (position), fonts, symbology, and size are varied as to the individual programs. Text labels are an important aspect of map viewing and are proprietary to the program. Sophistication is increasing and actual hard copy maps can be enhanced with secondary software applications.

Texture Patterns and Line Styles
Texture patterns and line styles are difficult to alter from program guidelines, so initial analysis of the output should be considered when choosing software.

Graphic Symbols
Graphic Symbols are used to portray the various entities depicted on the map. Some software packages provide a simple standard symbol set, but do not allow user input.  Others store them within the GIS and they can be called upon to be used as needed. Still others assign a symbol to an attribute and allow the symbology to be automatically plotted. As before, the selection of the software and its application should be carefully considered as to the output presentation needed.

___________________________________________________________________________________________________________
References

ArcGIS Desktop Help

Aronoff, Stan (1989) Geographic Information Systems: A Management Perspective, Ottawa: WDL Publications.

Clarke, Keith (1996) Getting Started with Geographic Information Systems, Upper Saddle River, NJ: Prentice Hall.

Submitted by Michael L. Hauschild.

Updated by Karen Rock, Nov. 2004