# Data Quality

## Outline

1. Map sources and abstraction
• What maps are used for
• Relative accuracy
• Scale as a limiting factor
• Map audience
• Currency
• Maps at different scales
2. Map Accuracy
• Accuracy vs. Precision
• U.S. National Map Accuracy Standard
• National Standard for Spatial Data Accuracy
• Accuracy specifications for USGS land use / land cover maps.
• USDA SCC soil maps
• Attribute data
• Sources of Error
3. Now and into the future
• Analog to digital conversion
• Remote Sensing
4. Questions

## Map Source and Abstractions

What Maps Are Used For

A large amount of GIS data is based upon maps, many of which were created without the knowledge that they would be utilized in future GIS system.

Relative Accuracy

Accuracy of these maps is determined by the mapmaker and the value of this accuracy is judged by the user of the map. For simple directions, a user may not value extreme accuracy. On the other hand, someone who is planning on setting a gas line will value accuracy. Just how accurate are the maps that we use today?

Scale as a limiting factor

Maps are abstractions or generalizations of reality. Map makers are allowed to emphasize certain parts of their maps to accomplish their goals. The users of GIS today are often not aware of the limitations of conventional maps. Let's take into consideration the effects of map scale. A typical map used in the United States is at a scale of 1:50,000. For a feature to show up on this map, it must be at least 90 ft. (25m.) long. One lane of road is typically 10 ft. wide, making only 9 lane roads visible on such a map. Most roads and streams are much smaller than this, yet are still represented on the map. This abstraction or generalization of these features affects the accuracy of the map.  The minimum discernible mark (MDM) is the smallest size something can be and still be represented at a particular scale. Following is a list of minimum discernible marks at different scales:

 Map Scale MDM 1:10,000 5m 1:50,000 25m 1:100,000 50m 1:250,000 125m 1:500,000 250m 1:1,000,000 500m

Another problem occurs when making soil and vegetation maps. A factor known as a minimum mapping unit comes into play. Minimum mapping unit asks the question of how small something is before it is not represented on the map. If a polygon map of vegetation has a polygon that is too small to show, is it eliminated or is there a point at which it is merged with other polygons?  In either case, the accuracy of the map is lowered.  There is no useful way to state the accuracy of a map.  Map inaccuracies are most often related to map scale.

Digital databases utilized in GIS mapping avoid these problems because they appear independent of scale and can easily be zoomed to depict very small areas or lines.

Map Audience

The audience that is viewing or using a map controls its complexity.  Viewers of the map may be interested in different features found on the map.  A tourist looking at a road atlas map of a state in the U.S might be looking at the simple route from point A to point B. On the other hand,  a GPS mapper may pay particular attention to detail and alternative routes between the two points.

Currency

Currency reflects the idea that maps are outdated as soon as they are created. Paper maps often became useless in a small amount of time. With the technology of GIS mapping systems, maps can be easily updated to provide a higher amount of accuracy.

Maps at Different Scales

Scales vary in a number of different applications. They can vary depending on their location in the world, the map type, and the financial situation of the area being mapped. In the United States maps of the world are generally in scales of 1:24,000 to 1:100,000. In England on the other hand, maps are used with scales down to 1:1250. Soil maps in the U.S. are generally represented on scales of 1:15,840 to 1:31,680, while vegetation and elevation maps use an even different scale range. Geology maps in the U.S. are made only if the land has financial benefits. Because of this, less than half of the U.S. has been mapped at scales of 1:250,000 or larger. Poorer countries around the world have neither topographic or thematic maps. The U.N. is presently using satellite images to help these countries keep up.

Map Accuracy

Accuracy vs. Precision.

The concepts of accuracy and precision must be understood.  Accuracy is how close to reality an observation is; while precision refers to the ability to reproduce a process or measurement.  A feature on a map may be constructed with precision, but is inaccurate in its location.

U.S. National Map Accuracy Standard

The U.S. Bureau of the Budget devised standards for map makers to insure map accuracy in 1941 and 1947.  These standards are available at the USGS website.  There are both digital and printing standards available.

The following were the standards:

1. On scales smaller than 1:20,000, not more than 10% of points tested should be more than 1/50 of an inch in horizontal error.

2. No more than 10% of the elevations (on an elevation map) tested will be in error by more than one half the contour interval.

3. Accuracy should be tested in comparison to actual survey data.

With only these three parameters, there are a lot of details missing. There is no standard for roads and road classifications, lengths, direction, etc. Rivers and streams are not tested for accuracy, along with a host of other map objects.

National Standard for Spatial Data Accuracy.

NSSDA, 1998, was created to replace the National Map Accuracy Standard of 1941/47. This is available from the Federal Geographic Data Committee (FGDC) website. This provides statistical testing methods for determining map accuracy.  Unlike the previous standards created in 1941/47 these standards utilize ground units instead of map units to determine accuracy.

Accuracy Specifications for USGS Land Use / Land Cover Maps

The USGS supplied a list of specifications for land use and land cover maps. The following is a list of guidelines that must be followed.

1. 85% is the minimum level of accuracy in identifying land use and land cover categories.

2. The several categories shown should have about the same accuracy.

3. Accuracy should be maintained between interpreters and times of sensing.

In a nutshell, this states that you can have 15% of your land use or land cover classifications incorrect and your errors can't all be on one land cover or land use type and that accuracy must be maintained between surveyors. Questions may be raised on the stringency of these specifications.

USDA NRCS soil maps

The following is a list of the specifications that must be followed when making a soil map:

1. Up to 25% of polygons may be of other soil types than that named, if this does not represent a major hindrance to land management.

2. If it does represent a hindrance, then it is up to 10%

Again, required accuracy of soil maps is questionable and may need to be revised.

A Cadastre is a database of land ownership. In the U.S. a high level of accuracy is expected for these files and maps for financial reasons. Yet in many other countries, they do not maintain legal deeds. They still resort back to land boundaries defined by metes and bounds. Metes and bounds is land marking by trees, rocks, or other natural markers. The problem with this is that the natural world is dynamic and these markers move with time. The U.S. followed the European metes and bounds system at one time, but most of the landmarks have disappeared with time and property descriptions have had to be updated.

Attribute Data

Most attribute data is obtained through census tracking. The problem with census data is that the methods by which it is obtained is sometimes under scrutiny. Today much of the census data is based on sampling of populations.  By using statistical strategies, census data is calculated by averaging the number of people seen in an area and multiplying it by the size of the area. This will give you a fairly accurate reading of the number of people living in the area, but it does not take into account every person living there. Many people simply don't want to be counted. For example there are people living here illegally, people dodging the government, etc. You also have to take into account the fact that the counting is done by actual people which adds the element of human error. Some areas are often under counted because of a rush by the census takers to get out of the area. This results in maps and databases that may not depict accurate populations.

Attribute data is also temporal, or easily outdated. A census is only taken once every 10 years. With population data being so dynamic, 10 years allows for many changes. Relying on a map of populations that is 5+ years old could result in some complications that would not be desirable to the person who assumes the maps are accurate.

Sources of Error

Map errors can be traced to many different sources.  Some common errors are due to problems with the following:   source data, poor conceptual representation of reality, data processing, data output, data conversion and data encoding.

Now and Into the Future

Analog to Digital Conversion

When converting a map from analog to digital form, some of the accuracy is lost. Digitizing a map produces a caricature of the line on the map, not a precise representation.  Human error can occur during the digitizing process. Digitizing a large map can take hours to complete; due to exhaustion and fatigue, many errors can be made by the technician.

Remote Sensing

Although remote sensing is state-of-the-art technology, it too can cause considerable errors when reporting data to a GIS. Remote Sensing works best if the cells being read are pure, meaning that the system is reading large areas of a single cover type. This is rarely the situation. The Earth's surface is very diverse, and there is a high probability that more than one cover type is contributing to the reflectance value. Remote sensing has many classification types such as parallelepiped, which try to look at single pixels with different cover types represented in them and classify them as one. With remote sensing it is often difficult to achieve greater than 85% accuracy.

Metadata is most simply described as data about data.  This is important in determining the source, scale, accuracy and intended audience of a particular data set.  The Federal Geographic Data Committee has a list of metadata standards on their website.   In the field of GIS, where data layers utilized in one application can come from multiple sources it is important to keep track of data specifications.  ESRI software realized the importance of metadata and included templates with the release of their new ArcGIS technology.  These templates allow GIS data "producers" and "consumers" to easily access information about their data.

Questions

1. Explain the idea of a minimum discernible mark how it can lead to problems when making a digital map from a conventional map.

2. Explain what a minimum mapping unit is and how it can be a negative factor when digitalizing a map.

3. Why is so little of the U.S. represented with geology maps?

4. Why are cadastre maps and databases so important in the United States compared to lesser developed countries?

5. Explain why the accuracy of remote sensing is less than the technology suggests when it comes to map making.

updated by Julie Holtz, 2004.