GIS Data Formats and Data Conversion


Outline

  1. Vector Formats
  2. Raster Formats
  3. Data Conversion
  4. Data Standards
  5. Questions
  6. References

 

Vector Formats

Hardware Specific Formats

There are two types of formats, those that preserve and use the actual ground coordinates of the data and those that use alternative page coordinate description of the map. Page Coordinates are used when a map is being drafted for display in a computer mapping program or in the data display module.  In the late 1970s, programs came out that were device independent.

The Hewlett-Packard Graphics Language (HPGL) is a page description language designed for use with plotters and printers. Each line of the file contains one move command, so a line segment connects two successive lines or points. It is unstructured and does not store or use topology.

PostScript

PostScript is a page definition language that is usually used to export or print a map rather than data. It supports graphics in both vector and raster formats. Postscript is used commonly by Adobe, and most printers are able to read it.

Digital Exchange Format (DXF)

DXF is an external format for transferring files between computers or between software packages. It is produced by Autocad. It does not have topology, but offers good detail on drawings, line widths and styles, colors, and text. DXF is typically constructed in 64 layers. Each layer consists of different features; allowing the user to separate features.

Omaha Public Power District uses this kind of software. It is a turn-key system with street and power line layers. The problem is that you can not tell what street the power line is on or closest to because it lacks topology and spatial analysis.

Digital Line Graph (DLG)

DLGs are distributed by the government, and are available at 1:100,000 and 1:24,000 scales. Features are in separate files that most GIS packages will import, although extra data manipulation is often necessary. DLGs consist of line work with the contours removed, therefore elevation is not available.

TIGER

TIGER format was first distributed by the US. Census Bureau in 1990. It includes block level maps of every village, town, and city in the United States. It includes geocoded block faces with address ranges of street numbers. This means than that they include topology and can address match. The maps are a combination of DLG and DBF/DIME files. They used the 1980 Census Bureau's maps along with the USGS's DLG maps, thus combining urban and nonurban areas.

TIGER consist of an arc/node type arrangement with separate files for points (zero cells), lines (one cells) and areas (two cells) that are linked together by cross-indexing. Cross-indexing means some features can be encoded as landmarks that allow GIS layers to be tied together.

Shapefile

A shapefile is a vector data format for storing the location, shape, and attributes of geographic features. A shapefile is stored in a set of related files and contains one feature class.

Scalable Vector Graphics

An SVG is an image that is an extension of the XML language. Any program that recognizes XML can display the SVG image. The scalable part of the term emphasizes that you can zoom- in on an image and not lose resolution. SVG files also have the advantages of being smaller, and arriving faster, than conventional image files such as GIF, PDF, and JPEG.

Arc-Info Coverage

This is a data model for storing geographic features using ArcInfo software. A coverage stores a set of thematically associated data considered to be a unit. It usually represents a single layer, such as soils, streams, roads, or land use. In a coverage, features are stored as both primary features (points, arcs, polygons) and secondary features (tics, links, annotation). Feature attributes are described and stored independently in feature attribute tables. Coverages cannot be edited in ArcGIS.

Arc-Info Interchange File (.e00)

An ArcInfo interchange file, also known as an export file, is a file format used to enable a coverage, grid or TIN and an associated INFO table to be transferred between different machines. ArcInfo interchange files have a .e00 extension, which increments to .e01, .e02, and so on, if the interchange file is composed of several separate files.

GeoDatabase

A geo-database is an object-oriented data model that represents geographic features and attributes as objects and the relationships between objects but is hosted inside a relational database management system. A geodatabase can store objects, such as feature classes, feature data sets, nonspatial tables, and relationship classes.

Raster Formats

Standard Raster Format

Many of the formats are based on photographic formats. The file structure has a header with a fixed length and a keyword or "magic number" to identify the format. In the header the length of one record in bits and the number of rows and columns can be found. Often the header also has a color table. This explains what colors to project.

Tagged Image File Formats (TIFF)

This format is associated with scanners. It saves the scanned images and reads them. TIFF can use run length and other image compression schemes. It is not limited to 256 colors like a GIF.

GEO-TIFF

As part of a header in a TIFF format it puts Lat/Long at the edges of the pixels.

Graphic Interchange Format (GIF)

Graphic Interchange Format. A file format for image files, commonly used on the Internet. It is well-suited for images with sharp edges and relatively few gradations of color.

Joint Photograph Experts Group (JPEG)

JPEG is a common picture format. It uses a variable-resolution compression system offering both partial and full resolution recovery.

DEM

Digital Elevation Models or DEM have two types of displays. The first is 30-meter elevation data from 1:24,000 seven-and-a-half minute quadrangle map. The second is the 1:250,000 3 arc-second digital terrain data. DEMs are produced by the National Mapping Division of USGS.

Band Interleaved by Pixel (BIP), Band Interleaved by Line (BIL)

BIP and BIL are formats produced by remote sensing systems. The primary difference among them is the technique used to store brightness values captured simultaneously in each of several colors or spectral bands.

RS Landsat

Landsat satellite imagery and BIL information are used in RS Landsat.  In one format, using BIL, pixel values from each band are pulled out and combined. Programs that use this kind of information include IDRISI, GRASS, and MapFactory. It is fairly easy to exchange information from within these raster formats.

Data Conversion

Raster-to-Raster & Vector-to-Vector

There are many types of vector formats used in GIS, and even more raster formats. It is often necessary to change between file formats, even if they are both raster, or both vector, to make data sets useable together. There are many free, and commercial, translator and converter software available on the web. Some GIS programs support this type of conversion also; for example, the conversion tool available in ArcGIS can be used to switch between a number of formats.

Raster-to-Vector & Vector-to-Raster

Moving from vector to raster is not that difficult. A line or polygon is simply given a pixel value. The opposite is not true though. The problem is that one line might be several pixels wide, therefore one has to skeletonize the line, often leaving it very jagged. This is a time consuming and complicated procedure. Sometimes it is impossible to exchange, and one cannot move between the formats. If this is the case, the map has to be re-digitized. In other instances, there is just a poor translation, and data is lost in the exchange.

Data Standards

Government Standards

The Federal Information Processing Standard 173, called the spatial data transfer standard (SDTS), was established for the exchange of data between different formats. It is extremely complicated because it has to produce a bibliography, a terminology, and a complete list of geographic and map features. It also has to address the issue of data accuracy.

Industry Standards

Two major points can be made about the industry. The first is that none of the industry standards exchange topology with the data; they only transfer the graphic information. The second point is that with many different formats each package has to include a large number of format translators.

Open GIS Consortium

The Open Geospatial Consortium, Inc. (OGC) is a non-profit, international, voluntary consensus standards organization that is leading the development of standards for geospatial and location based services. Through member-driven consensus programs, OGC works with government, private industry, and academia to create open software application programming interfaces for geographic information systems (GIS) and other mainstream technologies.

GML or Geography Markup Language is an XML based encoding standard for geographic information developed by the OpenGIS Consortium (OGC). The objective is to allow internet browsers the ability to view web based mapping without additional components or viewers.

Questions

  1. List and describe three types of vector format.
  2. Which vector formats have been created by the government?
  3. List and describe three types of raster format.
  4. Describe raster to vector and vector to raster conversion.
  5. What is the overall problem facing the industry regarding formats?
  6. Why do we have so many types of data formats, and why are they so difficult to use together?

References

Clarke, Keith (1996) Getting Started with Geographic Information Systems, Upper Saddle River, NJ: Prentice Hall, pages 86-96.

The ESRI GIS Glossary

www.geocomm.com

www.gislounge.com


Submitted by Amy Schaeufele, March 10, 1998. Revised by Aaron Quinn, Nov. 2004.