Geoscience Data Analysis and Modeling

Week 1: Introduction to course

Outline: A. Introduction. B. Thoughts on data. C. Precision and Accuracy. D. Exercise 1. E. Databases and spreadsheets. F. Modeling in Excel. G. Grain size measures.


Readings for next lecture:

Chapt. 1. Data Collection and Preparation in Sandilands & Swan, 1995 Introduction to Geological Analysis; Blackwell Science, p. 1-12. This is an intoduction into the basic statistics you will begin to tackle next week. Please read by next week. Take notes.

Chapt. 1 Examples and Principles, in Paulos, John A., 1988, Innumeracy - Mathematical Illiteracy and Its Consequences, Hill and Wang, New York., 135 p. This is a fun introduction into why the skills you will learn in this course are important. Just breeze through this. Read this if you have the time and desire.


Some intro thoughts for the course:

The democratization of science through widely available computing power: We live in a time where almost any employed person can own a supercomputer if they so desire. Even the typical desktop PC has the capacity to do some pretty heavy duty analysis. This puts the power of analysis into many hands, instead of having it housed with a privileged few, as was the case just a decade ago. You have opportunities and power not available to your predecessors of decades ago. Combine this with the ability to acquire or mine data from and disseminate results via the web and again possibilities exist that did not just a decade or two ago. Part of the purpose of this course is to give you the skills you need to explore and make good use of these new possibilities. Feedback suggests these are skills you will definitely use in your careers. This includes the skill of knowing not only what buttons to push or boxes to check, but also of interpretating the results, and recognizing when the results don't make sense.

Familiarity breeds comfort: There is a growing need for quantitative literacy in the earth sciences as the discipline continues to evolve and mature. There can be a major emotional component to how some students approach math. Some initially fear or struggle with math. Interestingly, many earth science majors seem to take to computers quite readily (an impression of mine, not backed by hard data)! This course is based on the realization that both quantitative and computer skills are crucial to earth science and is dedicated to the premise that, taught with a geoscience perspective, such material can become engrossing to earth science majors as they learn to better see and navigate the world that fascinates them. However, recognizing the variable background students bring to this course, we will start from a high school level of math and build from there.

Statistics as a basic life skill: Somewhere inside our brains are statistical routines of some type. They are clearly fundamentally different from those encoded in Excel software, which is probably a good thing for both parties . We use our 'intuitive statistical routines' in conducting risk assesment all the time. In Omaha the classic example is - can I make it through this yellow light. When hiking the risk might be ­ will I get sick if drink this water. . When climbing a mountain side the assessment might be of the stability of a hand hold. The basis of this internal risk assessment is in part the statistics of past experience and data input of the moment. The biophysics is partly in how past experience rewires our brain, through a combination of event repetition and associated pleasure or pain. If several of the recent holds on a particular climb had given way (crappy rock) then one makes extra sure the rope is secure. Natural selection has favored those people with better internal routines. Sometimes it doesn't work (Hey yaall - watch this!). Physical evolution hasn't permitted development of appropriate hardwired routines for dealing with the modern technologic world, and indeed the pace of technologic changes does not permit this. Cultural evolution and the growth of knowledge has developed to meet the need through the development of logic, statistics, decision science, and common sense. However, the nature of the universe assures us that even if statistical analysis provides a 99% confidence of success, it is still a risk and you have the 1% of failures. Failure could be a bad analytic routine, or it could be just plain bad luck. The Darwin awards highlight spectacular failures in risk assessment routines, otherwise usually known as intense stupidity, a.k.a. a bad analytic routine. Statistics is very much rooted in the real world, is fundamental to decision making and risk assessment, and is accessible in some form or another to the great majority of people.


Data

Data is information that you can see patterns in, that you can draw generalizations from, that you can use to test ideas. Data is a basic currency of science, it is highly valued. But to understand its worth you must understand exactly how it was produced. Some data is much more valuable than other data. Data can be worthless in certain contexts. Context, or the questions of interest, are of course crucial. The value of the data is determined by the questions you hope to answer with it, by the type of insights you hope to gain from it. The concept of defines data is also fairly complex. One can think of primary and secondary data. Conclusions based on one set of data, can then become data in another study. Isotope ratios measured in a mass spectrometer, are used to estimate/model the time of crystallization. The age of crystallization becomes a data point when looking at a plot of igneous activity with time.

Basic types of data:

What are fundamental measures in physical science:

Common types of data in geoscience (this list is far from exhaustive):

Sampling plans and protocols: This is extremely important. It is often what determines the value of your data. A bad sampling plan results in worthless data. The sampling plan is highly dependent on the situation and on the questions being addressed. We will discuss sampling plans as we go along in this course. Some basic types of spatial sampling plans are: grids, random, clustered, traverse. For the non-random patterns many variations exist. For example, for the traverse you can have regular spacing, or exponentially increased spacing. Consider the different types of different temporal sampling plans that might exist.


Precision and accuracy of data.

What is the difference?

propagation of error: If the precision of your numbers is crucial (say you are testifying in court), then you might explore the following exercise.


Exercise 1: Examples of data analysis in the literature.

The purpose of this exercise is to: a) familiarize yourself with journals that can provide data, and, b) get a feel for how data is treated in the profession.

Peruse a professional journal of geoscience flavor and choose an article that includes analysis of some distinct type of geoscience data. A great variety of journals are available in the library and on line. The American Geophysical Union is especially good with data-rich online journals. If after spending a half an hour in the library without finding something appropriate, come see me.

Prepare a short word-processed report for the next class upon the article you chose that specifically includes the following elements:

You will be asked to give a 2-3 minute summary to the class next time. Bring a copy of the data and/or graphs to illustrate your summary.


Introduction to databases and spreadsheets (specifically to Excel).

What is a database?

First introduction to Excel:

Column 1 Column 2 Column 3 Column 4
Row 1 Cell B2 Cell C2 Cell D2
Row 2 Cell B3 Cell C3 Cell D3
Row 3 Cell B4 Cell C4 Cell D4
Row 4 Cell B5 Cell C5 Cell D5
Sample # grain size (cm) grain type grain shape
1A 1.5 granite rounded
2A 1 granite subrounded
3A 2 carbonate subangular
4A 1.75 quartzite subrounded

Example of part of an Excel sheet. The top part shows the basic architecture of cells at the intersection of rows and columns. Each cell is referred to by a letter that denotes which vertical column it is in, followed by a number that designates the horizontal row it is in. Typically, each column is measurements of some variable from a series of samples, and each row are measurements of the variables from a given sample or entity. Traditionally, the first row contains variable labels, and the first column contains sample labels, while the rest of the cells contain the appropriate values. Below is a simple example.


An example of using Excel - Modeling mountain roots due to isostasy.

Isostasy is a very important concept in geophysics and earth science. It is the simple idea that the crust floats. Floating in equilibrium means that there is a compensation level, a level at which the weights of all the overlying columns are the same. If it isn't then the differential vertical loads causes fluid movement from the high to low pressure areas. Isostasy results in crustal uplift and/or subsidence in response to a change in crustal load. Gravity studies show us that much of the earth's crust is close to isostatic equilibrium. When equilibrium exists we can set up the following equality.

xm pc + xr pc = xr pm , where = the average crustal mountain elevation above sea level, = the average density of continental rock, = the column root depth of continental material. = the average density of mantle material.

This formula simply says the mantle material displaced must be equal in weight to that continental material in the root and the mountain. We can then input a range of mountain elevations and model what the root should be. We can also explore how different model densities would effect the results. The table below is taken from an Excel sheet where the 0.2 value is in the A2 cell. The formula for C2 is given. Below is the Excel graph depicting the results. Note that what we have done here in order to better visualize things is to, first, make the root negative, and second, add in the 35 km of normal crustal thickness. Thus the chart shows how the root grows as topography does, and what the effect of density variation is. If you set it up right, then simply changing the density value in the the mantle cell, will cause the graph to change in an appropriate manner, and one can explore the signficance of various variables in the model. You may obtain a copy of the appropriate excel sheet from me upon request.

Xm in km xr in km # 1 xr in km #2 Xr in km # 3
0.2 -35.7 -35.9 -36.0
0.5 -36.8 -37.2 -37.5
1 -38.6 -39.4 -39.9
2 -42.2 -43.8 -44.8
3 -45.7 -48.3 -49.7
4 -49.3 -52.7 -54.6
5 -52.9 -57.1 -59.5
       
"=-(A2*A$12)/(3.25-A$12)-35"      
cont. density mantle density    
2.54 3.25 #1 model  
2.65 3.25 #2 model  
2.7 3.25 #3 model  

What else can you model with this general model for isostasy?

The point is Excel can be a good place to numerically model and then visualize models of geologic and other phenomena.

In class exercise - exploring the measurements of grain size.


Recommended reading:

Schumm, S. A., 1991, To Interpret the Earth, Ten ways to be wrong; Cambridge University Press, 133 p. This is one of the best introductions to the philosophy of science and common pitfalls in reasoning, told humorously and from deep familiarity and extensive experience. It takes some thinking to get through though.

Vacher, H. L., 1998, Computational geology 1 - Significant Figures!; Journal of geoscience Education, v. 46, p. 292-295. It is amazing how many texts do not really address significant figures.


Copyright by Harmon D. Maher Jr.. This material may be used for non-profit educational purposes if proper attribution is given. Otherwise please contact Harmon D. Maher Jr.