Geoscience Data Analysis and Modeling

Week 1: Introduction to course

Outline: A. Introduction. B. Thoughts on data. C. Precision and Accuracy. D. Exercise 1. E. Databases and spreadsheets. F. Modeling in Excel. G. Grain size measure. H. Professional Behavior.


Suggested readings for next lecture
:

Chapt. 1. Data Collection and Preparation in Sandilands & Swan, 1995 Introduction to Geological Analysis; Blackwell Science, p. 1-12. This is an introduction into the basic statistics you will begin to tackle next week. Please read by next week. Take notes.

Chapt. 1 Examples and Principles, in Paulos, John A., 1988, Innumeracy - Mathematical Illiteracy and Its Consequences, Hill and Wang, New York., 135 p. This is a fun introduction into why the skills you will learn in this course are important. Read this if you have the time and desire.


Some introductory thoughts for the course:

The democratization of science through widely available computing power: We live in a time where almost any employed person can own a supercomputer if they so desire. Even the typical desktop PC has the capacity to do some pretty heavy duty analysis. This puts the power of analysis into many hands, instead of having it housed with a privileged few, as was the case just a decade ago. You have opportunities and power not available to your predecessors of decades ago. Combine this with the ability to acquire or mine data from and disseminate results via the web and again possibilities exist that did not just a decade or two ago. Part of the purpose of this course is to give you the skills you need to explore and make good use of these new possibilities. Feedback from past students suggests these are skills you will definitely use in your careers. This includes the skill of knowing not only what buttons to push or boxes to check, but also of interpreting the results, and recognizing when the results don't make sense (avoiding the Black Box Syndrome).

Familiarity breeds comfort: There is a growing need for quantitative literacy in the earth sciences as the discipline continues to evolve and mature. There can be a major emotional component to how some students approach math. Some initially fear or struggle with math. Interestingly, many earth science majors seem to take to computers quite readily (an impression of mine, not backed by hard data)! This course is based on the realization that both quantitative, computer, and conceptual skills are crucial to earth science and is dedicated to the premise that, taught with a geoscience perspective, such material can become engrossing to earth science majors as they learn to better see, understand, and navigate the world that fascinates them. However, recognizing the variable background students bring to this course, we will always start from a high school level of math and build from there. If at any point you do not understand, ask questions!

Statistics as a basic life skill: Somewhere inside our brains are statistical routines of some type. They are clearly fundamentally different from those encoded in Excel software, which is probably a good thing for both parties . We use our 'intuitive statistical routines' in conducting risk assessment all the time. In Omaha the classic example is - can I make it through this yellow light. When hiking the risk might be ­ will I get sick if drink this water. When climbing a mountain side the assessment might be of the stability of a hand hold. The basis of this internal risk assessment is in part the statistics of past experience, data input of the moment, and emotions. The biophysics is partly in how past experience rewires our brain, through a combination of event repetition and associated pleasure or pain. If several of the recent holds on a particular climb had given way (crappy rock) then one makes extra sure the rope is secure or abandons the climb. Natural selection has favored those people with better internal routines. Sometimes it doesn't work (Hey yaall - watch this!). Physical evolution hasn't permitted development of appropriate hardwired routines for dealing with the modern technologic world, and indeed the pace of technologic changes does not permit this. Cultural evolution and the growth of knowledge has developed to meet the need through the development of logic, statistics, decision science, and common sense. However, the nature of the universe assures us that even if statistical analysis provides a 99% confidence of success, it is still a risk and you have the 1% of failures. Failure could be a bad analytic routine, or it could be just plain bad luck. The Darwin awards highlight spectacular failures in risk assessment routines, otherwise usually known as intense stupidity, a.k.a. a bad analytic routine. Statistics is very much rooted in the real world, is fundamental to decision making and risk assessment, and is accessible in some form or another to the great majority of people. May all your failures be due to bad luck and not bad decisions.


Data

Data comes in a great variety of forms. Data is information that you can see patterns in, that you can draw generalizations from, that you can use to test ideas. Data is a basic currency of science and is highly valued. However, to understand its worth you must understand exactly how it was produced. Some data is much more valuable than other data. Data can be worthless in certain contexts. Context, or the questions of interest driving a research effort, are of course crucial. The value of the data is determined by the questions you hope to answer with it. There is a very interesting example where 'noise' in data, in this particular case seismic noise recorded by an earthquake seismometer, becomes data itself, when the right question is asked, which in this case was whether the source of the noise, crashing waves on a shoreline nearby, had changed in magnitude with time. The concept of what defines data is also fairly complex. One can think of primary and secondary data. Conclusions based on one set of data, can then become data in another study. Isotope ratios measured in a mass spectrometer, are used to estimate/model the time of crystallization. The age of crystallization becomes a data point when looking at a plot of igneous activity with time.

Basic types of data:

What are fundamental measures in physical science:

Common types of data in geoscience (this list is far from exhaustive):

Sampling plans and protocols: Sampling plans/protocols are extremely important. It is often what determines the value of your data. A bad sampling plan results in worthless data. The sampling plan is highly dependent on the situation and on the questions being addressed. We will discuss sampling plans as we go along in this course. Some basic types of spatial sampling plans are: grids, random, clustered, traverse. For the non-random patterns many variations exist. For example, for the traverse you can have regular spacing, or exponentially increased spacing. There is not only the spatial aspect to consider, where to sample, but also the temporal aspect to consider, when to sample. Consider the different types of different temporal sampling plans that might exist and when and why you might use them.


Precision and accuracy of data.

What is the difference?

Propagation of error: Knowing the precision of your numbers is crucial (imagine you are testifying in court). The exercise listed below is one example of how to approach significant numbers and propagation of error.


Exercise 1: Examples of data analysis in the literature.

The purpose of this exercise is to: a) introduce how data is treated in professional forums, and b) begin to familiarize yourself with journals and sources that can provide information and data. You probably realize in some fashion that one reads science in a different way than you would read a novel. It might help to be more explicit, and this link leads you to a one-page description of how to read a scientific journal article.

First you need to find an example of a journal article that analyzes some data. There are at least two routes suggested: 1) Peruse a professional journal of geoscience flavor and choose an article that includes analysis of some distinct type of geoscience data. A great variety of journals are available in the library and on line. The American Geophysical Union is especially good with data-rich online journals, which you can get full copies of in the library. 2) Pick a geoscience topic of interest to you (e.g. Hawaiian eruptions) and use Google, Google Scholar, or another search engine to find relevant articles online. Sometimes you won't be able to get full copies of the articles from the journal sites themselves. However, in some cases authors are allowed to post copies of the article as pdfs on their own websites, which they do (since they are interested in having other people read their stuff), and in this way you can get copies of the entire article. If after spending a half an hour in the library or online without finding something appropriate, please come contact me, and I can provide you with one.

Prepare a short word-processed report upon the article you chose that specifically includes the following elements:

I realize that many of you are just starting in geology, and professional geology journal articles will be challenging for you to read and understand. Concentrate on the portion that you do understand, and ask me or others about the part you don't understand. It is not expected that all will be clear to you, but you can still learn a lot.

Submit your report in Blackboard in the assignment function as a Word document or pdf. It should be written in full sentences and free of spelling and grammatical mistakes.


Introduction to spreadsheets (specifically to Excel) and to databases

What is a database?

First introduction to Excel:

Column A Column B Column C Column D
Row 1 Cell B2 Cell C2 Cell D2
Row 2 Cell B3 Cell C3 Cell D3
Row 3 Cell B4 Cell C4 Cell D4
Row 4 Cell B5 Cell C5 Cell D5
Sample # grain size (cm) grain type grain shape
1A 1.5 granite rounded
2A 1 granite subrounded
3A 2 carbonate subangular
4A 1.75 quartzite subrounded

Example of part of an Excel sheet. The top part shows the basic architecture of cells at the intersection of rows and columns. Each cell is referred to by a letter that denotes which vertical column it is in, followed by a number that designates the horizontal row it is in. Typically, each column contains measurements of some variable from a series of samples, and each row are measurements of the variables from a given sample or entity. Traditionally, the first row contains variable labels, and the first column contains sample labels, while the rest of the cells contain the appropriate values.


An example of using Excel to model - depth mountain roots due to isostasy.

In addition to being a simple repository for data, Excel has significant ability to analyze, visualize and model data. Excel is limited in what it can model, but it is also quite accessible. Below is one example focused on isostasy.

Isostasy is a very important concept in geophysics and earth science. It is the simple idea that the crust floats. Floating in equilibrium means that there is a compensation level, a level at which the weights of all the overlying columns are the same. If the weights aren't the same then the differential vertical loads causes fluid movement from the high to low pressure areas until they even out. Isostasy results in crustal uplift and/or subsidence in response to a change in crustal load. Naturally, geologic response rates are slow. Gravity studies show us that much (but not all) of the earth's crust is close to isostatic equilibrium. When equilibrium exists we can set up the following equality.

xm pc + xr pc = xr pm

where xm = the average crustal mountain elevation above sea level, pc= the average density of continental rock, xr= the root depth of continental material. pm = the average density of mantle material. The root depth of continental material is added to the normal continental thickness at sea level (here assumed to be 35 km).

This formula simply says the mantle material displaced must be equal in weight to that continental material in the root and the mountain. We can then input a range of mountain elevations and model what the root should be. We can also explore how different model densities would effect the results. The table below is taken from an Excel sheet where the 0.2 value is in the A2 cell. The formula for C2 is given. Below is the Excel graph depicting the results. Note that what we have done here in order to better visualize things is to, first, make the root negative, and second, add in the 35 km of normal crustal thickness. Thus the chart shows how the root grows as topography does, and what the effect of density variation is. If you set it up right, then simply changing the density value in the the mantle cell, will cause the graph to change in an appropriate manner, and one can explore the significance of various variables in the model. A version of the sheet can be downloaded here.

Xm in km xr+35 in km # 1 xr+35 in km #2 Xr+35 in km # 3
0.2 -35.7 -35.9 -36.0
0.5 -36.8 -37.2 -37.5
1 -38.6 -39.4 -39.9
2 -42.2 -43.8 -44.8
3 -45.7 -48.3 -49.7
4 -49.3 -52.7 -54.6
5 -52.9 -57.1 -59.5
       
"=-(A2*A$12)/(3.25-A$12)-35"      
cont. density mantle density    
2.54 3.25 #1 model  
2.65 3.25 #2 model  
2.7 3.25 #3 model  

Example of input and computation cells above, and output graph below.

What else can you model with a general model for isostasy?

The same basic phenomena of floating occurs with freshwater floating on top of salt water. You could therefore use a similar approach to model how changing the elevation of the groundwater table (e.g. by well withdrawal and development of a draw-down cone) would change the configuration of the fresh-salt water divide at depth. This could help model/understand the common phenomena with coastal aquifers of salt water incursion.

An example of using Excel - modeling differential compaction.

What is differential compaction? As new sediments pile up on top of previously deposited sediment, the underlying sediment compacts. The compaction is associated with squeezing the water out, rearranging grains, crushing grains, and changes in the mineralogy. This means that several feet of mud can end up being one foot of shale, the compacted and lithified equivalent of mud. Different sediments compact to different degrees, with a general trend of more clay rich and finer grained sediments tending to compact more, and a good clean quart sandstone compacting very little. Given that lateral compositional variation is common in sediments then one expects differential compaction, which in turn will deform the existing overlying beds. Differential compaction at depth will also produce differential subsidence at the surface, and can influence sediment accumulation at the surface. Differential compactions produces an array of structures, often subtle, but not always, that can be quite important for understanding fluid migration and pressure development. These structures include gentle folds, fracture systems, and in some cases faults. Differential compaction can also occur due to water or oil withdrawal, producing surface depressions and even reactivating shallow faulting (USGS site on detecting subsidence from space). Differential compaction during construction is also a real challenge with real costs associated with it.

Faults that are attributed to differential compaction. This is a seismic section through part of the Chesapeake impact structure. Faulting is interpreted to have occurred due to uneven compaction of post-impact sediment over the underlying topography. Image source: http://woodshole.er.usgs.gov/epubs/bolide/faulting.html

A simple model in Excel for differential compaction. Since Excel can reference cells you can basically transform one array of values into another and in this way produce a model with input and output. In this particular case we have an input array of values where successive columns represent some horizontal distance along some bed and rows represent stacked beds. The value in each cell represents the amount of compaction that the sediment at that point undergoes. A 1 means no compaction - the sediment retains its initial thickness. Sandstones under certain conditions would be close to a value of 1. A .5 would mean that it compacts to half its original thickness, and so on. Muds can realistically have values of .5. Note that in the input example below that there is a general increase with depth. This would be reasonable given that more sediment overhead means more compaction. However, one can experiment with different assumptions and values, which is the point of this exercise. In the input matrix example below one can also see the lateral variation. The suite of ones that extend to depth are meant to capture a sandstone channel within more compressible adjacent flood plain sediments. Another anomaly at X13 and YD could represent collapse over a void. You can see in the graphical representation of the geometry of beds after compaction, that a subtle asymmetric anticline developed where the channel sandstones are. Note that the anticline does not persist at depth. By inputting different geometries that mirror different geologic situations you can explore the array of differential compaction features that can result. Please ask if you would like a copy of the Excel sheet that contains this model. You can think of many additions and improvements you can make to this model, and some rather sophisticated models exist for modeling basin sedimentation that include compaction. An obvious improvement would be to increase the resolution by increasing the number of cells. One quickly reaches the limit of what Excel can do in constructing these models, and so not surprisingly most modeling software does not use an Excel platform.

In this very simple model the above Excel cells represent a matrix input of compaction values that represents a cross section view. Imagine these values as depicting a crude channel form. The graph below shows the geometry of initial straight lines after compaction has occurred.

These are very simple examples of how software can be used to explore, learn about, and visualize models of geologic and other phenomena.


In class exercise - exploring the measurements of grain size.


Professional Conduct

In addition to the various 'hard', scientific skills that are the focus of this course, there are 'soft', people skills that bear mentioning and some exploration. Science is increasingly collaborative, and there is also the crucial necessity of conveying science to various stakeholders. For these reasons these skills are very important. Professional conduct is an essential part of those needed soft skills and is tied to credibility, which is basically a measure of how much trust others have in you and your work. You can't save or improve lives if people won't listen.

What constitutes professional behavior for geologists, and why is it important?

Some resources that address that concern:


Additional recommended reading:

Schumm, S. A., 1991, To Interpret the Earth, Ten ways to be wrong; Cambridge University Press, 133 p. This is one of the best introductions to the philosophy of science and common pitfalls in reasoning, told humorously and from deep familiarity and extensive experience. It takes some thinking to get through though.

Vacher, H. L., 1998, Computational geology 1 - Significant Figures!: Journal of Geoscience Education, v. 46, p. 292-295.


Copyright by Harmon D. Maher Jr.. This material may be used for non-profit educational purposes if proper attribution is given. Otherwise please contact Harmon D. Maher Jr.