**Outline: A. Introduction.
B. Thoughts on data. C. Precision
and Accuracy. D. Exercise 1. E.
Databases and spreadsheets. F. Modeling in Excel. G. Grain
size measure. ****H. Professional Behavior**.

Suggested readings for next lecture

Chapt. 1. Data Collection and Preparation in
Sandilands & Swan, 1995 Introduction to Geological Analysis;
Blackwell Science, p. 1-12. This is an introduction into the basic
statistics you will begin to tackle next week. ** Please read
by next week.** Take notes.

Chapt. 1 Examples and Principles, in Paulos, John A., 1988, Innumeracy - Mathematical Illiteracy and Its Consequences, Hill and Wang, New York., 135 p. This is a fun introduction into why the skills you will learn in this course are important. Read this if you have the time and desire.

__The democratization of science through widely
available computing power:__ We live
in a time where almost any employed person can own a supercomputer
if they so desire. Even the typical desktop PC has the capacity
to do some pretty heavy duty analysis. This puts the power of
analysis into many hands, instead of having it housed with a privileged
few, as was the case just a decade ago. You have opportunities
and power not available to your predecessors of decades ago. Combine
this with the ability to acquire or mine data from and disseminate
results via the web and again possibilities exist that did not
just a decade or two ago. Part of the purpose of this course is
to give you the skills you need to explore and make good use of
these new possibilities. Feedback from past students suggests these are skills you
will definitely use in your careers. This includes the skill of
knowing not only what buttons to push or boxes to check, but also
of interpreting the results, and recognizing when the results
don't make sense (avoiding the Black Box Syndrome).

__Familiarity breeds comfort:__ There is a growing need for quantitative literacy
in the earth sciences as the discipline continues to evolve and
mature. There can be a major emotional component to how some students
approach math. Some initially fear or struggle with math. Interestingly,
many earth science majors seem to take to computers quite readily
(an impression of mine, not backed by hard data)! This course
is based on the realization that both quantitative, computer, and conceptual
skills are crucial to earth science and is dedicated to the premise
that, taught with a geoscience perspective, such material can
become engrossing to earth science majors as they learn to better
see, understand, and navigate the world that fascinates them. However, recognizing
the variable background students bring to this course, we will
always start from a high school level of math and build from there.** If at any point you do not understand, ask questions! **

__Statistics as a basic life skill:__ Somewhere inside our brains are statistical routines
of some type. They are clearly fundamentally different from those
encoded in Excel software, which is probably a good thing for
both parties . We use our 'intuitive statistical routines' in
conducting risk assessment all the time. In Omaha the classic example
is - can I make it through this yellow light. When hiking the
risk might be will I get sick if drink this water. When
climbing a mountain side the assessment might be of the stability
of a hand hold. The basis of this internal risk assessment is
in part the statistics of past experience, data input of the
moment, and emotions. The biophysics is partly in how past experience rewires
our brain, through a combination of event repetition and associated
pleasure or pain. If several of the recent holds on a particular
climb had given way (crappy rock) then one makes extra sure the
rope is secure or abandons the climb. Natural selection has favored those people with
better internal routines. Sometimes it doesn't work (Hey yaall
- watch this!). Physical evolution hasn't permitted development
of appropriate hardwired routines for dealing with the modern
technologic world, and indeed the pace of technologic changes
does not permit this. Cultural evolution and the growth of knowledge
has developed to meet the need through the development of logic,
statistics, decision science, and common sense. However, the nature
of the universe assures us that even if statistical analysis provides
a 99% confidence of success, it is still a risk and you have the
1% of failures.** Failure could be a bad analytic routine, or
it could be just plain bad luck.** The Darwin awards highlight
spectacular failures in risk assessment routines, otherwise usually
known as intense stupidity, a.k.a. a bad analytic routine. Statistics
is very much rooted in the real world, is fundamental to decision
making and risk assessment, and is accessible in some form or
another to the great majority of people. May all your failures be due to bad luck and not bad decisions.

Data comes in a great variety of forms. Data is information that you can see patterns in, that you can draw generalizations from, that you can use to test ideas. Data is a basic currency of science and is highly valued. However, to understand its worth you must understand exactly how it was produced. Some data is much more valuable than other data. Data can be worthless in certain contexts. Context, or the questions of interest driving a research effort, are of course crucial. The value of the data is determined by the questions you hope to answer with it. There is a very interesting example where 'noise' in data, in this particular case seismic noise recorded by an earthquake seismometer, becomes data itself, when the right question is asked, which in this case was whether the source of the noise, crashing waves on a shoreline nearby, had changed in magnitude with time. The concept of what defines data is also fairly complex. One can think of primary and secondary data. Conclusions based on one set of data, can then become data in another study. Isotope ratios measured in a mass spectrometer, are used to estimate/model the time of crystallization. The age of crystallization becomes a data point when looking at a plot of igneous activity with time.

**Basic types of data:**

- nominal (e.g. mineral types, rock types, fossil species).
- ordinal (rank order without constant intervals between elements, fairly uncommon in geoscience, the Mercalli scale for earthquakes and Moh's hardness scale are examples in geoscience).
- scalar (e.g. temperature, length).
- directional/orientation data (without magnitude, strike and dip, trend and plunge).
- vector (direction and magnitude, e.g. glacial ice velocities, crustal GPS velocities).
- matrix (e.g. stress states, ).
- one use different types of analysis and visualization for different types of data.

**What are fundamental measures in physical
science:**

- length.
- time.
- mass.
- volume.
- energy units (e.g. calories).
- position in a coordinate system (latitude and longitude as angular coordinates.
- pressure.
- temperature.
- elemental composition.
- these can generate other more complex quantities (e.g. a gradient, or a stress state).
**What are your units?**It is absolutely crucial to keep track of your units. One way to do this when working in an Excel environment is to always include the units in the column heading description. Tracking your units is one way to detect mistakes.

**Common types of data in geoscience **(this list is far from exhaustive)**:**

- structural data: orientation data of structural elements (azimuth, strike and dip, trend and plunge, vector orientation).
- geochemical data:
- whole rock.
- trace element.
- isotopic.
- pH.

- geophysical data:
- magnetic.
- gravity.
- seismic.
- material properties.
- heat flow.

- geochronologic data: note that isotopic data can create geochronologic data.
- sedimentologic data:
- grain size.
- grain shape.
- grain type.
- grain orientation (e.g. till fabrics)
- paleocurrent data.

- stratigraphic data - think of the wealth of data in a stratigraphic column.
- paleontologic data - e.g. morphometric data from fossils.
- paleoclimate data - often a derivative from more basic types of data.
- The above list is quite incomplete - try adding some of your own.

**Sampling plans and protocols: **Sampling plans/protocols are extremely important. It is often what determines
the value of your data. A bad sampling plan results in worthless
data. The sampling plan is highly dependent on the situation and
on the questions being addressed. We will discuss sampling plans
as we go along in this course. Some basic types of **spatial sampling
plans** are: **grids, random, clustered, traverse**. For the non-random
patterns many variations exist. For example, for the traverse
you can have regular spacing, or exponentially increased spacing.
There is not only the spatial aspect to consider, where to sample, but also the temporal aspect to consider, when to sample. Consider the different types of different temporal sampling plans
that might exist and when and why you might use them.

**What is the difference?**

**precision**, often cast as measurement error, is a function of the statistics of repeatability of measurement.**accuracy**is much more complex and is a function of whether your measurement protocol truly measures what your are trying to measure. A incorrectly calibrated thermometer could give high precision, but low accuracy.**measurement error**(variation if measure equivalent specimens repeatedly in the same way) vs.**natural variation**(variation in result due to inhomogeneity of sample body): generally you want to reduce the first and capture the second.- test yourself with regard to the difference
between precision and accuracy:
- how will a bias in sampling by itself effect precision? accuracy?
- how will instrument drift effect precision? accuracy?

**significant places**. The basic questions is how well do you know that number, or how well do you need to know it in order to make a conclusion. Generally, the more significant places in a number the more precise it is assumed to be. Some rules from Vacher, 1998 (a good reference).- " When you multiply or divide, round off to the same number of figures as in the factor with the fewest figures."
- " When you add or subtract, it is the position of the digits that is important, not how many there are."
- " If you have a calculation that involves multiple steps, keep additional digits through the intermediate results and round off at the end."

- you can set up a cell in Excel to only display a certain number of significant places through formatting.

**Propagation of error**:
Knowing the precision of your numbers is crucial (imagine you are testifying
in court). The exercise listed below is one example of how to approach significant numbers and propagation of error.

- Treat every number as if it has an implied error just beyond the number of significant places. This expresses the degree of implied uncertainty. For example 2.1 would be treated as 2.1 +/- .05.
- Run through your calculations in three ways, one just with the original numbers, one to maximize the end result, and one to minimize the end results. For example, if you were dividing by 2.1, divide by 2.1 in case 1, 2.05 in case 2, and 2.15 in case 3.
- Report the number with the appropriate, significant
places but with the error of propagation as indicated by the
differences between case 1 and cases 2 and 3.
- As an example consider the calculation (2.1*.3)+(3233/640)
- case 1 (maximizing): (2.15 * .35) + (3233.5 / 639.5) = 5.8
- case 2: (2.1 * .3) + (3233 / 640) = 5.7
- case 3: (2.05 * .25) + (3232.5 / 640.5) = 5.6
- report as 5.7 +/- .1 This is one expression of how well you know that number.

- You can build this into your Excel spreadsheet calculations by adding extra columns.
- You can take a more sophisticated approach
that looks at the statistical likelihood of error, and the more probable
cancellation effects if the numbers are independent of each other.
Consult a statistician if this becomes important.
- Note that the more operations the greater the error of propagation can be. Note also that this does not include the notion of accuracy.

- YouTube video example of above - https://www.youtube.com/watch?v=2sWWdYs9Gxk .
- YouTube video example of propagation of uncertainty using differential equations - https://www.youtube.com/watch?v=V0ZRvvHfF0E .

The purpose of this exercise is to: a) introduce how data is treated in professional forums, and b) begin to familiarize yourself with journals and sources that can provide information and data. You probably realize in some fashion that one reads science in a different way than you would read a novel. It might help to be more explicit, and this link leads you to a one-page description of how to read a scientific journal article.

First you need to **find an example of a journal article that analyzes some data**. There are at least two routes suggested: 1) Peruse a professional journal of geoscience
flavor and choose an article that includes analysis of some distinct
type of geoscience data. A great variety of journals are available
in the library and on line. The American
Geophysical Union is especially good with data-rich online
journals, which you can get full copies of in the library. 2) Pick a geoscience topic of interest to you (e.g. Hawaiian eruptions) and use Google, Google Scholar, or another search engine to find relevant articles online. Sometimes you won't be able to get full copies of the articles from the journal sites themselves. However, in some cases authors are allowed to post copies of the article as pdfs on their own websites, which they do (since they are interested in having other people read their stuff), and in this way you can get copies of the entire article. **If after spending a half an hour in the library or online without
finding something appropriate, please come contact me, and I can provide you with one.**

**Prepare a short word-processed report** upon the article you chose that specifically includes
the following elements:

- an intro sentence or two describing the focus of the paper (this can be a quote from the abstract).
- a short description:
- of the types of data used.
- how the data was acquired.
- how the data was plotted and/or analyzed (be as specific as possible).

- describe any mention of error or analytic uncertainty.
- describe
**one**major conclusion that was or could be drawn from the data. - a full reference citation, including the URL if the article is available on-line (or include a pdf copy of the article).

I realize that many of you are just starting in geology, and professional geology journal articles will be challenging for you to read and understand. Concentrate on the portion that you do understand, and ask me or others about the part you don't understand. It is not expected that all will be clear to you, but you can still learn a lot.

**Submit your report in Blackboard in the assignment function as a Word document or pdf**. It should be written in full sentences and free of spelling and grammatical mistakes.

**What is a database?**

- A collection of related information of some
type;
__data from multiple, but related, variable populations__. Databases come in many different forms, some of which we will explore more later in the semester. - Later lab focuses on the nature and organizational structures databases can take.
- Database software: Excel, Access, Oracle (many others).
- GIS databases - map based databases plus more. We will focus on these some.

**First introduction to Excel:**

- very widely used, and 'communicates' well with other software programs (i.e. can trade data back and forth easily).
- widely used in the real world.
- fundamental hierarchal architecture components - Excel file, sheets, rows and columns, cells.
- some basic operations (examples will be reviewed
in lecture):
- formatting cells.
- entering values.
- cell referencing.
- entering formulas and functions.

- copy, paste, cut.

Column A | Column B | Column C | Column D |

Row 1 | Cell B2 | Cell C2 | Cell D2 |

Row 2 | Cell B3 | Cell C3 | Cell D3 |

Row 3 | Cell B4 | Cell C4 | Cell D4 |

Row 4 | Cell B5 | Cell C5 | Cell D5 |

Sample # | grain size (cm) | grain type | grain shape |

1A | 1.5 | granite | rounded |

2A | 1 | granite | subrounded |

3A | 2 | carbonate | subangular |

4A | 1.75 | quartzite | subrounded |

*Example of part of an Excel sheet. The top part shows the
basic architecture of cells at the intersection of rows and columns.
Each cell is referred to by a letter that denotes which vertical
column it is in, followed by a number that designates the horizontal
row it is in. Typically, each column contains measurements of some variable
from a series of samples, and each row are measurements of the
variables from a given sample or entity. Traditionally, the first
row contains variable labels, and the first column contains sample
labels, while the rest of the cells contain the appropriate values.*

In addition to being a simple repository for data, Excel has significant ability to analyze, visualize and model data. Excel is limited in what it can model, but it is also quite accessible. Below is one example focused on isostasy.

**Isostasy** is a very important concept in geophysics
and earth science. It is the simple idea that **the crust floats**.
Floating in equilibrium means that there is a compensation level,
a level at which the weights of all the overlying columns are
the same. If the weights aren't the same then the differential vertical loads causes
fluid movement from the high to low pressure areas until they even out. Isostasy results
in crustal uplift and/or subsidence in response to a change in
crustal load. Naturally, geologic response rates are slow. Gravity studies show us that much (but not all) of the earth's
crust is close to isostatic equilibrium. When equilibrium exists
we can set up the following equality.

xm pc + xr pc = xr pm

where xm = the average crustal mountain elevation above sea level, pc= the average density of continental rock, xr= the root depth of continental material. pm = the average density of mantle material. The root depth of continental material is added to the normal continental thickness at sea level (here assumed to be 35 km).

This formula simply says the mantle material displaced must be equal in weight to that continental material in the root and the mountain. We can then input a range of mountain elevations and model what the root should be. We can also explore how different model densities would effect the results. The table below is taken from an Excel sheet where the 0.2 value is in the A2 cell. The formula for C2 is given. Below is the Excel graph depicting the results. Note that what we have done here in order to better visualize things is to, first, make the root negative, and second, add in the 35 km of normal crustal thickness. Thus the chart shows how the root grows as topography does, and what the effect of density variation is. If you set it up right, then simply changing the density value in the the mantle cell, will cause the graph to change in an appropriate manner, and one can explore the significance of various variables in the model. A version of the sheet can be downloaded here.

Xm in km | xr+35 in km # 1 | xr+35 in km #2 | Xr+35 in km # 3 |

0.2 | -35.7 | -35.9 | -36.0 |

0.5 | -36.8 | -37.2 | -37.5 |

1 | -38.6 | -39.4 | -39.9 |

2 | -42.2 | -43.8 | -44.8 |

3 | -45.7 | -48.3 | -49.7 |

4 | -49.3 | -52.7 | -54.6 |

5 | -52.9 | -57.1 | -59.5 |

"=-(A2*A$12)/(3.25-A$12)-35" | |||

cont. density | mantle density | ||

2.54 | 3.25 | #1 model | |

2.65 | 3.25 | #2 model | |

2.7 | 3.25 | #3 model |

*Example of input and computation cells above, and output graph below.*

What else can you model with a general model for isostasy?

- the consequence of erosional denudation of mountains.
- consequence of sediment loading of a continental shelf.
- amount of depression due to large scale ice loading (have to take density of ice into account).
- isostatic response is a common and important crustal process.

The same basic phenomena of floating occurs with freshwater floating on top of salt water. You could therefore use a similar approach to model how changing the elevation of the groundwater table (e.g. by well withdrawal and development of a draw-down cone) would change the configuration of the fresh-salt water divide at depth. This could help model/understand the common phenomena with coastal aquifers of salt water incursion.

**An
example of using Excel** - **modeling differential compaction.**

* What is differential compaction?* As new sediments pile up on top of previously deposited sediment, the underlying sediment compacts. The compaction is associated with squeezing the water out, rearranging grains, crushing grains, and changes in the mineralogy. This means that several feet of mud can end up being one foot of shale, the compacted and lithified equivalent of mud. Different sediments compact to different degrees, with a general trend of more clay rich and finer grained sediments tending to compact more, and a good clean quart sandstone compacting very little. Given that lateral compositional variation is common in sediments then one expects differential compaction, which in turn will deform the existing overlying beds. Differential compaction at depth will also produce differential subsidence at the surface, and can influence sediment accumulation at the surface. Differential compactions produces an array of structures, often subtle, but not always, that can be quite important for understanding fluid migration and pressure development. These structures include gentle folds, fracture systems, and in some cases faults. Differential compaction can also occur due to water or oil withdrawal, producing surface depressions and even reactivating shallow faulting (USGS site on detecting subsidence from space). Differential compaction during construction is also a real challenge with real costs associated with it.

*Faults that are attributed to differential compaction. This is a seismic section through part of the Chesapeake impact structure. Faulting is interpreted to have occurred due to uneven compaction of post-impact sediment over the underlying topography. Image source: http://woodshole.er.usgs.gov/epubs/bolide/faulting.html*

*A simple model in Excel for differential compaction*. Since Excel can reference cells you can basically transform one array of values into another and in this way produce a model with input and output. In this particular case we have an input array of values where successive columns represent some horizontal distance along some bed and rows represent stacked beds. The value in each cell represents the amount of compaction that the sediment at that point undergoes. A 1 means no compaction - the sediment retains its initial thickness. Sandstones under certain conditions would be close to a value of 1. A .5 would mean that it compacts to half its original thickness, and so on. Muds can realistically have values of .5. Note that in the input example below that there is a general increase with depth. This would be reasonable given that more sediment overhead means more compaction. However, one can experiment with different assumptions and values, which is the point of this exercise. In the input matrix example below one can also see the lateral variation. The suite of ones that extend to depth are meant to capture a sandstone channel within more compressible adjacent flood plain sediments. Another anomaly at X13 and YD could represent collapse over a void. You can see in the graphical representation of the geometry of beds after compaction, that a subtle asymmetric anticline developed where the channel sandstones are. Note that the anticline does not persist at depth. By inputting different geometries that mirror different geologic situations you can explore the array of differential compaction features that can result. Please ask if you would like a copy of the Excel sheet that contains this model. You can think of many additions and improvements you can make to this model, and some rather sophisticated models exist for modeling basin sedimentation that include compaction. An obvious improvement would be to increase the resolution by increasing the number of cells. One quickly reaches the limit of what Excel can do in constructing these models, and so not surprisingly most modeling software does not use an Excel platform.

*In this very simple model the above Excel cells represent a matrix input of compaction values that represents a cross section view. Imagine these values as depicting a crude channel form. The graph below shows the geometry of initial straight lines after compaction has occurred.*

These are very simple examples of how software can be used to explore, learn about, and visualize models of geologic and other phenomena.

**In
class exercise - exploring the measurements of grain size**.

In addition to the various 'hard', scientific skills that are the focus of this course, there are 'soft', people skills that bear mentioning and some exploration. Science is increasingly collaborative, and there is also the crucial necessity of conveying science to various stakeholders. For these reasons these skills are very important. Professional conduct is an essential part of those needed soft skills and is tied to credibility, which is basically a measure of how much trust others have in you and your work. You can't save or improve lives if people won't listen.

*What constitutes professional behavior for geologists, and why is it important?*

Some resources that address that concern:

- AGI Guidelines for Ethical Professional Conduct - http://www.americangeosciences.org/community/agi-guidelines-ethical-professional-conduct .
- Geological Society of America Ethical Guidelines for Publication: http://www.geosociety.org/pubs/ethics.htm.

**Additional recommended reading:**

Schumm, S. A., 1991, To Interpret the Earth, Ten ways to be wrong; Cambridge University Press, 133 p. This is one of the best introductions to the philosophy of science and common pitfalls in reasoning, told humorously and from deep familiarity and extensive experience. It takes some thinking to get through though.

Vacher, H. L., 1998, Computational geology 1 - Significant Figures!: Journal of Geoscience Education, v. 46, p. 292-295.

Copyright by Harmon D. Maher Jr.. This material may be used for non-profit educational purposes if proper attribution is given. Otherwise please contact Harmon D. Maher Jr.