A first comparison of the distribution of silica contents of oceanic crust and a look at the influence of sample size on estimating population parameters.

Geologic background of first data set: Below are the first 20 values of 1338 rows of data downloaded from the Lamont Doherty's basalt geochemistry and petrography database. You can download the data set yourself if you want a full copy, or come see me. Specifically, it is the prepared data set for backarc basins. Because these basins are smaller and often filled with significant sediment they do not show the well developed magnetic anomalies and ridge geometry seen for sea floor spreading ridge, and yet they must be some type of oceanic crust. They may have somewhat different mechanisms for crustal formation than standard seafloor spreading ridge. This should be reflected in their geochemistry.

sample_id material SiO2 TiO2 Al2O3
CHRSPS3-005-001 GL 49.55 1.53 14.85
CHRSPS3-005-002 GL 50.41 1.86 15.01
CHRSPS3-005-003 GL 50.17 1.63 15.43
CHRSPS3-005-004 GL 49.74 1.49 15.61
CHRSPS3-005-005 GL 50.01 1.56 15.49
KAIYO88-006-003 GL 51.38 1.64 13.96
KAIYO88-006-005 GL 50.83 1.83 14.15
KAIYO87-003-001 GL 48.91 1.06 17.11
KAIYO87-003-002 GL 48.77 1.02 17.29
KAIYO88-003-027 GL 50.11 1.78 15.81
KAIYO88-005-003 GL 54.04 1.19 15.11
KAIYO87-002-001 GL 50.15 1.5 15.34
KAIYO87-002-002 GL 50.29 1.51 15.03
KAIYO87-002-003 GL 50.04 1.53 14.99
KAIYO87-002-004 GL 50.22 1.55 15.11
KAIYO87-002-005 GL 50.3 1.51 14.91
KAIYO87-002-006 GL 50.34 1.52 15.01
KAIYO87-002-007 GL 50.48 1.54 14.98
KAIYO87-002-008 GL 49.87 1.53 15.31

Geologic background of the second data set. This is from the same data source as above, but all the samples are from the EPR (East Pacific Rise). This is one of the faster seafloor spreading ridges active on the earth today. There are some 3817 rows of data, and so it is a substantial data set.

sample_id material SiO2 TiO2 Al2O3
ALV0976-005 GL 49.8 1.94 14.7
ALV0976-005 GL 49.4 1.64 18.3
ALV0976-003 GL 50.5 1.73 14.37
ALV0976-005 GL 50.16 2.07 14.56
GIL7904-009-A1 GL 48.8 1.88 14.8
GIL7904-009-A1 GL 49.3 1.87 15.2
GIL7904-009-001 GL 48.84 1.95 14.77
GIL7904-009-005 GL 49.68 1.99 14.51
GIL7904-009-008 GL 49.88 1.95 14.61
GIL7904-009-020 GL 49.57 1.94 14.6
GIL7904-009-A1 GL 49.36 1.89 14.61
ALV0972-002 GL 49 1.53 15.7
ALV0972-006 GL 49.1 1.51 15.8
ALV0975-002 GL 49.4 2 14.4
ALV0975-004 GL 49.4 2.03 14.3
ALV0975-007 GL 49.3 2.13 14.1
ALV0975-008 GL 49.5 2.21 14.2
ALV0972-002 GL 48.8 1.52 16.2
ALV0972-006 GL 49.1 1.48 16

Below are histograms of the data with the backarc basin data on top and the EPR data on bottom:

How do they compare? What conclusions can you draw from this? Does the difference between the sample size for the two data sets strengthen or weaken your conclusions. Even though there is a lot of data here do you really know how this data set is representative of back-arc basins? Can you think of biases that might exist?


An Excel experiment: The back arc basin data rows were assigned random numbers. These rows were then sorted by the random numbers. Averages were then computed for 1 to n rows, with n increasing by one. This allows us to look at how the sample average changes as you incrementally increase your sample size. The difference between the universal population and sample population versus the sample size is plotted below. As you would expect, with increasing n you converge on the 'true' sample population. However, note some interesting behavior before that point. It is possible to have your sample average deviate more from the true average, as you happen to hit a random 'cluster' of higher or lower values. Two random number assignments are made below resulting in the two charts. This, by the way, would look different if in each reiteration of computing the sample average the entire population was sampled anew, but with n increased by 1. Instead you are only seeing the effect of getting one more sample, but keeping the samples you already have. If you have time and interest you can explore what the difference would be.


Copyright by Harmon D. Maher Jr.. This material may be used for non-profit educational purposes if proper attribution is given. Otherwise please contact Harmon D. Maher Jr.