Statistical and then being tested to predict

Statistical and geo-statistical data analyses are significant for geologists and geophysicists to interpret and predict the variable’s behavior within the data sample when there are no measurements. These analyses will also clarify the assumptions in the data by using mathematical and graphical methods so that the characteristics are obtained and then being tested to predict the unknowns. In addition, the type of data that geologist or geophysicist can take are mainly geologic, hydrologic and environmental data. Central tendency, variability, and graphical description are the main mathematical methods of describing the data, and it is a type of data analysis called “descriptive analysis”. This data analysis studies the behavior of only one variable such as organic contents in sedimentary rock, depth of ground water, etc. Central tendency is the first type of descriptive analysis that the geologists/geophysicists need in order to obtain a typical representative of the values within the data. This type is based on mode, median and mean. The mode is the value that occurs more often, and it can be one mode (unimodal), two modes (bimodal), three modes (trimodal) or more modes (multimodal). Median is the middle value of the data after being sorted from smallest to highest value or the opposite. Mean is the average value of the data, and it is sensitive to outliers. Furthermore, measurement of spread or variability is the second type of descriptive analysis that focuses on the range, inter-quartile range (IQR), and variance. A range is a difference between the smallest and the highest value of the data. IQR is the range within the 50% of the data sample by eliminating the first 25% and the last 25% of the data. The variance is the deviation of each sample from the average value. For example, tall people have positive deviation while short people have negative deviation. Furthermore, graphical description of the data is the third type of descriptive analysis, and it mainly includes Box Plot and Histogram. A histogram is a display of statistical information that uses bins to show the frequency of the data in numerical intervals of equal size, usually by default one interval per 10 samples. Other useful descriptive statistics include variance, standard deviation, skewness, and kurtosis. Variance measures how far the data set is spread out, or mathematically the average of the squared differences from the mean. Standard deviation shows how much is the data is spread out around the mean or average. Skewness indicated the symmetry or asymmetry of the data around the mean. Lastly, kurtosis represents the peak near the mean in the histogram. In conclusion, by using statistical analyses such as descriptive analysis, the behavior of the values within the data sample will be predicted. Such predictions depend on mathematical methods of how describing the data, such as central tendency, variability, variance, standard deviation, skewness, and kurtosis.