# Exploratory Data
Analysis

“Exploratory data analysis is detective work” [Tukey, 1977, p.2].
This package enables the user to use graphical tools to find
‘quantitative indications’ enabling a better understanding of the data
at hand. “As all all detective stories remind us, many of the
circumstances surrounding acrime are accidental or misleading. Equally,
many of the indications to be discerned in bodies of data are accidental
or misleading [Tukey, 1977, p.3].” The solution is to compare many
different graphical tools with the goal to find an agreement or to
generate an hypothesis and then to confirm it with statistical methods.
This package serves as a starting point.

## Synoptic
Overview

```
library(DataVisualizations)
data("Lsun3D")
Pixelmatrix(Lsun3D$Data)
```

## Distribution
Analysis

“A scientifically sound procedure for the identification and analysis
of empirical distributions is a comparison to a known theoretic
distribution. The quantile/quantile plot (QQ-plot) allows comparing an
empirical distribution to a known distribution [Michael, 1983]. Here, in
100 quantiles the model of a Gaussian distribution is compared to the
data, and a straight line confirms a good data fit of the model. The
Gaussian distribution is the canonical starting point for such a
comparison[…]

[t]he precise form, i.e., the type, nature and parameters of the
formal model of the probability density function (pdf) is the […] goal
of [Distribution] analysis. Usually, this is performed using kernel
density estimators. The simplest of such a density estimation is the
histogram. However, histograms are often misleading and require critical
parameters such as the width of the bin [Keating and Scott, 1999]. A
specially designed density estimation, which has been successfully
proved in many practical applications is the “Pareto Density Estimation”
(PDE). PDE consists of a kernel density estimator representing the
relative likelihood of a given continuous random data [Ultsch, 2005].
PDE has been shown to be particularly suitable for the discovery of
structures in continuous data hinting at the presence of distinct groups
of data and particularly suitable for the discovery of mixtures of
Gaussians [Ultsch, 2005]. The parameters of the kernels are auto-adopted
to the date using an information theoretic optimum on skewed
distributions [Ultsch, Thrun, Hansen-Goos, and Lötsch, 2015].”
[Thrun/Ultsch 2018].

```
library(DataVisualizations)
data(MTY)
InspectVariable(MTY,'MTY')
```