---
title: "FuzzyPovertyR"
output: rmarkdown::html_vignette
bibliography: bib.bib
vignette: >
%\VignetteIndexEntry{FuzzyPovertyR}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(FuzzyPovertyR)
library(kableExtra)
```
## Introduction
`FuzzyPovertyR` is a package for estimating fuzzy poverty indexes.
Broadly speaking, a fuzzy poverty index is an index that ranges in the
set $Q=[0,1]$ (@alkire2015multidimensional; @silber2023research). A fuzzy indexes assigns to each statistical unit a value
in this interval according to a given membership function (mf)
$\mu(x_i)\in Q$ where $x$ is a poverty predicate. The higher the value
of $\mu$, the more the individual is regarded as "poor" with respect to
the poverty predicate $x$. In socio-economic surveys $x$ may be the
equivalised disposable income, or expenditure. However, in principle $x$
may be a generic poverty predicate that the researcher needs to analyse,
for example this could be a variable that relates to access to
transports, services and other facilities.
Below we distinguish between monetary and supplementary poverty indexes
(or measures). A fuzzy monetary poverty measure is calculated over a
`numeric` vector of length $n$ (the available sample size). A
supplementary poverty index is calculated on a `data.frame` of items of
a questionnaire that relates to other dimensions of poverty other than
monetary.
The dataset coming from the package is loaded in the environment with
```{r}
data(eusilc)
```
## Fuzzy Monetary poverty measures
The package lets the user choose among different membership functions
trough the argument `fm` of the `fm_construct` function. The membership
function available are:
- `fm="verma"` (see @cheli1995totally)
$$
\mu_i=\left(1-F_X(x_i)\right)^{\alpha-1} =
\left(\frac{\sum_{j=i+1} w_j|x_j> x_i}{\sum_{j\ge 2} w_j|x_j> x_1}\right)^{\alpha-1}
$$
where $F_X$ is the empirical cumulative distribution function of
$X$ calculated for the $i-th$ individual and $w_i$ is the sampling
weight of statistical unit $i$. The parameter $\alpha\ge 2$ is chosen so
that the average over the function equals the head count ratio (i.e. the
proportion of units whose equivalised disposable income falls below the
poverty line) of the whole population.
- `fm="verma1999"` (see @betti1999measuring)
$$
\mu_i=(1-L_X(x_i))^{\alpha-1} =
\left(\frac{\sum_{j=i+1} w_jx_j|x_j> x_i}{\sum_{j\ge 2} w_jx_j|x_j> x_1}\right)^{\alpha-1}
$$
where $L_X$ is the Lorenz Curve of $X$ calculated for the $i-th$
individual and $w_i$ is the sampling weight of statistical unit $i$.
Again, the parameter $\alpha\ge 2$ is chosen once again so that the
average over the function equals the head count ratio of the whole
population or its estimate.
- `fm="verma"` (see @betti2008fuzzy)
$$
\begin{split}
\mu_i&=\left(1-F_X(x_i)\right)^{\alpha-1}\left(1-L_X(x_i)\right)=\\
&=\left(\frac{\sum_{j=i+1} w_j|x_j> x_i}{\sum_{j\ge 2} w_j|x_j> x_1}\right)^{\alpha-1} \left(\frac{\sum_{j=i+1} w_jx_j|x_j> x_i}{\sum_{j\ge 2} w_jx_j|x_j> x_1}\right)
\end{split}
$$
Again, the parameter $\alpha\ge 2$ is chosen once again so that the
average over the function equals the head count ratio of the whole
population or its estimate.
- `fm="chakravarty"` (see @chakravarty2019axiomatic)
$$
\mu_i = \begin{cases}
1 & x_i = 0\\
\frac{z - x_i}{z} & 0 \le x_i < z\\
0 & x_i \ge z
\end{cases}
$$
where $z$ is a threshold to be chosen by the researcher.
- `fm="cerioli"` (see @cerioli1990fuzzy) \$\$ \mu\_i=
$$
\mu_i = \begin{cases}
1, \quad 0=1)$ rules the shape of the membership functions
(set `b=1` for linearity)
```{r}
z1 = 20000; z2 = 70000; b = 2
belhadj = fm_construct(predicate = eusilc$eq_income, weight = eusilc$DB090, fm = "belhadj2015", z1 = z1, z2 = z2, b = b)
```
```{r, echo=FALSE, fig.height=6, fig.width=6}
summary(belhadj)
plot(belhadj)
```
Using `fm="cerioli"`. Again we have to set the values of `z1`, `z2` as follows:
```{r}
z1 = 10000; z2 = 70000
cerioli = fm_construct(predicate = eusilc$eq_income, weight = eusilc$DB090, fm = "cerioli", z1 = z1, z2 = z2)
```
```{r, echo=FALSE, fig.height=6, fig.width=6}
summary(cerioli)
plot(cerioli)
```
### Example using `fm=chakravarty` and `fm=belhadj2011`
@chakravarty2019axiomatic fuzzy index is obtained setting
`fm = "chakravarty"`. The argument `z` needs user's specification as follows:
```{r}
z = 60000
chakravarty = fm_construct(predicate = eusilc$eq_income, weight = eusilc$DB090, fm = "chakravarty", z = z)
```
```{r, echo=FALSE, fig.height=6, fig.width=6}
summary(chakravarty)
plot(chakravarty)
```
again is is possible to specify the `breakdown` argument to obtain
estimates at sub-domains.
```{r}
chakravarty.break = fm_construct(predicate = eusilc$eq_income, eusilc$DB090, fm = "chakravarty", z = z, breakdown = eusilc$db040)
```
```{r}
knitr::kable(data.frame(verma.break$estimate, chakravarty.break$estimate), col.names = c("Verma", "Chakravarty"), digits = 4)
```
Instead, for `fm = "belhadj2011"` the values of $z_{min}$ and $z_{MAX}$ need user's specification as follows:
```{r}
zmin = 5000; zmax = 60000
belhadj2011 = fm_construct(predicate = eusilc$eq_income, weight = eusilc$DB090, fm = "belhadj2011", z_min = zmin, z_max = zmax)
```
```{r, echo=FALSE, fig.height=6, fig.width=6}
summary(belhadj2011)
plot(belhadj2011)
```
### Example using `fm=ZBM`
As previously showed to compute the index related to this mf is necessary to compute three parameters $a, b, c$ which are computed with bootstrap techniques (@zedini2015new). Moreover the definition of the index require the knowledge of the household size. The index is obtained as follows:
```{r}
ZBM = fm_construct(predicate = eusilc$eq_income, weight = eusilc$DB090, fm = "ZBM", hh.size = eusilc$ncomp)
```
```{r, echo=FALSE, fig.height=6, fig.width=6}
plot(ZBM)
```
------------------------------------------------------------------------
## Fuzzy supplementary
The package include also a multidimensional fuzzy poverty index known as Fuzzy Supplementary (FS) index (see @betti2008fuzzy and @betti2018simplified). This index is defined with multiple steps. The package has an ad-hoc function for each step (excluding the third one).
The steps are:
- Step 1 - Identification
- Step 2 - Transformation
- Step 3 - Factor analysis to identify dimensions of poverty
- Step 4 - Calculation of weights
- Step 5 - Calculation of scores in dimensions
- Step 6 - Calculation of the $\alpha$ parameter
- Step 7 - Construction of the FS measure for each dimension
### Step 1 - Identification
This step is pretty simple. The user has to select the columns of the
data that correspond to the items that he/she has decided to keep in the
analysis.
Step 1 is done with the following selection
```{r}
# eusilc = na.omit(eusilc)
step1 = eusilc[,4:23]
```
If the data in the dataset are not "ordered" in the right way, i.e. the highest values represents the highest deprivation it is possible to invert them using the following function:
```{r}
#Create a dataframe in which the variable X is not ordered in the right way:
data=data.frame("X"=rep(c(1,2,3,4),20), "Y"=rep(c(7,8,9,1),20))
#Crete vec_order a vector of length n with TRUE or FALSE. True if the order of the variable is to be inverted, False otherwise
vec_order=c(TRUE,FALSE)
head(fs_order(data=data, vec_order))
```
### Step 2 - Transformation
In this step the items are mapped from their original space to the
$[0,1]$ interval using the function `fs_transform` (see
@betti2018simplified and @betti2008fuzzy). For each item, a positive score $s_{ij}$ is determined as follows
$$
s_{ij}= 1- d_{ij} = 1-\frac{1-F(c_{ij})}{1-F(1)},\quad i=1,\dots,n \quad \text{and}\quad j=1,\dots,k
$$
where $c_{ij}$ is the value of the category of the $j$-th item for the $i$-th household and $F(c_{ij})$ is the value of the $j$-th item cumulative function for the $i$-th household. This step is done as follows using the function `fs_transform`:
```{r}
step2 = fs_transform(step1, weight = eusilc$DB090, ID = eusilc$ID); class(step2)
summary(step2$step2)
# step2.1 = fs_transform(step1, weight = eusilc$DB090, ID = eusilc$ID, depr.score = "d")
```
which outputs
```{r}
knitr::kable(head(step2$step2), digits = 3, align = "c", caption = "Transformed items")
```
### Step 3 - Factor Analysis
This fuzzy supplementary measure use factor analysis to undercover
latent dimension in the data. There are multiple approaches to get
factor analysis in R which we do not cover in this vignette, however the
user can check for example the `lavaan` package. Anyways, factor
analysis is not mandatory and the user may wish to undertake a different
approach to undercover a latent structure in the data. Indeed, it is
possible to skip factor analysis or to use a personal assignment of
columns in dimensions.
Regardless of the chosen method, to go trough Step 3 the user has to
specify a numeric vector of the same length of the number of items
selected in Step 1 that assigns each column to a given dimension.
```{r}
dimensions = c(1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5)
```
### Steps 4 and 5 - Calculation of weights and measures within dimensions
In this step, the weights to be assigned to each item, belonging to a given dimension $h$, are determined separately within each dimension. Such weighting procedure takes into account of two different aspects: the dispersion of the deprivation indicator and its correlation with other deprivation indicators in the given dimension. The weight of item $j$ belonging to dimension $h$ is taken as
$$
\omega_{hj}=\omega^a_{hj}\times \omega^b_{hj}, \quad h=1,\dots,H \quad \text{and}\quad j=1,\dots,k_h
$$
$\omega_{hj}^a$ is taken as proportional to the coefficient of variation of the complement to one of the positive score for the variable concerned
$$
\omega^a_{hj}\propto \frac{\sigma_{s_{hj}}}{1-\bar{s}_{hj}}
$$
where $\sigma_{s_{hj}}$ is the standard deviation of the deprivation score $s$ for item $j$ in dimension $h$ and $\bar s_{hj}$ its sample mean. The second factor is
$$
\omega^b_{hj}\propto\Biggl(\frac{1}{1+\sum_{j=1}^{k_h}\rho_{e_{hjhj^{\ast}}}|\rho_{e_{hjhj^{\ast}}}s_{hi}}{\sum_{\gamma\ge 2}w_{h\gamma}|s_{h\gamma}>s_{h1}}\Biggr]^{\alpha-1}\Biggl[\frac{\sum_{\gamma=i+1}w_{h\gamma}s_{h\gamma}|s_{h\gamma}>s_{hi}}{\sum_{\gamma\ge 2}w_{h\gamma}s_{h\gamma}|s_{h\gamma}>s_{h1}}\Biggr]
\end{split}
$$
The function `fs_construct` is used to compute the FS for each dimension and overall as follows:
```{r, echo=FALSE, fig.height=6, fig.width=6}
FS = fs_construct(steps4_5 = steps4_5, weight = eusilc$DB090, alpha = alpha, breakdown = NULL) # no breakdown
summary(FS)
FS = fs_construct(steps4_5 = steps4_5, weight = eusilc$DB090, alpha = alpha, breakdown = eusilc$db040)
FS$estimate
plot(FS)
```
The output of the `fs_construct` function is a list containing:
- `membership` a `list` containing the FS measures for each
statistical unit in the sample. Results for each dimension can be
obtained by
- `estimate` the average of the membership function for each dimension
```{r}
FS$estimate
```
- `alpha` the parameter $\alpha$ estimated from data.
Again, it is possible to obtain results for sub-domains by specifying
the `breakdown` argument
```{r echo=FALSE}
knitr::kable(fs_construct(steps4_5 = steps4_5, weight = eusilc$DB090, alpha = alpha, breakdown = eusilc$db040)$estimate, digits = 4 )
```
The package contains also a function named `fs_construct_all` which constructs the fuzzy supplementary poverty measure based without step-by-step functions.
------------------------------------------------------------------------
## Variance Estimation
The variance of each Fuzzy Monetary measure can be estimated either via
Bootstrap (naive or calibrated) or Jackknife Repeated Replications. We recommend the former
each time the user has no knowledge of the sampling design, while we
recommend the Jackknife when there is full information on the design and
of the PSUs (see @betti2018simplified). In the following, for the unidimensional indices, we report only the examples linked with `fm=verma`. For the other specification of the mf the function is identical the only changes are linked with the parameters required by the mf (see `fm_construct`).
### Example using `fm=verma`
In case of `fm="verma"`, we recommend the user to use the value of
`alpha` from obtained from the function `fm_construct`. It is possible
to specify different values of the parameter (i.e. `alpha=2`). We do not
recommend to leave the argument `alpha=NULL` for the computation of
variance.
```{r}
alpha = fm_construct(predicate = eusilc$eq_income, weight = eusilc$DB090, ID = NULL, HCR = 0.12, interval = c(1,10), alpha = NULL)$alpha
```
```{r}
boot.var = fm_var(predicate = eusilc$eq_income, weight = eusilc$DB090, fm = "verma", type = "bootstrap_naive", HCR = .12, alpha = alpha, verbose = F, R = 10)
# plot(boot.var)
fm_var(predicate = eusilc$eq_income, weight = eusilc$DB090, fm = "verma", type = "jackknife", HCR = .12, alpha = 9, stratum = eusilc$stratum, psu = eusilc$psu, verbose = F)
```
which gives the bootstrap estimate or the jackknife estimate.
If there are multiple sub-domains or sub-populations that need variance
estimation, the user can specify the breakdown to the `breakdown`
argument of the function `fm_var`. For example:
```{r echo=FALSE}
Bootstrap = fm_var(predicate = eusilc$eq_income, weight = eusilc$DB090, fm = "verma", breakdown = eusilc$db040, type = "bootstrap_naive", HCR = hcr, alpha = alpha, verbose = FALSE) %>% summary()
Jackknife = fm_var(predicate = eusilc$eq_income, weight = eusilc$DB090, fm = "verma", breakdown = eusilc$db040, type = "jackknife", HCR = hcr, alpha = alpha, stratum = eusilc$stratum, psu = eusilc$psu, verbose = F)%>% summary()
```
```{r, echo=FALSE, fig.height=6, fig.width=6}
Bootstrap = fm_var(predicate = eusilc$eq_income, weight = eusilc$DB090, fm = "verma", breakdown = eusilc$db040, type = "bootstrap_naive", HCR = hcr, alpha = alpha, verbose = FALSE)
plot(Bootstrap)
Bootstrap = fm_var(predicate = eusilc$eq_income, weight = eusilc$DB090, fm = "verma", breakdown = eusilc$db040, type = "jackknife", HCR = hcr, alpha = alpha, stratum = eusilc$stratum, psu = eusilc$psu, verbose = F)
```
### Fuzzy supplementary
```{r eval=FALSE}
variance = fs_var(data = eusilc[,4:23], weight = eusilc$DB090, ID = NULL, dimensions = dimensions, breakdown = NULL, HCR = 0.12, alpha = 2, rho = NULL, type = 'bootstrap', M = NULL, R = 2, verbose = F)
summary(variance)
```
```{r, echo=FALSE, fig.height=6, fig.width=6}
Bootstrap = fs_var(data = eusilc[,4:23], weight = eusilc$DB090, ID = NULL,
dimensions = dimensions, breakdown = eusilc$db040, HCR = .12,
alpha = 2, rho = NULL, type = 'bootstrap_naive', M = NULL, R = 10, verbose = F)
plot(Bootstrap)
```
The following uses the Jackknife
```{r eval=FALSE}
fs_var(data = eusilc[,4:23], weight = eusilc$DB090, ID = NULL, dimensions = dimensions,
stratum = eusilc$stratum, psu = eusilc$psu, verbose = F, f = .01,
breakdown = eusilc$db040, alpha = 3, rho = NULL, type = "jackknife")%>%summary()
```