Title: | Estimation of Indicators on Social Exclusion and Poverty |
---|---|
Description: | Estimation of indicators on social exclusion and poverty, as well as Pareto tail modeling for empirical income distributions. |
Authors: | Andreas Alfons [aut, cre] , Josef Holzer [aut], Matthias Templ [aut], Alexander Haider [ctb] |
Maintainer: | Andreas Alfons <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.5.4 |
Built: | 2025-02-01 05:13:46 UTC |
Source: | https://github.com/aalfons/laeken |
Estimation of indicators on social exclusion and poverty, as well as Pareto tail modeling for empirical income distributions.
The DESCRIPTION file:
Package: | laeken |
Type: | Package |
Title: | Estimation of Indicators on Social Exclusion and Poverty |
Version: | 0.5.4 |
Date: | 2024-02-05 |
Depends: | R (>= 3.2.0) |
Imports: | boot, MASS |
Description: | Estimation of indicators on social exclusion and poverty, as well as Pareto tail modeling for empirical income distributions. |
License: | GPL (>= 2) |
Authors@R: | c(person("Andreas", "Alfons", email = "[email protected]", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-2513-3788")), person("Josef", "Holzer", role = "aut"), person("Matthias", "Templ", role = "aut"), person("Alexander", "Haider", role = "ctb")) |
Author: | Andreas Alfons [aut, cre] (<https://orcid.org/0000-0002-2513-3788>), Josef Holzer [aut], Matthias Templ [aut], Alexander Haider [ctb] |
Maintainer: | Andreas Alfons <[email protected]> |
URL: | https://github.com/aalfons/laeken |
BugReports: | https://github.com/aalfons/laeken/issues |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Repository: | https://aalfons.r-universe.dev |
RemoteUrl: | https://github.com/aalfons/laeken |
RemoteRef: | HEAD |
RemoteSha: | b96407d7e2e10c0db2ae45d94cae75e81a401a42 |
Index of help topics:
arpr At-risk-of-poverty rate arpt At-risk-of-poverty threshold bootVar Bootstrap variance and confidence intervals of indicators on social exclusion and poverty calibVars Construct a matrix of binary variables for calibration calibWeights Calibrate sample weights eqInc Equivalized disposable income eqSS Equivalized household size eusilc Synthetic EU-SILC survey data fitPareto Fit income distribution models with the Pareto distribution gini Gini coefficient gpg Gender pay (wage) gap. incMean Weighted mean income incMedian Weighted median income incQuintile Weighted income quintile laeken-package Estimation of Indicators on Social Exclusion and Poverty meanExcessPlot Mean excess plot minAMSE Weighted asymptotic mean squared error (AMSE) estimator paretoQPlot Pareto quantile plot paretoScale Estimate the scale parameter of a Pareto distribution paretoTail Pareto tail modeling for income distributions plot.paretoTail Diagnostic plot for the Pareto tail model prop Proportion of an alternative distribution qsr Quintile share ratio replaceTail Replace observations under a Pareto model reweightOut Reweight outliers in the Pareto model rmpg Relative median at-risk-of-poverty gap ses Synthetic SES survey data shrinkOut Shrink outliers in the Pareto model thetaHill Hill estimator thetaISE Integrated squared error (ISE) estimator thetaLS Least squares (LS) estimator thetaMoment Moment estimator thetaPDC Partial density component (PDC) estimator thetaQQ QQ-estimator thetaTM Trimmed mean estimator thetaWML Weighted maximum likelihood estimator utils Utility functions for indicators on social exclusion and poverty variance Variance and confidence intervals of indicators on social exclusion and poverty weightedMean Weighted mean weightedMedian Weighted median weightedQuantile Weighted quantiles
Andreas Alfons [aut, cre] (<https://orcid.org/0000-0002-2513-3788>), Josef Holzer [aut], Matthias Templ [aut], Alexander Haider [ctb]
Maintainer: Andreas Alfons <[email protected]>
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
A. Alfons, M. Templ, P. Filzmoser (2013) Robust estimation of economic indicators from survey samples based on Pareto tail modeling. Journal of the Royal Statistical Society, Series C, 62(2), 271–286.
Estimate the at-risk-of-poverty rate, which is defined as the proportion of persons with equivalized disposable income below the at-risk-of-poverty threshold.
arpr( inc, weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, p = 0.6, var = NULL, alpha = 0.05, threshold = NULL, na.rm = FALSE, ... )
arpr( inc, weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, p = 0.6, var = NULL, alpha = 0.05, threshold = NULL, na.rm = FALSE, ... )
inc |
either a numeric vector giving the equivalized disposable income,
or (if |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
sort |
optional; either a numeric vector giving the personal IDs to be
used as tie-breakers for sorting, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
breakdown |
optional; either a numeric vector giving different domains,
or (if |
design |
optional and only used if |
cluster |
optional and only used if |
data |
an optional |
p |
a numeric vector of values in |
var |
a character string specifying the type of variance estimation to
be used, or |
alpha |
numeric; if |
threshold |
if 'NULL', the at-risk-at-poverty threshold is estimated from the data. |
na.rm |
a logical indicating whether missing values should be removed. |
... |
if |
The implementation strictly follows the Eurostat definition.
A list of class "arpr"
(which inherits from the class
"indicator"
) with the following components:
value |
a numeric vector containing the overall value(s). |
valueByStratum |
a |
varMethod |
a character string specifying the type of variance
estimation used, or |
var |
a numeric vector containing the variance estimate(s), or
|
varByStratum |
a |
ci |
a numeric vector or matrix containing the lower and upper
endpoints of the confidence interval(s), or |
ciByStratum |
a |
alpha |
a numeric value giving the significance level used for
computing the confidence interval(s) (i.e., the confidence level is |
years |
a numeric vector containing the different years of the survey. |
strata |
a character vector containing the different domains of the breakdown. |
p |
a numeric giving the percentage of the weighted median used for the at-risk-of-poverty threshold. |
threshold |
a numeric vector containing the at-risk-of-poverty threshold(s). |
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat, Luxembourg.
data(eusilc) # overall value arpr("eqIncome", weights = "rb050", data = eusilc) # values by region arpr("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc)
data(eusilc) # overall value arpr("eqIncome", weights = "rb050", data = eusilc) # values by region arpr("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc)
Estimate the at-risk-of-poverty threshold. The standard definition is to use 60% of the weighted median equivalized disposable income.
arpt( inc, weights = NULL, sort = NULL, years = NULL, data = NULL, p = 0.6, na.rm = FALSE )
arpt( inc, weights = NULL, sort = NULL, years = NULL, data = NULL, p = 0.6, na.rm = FALSE )
inc |
either a numeric vector giving the equivalized disposable income,
or (if |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
sort |
optional; either a numeric vector giving the personal IDs to be
used as tie-breakers for sorting, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
data |
an optional |
p |
a numeric vector of values in |
na.rm |
a logical indicating whether missing values should be removed. |
The implementation strictly follows the Eurostat definition.
A numeric vector containing the value(s) of the at-risk-of-poverty threshold is returned.
Andreas Alfons
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
arpr
, incMedian
,
weightedMedian
data(eusilc) arpt("eqIncome", weights = "rb050", data = eusilc)
data(eusilc) arpt("eqIncome", weights = "rb050", data = eusilc)
Compute variance and confidence interval estimates of indicators on social exclusion and poverty based on bootstrap resampling.
bootVar( inc, weights = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, indicator, R = 100, bootType = c("calibrate", "naive"), X, totals = NULL, ciType = c("perc", "norm", "basic"), alpha = 0.05, seed = NULL, na.rm = FALSE, gender = NULL, method = NULL, ... )
bootVar( inc, weights = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, indicator, R = 100, bootType = c("calibrate", "naive"), X, totals = NULL, ciType = c("perc", "norm", "basic"), alpha = 0.05, seed = NULL, na.rm = FALSE, gender = NULL, method = NULL, ... )
inc |
either a numeric vector giving the equivalized disposable income,
or (if |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
breakdown |
optional; either a numeric vector giving different domains,
or (if |
design |
optional; either an integer vector or factor giving different
strata for stratified sampling designs, or (if |
cluster |
optional; either an integer vector or factor giving different
clusters for cluster sampling designs, or (if |
data |
an optional |
indicator |
an object inheriting from the class |
R |
a numeric value giving the number of bootstrap replicates. |
bootType |
a character string specifying the type of bootstap to be
performed. Possible values are |
X |
if |
totals |
numeric; if |
ciType |
a character string specifying the type of confidence
interval(s) to be computed. Possible values are |
alpha |
a numeric value giving the significance level to be used for
computing the confidence interval(s) (i.e., the confidence level is |
seed |
optional; an integer value to be used as the seed of the random number generator, or an integer vector containing the state of the random number generator to be restored. |
na.rm |
a logical indicating whether missing values should be removed. |
gender |
either a numeric vector giving the gender, or (if |
method |
a character string specifying the method to be used (only for
|
... |
if |
An object of the same class as indicator
is returned. See
arpr
, qsr
, rmpg
or
gini
for details on the components.
This function gives reasonable variance estimates for basic sample designs such as simple random sampling or stratified simple random sampling.
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
variance
, calibWeights
,
arpr
, qsr
, rmpg
, gini
data(eusilc) a <- arpr("eqIncome", weights = "rb050", data = eusilc) ## naive bootstrap bootVar("eqIncome", weights = "rb050", design = "db040", data = eusilc, indicator = a, R = 50, bootType = "naive", seed = 123) ## bootstrap with calibration bootVar("eqIncome", weights = "rb050", design = "db040", data = eusilc, indicator = a, R = 50, X = calibVars(eusilc$db040), seed = 123)
data(eusilc) a <- arpr("eqIncome", weights = "rb050", data = eusilc) ## naive bootstrap bootVar("eqIncome", weights = "rb050", design = "db040", data = eusilc, indicator = a, R = 50, bootType = "naive", seed = 123) ## bootstrap with calibration bootVar("eqIncome", weights = "rb050", design = "db040", data = eusilc, indicator = a, R = 50, X = calibVars(eusilc$db040), seed = 123)
Construct a matrix of binary variables for calibration of sample weights according to known marginal population totals.
calibVars(x)
calibVars(x)
x |
a vector that can be interpreted as factor, or a matrix or
|
A matrix of binary variables that indicate membership to the corresponding factor levels.
Andreas Alfons
data(eusilc) # default method aux <- calibVars(eusilc$rb090) head(aux) # data.frame method aux <- calibVars(eusilc[, c("db040", "rb090")]) head(aux)
data(eusilc) # default method aux <- calibVars(eusilc$rb090) head(aux) # data.frame method aux <- calibVars(eusilc[, c("db040", "rb090")]) head(aux)
Calibrate sample weights according to known marginal population totals. Based on initial sample weights, the so-called g-weights are computed by generalized raking procedures.
calibWeights( X, d, totals, q = NULL, method = c("raking", "linear", "logit"), bounds = c(0, 10), maxit = 500, tol = 1e-06, eps = .Machine$double.eps )
calibWeights( X, d, totals, q = NULL, method = c("raking", "linear", "logit"), bounds = c(0, 10), maxit = 500, tol = 1e-06, eps = .Machine$double.eps )
X |
a matrix of binary calibration variables (see
|
d |
a numeric vector giving the initial sample weights. |
totals |
a numeric vector of population totals corresponding to the
calibration variables in |
q |
a numeric vector of positive values accounting for heteroscedasticity. Small values reduce the variation of the g-weights. |
method |
a character string specifying the calibration method to be
used. Possible values are |
bounds |
a numeric vector of length two giving bounds for the g-weights to be used in the logit method. The first value gives the lower bound (which must be smaller than or equal to 1) and the second value gives the upper bound (which must be larger than or equal to 1). |
maxit |
a numeric value giving the maximum number of iterations. |
tol |
the desired accuracy for the iterative procedure. |
eps |
the desired accuracy for computing the Moore-Penrose generalized
inverse (see |
The final sample weights need to be computed by multiplying the resulting g-weights with the initial sample weights.
A numeric vector containing the g-weights.
This is a faster implementation of parts of calib
from
package sampling
. Note that the default calibration method is
raking and that the truncated linear method is not yet implemented.
Andreas Alfons
Deville, J.-C. and Särndal, C.-E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87(418), 376–382.
Deville, J.-C., Särndal, C.-E. and Sautory, O. (1993) Generalized raking procedures in survey sampling. Journal of the American Statistical Association, 88(423), 1013–1020.
data(eusilc) # construct auxiliary 0/1 variables for genders aux <- calibVars(eusilc$rb090) # population totals totals <- c(3990798, 4191431) # compute g-weights g <- calibWeights(aux, eusilc$rb050, totals) # compute final weights weights <- g * eusilc$rb050 summary(weights)
data(eusilc) # construct auxiliary 0/1 variables for genders aux <- calibVars(eusilc$rb090) # population totals totals <- c(3990798, 4191431) # compute g-weights g <- calibWeights(aux, eusilc$rb050, totals) # compute final weights weights <- g * eusilc$rb050 summary(weights)
Compute the equivalized disposable income from household and personal income variables.
eqInc(hid, hplus, hminus, pplus, pminus, eqSS, year = NULL, data = NULL)
eqInc(hid, hplus, hminus, pplus, pminus, eqSS, year = NULL, data = NULL)
hid |
if |
hplus |
if |
hminus |
if |
pplus |
if |
pminus |
if |
eqSS |
if |
year |
if |
data |
a |
All income components should already be imputed, otherwise NA
s are
simply removed before the calculations.
A numeric vector containing the equivalized disposable income for
every individual in data
.
Andreas Alfons
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
data(eusilc) # compute a simplified version of the equivalized disposable income # (not all income components are available in the synthetic data) hplus <- c("hy040n", "hy050n", "hy070n", "hy080n", "hy090n", "hy110n") hminus <- c("hy130n", "hy145n") pplus <- c("py010n", "py050n", "py090n", "py100n", "py110n", "py120n", "py130n", "py140n") eqIncome <- eqInc("db030", hplus, hminus, pplus, character(), "eqSS", data=eusilc) # combine with household ID and equivalized household size tmp <- cbind(eusilc[, c("db030", "eqSS")], eqIncome) # show the first 8 rows head(tmp, 8)
data(eusilc) # compute a simplified version of the equivalized disposable income # (not all income components are available in the synthetic data) hplus <- c("hy040n", "hy050n", "hy070n", "hy080n", "hy090n", "hy110n") hminus <- c("hy130n", "hy145n") pplus <- c("py010n", "py050n", "py090n", "py100n", "py110n", "py120n", "py130n", "py140n") eqIncome <- eqInc("db030", hplus, hminus, pplus, character(), "eqSS", data=eusilc) # combine with household ID and equivalized household size tmp <- cbind(eusilc[, c("db030", "eqSS")], eqIncome) # show the first 8 rows head(tmp, 8)
Compute the equivalized household size according to the modified OECD scale adopted in 1994.
eqSS(hid, age, year = NULL, data = NULL)
eqSS(hid, age, year = NULL, data = NULL)
hid |
if |
age |
if |
year |
if |
data |
a |
A numeric vector containing the equivalized household size for every
observation in data
.
Andreas Alfons
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
data(eusilc) # calculate equivalized household size eqSS <- eqSS("db030", "age", data=eusilc) # combine with household ID and household size tmp <- cbind(eusilc[, c("db030", "hsize")], eqSS) # show the first 8 rows head(tmp, 8)
data(eusilc) # calculate equivalized household size eqSS <- eqSS("db030", "age", data=eusilc) # combine with household ID and household size tmp <- cbind(eusilc[, c("db030", "hsize")], eqSS) # show the first 8 rows head(tmp, 8)
This data set is synthetically generated from real Austrian EU-SILC (European Union Statistics on Income and Living Conditions) data.
data(eusilc)
data(eusilc)
A data frame with 14827 observations on the following 28 variables.
db030
integer; the household ID.
hsize
integer; the number of persons in the household.
db040
factor; the federal state in which the household is
located (levels Burgenland
, Carinthia
,
Lower Austria
, Salzburg
, Styria
, Tyrol
,
Upper Austria
, Vienna
and Vorarlberg
).
rb030
integer; the personal ID.
age
integer; the person's age.
rb090
factor; the person's gender (levels male
and
female
).
pl030
factor; the person's economic status (levels
1
= working full time, 2
= working part time, 3
=
unemployed, 4
= pupil, student, further training or unpaid work
experience or in compulsory military or community service, 5
= in
retirement or early retirement or has given up business, 6
=
permanently disabled or/and unfit to work or other inactive person,
7
= fulfilling domestic tasks and care responsibilities).
pb220a
factor; the person's citizenship (levels AT
,
EU
and Other
).
py010n
numeric; employee cash or near cash income (net).
py050n
numeric; cash benefits or losses from self-employment (net).
py090n
numeric; unemployment benefits (net).
py100n
numeric; old-age benefits (net).
py110n
numeric; survivor's benefits (net).
py120n
numeric; sickness benefits (net).
py130n
numeric; disability benefits (net).
py140n
numeric; education-related allowances (net).
hy040n
numeric; income from rental of a property or land (net).
hy050n
numeric; family/children related allowances (net).
hy070n
numeric; housing allowances (net).
hy080n
numeric; regular inter-household cash transfer received (net).
hy090n
numeric; interest, dividends, profit from capital investments in unincorporated business (net).
hy110n
numeric; income received by people aged under 16 (net).
hy130n
numeric; regular inter-household cash transfer paid (net).
hy145n
numeric; repayments/receipts for tax adjustment (net).
eqSS
numeric; the equivalized household size according to the modified OECD scale.
eqIncome
numeric; a slightly simplified version of the equivalized household income.
db090
numeric; the household sample weights.
rb050
numeric; the personal sample weights.
The data set consists of 6000 households and is used in the examples of package
laeken
. Note that this is a synthetic data set based on original
EU-SILC survey data.
Only a few of the large number of variables in the original survey are included
in this example data set. The variable names are rather cryptic codes, but
these are the standardized names used by the statistical agencies. Furthermore,
the variables hsize
, age
, eqSS
and eqIncome
are not
included in the standardized format of EU-SILC data, but have been derived from
other variables for convenience. Moreover, some very sparse income components
were not included in the the generation of this synthetic data set. Thus the
equivalized household income is computed from the available income components.
This is a synthetic data set based on Austrian EU-SILC data from 2006. The original sample was provided by Statistics Austria.
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
A. Alfons, M. Templ, P. Filzmoser (2011) Simulation of close-to-reality population data for household surveys with application to EU-SILC. Statistical Methods and Applications, vol 20 (3), 383-407.
Eurostat (2004) Description of target variables: Cross-sectional and longitudinal. EU-SILC 065/04, Eurostat.
data(eusilc) summary(eusilc)
data(eusilc) summary(eusilc)
Fit a Pareto distribution to the upper tail of income data. Since a theoretical distribution is used for the upper tail, this is a semiparametric approach.
fitPareto( x, k = NULL, x0 = NULL, method = "thetaPDC", groups = NULL, w = NULL, ... )
fitPareto( x, k = NULL, x0 = NULL, method = "thetaPDC", groups = NULL, w = NULL, ... )
x |
a numeric vector. |
k |
the number of observations in the upper tail to which the Pareto distribution is fitted. |
x0 |
the threshold (scale parameter) above which the Pareto distribution is fitted. |
method |
either a function or a character string specifying the function
to be used to estimate the shape parameter of the Pareto distibution, such as
|
groups |
an optional vector or factor specifying groups of elements of
|
w |
an optional numeric vector giving sample weights. |
... |
addtional arguments to be passed to the specified method. |
The arguments k
and x0
of course correspond with each other.
If k
is supplied, the threshold x0
is estimated with the largest value in
x
, where is the number of observations.
On the other hand, if the threshold
x0
is supplied, k
is given
by the number of observations in x
larger than x0
. Therefore,
either k
or x0
needs to be supplied. If both are supplied,
only k
is used (mainly for back compatibility).
The function supplied to method
should take a numeric vector (the
observations) as its first argument. If k
is supplied, it will be
passed on (in this case, the function is required to have an argument called
k
). Similarly, if the threshold x0
is supplied, it will be
passed on (in this case, the function is required to have an argument called
x0
). As above, only k
is passed on if both are supplied. If
the function specified by method
can handle sample weights, the
corresponding argument should be called w
. Additional arguments are
passed via the ... argument.
A numeric vector with a Pareto distribution fit to the upper tail.
The arguments x0
for the threshold (scale parameter) of the
Pareto distribution and w
for sample weights were introduced in
version 0.2. This results in slightly different behavior regarding the
function calls to method
compared to prior versions.
Andreas Alfons and Josef Holzer
thetaPDC
, thetaWML
, thetaHill
,
thetaISE
, thetaLS
, thetaMoment
,
thetaQQ
, thetaTM
data(eusilc) ## gini coefficient without Pareto tail modeling gini("eqIncome", weights = "rb050", data = eusilc) ## gini coefficient with Pareto tail modeling # using number of observations in tail eqIncome <- fitPareto(eusilc$eqIncome, k = 175, w = eusilc$db090, groups = eusilc$db030) gini(eqIncome, weights = eusilc$rb050) # using threshold eqIncome <- fitPareto(eusilc$eqIncome, x0 = 44150, w = eusilc$db090, groups = eusilc$db030) gini(eqIncome, weights = eusilc$rb050)
data(eusilc) ## gini coefficient without Pareto tail modeling gini("eqIncome", weights = "rb050", data = eusilc) ## gini coefficient with Pareto tail modeling # using number of observations in tail eqIncome <- fitPareto(eusilc$eqIncome, k = 175, w = eusilc$db090, groups = eusilc$db030) gini(eqIncome, weights = eusilc$rb050) # using threshold eqIncome <- fitPareto(eusilc$eqIncome, x0 = 44150, w = eusilc$db090, groups = eusilc$db030) gini(eqIncome, weights = eusilc$rb050)
Estimate the Gini coefficient, which is a measure for inequality.
gini( inc, weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, var = NULL, alpha = 0.05, na.rm = FALSE, ... )
gini( inc, weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, var = NULL, alpha = 0.05, na.rm = FALSE, ... )
inc |
either a numeric vector giving the equivalized disposable income,
or (if |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
sort |
optional; either a numeric vector giving the personal IDs to be
used as tie-breakers for sorting, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
breakdown |
optional; either a numeric vector giving different domains,
or (if |
design |
optional and only used if |
cluster |
optional and only used if |
data |
an optional |
var |
a character string specifying the type of variance estimation to
be used, or |
alpha |
numeric; if |
na.rm |
a logical indicating whether missing values should be removed. |
... |
if |
The implementation strictly follows the Eurostat definition.
A list of class "gini"
(which inherits from the class
"indicator"
) with the following components:
value |
a numeric vector containing the overall value(s). |
valueByStratum |
a |
varMethod |
a character string specifying the type of variance
estimation used, or |
var |
a numeric vector containing the variance estimate(s), or
|
varByStratum |
a |
ci |
a numeric vector or matrix containing the lower and upper
endpoints of the confidence interval(s), or |
ciByStratum |
a |
alpha |
a numeric value giving the significance level used for
computing the confidence interval(s) (i.e., the confidence level is |
years |
a numeric vector containing the different years of the survey. |
strata |
a character vector containing the different domains of the breakdown. |
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat, Luxembourg.
data(eusilc) # overall value gini("eqIncome", weights = "rb050", data = eusilc) # values by region gini("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc)
data(eusilc) # overall value gini("eqIncome", weights = "rb050", data = eusilc) # values by region gini("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc)
Estimate the gender pay (wage) gap.
gpg( inc, gender = NULL, method = c("mean", "median"), weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, var = NULL, alpha = 0.05, na.rm = FALSE, ... )
gpg( inc, gender = NULL, method = c("mean", "median"), weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, var = NULL, alpha = 0.05, na.rm = FALSE, ... )
inc |
either a numeric vector giving the equivalized disposable income,
or (if |
gender |
either a factor giving the gender, or (if |
method |
a character string specifying the method to be used. Possible
values are |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
sort |
optional; either a numeric vector giving the personal IDs to be
used as tie-breakers for sorting, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
breakdown |
optional; either a numeric vector giving different domains,
or (if |
design |
optional and only used if |
cluster |
optional and only used if |
data |
an optional |
var |
a character string specifying the type of variance estimation to
be used, or |
alpha |
numeric; if |
na.rm |
a logical indicating whether missing values should be removed. |
... |
if |
The implementation strictly follows the Eurostat definition (with default
method "mean"
and alternative method "median"
). If weights are
provided, the weighted mean or weighted median is estimated.
A list of class "gpg"
(which inherits from the class
"indicator"
) with the following components:
value |
a numeric vector containing the overall value(s). |
valueByStratum |
a |
varMethod |
a character string specifying the type of variance
estimation used, or |
var |
a numeric vector containing the variance estimate(s), or
|
varByStratum |
a |
ci |
a numeric vector or matrix containing the lower and upper
endpoints of the confidence interval(s), or |
ciByStratum |
a |
alpha |
a numeric value giving the significance level used for
computing the confidence interv al(s) (i.e., the confidence level is |
years |
a numeric vector containing the different years of the survey. |
strata |
a character vector containing the different domains of the breakdown. |
Matthias Templ and Alexander Haider, using code for breaking down estimation by Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat, Luxembourg.
data(ses) # overall value with mean gpg("earningsHour", gender = "sex", weigths = "weights", data = ses) # overall value with median gpg("earningsHour", gender = "sex", weigths = "weights", data = ses, method = "median") # values by education with mean gpg("earningsHour", gender = "sex", weigths = "weights", breakdown = "education", data = ses) # values by education with median gpg("earningsHour", gender = "sex", weigths = "weights", breakdown = "education", data = ses, method = "median")
data(ses) # overall value with mean gpg("earningsHour", gender = "sex", weigths = "weights", data = ses) # overall value with median gpg("earningsHour", gender = "sex", weigths = "weights", data = ses, method = "median") # values by education with mean gpg("earningsHour", gender = "sex", weigths = "weights", breakdown = "education", data = ses) # values by education with median gpg("earningsHour", gender = "sex", weigths = "weights", breakdown = "education", data = ses, method = "median")
Compute the weighted mean income.
incMean(inc, weights = NULL, years = NULL, data = NULL, na.rm = FALSE)
incMean(inc, weights = NULL, years = NULL, data = NULL, na.rm = FALSE)
inc |
either a numeric vector giving the (equivalized disposable)
income, or (if |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
data |
an optional |
na.rm |
a logical indicating whether missing values should be removed. |
A numeric vector containing the value(s) of the weighted mean income is returned.
Andreas Alfons
data(eusilc) incMean("eqIncome", weights = "rb050", data = eusilc)
data(eusilc) incMean("eqIncome", weights = "rb050", data = eusilc)
Compute the weighted median income.
incMedian( inc, weights = NULL, sort = NULL, years = NULL, data = NULL, na.rm = FALSE )
incMedian( inc, weights = NULL, sort = NULL, years = NULL, data = NULL, na.rm = FALSE )
inc |
either a numeric vector giving the (equivalized disposable)
income, or (if |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
sort |
optional; either a numeric vector giving the personal IDs to be
used as tie-breakers for sorting, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
data |
an optional |
na.rm |
a logical indicating whether missing values should be removed. |
The implementation strictly follows the Eurostat definition.
A numeric vector containing the value(s) of the weighted median income is returned.
Andreas Alfons
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
data(eusilc) incMedian("eqIncome", weights = "rb050", data = eusilc)
data(eusilc) incMedian("eqIncome", weights = "rb050", data = eusilc)
Compute weighted income quintiles.
incQuintile( inc, weights = NULL, sort = NULL, years = NULL, k = c(1, 4), data = NULL, na.rm = FALSE )
incQuintile( inc, weights = NULL, sort = NULL, years = NULL, k = c(1, 4), data = NULL, na.rm = FALSE )
inc |
either a numeric vector giving the (equivalized disposable)
income, or (if |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
sort |
optional; either a numeric vector giving the personal IDs to be
used as tie-breakers for sorting, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
k |
a vector of integers between 0 and 5 specifying the quintiles to be computed (0 gives the minimum, 5 the maximum). |
data |
an optional |
na.rm |
a logical indicating whether missing values should be removed. |
The implementation strictly follows the Eurostat definition.
A numeric vector (if years
is NULL
) or matrix (if
years
is not NULL
) containing the values of the weighted income
quintiles specified by k
are returned.
Andreas Alfons
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
data(eusilc) incQuintile("eqIncome", weights = "rb050", data = eusilc)
data(eusilc) incQuintile("eqIncome", weights = "rb050", data = eusilc)
The Mean Excess plot is a graphical method for detecting the threshold (scale parameter) of a Pareto distribution.
meanExcessPlot( x, w = NULL, probs = NULL, interactive = TRUE, pch = par("pch"), cex = par("cex"), col = par("col"), bg = "transparent", ... )
meanExcessPlot( x, w = NULL, probs = NULL, interactive = TRUE, pch = par("pch"), cex = par("cex"), col = par("col"), bg = "transparent", ... )
x |
a numeric vector. |
w |
an optional numeric vector giving sample weights. |
probs |
an optional numeric vector of probabilities with values in
|
interactive |
a logical indicating whether the threshold (scale parameter) can be selected interactively by clicking on points. Information on the selected threshold is then printed on the console. |
pch , cex , col , bg
|
graphical parameters for the plot symbol of each data
point or quantile (see |
... |
additional arguments to be passed to
|
The corresponding mean excesses are plotted against the values of x
(if supplied, only those specified by probs
). If the tail of the data
follows a Pareto distribution, these observations show a positive linear
trend. The leftmost point of a fitted line can thus be used as an estimate of
the threshold (scale parameter).
The interactive selection of the threshold (scale parameter) is implemented
using identify
. For the usual X11
device, the
selection process is thus terminated by pressing any mouse button other than
the first. For the quartz
device (on Mac OS X systems), the process
is terminated either by a secondary click (usually second mouse button or
Ctrl
-click) or by pressing the ESC
key.
If interactive
is TRUE
, the last selection for the
threshold is returned invisibly as an object of class "paretoScale"
,
which consists of the following components:
x0 |
the selected threshold (scale parameter). |
k |
the number of observations in the tail (i.e., larger than the threshold). |
The functionality to account for sample weights and to select the threshold (scale parameter) interactively was introduced in version 0.2.
Andreas Alfons and Josef Holzer
paretoScale
, paretoTail
,
minAMSE
, paretoQPlot
,
identify
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # with sample weights meanExcessPlot(eusilc$eqIncome, w = eusilc$db090) # without sample weights meanExcessPlot(eusilc$eqIncome)
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # with sample weights meanExcessPlot(eusilc$eqIncome, w = eusilc$db090) # without sample weights meanExcessPlot(eusilc$eqIncome)
Estimate the scale and shape parameters of a Pareto distribution with an iterative procedure based on minimizing the weighted asymptotic mean squared error (AMSE) of the Hill estimator.
minAMSE( x, weight = c("Bernoulli", "JASA"), kmin, kmax, mmax, tol = 0, maxit = 100 ) ## S3 method for class 'minAMSE' print(x, ...)
minAMSE( x, weight = c("Bernoulli", "JASA"), kmin, kmax, mmax, tol = 0, maxit = 100 ) ## S3 method for class 'minAMSE' print(x, ...)
x |
for |
weight |
a character vector specifying the weighting scheme to be used
in the procedure. If |
kmin |
An optional integer giving the lower bound for finding the
optimal number of observations in the tail. It defaults to
|
kmax |
An optional integer giving the upper bound for finding the optimal number of observations in the tail (see “Details”). |
mmax |
An optional integer giving the upper bound for finding the
optimal number of observations for computing the nuisance parameter
|
tol |
an integer giving the desired tolerance level for finding the optimal number of observations in the tail. |
maxit |
a positive integer giving the maximum number of iterations. |
... |
additional arguments to be passed to
|
The weights used in the weighted AMSE depend on a nuisance parameter
. Both the optimal number of observations in the tail and the
nuisance parameter
are estimated iteratively using nonlinear
integer minimization. This is currently done by a brute force algorithm,
hence it is stronly recommended to supply upper bounds
kmax
and
mmax
.
See the references for more details on the iterative algorithm.
An object of class "minAMSE"
with the following components:
kopt |
the optimal number of observations in the tail. |
x0 |
the corresponding threshold. |
theta |
the estimated shape parameter of the Pareto distribution. |
MSEmin |
the minimal MSE. |
rho |
the estimated nuisance parameter. |
k |
the examined range for the number of observations in the tail. |
MSE |
the corresponding MSEs. |
Josef Holzer and Andreas Alfons
Beirlant, J., Vynckier, P. and Teugels, J.L. (1996) Tail index estimation, Pareto quantile plots, and regression diagnostics. Journal of the American Statistical Association, 91(436), 1659–1667.
Beirlant, J., Vynckier, P. and Teugels, J.L. (1996) Excess functions and estimation of the extreme-value index. Bernoulli, 2(4), 293–318.
Dupuis, D.J. and Victoria-Feser, M.-P. (2006) A robust prediction error criterion for Pareto modelling of upper tails. The Canadian Journal of Statistics, 34(4), 639–658.
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken minAMSE(eusilc$eqIncome[!duplicated(eusilc$db030)], kmin = 60, kmax = 150, mmax = 250)
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken minAMSE(eusilc$eqIncome[!duplicated(eusilc$db030)], kmin = 60, kmax = 150, mmax = 250)
The Pareto quantile plot is a graphical method for inspecting the parameters of a Pareto distribution.
paretoQPlot( x, w = NULL, xlab = NULL, ylab = NULL, interactive = TRUE, x0 = NULL, theta = NULL, pch = par("pch"), cex = par("cex"), col = par("col"), bg = "transparent", ... )
paretoQPlot( x, w = NULL, xlab = NULL, ylab = NULL, interactive = TRUE, x0 = NULL, theta = NULL, pch = par("pch"), cex = par("cex"), col = par("col"), bg = "transparent", ... )
x |
a numeric vector. |
w |
an optional numeric vector giving sample weights. |
xlab , ylab
|
axis labels. |
interactive |
a logical indicating whether the threshold (scale parameter) can be selected interactively by clicking on points. Information on the selected threshold is then printed on the console. |
x0 , theta
|
optional; if estimates of the threshold (scale parameter)
and the shape parameter have already been obtained, they can be passed
through the corresponding argument ( |
pch , cex , col , bg
|
graphical parameters for the plot symbol of each data
point (see |
... |
additional arguments to be passed to
|
If the Pareto model holds, there exists a linear relationship between the
lograrithms of the observed values and the quantiles of the standard
exponential distribution, since the logarithm of a Pareto distributed random
variable follows an exponential distribution. Hence the logarithms of the
observed values are plotted against the corresponding theoretical quantiles.
If the tail of the data follows a Pareto distribution, these observations
form almost a straight line. The leftmost point of a fitted line can thus be
used as an estimate of the threshold (scale parameter). The slope of the
fitted line is in turn an estimate of , the
reciprocal of the shape parameter.
The interactive selection of the threshold (scale parameter) is implemented
using identify
. For the usual X11
device, the
selection process is thus terminated by pressing any mouse button other than
the first. For the quartz
device (on Mac OS X systems), the process
is terminated either by a secondary click (usually second mouse button or
Ctrl
-click) or by pressing the ESC
key.
If interactive
is TRUE
, the last selection for the
threshold is returned invisibly as an object of class "paretoScale"
,
which consists of the following components:
x0 |
the selected threshold (scale parameter). |
k |
the number of observations in the tail (i.e., larger than the threshold). |
The functionality to account for sample weights and to select the threshold (scale parameter) interactively was introduced in version 0.2. Also starting with version 0.2, a logarithmic y-axis is now used to display the axis labels in the scale of the original values.
Andreas Alfons and Josef Holzer
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
A. Alfons, M. Templ, P. Filzmoser (2013) Robust estimation of economic indicators from survey samples based on Pareto tail modeling. Journal of the Royal Statistical Society, Series C, 62(2), 271–286.
Beirlant, J., Vynckier, P. and Teugels, J.L. (1996) Tail index estimation, Pareto quantile plots, and regression diagnostics. Journal of the American Statistical Association, 91(436), 1659–1667.
paretoScale
, paretoTail
,
minAMSE
, meanExcessPlot
,
identify
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # with sample weights paretoQPlot(eusilc$eqIncome, w = eusilc$db090) # without sample weights paretoQPlot(eusilc$eqIncome)
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # with sample weights paretoQPlot(eusilc$eqIncome, w = eusilc$db090) # without sample weights paretoQPlot(eusilc$eqIncome)
Estimate the scale parameter of a Pareto distribution, i.e., the threshold for Pareto tail modeling.
paretoScale( x, w = NULL, groups = NULL, method = "VanKerm", center = c("mean", "median"), probs = c(0.97, 0.98), na.rm = FALSE )
paretoScale( x, w = NULL, groups = NULL, method = "VanKerm", center = c("mean", "median"), probs = c(0.97, 0.98), na.rm = FALSE )
x |
a numeric vector. |
w |
an optional numeric vector giving sample weights. |
groups |
an optional vector or factor specifying groups of elements of
|
method |
a character string specifying the estimation method. If
|
center |
a character string specifying the estimation method for the
center of the distribution. Possible values are |
probs |
a numeric vector of length two giving probabilities to be used
for computing weighted quantiles of the distribution. Values should be close
to 1 such that the quantiles correspond to the upper tail. This is used if
|
na.rm |
a logical indicating whether missing values in |
Van Kerm's formula is given by
where
denotes the weighted mean and
denotes weighted quantiles. This
function allows to compute generalizations of Van Kerm's formula, where the
mean can be replaced by the median and different quantiles can be used.
An object of class "paretoScale"
with the following
components:
x0 |
the threshold (scale parameter). |
k |
the number of observations in the tail (i.e., larger than the threshold). |
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
Van Kerm, P. (2007) Extreme incomes and the estimation of poverty and inequality indicators from EU-SILC. IRISS Working Paper Series 2007-01, CEPS/INSTEAD.
minAMSE
, paretoQPlot
,
meanExcessPlot
data(eusilc) paretoScale(eusilc$eqIncome, eusilc$db090, groups = eusilc$db030)
data(eusilc) paretoScale(eusilc$eqIncome, eusilc$db090, groups = eusilc$db030)
Fit a Pareto distribution to the upper tail of income data. Since a theoretical distribution is used for the upper tail, this is a semiparametric approach.
paretoTail( x, k = NULL, x0 = NULL, method = "thetaPDC", groups = NULL, w = NULL, alpha = 0.01, ... )
paretoTail( x, k = NULL, x0 = NULL, method = "thetaPDC", groups = NULL, w = NULL, alpha = 0.01, ... )
x |
a numeric vector. |
k |
the number of observations in the upper tail to which the Pareto distribution is fitted. |
x0 |
the threshold (scale parameter) above which the Pareto distribution is fitted. |
method |
either a function or a character string specifying the function
to be used to estimate the shape parameter of the Pareto distibution, such as
|
groups |
an optional vector or factor specifying groups of elements of
|
w |
an optional numeric vector giving sample weights. |
alpha |
numeric; values above the theoretical |
... |
addtional arguments to be passed to the specified method. |
The arguments k
and x0
of course correspond with each other.
If k
is supplied, the threshold x0
is estimated with the largest value in
x
, where is the number of observations.
On the other hand, if the threshold
x0
is supplied, k
is given
by the number of observations in x
larger than x0
. Therefore,
either k
or x0
needs to be supplied. If both are supplied,
only k
is used.
The function supplied to method
should take a numeric vector (the
observations) as its first argument. If k
is supplied, it will be
passed on (in this case, the function is required to have an argument called
k
). Similarly, if the threshold x0
is supplied, it will be
passed on (in this case, the function is required to have an argument called
x0
). As above, only k
is passed on if both are supplied. If
the function specified by method
can handle sample weights, the
corresponding argument should be called w
. Additional arguments are
passed via the ... argument.
An object of class "paretoTail"
with the following
components:
x |
the supplied numeric vector. |
k |
the number of observations in the upper tail to which the Pareto distribution has been fitted. |
groups |
if supplied, the vector or factor specifying groups of elements. |
w |
if supplied, the numeric vector of sample weights. |
method |
the function used to estimate the shape parameter, or the name of the function. |
x0 |
the scale parameter. |
theta |
the estimated shape parameter. |
tail |
if |
alpha |
the tuning parameter |
out |
if |
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
A. Alfons, M. Templ, P. Filzmoser (2013) Robust estimation of economic indicators from survey samples based on Pareto tail modeling. Journal of the Royal Statistical Society, Series C, 62(2), 271–286.
reweightOut
, shrinkOut
,
replaceOut
, replaceTail
, fitPareto
thetaPDC
, thetaWML
, thetaHill
,
thetaISE
, thetaLS
, thetaMoment
,
thetaQQ
, thetaTM
data(eusilc) ## gini coefficient without Pareto tail modeling gini("eqIncome", weights = "rb050", data = eusilc) ## gini coefficient with Pareto tail modeling # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090, groups = eusilc$db030) # estimate shape parameter fit <- paretoTail(eusilc$eqIncome, k = ts$k, w = eusilc$db090, groups = eusilc$db030) # calibration of outliers w <- reweightOut(fit, calibVars(eusilc$db040)) gini(eusilc$eqIncome, w) # winsorization of outliers eqIncome <- shrinkOut(fit) gini(eqIncome, weights = eusilc$rb050) # replacement of outliers eqIncome <- replaceOut(fit) gini(eqIncome, weights = eusilc$rb050) # replacement of whole tail eqIncome <- replaceTail(fit) gini(eqIncome, weights = eusilc$rb050)
data(eusilc) ## gini coefficient without Pareto tail modeling gini("eqIncome", weights = "rb050", data = eusilc) ## gini coefficient with Pareto tail modeling # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090, groups = eusilc$db030) # estimate shape parameter fit <- paretoTail(eusilc$eqIncome, k = ts$k, w = eusilc$db090, groups = eusilc$db030) # calibration of outliers w <- reweightOut(fit, calibVars(eusilc$db040)) gini(eusilc$eqIncome, w) # winsorization of outliers eqIncome <- shrinkOut(fit) gini(eqIncome, weights = eusilc$rb050) # replacement of outliers eqIncome <- replaceOut(fit) gini(eqIncome, weights = eusilc$rb050) # replacement of whole tail eqIncome <- replaceTail(fit) gini(eqIncome, weights = eusilc$rb050)
Produce a diagnostic Pareto quantile plot for evaluating the fitted Pareto distribution. Reference lines indicating the estimates of the threshold (scale parameter) and the shape parameter are added to the plot, and any detected outliers are highlighted.
## S3 method for class 'paretoTail' plot( x, pch = c(1, 3), cex = 1, col = c("black", "red"), bg = "transparent", ... )
## S3 method for class 'paretoTail' plot( x, pch = c(1, 3), cex = 1, col = c("black", "red"), bg = "transparent", ... )
x |
an object of class |
pch , cex , col , bg
|
graphical parameters. Each can be a vector of length two, with the first and second element giving the graphical parameter for the good data points and the outliers, respectively. |
... |
additional arguments to be passed to
|
While the first horizontal line indicates the estimated threshold (scale parameter), the estimated shape parameter is indicated by a line whose slope is given by the reciprocal of the estimate. In addition, the second horizontal line represents the theoretical quantile of the fitted distribution that is used for outlier detection. Thus all values above that line are the detected outliers.
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
data(eusilc) # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090, groups = eusilc$db030) # estimate shape parameter fit <- paretoTail(eusilc$eqIncome, k = ts$k, w = eusilc$db090, groups = eusilc$db030) # produce plot plot(fit)
data(eusilc) # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090, groups = eusilc$db030) # estimate shape parameter fit <- paretoTail(eusilc$eqIncome, k = ts$k, w = eusilc$db090, groups = eusilc$db030) # produce plot plot(fit)
Estimate the proportion of an alternative distribution.
prop( bin, weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, var = NULL, alpha = 0.05, na.rm = FALSE, ... )
prop( bin, weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, var = NULL, alpha = 0.05, na.rm = FALSE, ... )
bin |
either a factor vector giving the values,
or (if |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
sort |
optional; either a numeric vector giving the personal IDs to be
used as tie-breakers for sorting, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
breakdown |
optional; either a numeric vector giving different domains,
or (if |
design |
optional and only used if |
cluster |
optional and only used if |
data |
an optional |
var |
a character string specifying the type of variance estimation to
be used, or |
alpha |
numeric; if |
na.rm |
a logical indicating whether missing values should be removed. |
... |
if |
If weights are provided, the weighted proportion is estimated.
A list of class "prop"
(which inherits from the class
"indicator"
) with the following components:
value |
a numeric vector containing the overall value(s). |
valueByStratum |
a |
varMethod |
a character string specifying the type of variance
estimation used, or |
var |
a numeric vector containing the variance estimate(s), or
|
varByStratum |
a |
ci |
a numeric vector or matrix containing the lower and upper
endpoints of the confidence interval(s), or |
ciByStratum |
a |
alpha |
a numeric value giving the significance level used for
computing the confidence interval(s) (i.e., the confidence level is |
years |
a numeric vector containing the different years of the survey. |
strata |
a character vector containing the different domains of the breakdown. |
Matthias Templ, using code for breaking down estimation by Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat, Luxembourg.
data(eusilc) # overall value prop("rb090", weights = "rb050", data = eusilc) # values by region p1 <- prop("rb090", weights = "rb050", breakdown = "db040", cluster = "db030", data = eusilc) p1 ## Not run: variance("rb090", weights = "rb050", breakdown = "db040", data = eusilc, indicator=p1, cluster="db030", X = calibVars(eusilc$db040)) ## End(Not run) eusilc$agecut <- cut(eusilc$age, 2) p1 <- prop("agecut", weights = "rb050", breakdown = "db040", cluster="db030", data = eusilc) p1 ## Not run: variance("agecut", weights = "rb050", breakdown = "db040", data = eusilc, indicator=p1, X = calibVars(eusilc$db040), cluster="db030") ## End(Not run) eusilc$eqIncomeCat <- factor(ifelse(eusilc$eqIncome < quantile(eusilc$eqIncome,0.2), "one", "two")) p1 <- prop("eqIncomeCat", weights = "rb050", breakdown = "db040", data = eusilc, cluster="db030") p1 ## Not run: variance("eqIncomeCat", weights = "rb050", breakdown = "db040", data = eusilc, indicator=p1, X = calibVars(eusilc$db040), cluster="db030") ## End(Not run)
data(eusilc) # overall value prop("rb090", weights = "rb050", data = eusilc) # values by region p1 <- prop("rb090", weights = "rb050", breakdown = "db040", cluster = "db030", data = eusilc) p1 ## Not run: variance("rb090", weights = "rb050", breakdown = "db040", data = eusilc, indicator=p1, cluster="db030", X = calibVars(eusilc$db040)) ## End(Not run) eusilc$agecut <- cut(eusilc$age, 2) p1 <- prop("agecut", weights = "rb050", breakdown = "db040", cluster="db030", data = eusilc) p1 ## Not run: variance("agecut", weights = "rb050", breakdown = "db040", data = eusilc, indicator=p1, X = calibVars(eusilc$db040), cluster="db030") ## End(Not run) eusilc$eqIncomeCat <- factor(ifelse(eusilc$eqIncome < quantile(eusilc$eqIncome,0.2), "one", "two")) p1 <- prop("eqIncomeCat", weights = "rb050", breakdown = "db040", data = eusilc, cluster="db030") p1 ## Not run: variance("eqIncomeCat", weights = "rb050", breakdown = "db040", data = eusilc, indicator=p1, X = calibVars(eusilc$db040), cluster="db030") ## End(Not run)
Estimate the quintile share ratio, which is defined as the ratio of the sum of equivalized disposable income received by the top 20% to the sum of equivalized disposable income received by the bottom 20%.
qsr( inc, weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, var = NULL, alpha = 0.05, na.rm = FALSE, ... )
qsr( inc, weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, var = NULL, alpha = 0.05, na.rm = FALSE, ... )
inc |
either a numeric vector giving the equivalized disposable income,
or (if |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
sort |
optional; either a numeric vector giving the personal IDs to be
used as tie-breakers for sorting, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
breakdown |
optional; either a numeric vector giving different domains,
or (if |
design |
optional and only used if |
cluster |
optional and only used if |
data |
an optional |
var |
a character string specifying the type of variance estimation to
be used, or |
alpha |
numeric; if |
na.rm |
a logical indicating whether missing values should be removed. |
... |
if |
The implementation strictly follows the Eurostat definition.
A list of class "qsr"
(which inherits from the class
"indicator"
) with the following components:
value |
a numeric vector containing the overall value(s). |
valueByStratum |
a |
varMethod |
a character string specifying the type of variance
estimation used, or |
var |
a numeric vector containing the variance estimate(s), or
|
varByStratum |
a |
ci |
a numeric vector or matrix containing the lower and upper
endpoints of the confidence interval(s), or |
ciByStratum |
a |
alpha |
a numeric value giving the significance level used for
computing the confidence interval(s) (i.e., the confidence level is |
years |
a numeric vector containing the different years of the survey. |
strata |
a character vector containing the different domains of the breakdown. |
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat, Luxembourg.
data(eusilc) # overall value qsr("eqIncome", weights = "rb050", data = eusilc) # values by region qsr("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc)
data(eusilc) # overall value qsr("eqIncome", weights = "rb050", data = eusilc) # values by region qsr("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc)
Replace observations under a Pareto model for the upper tail with values drawn from the fitted distribution.
replaceTail(x, ...) ## S3 method for class 'paretoTail' replaceTail(x, all = TRUE, ...) replaceOut(x, ...)
replaceTail(x, ...) ## S3 method for class 'paretoTail' replaceTail(x, all = TRUE, ...) replaceOut(x, ...)
x |
an object of class |
... |
additional arguments to be passed down. |
all |
a logical indicating whether all observations in the upper tail should be replaced or only those flagged as outliers. |
replaceOut(x, ...{})
is a simple wrapper for replaceTail(x,
all = FALSE, ...{})
.
A numeric vector consisting mostly of the original values, but with observations in the upper tail replaced with values from the fitted Pareto distribution.
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
A. Alfons, M. Templ, P. Filzmoser (2013) Robust estimation of economic indicators from survey samples based on Pareto tail modeling. Journal of the Royal Statistical Society, Series C, 62(2), 271–286.
paretoTail
, reweightOut
,
shrinkOut
data(eusilc) ## gini coefficient without Pareto tail modeling gini("eqIncome", weights = "rb050", data = eusilc) ## gini coefficient with Pareto tail modeling # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090, groups = eusilc$db030) # estimate shape parameter fit <- paretoTail(eusilc$eqIncome, k = ts$k, w = eusilc$db090, groups = eusilc$db030) # replacement of outliers eqIncome <- replaceOut(fit) gini(eqIncome, weights = eusilc$rb050) # replacement of whole tail eqIncome <- replaceTail(fit) gini(eqIncome, weights = eusilc$rb050)
data(eusilc) ## gini coefficient without Pareto tail modeling gini("eqIncome", weights = "rb050", data = eusilc) ## gini coefficient with Pareto tail modeling # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090, groups = eusilc$db030) # estimate shape parameter fit <- paretoTail(eusilc$eqIncome, k = ts$k, w = eusilc$db090, groups = eusilc$db030) # replacement of outliers eqIncome <- replaceOut(fit) gini(eqIncome, weights = eusilc$rb050) # replacement of whole tail eqIncome <- replaceTail(fit) gini(eqIncome, weights = eusilc$rb050)
Reweight observations that are flagged as outliers in a Pareto model for the upper tail of the distribution.
reweightOut(x, ...) ## S3 method for class 'paretoTail' reweightOut(x, X, w = NULL, ...)
reweightOut(x, ...) ## S3 method for class 'paretoTail' reweightOut(x, X, w = NULL, ...)
x |
an object of class |
... |
additional arguments to be passed down. |
X |
a matrix of binary calibration variables (see
|
w |
a numeric vector of sample weights. This is only used if |
If the data contain sample weights, the weights of the outlying observations
are set to and the weights of the remaining observations are
calibrated according to auxiliary variables. Otherwise, weight
is
assigned to outliers and weight
to other observations.
If the data contain sample weights, a numeric containing the
recalibrated weights is returned, otherwise a numeric vector assigning weight
to outliers and weight
to other observations.
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
A. Alfons, M. Templ, P. Filzmoser (2013) Robust estimation of economic indicators from survey samples based on Pareto tail modeling. Journal of the Royal Statistical Society, Series C, 62(2), 271–286.
paretoTail
, shrinkOut
,
replaceOut
, replaceTail
data(eusilc) ## gini coefficient without Pareto tail modeling gini("eqIncome", weights = "rb050", data = eusilc) ## gini coefficient with Pareto tail modeling # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090, groups = eusilc$db030) # estimate shape parameter fit <- paretoTail(eusilc$eqIncome, k = ts$k, w = eusilc$db090, groups = eusilc$db030) # calibration of outliers w <- reweightOut(fit, calibVars(eusilc$db040)) gini(eusilc$eqIncome, w)
data(eusilc) ## gini coefficient without Pareto tail modeling gini("eqIncome", weights = "rb050", data = eusilc) ## gini coefficient with Pareto tail modeling # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090, groups = eusilc$db030) # estimate shape parameter fit <- paretoTail(eusilc$eqIncome, k = ts$k, w = eusilc$db090, groups = eusilc$db030) # calibration of outliers w <- reweightOut(fit, calibVars(eusilc$db040)) gini(eusilc$eqIncome, w)
Estimate the relative median at-risk-of-poverty gap, which is defined as the relative difference between the median equivalized disposable income of persons below the at-risk-of-poverty threshold and the at-risk-of-poverty threshold itself (expressed as a percentage of the at-risk-of-poverty threshold).
rmpg( inc, weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, var = NULL, alpha = 0.05, na.rm = FALSE, ... )
rmpg( inc, weights = NULL, sort = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, var = NULL, alpha = 0.05, na.rm = FALSE, ... )
inc |
either a numeric vector giving the equivalized disposable income,
or (if |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
sort |
optional; either a numeric vector giving the personal IDs to be
used as tie-breakers for sorting, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
breakdown |
optional; either a numeric vector giving different domains,
or (if |
design |
optional and only used if |
cluster |
optional and only used if |
data |
an optional |
var |
a character string specifying the type of variance estimation to
be used, or |
alpha |
numeric; if |
na.rm |
a logical indicating whether missing values should be removed. |
... |
if |
The implementation strictly follows the Eurostat definition.
A list of class "rmpg"
(which inherits from the class
"indicator"
) with the following components:
value |
a numeric vector containing the overall value(s). |
valueByStratum |
a |
varMethod |
a character string specifying the type of variance
estimation used, or |
var |
a numeric vector containing the variance estimate(s), or
|
varByStratum |
a |
ci |
a numeric vector or matrix containing the lower and upper
endpoints of the confidence interval(s), or |
ciByStratum |
a |
alpha |
a numeric value giving the significance level used for
computing the confidence interval(s) (i.e., the confidence level is |
years |
a numeric vector containing the different years of the survey. |
strata |
a character vector containing the different domains of the breakdown. |
threshold |
a numeric vector containing the at-risk-of-poverty threshold(s). |
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat, Luxembourg.
data(eusilc) # overall value rmpg("eqIncome", weights = "rb050", data = eusilc) # values by region rmpg("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc)
data(eusilc) # overall value rmpg("eqIncome", weights = "rb050", data = eusilc) # values by region rmpg("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc)
This data set is a subset of synthetically generated real Austrian SES (Structural Earnings Survey) data.
data(ses)
data(ses)
A data frame with 115691 observations on the following 28 variables.
location
geographical location with levels AT1
(eastern Austria), AT2
(southern Austria), and AT3
(western Austria).
NACE1
economic branch given in NACE (C - O) 1-digit classification.
size
employment size range in 5 categories.
economicFinanc
form of economic and financial control (levels
A
= public and financial control, B
= private control).
payAgreement
collective bargaining agreement with levels
A
= national level pay agreement or interconfederal agreement,
B
= industry agreement,
C
= agreement of individual industries in individual regions,
D
= enterprise or single employer agreement,
E
= agreement applying only to workers in the local unit,
F
= any other type of agreement,
N
= no collective agreement exists
IDunit
ID for place of employment.
sex
gender with levels female
and male
.
age
age in age classes.
education
highest education.
occupation
occupation with levels 11
= Legislators and
seniors officials,
12
= Corporate managers,
13
= Managers of small enterprises,
21
= Physical, mathematical and engineering science professionals,
22
= Life science and health professionals,
23
= Teaching professionals,
24
= Other professionals,
31
= Physical and engineering science associate professionals,
32
= Life science and health associate professionals,
33
= Teaching associate professionals,
34
= Other associate professionals,
41
= Office clerks,
42
= Customer services clerks,
51
= Personal and protective services workers,
52
= Models, salespersons and demonstrators,
61
= Skilled agricultural and fishery workers,
71
= Extraction and building trades workers,
72
= Metal, machinery and related trades workers,
73
= Precision, handicraft, craft printing and related trades workers,
74
= Other craft and related trades workers,
81
= Stationary plant and related operators,
82
= Machine operators and assemblers,
83
= Drivers and mobile plant operators,
91
= Sales and services elementary occupations,
92
= Agricultural, fishery and related labourers,
93
= Labourers in mining, construction, manufacturing and transport
contract
type of contract. Levels A
= indefinite
duration, employment contract, B
= temporary fixed duration
C
= apprentice.
fullPart
full-time working time (FT) or part-time employee (PT).
lengthService
The total length of service in the enterprises in the reference month is be based on the number of completed years of service.
weeks
the number of weeks in the reference year to which the gross annual earnings relate is mentioned. That is the employee's working time actually paid during the year and should correspond to the actual gross annual earnings.
hoursPaid
the number of hours paid in the reference month which means these hours actually paid including all normal and overtime hours worked and remunerated by the employee during the month.
overtimeHours
the number of overtime hours paid in the reference month. Overtime hours are those worked in addition to those of the normal working month.
shareNormalHours
the share of a full timer's normal hours. The hours contractually worked of a part-time employee are expressed as percentages of the number of normal hours worked by a full-time employee in the local unit.
holiday
the annual days of holiday leave (in full days).
notPaid
examples of annual bonuses and allowances are Christmas and holiday bonuses, 13th and 14th month payments and productivity bonuses, hence any periodic, irregular and exceptional bonuses and other payments that do not feature every pay period. Besides the main difference between annual earnings and monthly earnings is the inclusion of payments that do not regularly occur in each pay period.
earningsOvertime
earnings related to overtime.
paymentsShiftWork
These special payments for shift work are premium payments during the reference month for shirt work, night work or weekend work where they are not treated as overtime.
earningsMonth
the gross earnings in the reference month covers remuneration in cash paid during the reference month before any tax deductions and social security deductions and social security contributions payable by wage earners and retained by the employer.
earnings
gross annual earnings in the reference year.
earningsHour
hourly earnings, being the quotient of monthly earnings and the number of hours paid in the reference month.
weightsEmployers
sampling weights in the first stage at employer level.
weightsEmployees
sampling weights corresponding to the second stage at employee level.
weights
the final sampling weights, which is the product of
weightsEmployers
and weighsEmployees
.
The Structural Earnings Survey (SES) is conducted in almost all European Countries, and the most important figures are reported to Eurostat. SES is a complex survey of enterprises and establishments with more than 10 employees, NACE C-O, including a large sample of employees. In many countries, a two-stage design is used where in the first stage a stratified sample of enterprises and establishments on NACE 1-digit level, NUTS 1 and employment size range is used, and large enterprises have higher inclusion probabilities. In stage 2, systematic sampling is applied in each enterprise using unequal inclusion probabilities regarding employment size range categories.
The data set in the package consists of enterprise and employees data from 500 places of work. Note that this is a subset of synthetic data set that is simulated from the original Austrian SES data.
Matthias Templ, Karoline Geissler
This is a synthetic data set based on Austrian SES data from 2006.
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
T. Geissberger (2009) Verdienststrukturerhebung 2006, Struktur und Verteilung der Verdienste in Oesterreich, Statistik Austria, ISBN 978-3-902587-97-8.
M. Templ (2012) Comparison of perturbation methods based on pre-defined quality indicators, UNECE Work Session on Statistical Data Editing, Tarragona, Spain.
data(ses) summary(ses)
data(ses) summary(ses)
Shrink observations that are flagged as outliers in a Pareto model for the upper tail of the distribution to the theoretical quantile used for outlier detection.
shrinkOut(x, ...) ## S3 method for class 'paretoTail' shrinkOut(x, ...)
shrinkOut(x, ...) ## S3 method for class 'paretoTail' shrinkOut(x, ...)
x |
an object of class |
... |
additional arguments to be passed down (currently ignored as there are no additional arguments in the only method implemented). |
A numeric vector consisting mostly of the original values, but with outlying observations in the upper tail shrunken to the corresponding theoretical quantile of the fitted Pareto distribution.
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
paretoTail
, reweightOut
,
replaceOut
, replaceTail
data(eusilc) ## gini coefficient without Pareto tail modeling gini("eqIncome", weights = "rb050", data = eusilc) ## gini coefficient with Pareto tail modeling # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090, groups = eusilc$db030) # estimate shape parameter fit <- paretoTail(eusilc$eqIncome, k = ts$k, w = eusilc$db090, groups = eusilc$db030) # shrink outliers eqIncome <- shrinkOut(fit) gini(eqIncome, weights = eusilc$rb050)
data(eusilc) ## gini coefficient without Pareto tail modeling gini("eqIncome", weights = "rb050", data = eusilc) ## gini coefficient with Pareto tail modeling # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090, groups = eusilc$db030) # estimate shape parameter fit <- paretoTail(eusilc$eqIncome, k = ts$k, w = eusilc$db090, groups = eusilc$db030) # shrink outliers eqIncome <- shrinkOut(fit) gini(eqIncome, weights = eusilc$rb050)
The Hill estimator uses the maximum likelihood principle to estimate the shape parameter of a Pareto distribution.
thetaHill(x, k = NULL, x0 = NULL, w = NULL)
thetaHill(x, k = NULL, x0 = NULL, w = NULL)
x |
a numeric vector. |
k |
the number of observations in the upper tail to which the Pareto distribution is fitted. |
x0 |
the threshold (scale parameter) above which the Pareto distribution is fitted. |
w |
an optional numeric vector giving sample weights. |
The arguments k
and x0
of course correspond with each other.
If k
is supplied, the threshold x0
is estimated with the largest value in
x
, where is the number of observations.
On the other hand, if the threshold
x0
is supplied, k
is given
by the number of observations in x
larger than x0
. Therefore,
either k
or x0
needs to be supplied. If both are supplied,
only k
is used (mainly for back compatibility).
The estimated shape parameter.
The arguments x0
for the threshold (scale parameter) of the
Pareto distribution and w
for sample weights were introduced in
version 0.2.
Andreas Alfons and Josef Holzer
Hill, B.M. (1975) A simple general approach to inference about the tail of a distribution. The Annals of Statistics, 3(5), 1163–1174.
paretoTail
, fitPareto
,
thetaPDC
, thetaWML
, thetaISE
,
minAMSE
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaHill(eusilc$eqIncome, k = ts$k, w = eusilc$db090) # using threshold thetaHill(eusilc$eqIncome, x0 = ts$x0, w = eusilc$db090)
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaHill(eusilc$eqIncome, k = ts$k, w = eusilc$db090) # using threshold thetaHill(eusilc$eqIncome, x0 = ts$x0, w = eusilc$db090)
The integrated squared error (ISE) estimator estimates the shape parameter of a Pareto distribution based on the relative excesses of observations above a certain threshold.
thetaISE(x, k = NULL, x0 = NULL, w = NULL, ...)
thetaISE(x, k = NULL, x0 = NULL, w = NULL, ...)
x |
a numeric vector. |
k |
the number of observations in the upper tail to which the Pareto distribution is fitted. |
x0 |
the threshold (scale parameter) above which the Pareto distribution is fitted. |
w |
an optional numeric vector giving sample weights. |
... |
additional arguments to be passed to
|
The arguments k
and x0
of course correspond with each other.
If k
is supplied, the threshold x0
is estimated with the largest value in
x
, where is the number of observations.
On the other hand, if the threshold
x0
is supplied, k
is given
by the number of observations in x
larger than x0
. Therefore,
either k
or x0
needs to be supplied. If both are supplied,
only k
is used (mainly for back compatibility).
The ISE estimator minimizes the integrated squared error (ISE) criterion with
a complete density model. The minimization is carried out using
nlm
. By default, the starting value is obtained
the Hill estimator (see thetaHill
).
optimize
.
The estimated shape parameter.
The arguments x0
for the threshold (scale parameter) of the
Pareto distribution and w
for sample weights were introduced in
version 0.2.
Andreas Alfons and Josef Holzer
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
A. Alfons, M. Templ, P. Filzmoser (2013) Robust estimation of economic indicators from survey samples based on Pareto tail modeling. Journal of the Royal Statistical Society, Series C, 62(2), 271–286.
Vandewalle, B., Beirlant, J., Christmann, A., and Hubert, M. (2007) A robust estimator for the tail index of Pareto-type distributions. Computational Statistics & Data Analysis, 51(12), 6252–6268.
paretoTail
, fitPareto
,
thetaPDC
, thetaHill
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaISE(eusilc$eqIncome, k = ts$k, w = eusilc$db090) # using threshold thetaISE(eusilc$eqIncome, x0 = ts$x0, w = eusilc$db090)
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaISE(eusilc$eqIncome, k = ts$k, w = eusilc$db090) # using threshold thetaISE(eusilc$eqIncome, x0 = ts$x0, w = eusilc$db090)
Estimate the shape parameter of a Pareto distribution using a least squares (LS) approach.
thetaLS(x, k = NULL, x0 = NULL)
thetaLS(x, k = NULL, x0 = NULL)
x |
a numeric vector. |
k |
the number of observations in the upper tail to which the Pareto distribution is fitted. |
x0 |
the threshold (scale parameter) above which the Pareto distribution is fitted. |
The arguments k
and x0
of course correspond with each other.
If k
is supplied, the threshold x0
is estimated with the largest value in
x
, where is the number of observations.
On the other hand, if the threshold
x0
is supplied, k
is given
by the number of observations in x
larger than x0
. Therefore,
either k
or x0
needs to be supplied. If both are supplied,
only k
is used (mainly for back compatibility).
The estimated shape parameter.
The argument x0
for the threshold (scale parameter) of the
Pareto distribution was introduced in version 0.2.
Andreas Alfons and Josef Holzer
Brazauskas, V. and Serfling, R. (2000) Robust estimation of tail parameters for two-parameter Pareto and exponential models via generalized quantile statistics. Extremes, 3(3), 231–249.
Brazauskas, V. and Serfling, R. (2000) Robust and efficient estimation of the tail index of a single-parameter Pareto distribution. North American Actuarial Journal, 4(4), 12–27.
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaLS(eusilc$eqIncome, k = ts$k) # using threshold thetaLS(eusilc$eqIncome, x0 = ts$x0)
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaLS(eusilc$eqIncome, k = ts$k) # using threshold thetaLS(eusilc$eqIncome, x0 = ts$x0)
Estimate the shape parameter of a Pareto distribution based on moments.
thetaMoment(x, k = NULL, x0 = NULL)
thetaMoment(x, k = NULL, x0 = NULL)
x |
a numeric vector. |
k |
the number of observations in the upper tail to which the Pareto distribution is fitted. |
x0 |
the threshold (scale parameter) above which the Pareto distribution is fitted. |
The arguments k
and x0
of course correspond with each other.
If k
is supplied, the threshold x0
is estimated with the largest value in
x
, where is the number of observations.
On the other hand, if the threshold
x0
is supplied, k
is given
by the number of observations in x
larger than x0
. Therefore,
either k
or x0
needs to be supplied. If both are supplied,
only k
is used (mainly for back compatibility).
The estimated shape parameter.
The argument x0
for the threshold (scale parameter) of the
Pareto distribution was introduced in version 0.2.
Andreas Alfons and Josef Holzer
Dekkers, A.L.M., Einmahl, J.H.J. and de Haan, L. (1989) A moment estimator for the index of an extreme-value distribution. The Annals of Statistics, 17(4), 1833–1855.
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaMoment(eusilc$eqIncome, k = ts$k) # using threshold thetaMoment(eusilc$eqIncome, x0 = ts$x0)
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaMoment(eusilc$eqIncome, k = ts$k) # using threshold thetaMoment(eusilc$eqIncome, x0 = ts$x0)
The partial density component (PDC) estimator estimates the shape parameter of a Pareto distribution based on the relative excesses of observations above a certain threshold.
thetaPDC(x, k = NULL, x0 = NULL, w = NULL, ...)
thetaPDC(x, k = NULL, x0 = NULL, w = NULL, ...)
x |
a numeric vector. |
k |
the number of observations in the upper tail to which the Pareto distribution is fitted. |
x0 |
the threshold (scale parameter) above which the Pareto distribution is fitted. |
w |
an optional numeric vector giving sample weights. |
... |
additional arguments to be passed to
|
The arguments k
and x0
of course correspond with each other.
If k
is supplied, the threshold x0
is estimated with the largest value in
x
, where is the number of observations.
On the other hand, if the threshold
x0
is supplied, k
is given
by the number of observations in x
larger than x0
. Therefore,
either k
or x0
needs to be supplied. If both are supplied,
only k
is used (mainly for back compatibility).
The PDC estimator minimizes the integrated squared error (ISE) criterion with
an incomplete density mixture model. The minimization is carried out using
nlm
. By default, the starting value is obtained with
the Hill estimator (see thetaHill
).
optimize
.
The estimated shape parameter.
The arguments x0
for the threshold (scale parameter) of the
Pareto distribution and w
for sample weights were introduced in
version 0.2.
Andreas Alfons and Josef Holzer
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
A. Alfons, M. Templ, P. Filzmoser (2013) Robust estimation of economic indicators from survey samples based on Pareto tail modeling. Journal of the Royal Statistical Society, Series C, 62(2), 271–286.
Vandewalle, B., Beirlant, J., Christmann, A., and Hubert, M. (2007) A robust estimator for the tail index of Pareto-type distributions. Computational Statistics & Data Analysis, 51(12), 6252–6268.
paretoTail
, fitPareto
,
thetaISE
, thetaHill
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaPDC(eusilc$eqIncome, k = ts$k, w = eusilc$db090) # using threshold thetaPDC(eusilc$eqIncome, x0 = ts$x0, w = eusilc$db090)
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaPDC(eusilc$eqIncome, k = ts$k, w = eusilc$db090) # using threshold thetaPDC(eusilc$eqIncome, x0 = ts$x0, w = eusilc$db090)
Estimate the shape parameter of a Pareto distribution using a quantile-quantile approach.
thetaQQ(x, k = NULL, x0 = NULL)
thetaQQ(x, k = NULL, x0 = NULL)
x |
a numeric vector. |
k |
the number of observations in the upper tail to which the Pareto distribution is fitted. |
x0 |
the threshold (scale parameter) above which the Pareto distribution is fitted. |
The arguments k
and x0
of course correspond with each other.
If k
is supplied, the threshold x0
is estimated with the largest value in
x
, where is the number of observations.
On the other hand, if the threshold
x0
is supplied, k
is given
by the number of observations in x
larger than x0
. Therefore,
either k
or x0
needs to be supplied. If both are supplied,
only k
is used (mainly for back compatibility).
The estimated shape parameter.
The argument x0
for the threshold (scale parameter) of the
Pareto distribution was introduced in version 0.2.
Andreas Alfons and Josef Holzer
Kratz, M.F. and Resnick, S.I. (1996) The QQ-estimator and heavy tails. Stochastic Models, 12(4), 699–724.
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaQQ(eusilc$eqIncome, k = ts$k) # using threshold thetaQQ(eusilc$eqIncome, x0 = ts$x0)
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaQQ(eusilc$eqIncome, k = ts$k) # using threshold thetaQQ(eusilc$eqIncome, x0 = ts$x0)
Estimate the shape parameter of a Pareto distribution using a trimmed mean approach.
thetaTM(x, k = NULL, x0 = NULL, beta = 0.05)
thetaTM(x, k = NULL, x0 = NULL, beta = 0.05)
x |
a numeric vector. |
k |
the number of observations in the upper tail to which the Pareto distribution is fitted. |
x0 |
the threshold (scale parameter) above which the Pareto distribution is fitted. |
beta |
A numeric vector of length two giving the trimming proportions for the lower and upper end of the tail, respectively. If a single numeric value is supplied, it is recycled. |
The arguments k
and x0
of course correspond with each other.
If k
is supplied, the threshold x0
is estimated with the largest value in
x
, where is the number of observations.
On the other hand, if the threshold
x0
is supplied, k
is given
by the number of observations in x
larger than x0
. Therefore,
either k
or x0
needs to be supplied. If both are supplied,
only k
is used (mainly for back compatibility).
The estimated shape parameter.
The argument x0
for the threshold (scale parameter) of the
Pareto distribution was introduced in version 0.2.
Andreas Alfons and Josef Holzer
Brazauskas, V. and Serfling, R. (2000) Robust estimation of tail parameters for two-parameter Pareto and exponential models via generalized quantile statistics. Extremes, 3(3), 231–249.
Brazauskas, V. and Serfling, R. (2000) Robust and efficient estimation of the tail index of a single-parameter Pareto distribution. North American Actuarial Journal, 4(4), 12–27.
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaTM(eusilc$eqIncome, k = ts$k) # using threshold thetaTM(eusilc$eqIncome, x0 = ts$x0)
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaTM(eusilc$eqIncome, k = ts$k) # using threshold thetaTM(eusilc$eqIncome, x0 = ts$x0)
Estimate the shape parameter of a Pareto distribution using a weighted maximum likelihood approach.
thetaWML( x, k = NULL, x0 = NULL, weight = c("residuals", "probability"), const, bias = TRUE, ... )
thetaWML( x, k = NULL, x0 = NULL, weight = c("residuals", "probability"), const, bias = TRUE, ... )
x |
a numeric vector. |
k |
the number of observations in the upper tail to which the Pareto distribution is fitted. |
x0 |
the threshold (scale parameter) above which the Pareto distribution is fitted. |
weight |
a character string specifying the weight function to be used.
If |
const |
Tuning constant(s) that control the robustness of the method.
If |
bias |
a logical indicating whether bias correction should be applied. |
... |
additional arguments to be passed to
|
The arguments k
and x0
of course correspond with each other.
If k
is supplied, the threshold x0
is estimated with the largest value in
x
, where is the number of observations.
On the other hand, if the threshold
x0
is supplied, k
is given
by the number of observations in x
larger than x0
. Therefore,
either k
or x0
needs to be supplied. If both are supplied,
only k
is used (mainly for back compatibility).
The weighted maximum likelihood estimator belongs to the class of
M-estimators. In order to obtain the estimate, the root of a certain
function needs to be found, which is implemented using
uniroot
.
The estimated shape parameter.
The argument x0
for the threshold (scale parameter) of the
Pareto distribution was introduced in version 0.2.
Andreas Alfons and Josef Holzer
Dupuis, D.J. and Morgenthaler, S. (2002) Robust weighted likelihood estimators with an application to bivariate extreme value problems. The Canadian Journal of Statistics, 30(1), 17–36.
Dupuis, D.J. and Victoria-Feser, M.-P. (2006) A robust prediction error criterion for Pareto modelling of upper tails. The Canadian Journal of Statistics, 34(4), 639–658.
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaWML(eusilc$eqIncome, k = ts$k) # using threshold thetaWML(eusilc$eqIncome, x0 = ts$x0)
data(eusilc) # equivalized disposable income is equal for each household # member, therefore only one household member is taken eusilc <- eusilc[!duplicated(eusilc$db030),] # estimate threshold ts <- paretoScale(eusilc$eqIncome, w = eusilc$db090) # using number of observations in tail thetaWML(eusilc$eqIncome, k = ts$k) # using threshold thetaWML(eusilc$eqIncome, x0 = ts$x0)
Test for class, print and take subsets of indicators on social exclusion and poverty.
is.indicator(x) is.arpr(x) is.qsr(x) is.rmpg(x) is.gini(x) is.prop(x) is.gpg(x) ## S3 method for class 'indicator' print(x, ...) ## S3 method for class 'arpr' print(x, ...) ## S3 method for class 'rmpg' print(x, ...) ## S3 method for class 'indicator' subset(x, years = NULL, strata = NULL, ...) ## S3 method for class 'arpr' subset(x, years = NULL, strata = NULL, ...) ## S3 method for class 'rmpg' subset(x, years = NULL, strata = NULL, ...)
is.indicator(x) is.arpr(x) is.qsr(x) is.rmpg(x) is.gini(x) is.prop(x) is.gpg(x) ## S3 method for class 'indicator' print(x, ...) ## S3 method for class 'arpr' print(x, ...) ## S3 method for class 'rmpg' print(x, ...) ## S3 method for class 'indicator' subset(x, years = NULL, strata = NULL, ...) ## S3 method for class 'arpr' subset(x, years = NULL, strata = NULL, ...) ## S3 method for class 'rmpg' subset(x, years = NULL, strata = NULL, ...)
x |
for |
... |
additional arguments to be passed to and from methods. |
years |
an optional numeric vector giving the years to be extracted. |
strata |
an optional vector giving the domains of the breakdown to be extracted. |
is.indicator
returns TRUE
if x
inherits from
class "indicator"
and FALSE
otherwise.
is.arpr
returns TRUE
if x
inherits from class
"arpr"
and FALSE
otherwise.
is.qsr
returns TRUE
if x
inherits from class
"qsr"
and FALSE
otherwise.
is.rmpg
returns TRUE
if x
inherits from class
"rmpg"
and FALSE
otherwise.
is.gini
returns TRUE
if x
inherits from class
"gini"
and FALSE
otherwise.
is.gini
returns TRUE
if x
inherits from class
"gini"
and FALSE
otherwise.
print.indicator
, print.arpr
and print.rmpg
return
x
invisibly.
subset.indicator
, subset.arpr
and subset.rmpg
return a
subset of x
of the same class.
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
data(eusilc) # at-risk-of-poverty rate a <- arpr("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc) print(a) is.arpr(a) is.indicator(a) subset(a, strata = c("Lower Austria", "Vienna")) # quintile share ratio q <- qsr("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc) print(q) is.qsr(q) is.indicator(q) subset(q, strata = c("Lower Austria", "Vienna")) # relative median at-risk-of-poverty gap r <- rmpg("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc) print(r) is.rmpg(r) is.indicator(r) subset(r, strata = c("Lower Austria", "Vienna")) # Gini coefficient g <- gini("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc) print(g) is.gini(g) is.indicator(g) subset(g, strata = c("Lower Austria", "Vienna"))
data(eusilc) # at-risk-of-poverty rate a <- arpr("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc) print(a) is.arpr(a) is.indicator(a) subset(a, strata = c("Lower Austria", "Vienna")) # quintile share ratio q <- qsr("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc) print(q) is.qsr(q) is.indicator(q) subset(q, strata = c("Lower Austria", "Vienna")) # relative median at-risk-of-poverty gap r <- rmpg("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc) print(r) is.rmpg(r) is.indicator(r) subset(r, strata = c("Lower Austria", "Vienna")) # Gini coefficient g <- gini("eqIncome", weights = "rb050", breakdown = "db040", data = eusilc) print(g) is.gini(g) is.indicator(g) subset(g, strata = c("Lower Austria", "Vienna"))
Compute variance and confidence interval estimates of indicators on social exclusion and poverty.
variance( inc, weights = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, indicator, alpha = 0.05, na.rm = FALSE, type = "bootstrap", gender = NULL, method = NULL, ... )
variance( inc, weights = NULL, years = NULL, breakdown = NULL, design = NULL, cluster = NULL, data = NULL, indicator, alpha = 0.05, na.rm = FALSE, type = "bootstrap", gender = NULL, method = NULL, ... )
inc |
either a numeric vector giving the equivalized disposable income,
or (if |
weights |
optional; either a numeric vector giving the personal sample
weights, or (if |
years |
optional; either a numeric vector giving the different years of
the survey, or (if |
breakdown |
optional; either a numeric vector giving different domains,
or (if |
design |
optional; either an integer vector or factor giving different
strata for stratified sampling designs, or (if |
cluster |
optional; either an integer vector or factor giving different
clusters for cluster sampling designs, or (if |
data |
an optional |
indicator |
an object inheriting from the class |
alpha |
a numeric value giving the significance level to be used for
computing the confidence interval(s) (i.e., the confidence level is |
na.rm |
a logical indicating whether missing values should be removed. |
type |
a character string specifying the type of variance estimation to
be used. Currently, only |
gender |
either a numeric vector giving the gender, or (if |
method |
a character string specifying the method to be used (only for
|
... |
additional arguments to be passed to |
This is a wrapper function for computing variance and confidence interval estimates of indicators on social exclusion and poverty.
An object of the same class as indicator
is returned. See
arpr
, qsr
, rmpg
or
gini
for details on the components.
Andreas Alfons
A. Alfons and M. Templ (2013) Estimation of Social Exclusion Indicators from Complex Surveys: The R Package laeken. Journal of Statistical Software, 54(15), 1–25. doi:10.18637/jss.v054.i15
bootVar
, arpr
, qsr
,
rmpg
, gini
data(eusilc) a <- arpr("eqIncome", weights = "rb050", data = eusilc) ## naive bootstrap variance("eqIncome", weights = "rb050", design = "db040", data = eusilc, indicator = a, R = 50, bootType = "naive", seed = 123) ## bootstrap with calibration variance("eqIncome", weights = "rb050", design = "db040", data = eusilc, indicator = a, R = 50, X = calibVars(eusilc$db040), seed = 123)
data(eusilc) a <- arpr("eqIncome", weights = "rb050", data = eusilc) ## naive bootstrap variance("eqIncome", weights = "rb050", design = "db040", data = eusilc, indicator = a, R = 50, bootType = "naive", seed = 123) ## bootstrap with calibration variance("eqIncome", weights = "rb050", design = "db040", data = eusilc, indicator = a, R = 50, X = calibVars(eusilc$db040), seed = 123)
Compute the weighted mean.
weightedMean(x, weights = NULL, na.rm = FALSE)
weightedMean(x, weights = NULL, na.rm = FALSE)
x |
a numeric vector. |
weights |
an optional numeric vector giving the sample weights. |
na.rm |
a logical indicating whether missing values in |
This is a simple wrapper function calling weighted.mean
if sample weights are supplied and mean
otherwise.
The weighted mean of values in x
is returned.
Andreas Alfons
data(eusilc) weightedMean(eusilc$eqIncome, eusilc$rb050)
data(eusilc) weightedMean(eusilc$eqIncome, eusilc$rb050)
Compute the weighted median (Eurostat definition).
weightedMedian(x, weights = NULL, sorted = FALSE, na.rm = FALSE)
weightedMedian(x, weights = NULL, sorted = FALSE, na.rm = FALSE)
x |
a numeric vector. |
weights |
an optional numeric vector giving the sample weights. |
sorted |
a logical indicating whether the observations in |
na.rm |
a logical indicating whether missing values in |
The implementation strictly follows the Eurostat definition.
The weighted median of values in x
is returned.
Andreas Alfons and Matthias Templ
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
arpt
, incMedian
,
weightedQuantile
data(eusilc) weightedMedian(eusilc$eqIncome, eusilc$rb050)
data(eusilc) weightedMedian(eusilc$eqIncome, eusilc$rb050)
Compute weighted quantiles (Eurostat definition).
weightedQuantile( x, weights = NULL, probs = seq(0, 1, 0.25), sorted = FALSE, na.rm = FALSE )
weightedQuantile( x, weights = NULL, probs = seq(0, 1, 0.25), sorted = FALSE, na.rm = FALSE )
x |
a numeric vector. |
weights |
an optional numeric vector giving the sample weights. |
probs |
numeric vector of probabilities with values in |
sorted |
a logical indicating whether the observations in |
na.rm |
a logical indicating whether missing values in |
The implementation strictly follows the Eurostat definition.
A numeric vector containing the weighted quantiles of values in
x
at probabilities probs
is returned. Unlike
quantile
, this returns an unnamed vector.
Andreas Alfons and Matthias Templ
Working group on Statistics on Income and Living Conditions (2004) Common cross-sectional EU indicators based on EU-SILC; the gender pay gap. EU-SILC 131-rev/04, Eurostat.
data(eusilc) weightedQuantile(eusilc$eqIncome, eusilc$rb050)
data(eusilc) weightedQuantile(eusilc$eqIncome, eusilc$rb050)