Dineq Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 23

dineq_change_rb
dineq_rb
gini.wtd
gini_decomp
mex_inc_2008
mex_inc_2016
mld.wtd
mld_change
mld_decomp
ntiles.wtd
polar.wtd
rif
rifr
rifrSE
theil.wtd
Index

Package ‘dineq’

February 24, 2018

Type Package

Title Decomposition of (income) inequality

Version 0.1.0

Date 2018-02-01

Author René Schulenberg

Maintainer René Schulenberg <reneschulenberg@gmail.com>

Description Decomposition of (income) inequality by population sub groups.

For a decomposition on a single variable the mean log deviation can be used.

For a decomposition on multiple variables a regression based technique can be used.

Recentered inﬂuence function regression for marginal effects of the income distribution.

Some extensions to inequality functions to handle weights and/or missings.

Depends R (>= 2.10)

Imports boot (>= 1.3-20), Hmisc (>= 4.0-3)

License GPL-3

Encoding UTF-8

LazyData true

RoxygenNote 6.0.1

NeedsCompilation no

Rtopics documented:

dineq_change_rb ...................................... 2

dineq_rb........................................... 4

gini.wtd ........................................... 5

gini_decomp ........................................ 6

mex_inc_2008........................................ 8

mex_inc_2016........................................ 9

mld.wtd ........................................... 10

mld_change......................................... 11

mld_decomp ........................................ 13

ntiles.wtd .......................................... 14

polar.wtd .......................................... 15

rif .............................................. 16

rifr.............................................. 18

rifrSE ............................................ 19

theil.wtd........................................... 21

2dineq_change_rb

Index 23

dineq_change_rb Decomposition of the change in inequality

Description

Decomposition of the change in (income) inequality into multiple characteristics, divided by a price

and a quantity effect.

Usage

dineq_change_rb(formula1, weights1 = NULL, data1, formula2, weights2 = NULL,

data2)

Arguments

formula1 an object of class "formula" (or one that can be coerced to that class) for the ﬁrst

year/dataset: a symbolic description of the model to be ﬁtted in the ordinary

least squares regression.

weights1 an optional vector of weights to be used in the ﬁtting process. Should be NULL

or a numeric vector. Should be inside selected data frame in the function and

between quotation marks.

data1 a data frame containing the variables for the ﬁrst year/dataset in the model.

formula2 an object of class "formula" (or one that can be coerced to that class) for the ﬁrst

year/dataset: a symbolic description of the model to be ﬁtted in the ordinary

least squares regression.

weights2 an optional vector of weights to be used in the ﬁtting process. Should be NULL

or a numeric vector. Should be inside selected data frame in the function and

between quotation marks.

data2 a data frame containing the variables for the ﬁrst year/dataset in the model.

Details

This function uses a multivariate regression-based decomposition method. Multiple characteristics

can be added to the function in order to calculate the contribution of each individual variable (in-

cluding the residual) to the change of the inequality. For instance socio-economic, demographic

and geographic characteristics (such as age, household composition, gender, region, education) of

the household or the individual can be added.

The change decomposition is divided into a price and a quantity effect for each characteristic. The

quantity effect is caused by changes in the relative size of subgroups (for instance: a higher per-

centage of elderly households). The price effect is caused by a change in the inﬂuence of the

characteristic on the dependent variable (for instance a higher income for the elderly households).

It uses a logarithmic transformation of the values of the dependent variable. Therefore it cannot

handle negative or zero values. Those are excluded from the computation in this function.

The decomposition can only be used on the variance of log income.

The main difference with the decomposition of the change of the mean log deviation is that multiple

characteristics can be analyzed at the same time. While the decomposition function only analyze

one characteristic at the same time.

The function uses two datasets for both years to compare. Pay attention that characteristics should

be the same (although can be named differently) and in the same order in the formula.

dineq_change_rb 3

Value

a list with the results of the decomposition and the parts used for the decomposition, containing the

following components:

attention optional note on the difference in the input.

variance_logincome

the values of the variance of log income of both years/datasets and difference

between both.

decomposition_inequality

the (relative) decomposition of the inequality of both years/datasets into the dif-

ferent variables. See function ’rb_decomp’.

decomposition_change_absolute

decomposition of the change in the variance of log income into the different vari-

ables and residual split into price and quantity effects. Adds up to the absolute

change in variance of log income.

decomposition_change_relative

decomposition of the change in the variance of log income into the different

variables and residual split into price and quantity effects. Adds up to 100 per-

cent.

notes number of zero or negative observations in both data sets/years. The function

uses a logarithmic transformation of x as input for the regression. Therefore

these observations are deleted from the analysis

References

Yun, M.-S. (2006) Earnings Inequality in USA, 1969–99: Comparing Inequality Using Earnings

Equations, Review of Income and Wealth, 52 (1): p. 127–144.

Fields, G. (2003) Accounting for income inequality and its change: a new method, with application

to the distribution of earnings in the United States, Research in Labor Economics, 22, p. 1–38.

Brewer M., and L. Wren-Lewis (2016) Accounting for Changes in Income Inequality: Decompo-

sition Analyses for the UK, 1978–2008. Oxford Bulletin of economics and statistics, 78 (3), p.

289-322,

See Also

dineq_rb

Examples

#Decomposition of the change in income inequality into 4 variables using the Mexican Income

#data set

data(mex_inc_2008)

inequality_change <- dineq_change_rb(formula1=income~hh_structure+education+domicile_size+age_cat,

weights1="factor",data1=mex_inc_2008, formula2=income~hh_structure+education+

domicile_size+age_cat, weights2="factor",data2=mex_inc_2016)

#selection of output: change in variance of log income decomposed in variables split into price

#and quantity effect and residual.

inequality_change["decomposition_change_absolute"]

#selection of output: relatieve change in variance of log income decomposed in variables split

#into price and quantity effect and residual. Because of negative change in variance of log

#income, the negative contributuon of education (quantity) becomes a positive number.

4dineq_rb

inequality_change["decomposition_change_relative"]

dineq_rb Regression-based decomposition of inequality

Description

Decomposition of (income) inequality into multiple characteristics. A regression-based decompo-

sition method is used.

Usage

dineq_rb(formula, weights = NULL, data)

Arguments

formula an object of class "formula" (or one that can be coerced to that class): a symbolic

description of the model to be ﬁtted in the ordinary least squares regression.

weights an optional vector of weights to be used in the ﬁtting process. Should be NULL

or a numeric vector. Should be inside selected data frame in the function and

between quotation marks.

data a data frame containing the variables in the model.

Details

This function uses a multivariate regression-based decomposition method. Multiple variables can

be added to the function in order to calculate the contribution of each individual variable (including

the residual) to the inequality. For instance socio-economic, demographic and geographic charac-

teristics (such as age, household composition, gender, region, education) of the household or the

individual can be added.

This decomposition can be used on a broad range of inequality measure, like Gini, Theil, mean log

deviation, Atkinson index and variance of log income.

It uses a logarithmic transformation of the values of the dependent variable. Therefore it cannot

handle negative or zero values. Those are excluded from the computation in this function.

The main difference with the decomposition of the mean log deviation or Gini coefﬁcient is that

multiple characteristics can be analyzed at the same time. While the other decomposition functions

only analyze one characteristic at the same time.

Value

a list with the results of the decomposition, containing the following components:

inequality_measures

the values of 4 inequality measures: gini, mean log deviation, theil and variance

of log income

decomposition_inequality

the (relative) decomposition of the inequality into the different variables

regression_results

results of the ols regression which is used to make the decomposition of inequal-

ity

gini.wtd 5

note number of zero or negative observations. The function uses a logarithmic trans-

formation of x as input for the regression. Therefore these observations are

deleted from the analysis

References

Fields, G. S. (2003). ‘Accounting for income inequality and its change: a new method, with ap-

plication to the distribution of earnings in the United States’, Research in Labor Economics, 22, p.

1–38.

Brewer M., and L. Wren-Lewis (2016) Accounting for Changes in Income Inequality: Decompo-

sition Analyses for the UK, 1978–2008. Oxford Bulletin of economics and statistics, 78 (3), p.

289-322,

See Also

dineq_change_rb

Examples

#Decomposition of the income inequality into 4 variables using Mexican Income data set:

data(mex_inc_2008)

inequality_decomp <- dineq_rb(income~hh_structure+education+domicile_size+age_cat,

weights="factor", data=mex_inc_2008)

#selection of the output: decomposition of the inequality into the contribution of the

#different variables and residual (adds up to 100 percent)

inequality_decomp["decomposition_inequality"]

gini.wtd Gini coefﬁcient

Description

Returns the (optional weighted) Gini coefﬁcient for a vector.

Usage

gini.wtd(x, weights = NULL)

Arguments

xa numeric vector containing at least non-negative elements.

weights an optional vector of weights of x to be used in the computation of the Gini

coefﬁcient. Should be NULL or a numeric vector.

6gini_decomp

Details

The Gini coefﬁcient is a measure of inequality among values of a distribution. The most used single

measure for income inequality. The coefﬁcient can theoretically range between 0 and 1, with 1 being

the highest possible inequality (for instance: 1 person in a society has all income; the others none).

But coefﬁcients that are negative or greater than 1 are also possible because of negative values in the

distribution. Compared to other measures of inequality, the Gini coefﬁcient is especially sensitive

for changes in the middle of the distribution.

Extension of the gini function in reldist package in order to handle missings.

Value

The value of the Gini coefﬁcient.

Source

Handcock, M. (2016), Relative Distribution Methods. Version 1.6-6. Project home page at http://www.stat.ucla.edu/~handcock/RelDist.

References

Haughton, J. and S. Khandker. (2009) Handbook on poverty and inequality, Washington, DC:

World Bank.

Cowell F. (2000) Measurement of Inequality. In Atkinson A. and Bourguignon F. (eds.) Handbook

of Income Distribution. Amsterdam: Elsevier, p. 87-166.

Examples

#calculate Gini coefficient using Mexican Income data set

data(mex_inc_2008)

#unweighted Gini coefficient:

gini.wtd(mex_inc_2008$income)

#weighted Gini coefficient:

gini.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)

gini_decomp Decomposition of the Gini coefﬁcient

Description

Decomposes the Gini coefﬁcient into population subgroups. Distinction is made by between and

within group inequality and an overlap (interaction) term.

Usage

gini_decomp(x, z, weights = NULL)

gini_decomp 7

Arguments

xa numeric vector containing at least non-negative elements.

za factor containing the population sub groups.

weights an optional vector of weights of x to be used in the computation of the decom-

position. Should be NULL or a numeric vector.

Details

The decomposition of the Gini coefﬁcient by between and within group inequality. In most cases

there is an overlap of the distribution of both groups. Consequence is that between and within group

inequality doesn’t add up to the total Gini coefﬁcient. In those cases there is an overlap term. Also

referred to as interaction effect.

Within group inequality is calculated by using the Gini coefﬁcient for each sub group. Between

group inequality by using the gini coefﬁcient of the average of both sub groups.

Value

a list with the results of the decomposition and the parts used for the decomposition, containing the

following components:

gini_decomp a list containing the decomposition: gini_total (value of the gini coefﬁcient of x),

gini_within (value of within-group inequality), gini_between (value of between-

group inequality) and gini_overlap (value of overlap in inequality)

gini_group a list containing gini_group (the gini coefﬁcients of the different subgroups) and

gini_group_contribution(the contribution of the subgroups to the total within-

group inequality: adds up to gini_within)

gini_decomp a list containing the means of x: mean_total (value of the mean of x of all

subgroups combined) and mean_group (value of the mean of x of the individual

subgroups) inequality) and gini_between (value of between-group inequality)

share_groups the distribution of the subgroups z

share_income_groups

the distribution of vector x by subgroups z

number_cases a list containing the number of cases in total, by subgroup (weighted and un-

weighted): n_unweighted (total number of unweighted x), n_weighted (total

number of weighted x), n_group_unweighted (number of unweighted x by sub-

group z), n_group_unweighted (number of weighted x by subgroup z)

References

Mookherjee, D. and A. Shorrocks (1982) A decomposition analysis of the trend in UK income

inequality, Economic Journal, 92 (368), p. 886-902.

Cowell F. (2000) Measurement of Inequality. In Atkinson A. and Bourguignon F. (eds.) Handbook

of Income Distribution. Amsterdam: Elsevier, p. 87-166.

See Also

mld_decomp

8mex_inc_2008

Examples

#Decomposition of the gini coefficient by level of education using Mexican Income data set

data(mex_inc_2008)

education_decomp <- gini_decomp(x=mex_inc_2008$income,z=mex_inc_2008$education,

weights=mex_inc_2008$factor)

#complete output

education_decomp

#Selected output: decomposition into between- and within-group inequality and overlap (interaction)

education_decomp["gini_decomp"]

mex_inc_2008 Mexican income data 2008

Description

Selection of Mexican income (survey) data and household characteristic for 2008. Extracted from

ENIGH (Household Income and Expenditure Survey).

Usage

data(mex_inc_2008)

Format

A data frame containing 5000 observations and 8 variables (a selection from the original).

hh_number Household ID.

factor Population inﬂating weights.

income Household income.

hh_structure Household structure, factor with levels unipersonal, nuclear, ampliado, compuesto

and coresidente.

education Highest achieved education of the head of the household, factor with levels Sin in-

struccion, Preescolar, Primaria incompleta, Primaria completa, Secundaria incompleta, Se-

cundaria completa, Preparatoria incompleta, Preparatoria completa, Profesional incompleta,

Profesional completa, Posgrado.

domicile_size Population of domicile, factor with levels <2500, 2500-15000, 15000-100000, >100000.

age age (integer) of the head of the household.

age_cat age (categorical) of the head of the household , factor with levels <25, 25-34, 35-44, 45-54,

55-64, 65-74, >=75.

Details

This data set is a selecion of the original dataset of the National Institute of Statistics and Geography

in Mexico (INEGI). The original contains 29468 observations and 129 variables with information

on the income and household characteristics in Mexico. This selection is only meant to be used

as a calculation example for the functions in this package. Results will not represent the correct

information on the Mexican situation.

mex_inc_2016 9

Source

http://en.www.inegi.org.mx/proyectos/enchogares/regulares/enigh/nc/2008/default.

html, the whole data set can be obtained here.

References

INEGI (2009), ENIGH 2008 Nueva construcción. Ingresos y gastos de los hogares, Aguascalientes:

INEGI.

mex_inc_2016 Mexican income data 2016

Description

Selection of Mexican income (survey) data and household characteristic for 2016. Extracted from

ENIGH (Household Income and Expenditure Survey).

Usage

data(mex_inc_2016)

Format

A data frame containing 5000 observations and 8 variables (a selection from the original).

hh_number Household ID.

factor Population inﬂating weights.

income Household income.

hh_structure Household structure, factor with levels unipersonal, nuclear, ampliado, compuesto

and coresidente.

education Highest achieved education of the head of the household, factor with levels Sin in-

struccion, Preescolar, Primaria incompleta, Primaria completa, Secundaria incompleta, Se-

cundaria completa, Preparatoria incompleta, Preparatoria completa, Profesional incompleta,

Profesional completa, Posgrado.

domicile_size Population of domicile, factor with levels <2500, 2500-15000, 15000-100000, >100000.

age age (integer) of the head of the household.

age_cat age (categorical) of the head of the household , factor with levels <25, 25-34, 35-44, 45-54,

55-64, 65-74, >=75.

Details

This data set is a selecion of the original dataset of the National Institute of Statistics and Geography

in Mexico (INEGI). The original contains 70311 observations and 127 variables with information

on the income and household characteristics in Mexico. This selection is only meant to be used

as a calculation example for the functions in this package. Results will not represent the correct

information on the Mexican situation.

Source

http://en.www.inegi.org.mx/proyectos/enchogares/regulares/enigh/nc/2016/default.

html, the whole data set can be obtained here.

10 mld.wtd

References

INEGI (2017), Encuesta Nacional de Ingresos y Gastos de los Hogares 2016. ENIGH. Nueva serie.

Temas, categorías y variables, Aguascalientes: INEGI.

mld.wtd Mean log deviation

Description

Returns the (optional weighted) mean log deviation for a vector.

Usage

mld.wtd(x, weights = NULL)

Arguments

xa numeric vector containing at least non-negative elements.

weights an optional vector of weights of x to be used in the computation of the mean log

deviation. Should be NULL or a numeric vector.

Details

The mean log deviation is a measure of inequality among values of a distribution. It is a member

of the Generalized Entropy Measures. Also referred to as GE(0). A value of zero is the lowest

possible inequality. The measure does not have an upper bound for the highest inequality. It uses a

logarithmic transformation of the values of the distribution. Therefore it cannot handle negative or

zero values. Those are excluded from the computation in this function. The mean log deviation is

more sensitive for changes in the lower tail of the distribution.

Extension of the calcGEI function in IC2 package in order to handle missings.

Value

the value of the mean log deviation index.

Source

Plat, D. (2012). IC2: Inequality and Concentration Indices and Curves. R package version 1.0-1.

https://CRAN.R-project.org/package=IC2

References

Haughton, J. and S. Khandker. (2009) Handbook on poverty and inequality, Washington, DC:

World Bank.

Cowell F. (2000) Measurement of Inequality. In Atkinson A. and Bourguignon F. (eds.) Handbook

of Income Distribution. Amsterdam: Elsevier, p. 87-166.

mld_change 11

Examples

#calculate mean log deviation using Mexican Income data set

data(mex_inc_2008)

#unweighted mean log deviation:

mld.wtd(mex_inc_2008$income)

#weighted mean log deviation:

mld.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)

mld_change Decomposition of the change of the mean log deviation

Description

Decomposes the change of the mean log deviation between two years/data sets into population

subgroups.

Usage

mld_change(x1, z1, weights1 = NULL, x2, z2, weights2 = NULL)

Arguments

x1 a numeric vector for the ﬁrst year/dataset containing at least non-negative ele-

ments.

z1 a factor for the ﬁrst year/dataset containing the population subgroups.

weights1 an optional vector of weights of x for the ﬁrst year/dataset to be used in the

computation of the decomposition. Should be NULL or a numeric vector.

x2 a numeric vector for the second year/dataset containing at least non-negative

elements.

z2 a factor for the second year/dataset containing the population subgroups.

weights2 an optional vector of weights of x for the second year/dataset to be used in the

computation of the decomposition. Should be NULL or a numeric vector.

Details

The change of the mean log deviation can be decomposed into three components: inequality

changes between and within groups and changes in the relative sizes of the groups. The change

of between group inequality is measures by a change in the relative income of the subgroups. The

change of within group inequality by adding up all changes in mean log deviation within the sub-

groups. And the contribution of changes in relative population size effects the change on both the

within and between group components. For the relative contributions those two are added together.

This method is introduced by Mookherjee and Shorrocks. It is an accurate approximation of the ex-

act decomposition. It uses a logarithmic transformation of the values of the distribution. Therefore

it cannot handle negative or zero values. Those are excluded from the computation in this function.

12 mld_change

Value

a list with the results of the decomposition and the parts used for the decomposition, containing the

following components:

mld_data1 the value of the mean log deviation index of x for the ﬁrst year/dataset, and the

decomposition into within-group and between-group inequality

mld_data2 the value of the mean log deviation index of x for the second year/dataset, and

the decomposition into within-group and between-group inequality

mld_difference the difference between the mean log deviation and the decomposition between

the second and ﬁrst year/dataset

absolute_contributions_difference

decomposition of the absolute change in inequality into: within group changes,

group size changes (split into the effect of within and between group compo-

nents) and between group changes.

relative_contributions_difference

decomposition of the change in inequality into relatieve contributions of: within

group changes, group size changes and between group changes. Adds up to 100

percent (or -100 percent for negative change)

note number of zero or negative observations in both datasets. The mean log devi-

ation uses a logarithmic transformation of x. Therefore these observations are

deleted from the analysis

References

Mookherjee, D. and A. Shorrocks (1982) A decomposition analysis of the trend in UK income

inequality, Economic Journal, 92 (368), p. 886-902.

Brewer M., and L. Wren-Lewis (2016) Accounting for Changes in Income Inequality: Decompo-

sition Analyses for the UK, 1978–2008. Oxford Bulletin of economics and statistics, 78 (3), p.

289-322,

See Also

mld_decomp

Examples

#Decomposition of the change in mean log deviation by level of eduction using

#Mexican Income data set

data(mex_inc_2008)

change_education <- mld_change(x1=mex_inc_2008$income, z1=mex_inc_2008$education,

weights1=mex_inc_2008$factor, x2=mex_inc_2016$income, z2=mex_inc_2016$education,

weights2=mex_inc_2016$factor)

#selection of the output: decomposition of the change into within- and between-group

#contribution and change in de size of groups (adds up to 100 percent)

change_education["relative_contributions_difference"]

mld_decomp 13

mld_decomp Decomposition of the mean log deviation

Description

Decomposes the mean log deviation into non overlapping population subgroups. Distinction is

made by between and within group inequality.

Usage

mld_decomp(x, z, weights = NULL)

Arguments

xa numeric vector containing at least non-negative elements.

za factor containing the population subgroups.

weights an optional vector of weights of x to be used in the computation of the decom-

position. Should be NULL or a numeric vector.

Details

The decomposition of the mean log deviation by between and within group inequality. Within

group inequality is calculated by using the mean log deviation for each sub group. Between group

inequality by the mean log deviation of the average of both sub groups.

It uses a logarithmic transformation of the values of the distribution. Therefore it cannot handle

negative or zero values. Those are excluded from the computation in this function.

Based on calcGEI function in IC2 package. Handles missings.

Value

a list with the results of the decomposition and the parts used for the decomposition, containing the

following components:

mld_decomp a list containing the decomposition: mld_total (value of the mean log devia-

tion index of x) mld_within (value of within-group inequality) and mld_between

(value of between-group inequality)

mld_group a list containing mld_group (the mean log deviations of the different subgroups)

and mld_group_contribution(the contribution of the subgroups to the total within-

group inequality: adds up to mld_within)

mld_decomp a list containing the means of x: mean_total (value of the mean of x of all

subgroups combined) and mean_group (value of the mean of x of the individual

subgroups) inequality) and mld_between (value of between-group inequality)

share_groups the distribution of the subgroups z

share_income_groups

the distribution of vector x by subgroups z

number_cases a list containing the number of cases in total, by subgroup (weighted and un-

weighted): n_unweighted (total number of unweighted x), n_weighted (total

number of weighted x), n_group_unweighted (number of unweighted x by sub-

group z), n_group_unweighted (number of weighted x by subgroup z)

14 ntiles.wtd

note number of zero or negative observations. The mean log deviation uses a loga-

rithmic transformation of x. Therefore these observations are deleted from the

analysis

Source

Plat, D. (2012). IC2: Inequality and Concentration Indices and Curves. R package version 1.0-1.

https://CRAN.R-project.org/package=IC2

References

Mookherjee, D. and A. Shorrocks (1982) A decomposition analysis of the trend in UK income

inequality, Economic Journal, 92 (368), p. 886-902.

Brewer M., and L. Wren-Lewis (2016) Accounting for Changes in Income Inequality: Decompo-

sition Analyses for the UK, 1978–2008. Oxford Bulletin of economics and statistics, 78 (3), p.

289-322,

Haughton, J. and S. Khandker. (2009) Handbook on poverty and inequality, Washington, DC:

World Bank.

See Also

mld_change gini_decomp

Examples

#Decomposition of mean log deviation by level of education using Mexican Income data set

data(mex_inc_2008)

education_decomp <- mld_decomp(x=mex_inc_2008$income,z=mex_inc_2008$education,

weights=mex_inc_2008$factor)

#complete output

education_decomp

#Selected output: decomposition into between- and within-group inequality

education_decomp["mld_decomp"]

ntiles.wtd Weighted tiles

Description

Breaks input vector into n groups. Returns the (optional weighted) tile of an individual observation

in vector x.

Usage

ntiles.wtd(x, n, weights = NULL)

polar.wtd 15

Arguments

xa numeric vector for which the quantiles are computed. Missing values are left

as missing.

nthe number of desired sub groups to break vector x into.

weights an optional vector of weights of x to be used in the computation of the tiles.

Should be NULL or a numeric vector.

Details

Breaks vector x into n sub groups. The main difference with other tile functions (for instance ntile

from dplyr) is that those functions break up vector x in exact equal size sub groups. Observations

with the same value can end up in different tiles. In this function, observations with the same value

always end up in the same tile, therefore sub groups may have different sizes. Especially when the

weights argument is used. For a weighted tile function with the same group size, see for instance

weighted_ntile from the grattan package.

When using a short-length vector (compared to the number of tiles) or with high variance weights,

output may be different than anticipated.

Value

A vector of integers corresponding to the quantiles of vector x.

Examples

#Break up the income variable in the Mexican Income data set into 10 groups (tiles)

data(mex_inc_2008)

#unweighted tiles:

q <- ntiles.wtd(x=mex_inc_2008$income, n=10)

#weighted tiles:

qw <- ntiles.wtd(x=mex_inc_2008$income, n=10, weights=mex_inc_2008$factor)

polar.wtd Polarization index

Description

Returns the (possibly weighted) polarization index for a vector. The Wolfson index of bipolarization

is used.

A bipolarized (income) distribution has fewer observations in the middle and more in lower and/or

higher part of the distribution. The regular measures of inequality (like the gini coefﬁcient) does

not give information about the polarization of the distribution. This Polarization index computes

the level of bipolarization of the distribution. The concept is closely related to the Lorenz curve and

therefore the scalar measure is also related to the Gini coefﬁcient. A lower number means a lower

level of polarization.

Extension of the polar.aff function in afﬂuence-index package. Option of weighting the index is

included.

16 rif

Usage

polar.wtd(x, weights = NULL)

Arguments

xa numeric vector.

weights an optional vector of weights of x to be used in the computation of the Polariza-

tion index. Should be NULL or a numeric vector.

Value

The value of the Wolfson polarization index.

Source

Wolny-Dominiak, A. and A. Saczewska-Piotrowska (2017). afﬂuenceIndex: Afﬂuence Indices. R

package version 1.0. https://CRAN.R-project.org/package=afﬂuenceIndex

References

Wolfson M. (1994) When inequalities diverge, The American Economic Review, 84, p. 353-358.

Schmidt, A. (2002) Statistical Measurement of Income Polarization. A Cross-National, Berlin 10th

International conference on panel data.

Examples

#calculate Polarization Index using Mexican Income data set

data(mex_inc_2008)

#unweighted Polarization Index:

polar.wtd(mex_inc_2008$income)

#weighted Polarization Index:

polar.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)

rif Recentered inﬂuence function (RIF)

Description

Returns the (optional weighted) recentered inﬂuence function of a distributional statistic.

Usage

rif(x, weights = NULL, method = "quantile", quantile = 0.5,

kernel = "gaussian")

rif 17

Arguments

xa numeric vector for which the recentered inﬂuence function is computed.

weights an optional vector of weights of x to be used in the computation of the recentered

inﬂuence function. Should be NULL or a numeric vector.

method the distribution statistic for which the recentered inﬂuence function is estimated.

Options are "quantile", "gini" and "variance". Default is "quantile".

quantile quantile to be used when method "quantile" is selected. Must be a numeric

between 0 and 1. Default is 0.5 (median). Only a single quantile can be selected.

kernel a character giving the smoothing kernel to be used in method "quantile". Op-

tions are "gaussian", "rectangular", "triangular", "epanechnikov", "biweight",

"cosine" or "optcosine". Default is "gaussian".

Details

The RIF can be used as input for a RIF regression approach. RIF regressions are mostly used to

estimate the marginal effect of covariates on distributional statistics of income or wealth.

The RIF is calculated by adding the distributional statistic (quantile, gini or variance) to the inﬂu-

ence function. RIF is a numeric vector where each element corresponds to a particular individual’s

inﬂuence on the distributional statistic.

Value

A numeric vector of the recentered inﬂuence function of the selected distributional statistic.

References

Firpo, S., N. Fortin and T. Lemieux (2009) Unconditional quantile regressions. Econometrica,

77(3), p. 953-973.

Heckley G, U.-G. Gerdtham U-G and G. Kjellsson (2016) A general method for decomposing the

causes of socioeconomic inequality in health. Journal of Health Economics,48, p. 89–106.

Pereira, J. and A. Galego (2016) The drivers of wage inequality across Europe, a recentered in-

ﬂuence function regression approach, 10th Annual Meeting of the Portuguese Economic Journal,

University of Evora.

See Also

rifr

Examples

data(mex_inc_2008)

#Recentered influence funtion of 20th quantile

rif_q20 <- rif(x=mex_inc_2008$income, weights=mex_inc_2008$factor, method="quantile",

quantile=0.2)

#Recentered influence funtion of the gini coefficient

rif_gini <- rif(x=mex_inc_2008$income, weights=mex_inc_2008$factor, method="gini")

18 rifr

rifr Recentered inﬂuence function regression (RIF Regression)

Description

Recentered inﬂuence function regression of a distributional statistic.

Usage

rifr(formula, data, weights = NULL, method = "quantile", quantile = 0.5,

kernel = "gaussian")

Arguments

formula an object of class "formula" (or one that can be coerced to that class): a symbolic

description of the model to be ﬁtted in the RIF regression.

data a data frame containing the variables and weights of the model.

weights an optional vector of weights of x to be used in the computation of the recentered

inﬂuence function. Should be NULL or a numeric vector. Should be inside

selected data frame in the function and between quotation marks.

method the distribution statistic for which the recentered inﬂuence function is estimated.

Options are "quantile", "gini" and "variance". Default is "quantile".

quantile quantile to be used when method "quantile" is selected. Must be a numeric

between 0 and 1. Default is 0.5 (median). Multiple quantiles can be used.

kernel a character giving the smoothing kernel to be used in method "quantile". Op-

tions are "gaussian", "rectangular", "triangular", "epanechnikov", "biweight",

"cosine" or "optcosine". Default is "gaussian".

Details

RIF Regressions can be used to estimate the marginal effects of covariates on distributional statistics

(such as quantiles, gini and variance). It is based on the recentered inﬂuence function of a statistic.

The transformed RIF is used as the dependent variable in an ordinary least squares regression. RIF

regressions are mostly used to estimate the marginal effect of covariates on distributional statistics

of income or wealth.

Value

A list containing the results of the RIF regression.

coefficients the coefﬁcient estimates.

SE the coefﬁcient standard error.

tthe coefﬁcient t-value.

pthe coefﬁcient p-value.

adjusted_r2 the adjusted r-squares.

rifrSE 19

References

Firpo, S., N. Fortin and T. Lemieux (2009) Unconditional quantile regressions. Econometrica,

77(3), p. 953-973.

Heckley G, U.-G. Gerdtham U-G and G. Kjellsson (2016) A general method for decomposing the

causes of socioeconomic inequality in health. Journal of Health Economics,48, p. 89–106.

Pereira, J. and A. Galego (2016) The drivers of wage inequality across Europe, a recentered in-

ﬂuence function regression approach, 10th Annual Meeting of the Portuguese Economic Journal,

University of Evora.

See Also

rif rifrSE

Examples

data(mex_inc_2008)

#Recentered influence funtion of each decile

rifr_q <- rifr(income~hh_structure+education, data=mex_inc_2008, weights="factor",

method="quantile", quantile=seq(0.1,0.9,0.1), kernel="gaussian")

#Recentered influence funtion of the gini coefficient

rifr_gini <- rifr(income~hh_structure+education, data=mex_inc_2008, weights="factor",

method="gini")

rifrSE Inference of recentered inﬂuence function regression (RIF regression)

Description

Inference of a RIF Regression using a bootstrap method.

Usage

rifrSE(formula, data, weights = NULL, method = "quantile", quantile = 0.5,

kernel = "gaussian", Nboot = 100, confidence = 0.95)

Arguments

formula an object of class "formula" (or one that can be coerced to that class): a symbolic

description of the model to be ﬁtted in the RIF regression.

data a data frame containing the variables and weights of the model.

weights an optional vector of weights of x to be used in the computation of the recentered

inﬂuence function. Should be NULL or a numeric vector. Should be inside

selected data frame in the function and between quotation marks.

method the distribution statistic for which the recentered inﬂuence function is estimated.

Options are "quantile", "gini" and "variance". Default is "quantile".

20 rifrSE

quantile quantile to be used when method "quantile" is selected. Must be a numeric

between 0 and 1. Default is 0.5 (median). Only a single quantile can be used.

kernel a character giving the smoothing kernel to be used in method "quantile". Op-

tions are "gaussian", "rectangular", "triangular", "epanechnikov", "biweight",

"cosine" or "optcosine". Default is "gaussian".

Nboot the number of bootstrap replicates. Default is 100.

confidence signiﬁcance level for estimation of the conﬁdence interval of the ﬁtted model.

Default is 0.95.

Details

RIF Regressions can be used to estimate the marginal effects of covariates on distributional statistics

(such as quantiles, gini and variance). It is based on the recentered inﬂuence function of a statistic.

The transformed RIF is used as the dependent variable in an ordinary least squares regression. RIF

regressions are mostly used to estimate the marginal effect of covariates on distributional statistics

of income or wealth.

The standard errors, conﬁdence intervals and Z- and P-values are calculated by using a standard

bootstrap method (from boot package).

Value

A data frame containing the results of the RIF regression.

Coef estimated coefﬁcients of the original (non bootstrapped) RIF regression

lower lower bound of conﬁdence interval of estimated coefﬁcient

upper upper bound of conﬁdence interval of estimated coefﬁcient

SE standard error

Z Value Z value

P Value P value

Signif Signiﬁcance codes of P: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

References

Firpo, S., N. Fortin and T. Lemieux (2009) Unconditional quantile regressions. Econometrica,

77(3), p. 953-973.

Heckley G, U.-G. Gerdtham U-G and G. Kjellsson (2016) A general method for decomposing the

causes of socioeconomic inequality in health. Journal of Health Economics,48, p. 89–106.

Pereira, J. and A. Galego (2016) The drivers of wage inequality across Europe, a recentered in-

ﬂuence function regression approach, 10th Annual Meeting of the Portuguese Economic Journal,

University of Evora.

See Also

rif rifr

theil.wtd 21

Examples

data(mex_inc_2008)

#Recentered influence funtion of 20th quantile

rifr_q <- rifrSE(income~hh_structure+education, data=mex_inc_2008, weights="factor",

method="quantile", quantile=0.2, kernel="gaussian", Nboot=100, confidence=0.95)

#Recentered influence funtion of the gini coefficient

rifr_gini <- rifrSE(income~hh_structure+education, data=mex_inc_2008, weights="factor",

method="gini", Nboot=100, confidence=0.95)

theil.wtd Theil index

Description

Returns the (optional weighted) Theil index for a vector.

Usage

theil.wtd(x, weights = NULL)

Arguments

xa numeric vector containing at least non-negative elements.

weights an optional vector of weights of x to be used in the computation of the Theil

index. Should be NULL or a numeric vector.

Details

The Theil index is a measure of inequality among values of a distribution. It is a member of the

Generalized Entropy Measures. Also referred to as GE(1). The index can have a value between 0

and ln N (the logarithm of the number of values), with 0 being the lowest possible inequality. It uses

a logarithmic transformation of the values of the distribution. Therefore it cannot handle negative

or zero values. Those are excluded from the computation in this function. The Theil Index is more

sensitive for changes in the upper tail of the distribution.

Extension of the calcGEI function in IC2 package in order to handle missings.

Value

The value of the Theil index.

Source

Plat, D. (2012). IC2: Inequality and Concentration Indices and Curves. R package version 1.0-1.

https://CRAN.R-project.org/package=IC2

22 theil.wtd

References

Haughton, J. and S. Khandker. (2009) Handbook on poverty and inequality, Washington, DC:

World Bank.

Cowell F. (2000) Measurement of Inequality. In Atkinson A. and Bourguignon F. (eds.) Handbook

of Income Distribution. Amsterdam: Elsevier, p. 87-166.

Examples

#calculate Theil Index using Mexican Income data set

data(mex_inc_2008)

#unweighted Theil Index:

theil.wtd(mex_inc_2008$income)

#weighted Theil Index:

theil.wtd(x=mex_inc_2008$income, weights=mex_inc_2008$factor)

Index

∗Topic datasets

mex_inc_2008,8

mex_inc_2016,9

dineq_change_rb,2,5

dineq_rb,3,4

gini.wtd,5

gini_decomp,6,14

mex_inc_2008,8

mex_inc_2016,9

mld.wtd,10

mld_change,11,14

mld_decomp,7,12,13

ntiles.wtd,14

polar.wtd,15

rif,16,19,20

rifr,17,18,20

rifrSE,19,19

theil.wtd,21

Dineq Manual

Navigation menu

Versions of this User Manual:

Views

Navigation