Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 8

akmeans_clust
alphaLabel
assault_data
gm_crime_data
lpm_centroids
missingV_filler
outlierDetect
plot_clust
props
qpm_centroids
whiteSpaces
Index

Package ‘akmeans’

February 20, 2019

Type Package

Title akmeans: 'Anchored' kmeans for Longitudinal Data

Version 0.1.0

Date 2019-02-06

Author Monsuru Adepeju [cre, aut], Samuel Langton [aut], Jon Bannister [aut]

Maintainer Monsuru Adepeju <monsuurg2010@gmail.com>

Description Advances an akmeans clustering technque and a stability-

based quality criterion for longitudinal data. Also, contains

functions for useful for the analysis of longitudinal data.

License GPL-2

Encoding UTF-8

LazyData TRUE

Imports kml, devtools, Hmisc, ggplot2, rgdal, base, utils, reshape2, later

Suggests knitr,

rmarkdown

RoxygenNote 6.1.1

VignetteBuilder knitr

Rtopics documented:

akmeans_clust........................................ 2

alphaLabel.......................................... 2

assault_data......................................... 3

gm_crime_data ....................................... 3

lpm_centroids........................................ 3

missingV_ﬁller ....................................... 4

outlierDetect ........................................ 5

plot_clust .......................................... 6

props ............................................ 6

qpm_centroids........................................ 7

whiteSpaces......................................... 7

Index 8

2alphaLabel

akmeans_clust akmeans_clust

Description

This function group trajectories based on a given list of initial centroids

Usage

akmeans_clust(traj, id_field = FALSE, init_method = "lpm", n_clusters = 3)

Arguments

traj A matrix or data.frame with each row representing the trajectory of observations

of a unique location. The columns show the observations at consecutive time

steps.

id_field Whether the ﬁrst column is a unique (id) ﬁeld. Default: FALSE

init_method initialisation method. Specifying a method to determine the initial centroids for

clustering. Default: "lpm" - linear partitioning medoids @seealso lpm_centroids]

n_clusters number of clusters to generate. Default: 3: (minimum value)

Details

Given a list of trajectories represented in a matrix or data.frame, and a method for choosing initial

cluster centroids (e.g. lpm_centroids), a list of clusters is generated after a limited number of

iterations. traj <- assault_data print(traj) result <- akmeans_clust(traj, id_ﬁeld = TRUE, init_method

= "lpm", n_clusters = 3) plot_clust(result)

Value

The original (traj) data with cluster label appended

alphaLabel Numerics ids to alphabetical ids

Description

Function to transform a list of numeric ids to alphabetic ids

Usage

alphaLabel(x)

Arguments

xA vector of numeric ids

assault_data 3

assault_data Sample crime (assault) dataset

Description

Simulated crime dataset with missing values.

Usage

assault_data

Format

A matrix

gm_crime_data Sample crime dataset

Description

Crime dataset of greater Manchester crime data aggregated at the LSOA geographical level data

(Source: data.police.uk)

Usage

gm_crime_data

Format

A matrix

lpm_centroids Linear Partition Medoids (LPM) Centroids

Description

This function to create the initial centroids based on linear partitioning medoids (lpm) initialisation

(Adepeju et al. 2019, submitted)

Usage

lpm_centroids(dat, id_field2 = FALSE, n_centroids = 3)

Arguments

dat A matrix or data.frame with each row representing the trajectory of observations

of a unique location. The columns show the observation at consecutive time

steps.

id_field2 Whether the ﬁrst column is a unique (id) ﬁeld. default: FALSE

n_centroids Number of initial (linear) centroids to generate based on lpm technique

4missingV_ﬁller

Value

l_centroids

References

Adepeju M, Langton S, Bannister J. (2019). akmeans: Anchored k-means: A longitudinal clustering

technique for measuring long-term inequality in the exposure to crime at the micro-area levels

(submitted).

missingV_filler Data imputing for longitudinal data

Description

This function ﬁlls up any missing entries (NA, Inf, 0) in a matrix or dataframe using a value

derived using a chosen method.

Usage

missingV_filler(traj, id_field = FALSE, method = 2, replace_with = 1, fill_zeros = FALSE)

Arguments

traj A matrix or data.frame with each row representing the trajectory of a unique

location. The columns show the observations at consecutive time steps.

id_field Whether the ﬁrst column is a unique (id) ﬁeld. default: FALSE

method Method for calculating the missing values. Available options: 1: arithmetic, 2:

regression. default: 1

replace_with How to calculate the missing value. For arithmetic method: replace_with

options are: 1: Mean value of column, 2: Minimum value of column, 3: Max-

imum value of column, 4: Mean value of row, 5: Minimum value of row, or

6: Maximum value of row. For regression method: the only available option

for replace_with is: 1: linear. That is, use a linear regression to interpolate or

extrapolate the missing data values. Note: only the missing data points derive

their new values from the regression line while the rest of the data points retain

their original values. Trajectories with only one observation will be removed.

fill_zeros Whether to consider zeros (0) as missing values. Default: FALSE. Only available

for 2:regression method.

Details

Given a matrix or data.frame with some missing values represented by (NA, Inf, 0), the func-

tion missingV_filler determines the missing values using either the arithmetic or regression

method.

Value

A data.frame with missing values (NA, Inf, 0) ﬁlled up

outlierDetect 5

Examples

traj <- assault_data

print(traj)

missingV_filler(traj, id_field = TRUE, method = 2, replace_with = 1, fill_zeros = FALSE)

outlierDetect Outlier detection in longitudinal or repeated observations

Description

Detect outlier in a longitudinal or repeated data. This function identify the outlier observations ac-

cording to a speciﬁed method. A matrix, ’outlier_mat’, is created with entries ’TRUE’ or ’FALSE’

indicating whether or not an observation is an outlier. The ﬁnal list of outlier trajectories is de-

termined by the ’hortz_tolerance’ parameter i.e. how many observation in a trajectory exceed the

’threshold’ value.

Usage

outlierDetect(dat, id_field = FALSE, method = "quantile",

threshold = 0.95, hortz_tolerance = 1, replace_with = "Mean_row")

Arguments

dat A matrix or data.frame with each row representing the trajectory of observations

of a unique location. The columns show the observation at consecutive time

steps.

id_field Whether the ﬁrst column is a unique (id) ﬁeld. [default: FALSE]

method Specify the method for identifying the outlier. Available methods: (1) "quantile"

(2) "manual" - a user-deﬁned value

threshold Value in which an observation must exceed in order to be ﬂagged as outlier.

Depending on the method speciﬁed: (1) for "quantile" method, enter a numeric

vector of probabilities with values in [0,1], (2) for "Manual" method: a user-

speciﬁed value.

hortz_tolerance

Specifying the number of observations of a trajectory that have to exceed the cut-

off ’threshold’ value in order for the trajectory to be ﬂagged as outlier. [default:

replace_with Value to replace the outlier observation with. Values to replace with [Values:

"Mean_col" or "Mean_row"]. The default is "Mean_row", meaning to imput the

average values of the ﬁeld in which the observation is located.

Value

dat_

6props

plot_clust To plot the clusters

Description

To plot the clusters

Usage

plot_clust(data_clusters_list, id_field = TRUE)

Arguments

data_clusters_list

A data.frame of clusters from akmeans_clust, in which the last column repre-

sents alphabetical cluster ids (labels)

id_field Whether the ﬁrst column is a unique (id) ﬁeld. [default: TRUE]

Value

data_clusters_list

props Function to convert counts or rates to proportion

Description

Function to convert counts or rates to proportion

Usage

props(rates, id_field = FALSE)

Arguments

rates A matrix or data.frame with each row representing the trajectory of observations

of a unique location. The columns show the observation at consecutive time

steps.

id_field Whether the ﬁrst column is a unique (id) ﬁeld. [default: FALSE]

Value

props

qpm_centroids 7

qpm_centroids Quadratic Partition Medoids (QPM) Centroids

Description

Quadratic Partition Medoids (QPM) Centroids

Usage

qpm_centroids(dat, n_centroids = 3, id_field = FALSE)

Arguments

dat A matrix or data.frame with each row representing the trajectory of observations

of a unique location. The columns show the observation at consecutive time

steps.

n_centroids Number of initial (quadratic) centroids to generate based on the qpm method

(See attached Vignette)

id_field Whether the ﬁrst column is a unique (id) ﬁeld. [default: FALSE]

Value

q_centroids

whiteSpaces Function to remove whitespaces in data entries

Description

Function to remove whitespaces in data entries

Usage

whiteSpaces(dat, head = TRUE)

Arguments

dat A matrix or data.frame

head If column names exist

Value

dat_Cleaned

Index

∗Topic datasets

assault_data,3

gm_crime_data,3

akmeans_clust,2,6

alphaLabel,2

assault_data,3

gm_crime_data,3

lpm_centroids,2,3

missingV_filler,4

outlierDetect,5

plot_clust,6

props,6

qpm_centroids,7

whiteSpaces,7

Manual

Navigation menu

Versions of this User Manual:

Views

Navigation