Title: | Simulation Framework |
---|---|
Description: | A general framework for statistical simulation, which allows researchers to make use of a wide range of simulation designs with minimal programming effort. The package provides functionality for drawing samples from a distribution or a finite population, for adding outliers and missing values, as well as for visualization of the simulation results. It follows a clear object-oriented design and supports parallel computing to increase computational performance. |
Authors: | Andreas Alfons [aut, cre], Yves Tille [ctb] (original R code of certain sampling algorithms), Alina Matei [ctb] (original R code of certain sampling algorithms) |
Maintainer: | Andreas Alfons <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.5.4 |
Built: | 2025-01-03 02:56:38 UTC |
Source: | https://github.com/aalfons/simframe |
A general framework for statistical simulation, which allows researchers to make use of a wide range of simulation designs with minimal programming effort. The package provides functionality for drawing samples from a distribution or a finite population, for adding outliers and missing values, as well as for visualization of the simulation results. It follows a clear object-oriented design and supports parallel computing to increase computational performance.
The DESCRIPTION file:
Package: | simFrame |
Version: | 0.5.4 |
Title: | Simulation Framework |
Date: | 2021-10-11 |
Depends: | R (>= 3.0.0), Rcpp (>= 0.8.6), lattice, parallel |
Imports: | methods, stats4 |
LinkingTo: | Rcpp |
Description: | A general framework for statistical simulation, which allows researchers to make use of a wide range of simulation designs with minimal programming effort. The package provides functionality for drawing samples from a distribution or a finite population, for adding outliers and missing values, as well as for visualization of the simulation results. It follows a clear object-oriented design and supports parallel computing to increase computational performance. |
License: | GPL (>= 2) |
LazyLoad: | yes |
Authors@R: | c(person("Andreas", "Alfons", email = "[email protected]", role = c("aut", "cre")), person("Yves", "Tille", role = "ctb", comment = "original R code of certain sampling algorithms"), person("Alina", "Matei", role = "ctb", comment = "original R code of certain sampling algorithms")) |
Author: | Andreas Alfons [aut, cre], Yves Tille [ctb] (original R code of certain sampling algorithms), Alina Matei [ctb] (original R code of certain sampling algorithms) |
Maintainer: | Andreas Alfons <[email protected]> |
Encoding: | UTF-8 |
Repository: | https://aalfons.r-universe.dev |
RemoteUrl: | https://github.com/aalfons/simframe |
RemoteRef: | HEAD |
RemoteSha: | 23314f0b1f6632560e0d95dc568f708f3c1286a9 |
Index of help topics:
BasicVector-class Class "BasicVector" ContControl Create contamination control objects ContControl-class Class "ContControl" DARContControl-class Class "DARContControl" DCARContControl-class Class "DCARContControl" DataControl-class Class "DataControl" NAControl-class Class "NAControl" NumericMatrix-class Class "NumericMatrix" OptBasicVector-class Class "OptBasicVector" OptCall-class Class "OptCall" OptCharacter-class Class "OptCharacter" OptContControl-class Class "OptContControl" OptDataControl-class Class "OptDataControl" OptNAControl-class Class "OptNAControl" OptNumeric-class Class "OptNumeric" OptSampleControl-class Class "OptSampleControl" SampleControl-class Class "SampleControl" SampleSetup-class Class "SampleSetup" SimControl-class Class "SimControl" SimResults-class Class "SimResults" Strata-class Class "Strata" SummarySampleSetup-class Class "SummarySampleSetup" TwoStageControl-class Class "TwoStageControl" VirtualContControl-class Class "VirtualContControl" VirtualDataControl-class Class "VirtualDataControl" VirtualNAControl-class Class "VirtualNAControl" VirtualSampleControl-class Class "VirtualSampleControl" aggregate-methods Method for aggregating simulation results clusterRunSimulation Run a simulation experiment on a cluster clusterSetup Set up multiple samples on a cluster contaminate Contaminate data draw Draw a sample eusilcP Synthetic EU-SILC data generate Generate data getAdd Accessor and mutator functions for objects getStrataLegend Utility functions for stratifying data head-methods Methods for returning the first parts of an object inclusionProb Inclusion probabilities length-methods Methods for getting the length of an object plot-methods Plot simulation results runSimulation Run a simulation experiment setNA Set missing values setup Set up multiple samples simApply Apply a function to subsets simBwplot Box-and-whisker plots simDensityplot Kernel density plots simFrame-package Simulation Framework simSample Set up multiple samples simXyplot X-Y plots srs Random sampling stratify Stratify data summary-methods Methods for producing a summary of an object tail-methods Methods for returning the last parts of an object
Andreas Alfons [aut, cre]; C++ implementations of certain sampling algorithms are based on R code by Yves Tille and Alina Matei.
Maintainer: Andreas Alfons <[email protected]>
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
Get values of slots of objects via accessor functions and set values via mutator functions. If no mutator methods are available, the slots of the corresponding objects are not supposed to be changed by the user.
getAdd(x) getAux(x) setAux(x, aux) getCall(x, ...) getCollect(x) setCollect(x, collect) getColnames(x) setColnames(x, colnames) getContControl(x) setContControl(x, contControl) getControl(x) getDataControl(x) getDesign(x) setDesign(x, design) getDistribution(x) setDistribution(x, distribution) getDots(x, ...) setDots(x, dots, ...) ## S4 method for signature 'TwoStageControl' getDots(x, stage = NULL) ## S4 method for signature 'TwoStageControl' setDots(x, dots, stage = NULL) getEpsilon(x) setEpsilon(x, epsilon) getFun(x, ...) setFun(x, fun, ...) ## S4 method for signature 'TwoStageControl' getFun(x, stage = NULL) ## S4 method for signature 'TwoStageControl' setFun(x, fun, stage = NULL) getGrouping(x) setGrouping(x, grouping) getIndices(x) getIntoContamination(x) setIntoContamination(x, intoContamination) getK(x) setK(x, k) getLegend(x) getNAControl(x) setNAControl(x, NAControl) getNArate(x) setNArate(x, NArate) getNr(x) getNrep(x) getProb(x, ...) setProb(x, prob, ...) ## S4 method for signature 'TwoStageControl' getProb(x, stage = NULL) ## S4 method for signature 'TwoStageControl' setProb(x, prob, stage = NULL) getSAE(x) setSAE(x, SAE) getSampleControl(x) getSeed(x) getSize(x, ...) setSize(x, size, ...) ## S4 method for signature 'TwoStageControl' getSize(x, stage = NULL) ## S4 method for signature 'TwoStageControl' setSize(x, size, stage = NULL) getSplit(x) getTarget(x) setTarget(x, target) getValues(x)
getAdd(x) getAux(x) setAux(x, aux) getCall(x, ...) getCollect(x) setCollect(x, collect) getColnames(x) setColnames(x, colnames) getContControl(x) setContControl(x, contControl) getControl(x) getDataControl(x) getDesign(x) setDesign(x, design) getDistribution(x) setDistribution(x, distribution) getDots(x, ...) setDots(x, dots, ...) ## S4 method for signature 'TwoStageControl' getDots(x, stage = NULL) ## S4 method for signature 'TwoStageControl' setDots(x, dots, stage = NULL) getEpsilon(x) setEpsilon(x, epsilon) getFun(x, ...) setFun(x, fun, ...) ## S4 method for signature 'TwoStageControl' getFun(x, stage = NULL) ## S4 method for signature 'TwoStageControl' setFun(x, fun, stage = NULL) getGrouping(x) setGrouping(x, grouping) getIndices(x) getIntoContamination(x) setIntoContamination(x, intoContamination) getK(x) setK(x, k) getLegend(x) getNAControl(x) setNAControl(x, NAControl) getNArate(x) setNArate(x, NArate) getNr(x) getNrep(x) getProb(x, ...) setProb(x, prob, ...) ## S4 method for signature 'TwoStageControl' getProb(x, stage = NULL) ## S4 method for signature 'TwoStageControl' setProb(x, prob, stage = NULL) getSAE(x) setSAE(x, SAE) getSampleControl(x) getSeed(x) getSize(x, ...) setSize(x, size, ...) ## S4 method for signature 'TwoStageControl' getSize(x, stage = NULL) ## S4 method for signature 'TwoStageControl' setSize(x, size, stage = NULL) getSplit(x) getTarget(x) setTarget(x, target) getValues(x)
x |
an object. |
aux |
a character string specifying an auxiliary variable (see
|
collect |
a logical indicating whether groups should be collected after
sampling individuals or sampled directly (see
|
colnames |
a character vector specifying column names (see
|
contControl |
an object of class |
design |
a character vector specifying columns to be used for
stratification (see |
distribution |
a function generating data (see
|
dots |
additional arguments to be passed to a function (see
|
epsilon |
a numeric vector giving contamination levels (see
|
fun |
a function (see
|
grouping |
a character string specifying a grouping variable (see
|
intoContamination |
a logical indicating whether missing values should
also be inserted into contaminated observations (see
|
k |
a single positive integer giving the number of samples to be set up
(see |
NAControl |
an object of class |
NArate |
a numeric vector or matrix giving missing value rates (see
|
prob |
a numeric vector giving probability weights (see
|
SAE |
a logical indicating whether small area estimation will be used in
the simulation experiment (see |
size |
a non-negative integer or a vector of non-negative integers (see
|
stage |
optional integer; for certain slots of
|
target |
a character vector specifying target columns (see
|
... |
only used to allow for the |
For accessor functions, the corresponding slot of x
is returned.
For mutator functions, the corresponding slot of x
is replaced.
signature(x = "SimResults")
signature(x = "ContControl")
signature(x = "NAControl")
signature(x = "SampleSetup")
signature(x = "SimResults")
signature(x = "Strata")
signature(x = "SampleControl")
signature(x = "DataControl")
signature(x = "SimResults")
signature(x = "DataControl")
signature(x = "SimControl")
signature(x = "SampleSetup")
signature(x = "SimResults")
signature(x = "SimResults")
signature(x = "SampleControl")
signature(x = "TwoStageControl")
signature(x = "SimControl")
signature(x = "SimResults")
signature(x = "Strata")
signature(x = "SampleControl")
signature(x = "TwoStageControl")
signature(x = "SimControl")
signature(x = "DataControl")
signature(x = "DCARContControl")
signature(x = "DataControl")
signature(x = "DARContControl")
signature(x = "DCARContControl")
signature(x = "SampleControl")
signature(x = "TwoStageControl")
signature(x = "SimControl")
signature(x = "SimResults")
signature(x = "VirtualContControl")
signature(x = "VirtualContControl")
signature(x = "DARContControl")
signature(x = "SampleControl")
signature(x = "TwoStageControl")
signature(x = "SimControl")
signature(x = "ContControl")
signature(x = "NAControl")
signature(x = "SampleControl")
signature(x = "TwoStageControl")
signature(x = "SampleSetup")
signature(x = "NAControl")
signature(x = "VirtualSampleControl")
signature(x = "Strata")
signature(x = "SimControl")
signature(x = "SimResults")
signature(x = "VirtualNAControl")
signature(x = "VirtualNAControl")
signature(x = "Strata")
signature(x = "SimResults")
signature(x = "SampleControl")
signature(x = "TwoStageControl")
signature(x = "SampleSetup")
signature(x = "SampleControl")
signature(x = "TwoStageControl")
signature(x = "SimControl")
signature(x = "SimResults")
signature(x = "SampleSetup")
signature(x = "SimResults")
signature(x = "DataControl")
signature(x = "SampleControl")
signature(x = "TwoStageControl")
signature(x = "Strata")
signature(x = "SummarySampleSetup")
signature(x = "DataControl")
signature(x = "SampleControl")
signature(x = "TwoStageControl")
signature(x = "Strata")
signature(x = "VirtualContControl")
signature(x = "VirtualNAControl")
signature(x = "SimResults")
signature(x = "Strata")
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
nc <- NAControl(NArate = 0.05) getNArate(nc) setNArate(nc, c(0.01, 0.03, 0.05, 0.07, 0.09)) getNArate(nc)
nc <- NAControl(NArate = 0.05) getNArate(nc) setNArate(nc, c(0.01, 0.03, 0.05, 0.07, 0.09)) getNArate(nc)
Aggregate simulation results, i.e, split the data into subsets if applicable and compute summary statistics.
## S4 method for signature 'SimResults' aggregate(x, select = NULL, FUN = mean, ...)
## S4 method for signature 'SimResults' aggregate(x, select = NULL, FUN = mean, ...)
x |
the simulation results to be aggregated, i.e., an object of class
|
select |
a character vector specifying the columns to be aggregated. It
must be a subset of the |
FUN |
a scalar function to compute the summary statistics (defaults to
|
... |
additional arguments to be passed down to
|
If contamination or missing values have been inserted or the simulations have
been split into different domains, a data.frame
is returned, otherwise
a vector.
If contamination or missing values have been inserted or the simulations have
been split into different domains, aggregate
is called
to compute the summary statistics for the respective subsets.
Otherwise, apply
is called to compute the summary statistics
for each column specified by select
.
x = "SimResults"
aggregate simulation results.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
aggregate
, apply
,
"SimResults"
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## run simulation results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) ## aggregate aggregate(results) # means of results aggregate(results, FUN = sd) # standard deviations of results #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## run simulation results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) ## aggregate aggregate(results) # means of results aggregate(results, FUN = sd) # standard deviations of results
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## run simulation results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) ## aggregate aggregate(results) # means of results aggregate(results, FUN = sd) # standard deviations of results #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## run simulation results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) ## aggregate aggregate(results) # means of results aggregate(results, FUN = sd) # standard deviations of results
Virtual class used internally for convenience.
A virtual Class: No objects may be created from it.
Class "OptBasicVector"
, directly.
getStrataLegend
signature(x = "data.frame",
design = "BasicVector")
: get a data.frame
describing the strata.
getStrataSplit
signature(x = "data.frame",
design = "BasicVector")
: get a list in which each element contains the
indices of the observations belonging to the corresponding stratum.
getStrataTable
signature(x = "data.frame",
design = "BasicVector")
: get a data.frame
describing the strata
and containing the stratum sizes.
getStratumSizes
signature(x = "data.frame",
design = "BasicVector")
: get the stratum sizes.
getStratumValues
signature(x = "data.frame",
design = "BasicVector", split = "missing")
: get the stratum number for
each observation.
getStratumValues
signature(x = "data.frame",
design = "BasicVector", split = "list")
: get the stratum number for
each observation.
simApply
signature(x = "data.frame",
design = "BasicVector", fun = "function")
: apply a function to subsets.
simSapply
signature(x = "data.frame",
design = "BasicVector", fun = "function")
: apply a function to subsets.
stratify
signature(x = "data.frame",
design = "BasicVector")
: stratify data.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
showClass("BasicVector")
showClass("BasicVector")
Generic function for running a simulation experiment on a cluster.
clusterRunSimulation(cl, x, setup, nrep, control, contControl = NULL, NAControl = NULL, design = character(), fun, ..., SAE = FALSE)
clusterRunSimulation(cl, x, setup, nrep, control, contControl = NULL, NAControl = NULL, design = character(), fun, ..., SAE = FALSE)
cl |
a cluster as generated by |
x |
a |
setup |
an object of class |
nrep |
a non-negative integer giving the number of repetitions of the simulation experiment (for model-based simulation, mixed simulation designs or simulation based on real data). |
control |
a control object of class |
contControl |
an object of a class inheriting from
|
NAControl |
an object of a class inheriting from
|
design |
a character vector specifying variables (columns) to be used
for splitting the data into domains. The simulations, including
contamination and the insertion of missing values (unless |
fun |
a function to be applied in each simulation run. |
... |
for |
SAE |
a logical indicating whether small area estimation will be used in the simulation experiment. |
Statistical simulation is embarrassingly parallel, hence computational
performance can be increased by parallel computing. Since version 0.5.0,
parallel computing in simFrame
is implemented using the package
parallel
, which is part of the R base distribution since version
2.14.0 and builds upon work done for the contributed packages
multicore
and snow
. Note that all objects and packages
required for the computations (including simFrame
) need to be made
available on every worker process unless the worker processes are created by
forking (see makeCluster
).
In order to prevent problems with random numbers and to ensure
reproducibility, random number streams should be used. With
parallel
, random number streams can be created via the
function clusterSetRNGStream()
.
There are some requirements for slot fun
of the control object
control
. The function must return a numeric vector, or a list with
the two components values
(a numeric vector) and add
(additional results of any class, e.g., statistical models). Note that the
latter is computationally slightly more expensive. A data.frame
is
passed to fun
in every simulation run. The corresponding argument
must be called x
. If comparisons with the original data need to be
made, e.g., for evaluating the quality of imputation methods, the function
should have an argument called orig
. If different domains are used
in the simulation, the indices of the current domain can be passed to the
function via an argument called domain
.
For small area estimation, the following points have to be kept in mind. The
slot design
of control
for splitting the data must be supplied
and the slot SAE
must be set to TRUE
. However, the data are
not actually split into the specified domains. Instead, the whole data set
(sample) is passed to fun
. Also contamination and missing values are
added to the whole data (sample). Last, but not least, the function must
have a domain
argument so that the current domain can be extracted
from the whole data (sample).
In every simulation run, fun
is evaluated using try
. Hence
no results are lost if computations fail in any of the simulation runs.
An object of class "SimResults"
.
cl = "ANY", x = "ANY", setup = "ANY", nrep = "ANY",
control = "missing"
convenience wrapper that allows the slots of
control
to be supplied as arguments
cl = "ANY", x = "data.frame", setup = "missing", nrep = "numeric",
control = "SimControl"
run a simulation experiment based on real data with repetitions on a cluster.
cl = "ANY", x = "data.frame", setup = "SampleSetup",
nrep = "missing", control = "SimControl"
run a design-based simulation experiment with previously set up samples on a cluster.
cl = "ANY", x = "data.frame", setup = "VirtualSampleControl",
nrep = "missing", control = "SimControl"
run a design-based simulation experiment on a cluster.
cl = "ANY", x = "VirtualDataControl", setup = "missing",
nrep = "numeric", control = "SimControl"
run a model-based simulation experiment with repetitions on a cluster.
cl = "ANY", x = "VirtualDataControl",
setup = "VirtualSampleControl", nrep = "numeric",
control = "SimControl"
run a simulation experiment using a mixed simulation design with repetitions on a cluster.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
L'Ecuyer, P., Simard, R., Chen E and Kelton, W. (2002) An Object-Oriented Random-Number Package with Many Long Streams and Substreams. Operations Research, 50(6), 1073–1075.
Rossini, A., Tierney L. and Li, N. (2007) Simple Parallel Statistical Computing in R. Journal of Computational and Graphical Statistics, 16(2), 399–420.
Tierney, L., Rossini, A. and Li, N. (2009) snow
: A Parallel Computing
Framework for the R System. International Journal of Parallel
Programming, 37(1), 78–90.
makeCluster
,
clusterSetRNGStream
,
runSimulation
, "SimControl"
,
"SimResults"
, simBwplot
,
simDensityplot
, simXyplot
## Not run: ## these examples requires at least a dual core processor ## design-based simulation data(eusilcP) #load data # start cluster cl <- makeCluster(2, type = "PSOCK") # load package and data on workers clusterEvalQ(cl, { library(simFrame) data(eusilcP) }) # set up random number stream clusterSetRNGStream(cl, iseed = "12345") # control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) # function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } # export objects to workers clusterExport(cl, c("sc", "cc", "sim")) # run simulation on cluster results <- clusterRunSimulation(cl, eusilcP, sc, contControl = cc, fun = sim) # stop cluster stopCluster(cl) # explore results head(results) aggregate(results) tv <- mean(eusilcP$eqIncome) # true population mean plot(results, true = tv) ## model-based simulation # start cluster cl <- makeCluster(2, type = "PSOCK") # load package on workers clusterEvalQ(cl, library(simFrame)) # set up random number stream clusterSetRNGStream(cl, iseed = "12345") # function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } # control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) # function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } # export objects to workers clusterExport(cl, c("rgnorm", "means", "dc", "cc", "sim")) # run simulation on cluster results <- clusterRunSimulation(cl, dc, nrep = 100, contControl = cc, design = "group", fun = sim) # stop cluster stopCluster(cl) # explore results head(results) aggregate(results) plot(results, true = means) ## End(Not run)
## Not run: ## these examples requires at least a dual core processor ## design-based simulation data(eusilcP) #load data # start cluster cl <- makeCluster(2, type = "PSOCK") # load package and data on workers clusterEvalQ(cl, { library(simFrame) data(eusilcP) }) # set up random number stream clusterSetRNGStream(cl, iseed = "12345") # control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) # function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } # export objects to workers clusterExport(cl, c("sc", "cc", "sim")) # run simulation on cluster results <- clusterRunSimulation(cl, eusilcP, sc, contControl = cc, fun = sim) # stop cluster stopCluster(cl) # explore results head(results) aggregate(results) tv <- mean(eusilcP$eqIncome) # true population mean plot(results, true = tv) ## model-based simulation # start cluster cl <- makeCluster(2, type = "PSOCK") # load package on workers clusterEvalQ(cl, library(simFrame)) # set up random number stream clusterSetRNGStream(cl, iseed = "12345") # function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } # control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) # function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } # export objects to workers clusterExport(cl, c("rgnorm", "means", "dc", "cc", "sim")) # run simulation on cluster results <- clusterRunSimulation(cl, dc, nrep = 100, contControl = cc, design = "group", fun = sim) # stop cluster stopCluster(cl) # explore results head(results) aggregate(results) plot(results, true = means) ## End(Not run)
Generic function for setting up multiple samples on a cluster.
clusterSetup(cl, x, control, ...) ## S4 method for signature 'ANY,data.frame,SampleControl' clusterSetup(cl, x, control)
clusterSetup(cl, x, control, ...) ## S4 method for signature 'ANY,data.frame,SampleControl' clusterSetup(cl, x, control)
cl |
a cluster as generated by |
x |
the |
control |
a control object inheriting from the virtual class
|
... |
if |
A fundamental design principle of the framework in the case of design-based simulation studies is that the sampling procedure is separated from the simulation procedure. Two main advantages arise from setting up all samples in advance.
First, the repeated sampling reduces overall computation time dramatically in certain situations, since computer-intensive tasks like stratification need to be performed only once. This is particularly relevant for large population data. In close-to-reality simulation studies carried out in research projects in survey statistics, often up to 10000 samples are drawn from a population of millions of individuals with stratified sampling designs. For such large data sets, stratification takes a considerable amount of time and is a very memory-intensive task. If the samples are taken on-the-fly, i.e., in every simulation run one sample is drawn, the function to take the stratified sample would typically split the population into the different strata in each of the 10000 simulation runs. If all samples are drawn in advance, on the other hand, the population data need to be split only once and all 10000 samples can be taken from the respective strata together.
Second, the samples can be stored permanently, which simplifies the reproduction of simulation results and may help to maximize comparability of results obtained by different partners in a research project. In particular, this is useful for large population data, when complex sampling techniques may be very time-consuming. In research projects involving different partners, usually different groups investigate different kinds of estimators. If the two groups use not only the same population data, but also the same previously set up samples, their results are highly comparable.
The computational performance of setting up multiple samples can be increased
by parallel computing. Since version 0.5.0, parallel computing in
simFrame
is implemented using the package parallel
, which is
part of the R base distribution since version 2.14.0 and builds upon work
done for the contributed packages multicore
and snow
. Note
that all objects and packages required for the computations (including
simFrame
) need to be made available on every worker process unless the
worker processes are created by forking (see
makeCluster
).
In order to prevent problems with random numbers and to ensure
reproducibility, random number streams should be used. With
parallel
, random number streams can be created via the
function clusterSetRNGStream()
.
The control class "SampleControl"
is highly flexible and allows
stratified sampling as well as sampling of whole groups rather than
individuals with a specified sampling method. Hence it is often sufficient
to implement the desired sampling method for the simple non-stratified case
to extend the existing framework. See "SampleControl"
for some restrictions on the argument names of such a function, which should
return a vector containing the indices of the sampled observations.
Nevertheless, for very complex sampling procedures, it is possible to define
a control class "MySampleControl"
extending
"VirtualSampleControl"
, and the corresponding method
clusterSetup(cl, x, control)
with signature 'ANY, data.frame,
MySampleControl'
. In order to optimize computational performance, it is
necessary to efficiently set up multiple samples. Thereby the slot k
of "VirtualSampleControl"
needs to be used to control the number of
samples, and the resulting object must be of class
"SampleSetup"
.
An object of class "SampleSetup"
.
cl = "ANY", x = "data.frame", control = "character"
set up
multiple samples on a cluster using a control class specified
by the character string control
. The slots of the control object
may be supplied as additional arguments.
cl = "ANY", x = "data.frame", control = "missing"
set up
multiple samples on a cluster using a control object of class
"SampleControl"
. Its slots may be supplied as additional arguments.
cl = "ANY", x = "data.frame", control = "SampleControl"
set up
multiple samples on a cluster as defined by the control object
control
.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
L'Ecuyer, P., Simard, R., Chen E and Kelton, W. (2002) An Object-Oriented Random-Number Package with Many Long Streams and Substreams. Operations Research, 50(6), 1073–1075.
Rossini, A., Tierney L. and Li, N. (2007) Simple Parallel Statistical Computing in R. Journal of Computational and Graphical Statistics, 16(2), 399–420.
Tierney, L., Rossini, A. and Li, N. (2009) snow
: A Parallel Computing
Framework for the R System. International Journal of Parallel
Programming, 37(1), 78–90.
makeCluster
,
clusterSetRNGStream
,
setup
, draw
,
"SampleControl"
, "TwoStageControl"
,
"VirtualSampleControl"
,
"SampleSetup"
## Not run: # these examples require at least a dual core processor # load data data(eusilcP) # start cluster cl <- makeCluster(2, type = "PSOCK") # load package and data on workers clusterEvalQ(cl, { library(simFrame) data(eusilcP) }) # set up random number stream clusterSetRNGStream(cl, iseed = "12345") # simple random sampling srss <- clusterSetup(cl, eusilcP, size = 20, k = 4) summary(srss) draw(eusilcP[, c("id", "eqIncome")], srss, i = 1) # group sampling gss <- clusterSetup(cl, eusilcP, grouping = "hid", size = 10, k = 4) summary(gss) draw(eusilcP[, c("hid", "id", "eqIncome")], gss, i = 2) # stratified simple random sampling ssrss <- clusterSetup(cl, eusilcP, design = "region", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(ssrss) draw(eusilcP[, c("id", "region", "eqIncome")], ssrss, i = 3) # stratified group sampling sgss <- clusterSetup(cl, eusilcP, design = "region", grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(sgss) draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgss, i = 4) # stop cluster stopCluster(cl) ## End(Not run)
## Not run: # these examples require at least a dual core processor # load data data(eusilcP) # start cluster cl <- makeCluster(2, type = "PSOCK") # load package and data on workers clusterEvalQ(cl, { library(simFrame) data(eusilcP) }) # set up random number stream clusterSetRNGStream(cl, iseed = "12345") # simple random sampling srss <- clusterSetup(cl, eusilcP, size = 20, k = 4) summary(srss) draw(eusilcP[, c("id", "eqIncome")], srss, i = 1) # group sampling gss <- clusterSetup(cl, eusilcP, grouping = "hid", size = 10, k = 4) summary(gss) draw(eusilcP[, c("hid", "id", "eqIncome")], gss, i = 2) # stratified simple random sampling ssrss <- clusterSetup(cl, eusilcP, design = "region", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(ssrss) draw(eusilcP[, c("id", "region", "eqIncome")], ssrss, i = 3) # stratified group sampling sgss <- clusterSetup(cl, eusilcP, design = "region", grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(sgss) draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgss, i = 4) # stop cluster stopCluster(cl) ## End(Not run)
Generic function for contaminating data.
contaminate(x, control, ...) ## S4 method for signature 'data.frame,ContControl' contaminate(x, control, i)
contaminate(x, control, ...) ## S4 method for signature 'data.frame,ContControl' contaminate(x, control, i)
x |
the data to be contaminated. |
control |
a control object of a class inheriting from the virtual class
|
i |
an integer giving the element of the slot |
... |
if |
With the control classes implemented in simFrame, contamination is modeled as a two-step process. The first step is to select observations to be contaminated, the second is to model the distribution of the outliers.
In order to extend the framework by a user-defined control class
"MyContControl"
(which must extend
"VirtualContControl"
), a method
contaminate(x, control, i)
with signature
'data.frame, MyContControl'
needs to be implemented. In case the
contaminated observations need to be identified at a later stage of the
simulation, e.g., if conflicts with inserting missing values should be
avoided, a logical indicator variable ".contaminated"
should be added
to the returned data set.
A data.frame
containing the contaminated data. In addition, the
column ".contaminated"
, which consists of logicals indicating the
contaminated observations, is added to the data.frame
.
x = "data.frame", control = "character"
contaminate data using
a control class specified by the character string control
. The
slots of the control object may be supplied as additional arguments.
x = "data.frame", control = "ContControl"
contaminate data as
defined by the control object control
.
x = "data.frame", control = "missing"
contaminate data using a
control object of class "ContControl"
. Its slots may be supplied as
additional arguments.
Since version 0.3, contaminate
no longer checks if the auxiliary
variable with probability weights are numeric and contain only finite positive
values (sample
still throws an error in these cases). This has
been removed to improve computational performance in simulation studies.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
Alfons, A., Templ, M. and Filzmoser, P. (2010) Contamination Models in the R Package simFrame for Statistical Simulation. In Aivazian, S., Filzmoser, P. and Kharin, Y. (editors) Computer Data Analysis and Modeling: Complex Stochastic Data and Systems, volume 2, 178–181. Minsk. ISBN 978-985-476-848-9.
Béguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34(1), 91–103.
Hulliger, B. and Schoch, T. (2009) Robust Multivariate Imputation with Survey Data. 57th Session of the International Statistical Institute, Durban.
"DCARContControl"
, "DARContControl"
,
"ContControl"
, "VirtualContControl"
## distributed completely at random data(eusilcP) sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20) # using a control object dcarc <- ContControl(target = "eqIncome", epsilon = 0.05, dots = list(mean = 5e+05, sd = 10000), type = "DCAR") contaminate(sam, dcarc) # supply slots of control object as arguments contaminate(sam, target = "eqIncome", epsilon = 0.05, dots = list(mean = 5e+05, sd = 10000)) ## distributed at random foo <- generate(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2)) # using a control object darc <- DARContControl(target = "V1", epsilon = 0.2, fun = function(x) x * 100) contaminate(foo, darc) # supply slots of control object as arguments contaminate(foo, "DARContControl", target = "V1", epsilon = 0.2, fun = function(x) x * 100)
## distributed completely at random data(eusilcP) sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20) # using a control object dcarc <- ContControl(target = "eqIncome", epsilon = 0.05, dots = list(mean = 5e+05, sd = 10000), type = "DCAR") contaminate(sam, dcarc) # supply slots of control object as arguments contaminate(sam, target = "eqIncome", epsilon = 0.05, dots = list(mean = 5e+05, sd = 10000)) ## distributed at random foo <- generate(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2)) # using a control object darc <- DARContControl(target = "V1", epsilon = 0.2, fun = function(x) x * 100) contaminate(foo, darc) # supply slots of control object as arguments contaminate(foo, "DARContControl", target = "V1", epsilon = 0.2, fun = function(x) x * 100)
Create objects of a class inheriting from "ContControl"
.
ContControl(..., type = c("DCAR", "DAR"))
ContControl(..., type = c("DCAR", "DAR"))
... |
arguments passed to |
type |
a character string specifying whether a control object of class
|
If type = "DCAR"
, an object of class "DCARContControl"
.
If type = "DAR"
, an object of class "DARContControl"
.
This constructor exists mainly for back compatibility with early draft
versions of simFrame
.
Andreas Alfons
"DCARContControl"
, "DARContControl"
,
"ContControl"
## distributed completely at random data(eusilcP) sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20) dcarc <- ContControl(target = "eqIncome", epsilon = 0.05, dots = list(mean = 5e+05, sd = 10000), type = "DCAR") contaminate(sam, dcarc) ## distributed at random foo <- generate(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2)) darc <- ContControl(target = "V1", epsilon = 0.2, fun = function(x) x * 100, type = "DAR") contaminate(foo, darc)
## distributed completely at random data(eusilcP) sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20) dcarc <- ContControl(target = "eqIncome", epsilon = 0.05, dots = list(mean = 5e+05, sd = 10000), type = "DCAR") contaminate(sam, dcarc) ## distributed at random foo <- generate(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2)) darc <- ContControl(target = "V1", epsilon = 0.2, fun = function(x) x * 100, type = "DAR") contaminate(foo, darc)
Virtual class for controlling contamination in a simulation experiment (used internally).
A virtual Class: No objects may be created from it.
target
:Object of class "OptCharacter"
; a character
vector specifying specifying the variables (columns) to be contaminated,
or NULL
to contaminate all variables (except the additional ones
generated internally).
epsilon
:Object of class "numeric"
giving the
contamination levels.
grouping
:Object of class "character"
specifying a
grouping variable (column) to be used for contaminating whole groups
rather than individual observations.
aux
:Object of class "character"
specifying an
auxiliary variable (column) whose values are used as probability weights
for selecting the items (observations or groups) to be contaminated.
Class "VirtualContControl"
, directly.
Class "OptContControl"
, by class "VirtualContControl",
distance 2.
In addition to the accessor and mutator methods for the slots inherited from
"VirtualContControl"
, the following are available:
getGrouping
signature(x = "ContControl")
: get slot
grouping
.
setGrouping
signature(x = "ContControl")
: set slot
grouping
.
getAux
signature(x = "ContControl")
: get slot
aux
.
setAux
signature(x = "ContControl")
: set slot
aux
.
In addition to the methods inherited from
"VirtualContControl"
, the following are available:
contaminate
signature(x = "data.frame",
control = "ContControl")
: contaminate data.
show
signature(object = "ContControl")
: print the
object on the R console.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
The slot grouping
was named group
prior to version 0.2.
Renaming the slot was necessary since accessor and mutator functions were
introduced in this version and a function named getGroup
already
exists.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
"DCARContControl"
, "DARContControl"
,
"VirtualContControl"
, contaminate
showClass("ContControl")
showClass("ContControl")
Class for controlling contamination in a simulation experiment. The values of the contaminated observations will be distributed at random (DAR), i.e., they will depend on on the original values.
Objects can be created by calls of the form
new("DARContControl", ...)
, DARContControl(...)
or
ContControl(..., type="DAR")
.
target
:Object of class "OptCharacter"
; a character
vector specifying specifying the variables (columns) to be contaminated,
or NULL
to contaminate all variables (except the additional ones
generated internally).
epsilon
:Object of class "numeric"
giving the
contamination levels.
grouping
:Object of class "character"
specifying a
grouping variable (column) to be used for contaminating whole groups
rather than individual observations.
aux
:Object of class "character"
specifying an
auxiliary variable (column) whose values are used as probability weights
for selecting the items (observations or groups) to be contaminated.
fun
:Object of class "function"
generating
the values of the contamination data. The original values of the
observations to be contaminated will be passed as its first argument.
Furthermore, it should return an object that can be coerced to a
data.frame
, containing the contamination data.
dots
:Object of class "list"
containing additional
arguments to be passed to fun
.
Class "ContControl"
, directly.
Class "VirtualContControl"
, by class "ContControl", distance 2.
Class "OptContControl"
, by class "ContControl", distance 3.
With this control class, contamination is modeled as a two-step process. The
first step is to select observations to be contaminated, the second is to
model the distribution of the outliers. In this case, the original values
will be modified by the function given by slot fun
, i.e., values of
the contaminated observations will depend on on the original values.
In addition to the accessor and mutator methods for the slots inherited from
"ContControl"
, the following are available:
getFun
signature(x = "DARContControl")
: get slot
fun
.
setFun
signature(x = "DARContControl")
: set slot
fun
.
getDots
signature(x = "DARContControl")
: get slot
dots
.
setDots
signature(x = "DARContControl")
: set slot
dots
.
Methods are inherited from "ContControl"
.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
The slot grouping
was named group
prior to version 0.2.
Renaming the slot was necessary since accessor and mutator functions were
introduced in this version and a function named getGroup
already
exists.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
Alfons, A., Templ, M. and Filzmoser, P. (2010) Contamination Models in the R Package simFrame for Statistical Simulation. In Aivazian, S., Filzmoser, P. and Kharin, Y. (editors) Computer Data Analysis and Modeling: Complex Stochastic Data and Systems, volume 2, 178–181. Minsk. ISBN 978-985-476-848-9.
Béguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34(1), 91–103.
Hulliger, B. and Schoch, T. (2009) Robust Multivariate Imputation with Survey Data. 57th Session of the International Statistical Institute, Durban.
"DCARContControl"
, "ContControl"
,
"VirtualContControl"
, contaminate
foo <- generate(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2)) cc <- DARContControl(target = "V1", epsilon = 0.2, fun = function(x) x * 100) contaminate(foo, cc)
foo <- generate(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2)) cc <- DARContControl(target = "V1", epsilon = 0.2, fun = function(x) x * 100) contaminate(foo, cc)
Class for controlling model-based generation of data.
Objects can be created by calls of the form new("DataControl", ...)
or
DataControl(...)
.
size
:Object of class "numeric"
giving the number of
observations to be generated.
distribution
:Object of class "function"
generating
the data, e.g., rnorm
(the default) or
rmvnorm
from package mvtnorm. It should take a positive
integer as its first argument, giving the number of observations to be
generated, and return an object that can be coerced to a
data.frame
.
dots
:Object of class "list"
containing additional
arguments to be passed to distribution
.
colnames
:Object of class "OptCharacter"
; a character
vector to be used as column names for the generated data.frame
, or
NULL
.
Class "VirtualDataControl"
, directly.
Class "OptDataControl"
, by class "VirtualDataControl", distance 2.
getSize
signature(x = "DataControl")
: get slot
size
.
setSize
signature(x = "DataControl")
: set slot
size
.
getDistribution
signature(x = "DataControl")
: get slot
distribution
.
setDistribution
signature(x = "DataControl")
: set slot
distribution
.
getDots
signature(x = "DataControl")
: get slot
dots
.
setDots
signature(x = "DataControl")
: set slot
dots
.
getColnames
signature(x = "DataControl")
: get slot
colnames
.
setColnames
signature(x = "DataControl")
: set slot
colnames
.
In addition to the methods inherited from
"VirtualDataControl"
, the following are available:
generate
signature(control = "DataControl")
: generate
data.
show
signature(object = "DataControl")
: print the
object on the R console.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
"VirtualDataControl"
, generate
dc <- DataControl(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2)) generate(dc)
dc <- DataControl(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2)) generate(dc)
Class for controlling contamination in a simulation experiment. The values of the contaminated observations will be distributed completely at random (DCAR), i.e., they will not depend on on the original values.
Objects can be created by calls of the form
new("DCARContControl", ...)
, DCARContControl(...)
or
ContControl(..., type="DCAR")
(the latter exists mainly for back
compatibility with early draft versions of simFrame
).
target
:Object of class "OptCharacter"
; a character
vector specifying specifying the variables (columns) to be contaminated,
or NULL
to contaminate all variables (except the additional ones
generated internally).
epsilon
:Object of class "numeric"
giving the
contamination levels.
grouping
:Object of class "character"
specifying a
grouping variable (column) to be used for contaminating whole groups
rather than individual observations (the same values are used for all
observations in the same group).
aux
:Object of class "character"
specifying an
auxiliary variable (column) whose values are used as probability weights
for selecting the items (observations or groups) to be contaminated.
distribution
:Object of class "function"
generating
the values of the contamination data, e.g.,
rnorm
(the default) or rmvnorm
from
package mvtnorm. It should take a non-negative integer as its
first argument, giving the number of items to be created, and return an
object that can be coerced to a data.frame
, containing the
contamination data.
dots
:Object of class "list"
containing additional
arguments to be passed to distribution
.
Class "ContControl"
, directly.
Class "VirtualContControl"
, by class "ContControl", distance 2.
Class "OptContControl"
, by class "ContControl", distance 3.
With this control class, contamination is modeled as a two-step process. The
first step is to select observations to be contaminated, the second is to
model the distribution of the outliers. In this case, the values of the
contaminated observations will be generated by the function given by slot
fun
and will not depend on on the original values.
In addition to the accessor and mutator methods for the slots inherited from
"ContControl"
, the following are available:
getDistribution
signature(x = "DCARContControl")
: get
slot distribution
.
setDistribution
signature(x = "DCARContControl")
: set
slot distribution
.
getDots
signature(x = "DCARContControl")
: get slot
dots
.
setDots
signature(x = "DCARContControl")
: set slot
dots
.
Methods are inherited from "ContControl"
.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
The slot grouping
was named group
prior to version 0.2.
Renaming the slot was necessary since accessor and mutator functions were
introduced in this version and a function named getGroup
already
exists.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
Alfons, A., Templ, M. and Filzmoser, P. (2010) Contamination Models in the R Package simFrame for Statistical Simulation. In Aivazian, S., Filzmoser, P. and Kharin, Y. (editors) Computer Data Analysis and Modeling: Complex Stochastic Data and Systems, volume 2, 178–181. Minsk. ISBN 978-985-476-848-9.
Béguin, C. and Hulliger, B. (2008) The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology, 34(1), 91–103.
Hulliger, B. and Schoch, T. (2009) Robust Multivariate Imputation with Survey Data. 57th Session of the International Statistical Institute, Durban.
"DARContControl"
, "ContControl"
,
"VirtualContControl"
, contaminate
data(eusilcP) sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20) cc <- DCARContControl(target = "eqIncome", epsilon = 0.05, dots = list(mean = 5e+05, sd = 10000)) contaminate(sam, cc)
data(eusilcP) sam <- draw(eusilcP[, c("id", "eqIncome")], size = 20) cc <- DCARContControl(target = "eqIncome", epsilon = 0.05, dots = list(mean = 5e+05, sd = 10000)) contaminate(sam, cc)
Generic function for drawing a sample.
draw(x, setup, ...) ## S4 method for signature 'data.frame,SampleSetup' draw(x, setup, i = 1) ## S4 method for signature 'data.frame,VirtualSampleControl' draw(x, setup)
draw(x, setup, ...) ## S4 method for signature 'data.frame,SampleSetup' draw(x, setup, i = 1) ## S4 method for signature 'data.frame,VirtualSampleControl' draw(x, setup)
x |
the data to sample from. |
setup |
an object of class |
i |
an integer specifying which one of the previously set up samples should be drawn. |
... |
if |
A data.frame
containing the sampled observations. In addition, the
column ".weight"
, which consists of the sample weights, is added to
the data.frame
.
x = "data.frame", setup = "character"
draw a sample using a
control class specified by the character string setup
. The slots of
the control object may be supplied as additional arguments.
x = "data.frame", setup = "missing"
draw a sample using a
control object of class "SampleControl"
. Its slots may be supplied
as additional arguments.
x = "data.frame", setup = "SampleSetup"
draw a previously set up sample.
x = "data.frame", setup = "VirtualSampleControl"
draw a sample
using a control object inheriting from the virtual class
"VirtualSampleControl"
.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
setup
, "SampleSetup"
,
"SampleControl"
, "TwoStageControl"
,
"VirtualSampleControl"
## load data data(eusilcP) ## simple random sampling draw(eusilcP[, c("id", "eqIncome")], size = 20) ## group sampling draw(eusilcP[, c("hid", "id", "eqIncome")], grouping = "hid", size = 10) ## stratified simple random sampling draw(eusilcP[, c("id", "region", "eqIncome")], design = "region", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2)) ## stratified group sampling draw(eusilcP[, c("hid", "id", "region", "eqIncome")], design = "region", grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2))
## load data data(eusilcP) ## simple random sampling draw(eusilcP[, c("id", "eqIncome")], size = 20) ## group sampling draw(eusilcP[, c("hid", "id", "eqIncome")], grouping = "hid", size = 10) ## stratified simple random sampling draw(eusilcP[, c("id", "region", "eqIncome")], design = "region", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2)) ## stratified group sampling draw(eusilcP[, c("hid", "id", "region", "eqIncome")], design = "region", grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2))
This data set is synthetically generated from real Austrian EU-SILC (European Union Statistics on Income and Living Conditions) data.
data(eusilcP)
data(eusilcP)
A data.frame
with 58 654 observations on the following 28 variables:
hid
integer; the household ID.
region
factor; the federal state in which the household is
located (levels Burgenland
, Carinthia
,
Lower Austria
, Salzburg
, Styria
, Tyrol
,
Upper Austria
, Vienna
and Vorarlberg
).
hsize
integer; the number of persons in the household.
eqsize
numeric; the equivalized household size according to the modified OECD scale.
eqIncome
numeric; a simplified version of the equivalized household income.
pid
integer; the personal ID.
the household ID combined with the personal ID. The first five digits represent the household ID, the last two digits the personal ID (both with leading zeros).
age
integer; the person's age.
gender
factor; the person's gender (levels male
and
female
).
ecoStat
factor; the person's economic status (levels
1
= working full time, 2
= working part time, 3
=
unemployed, 4
= pupil, student, further training or unpaid work
experience or in compulsory military or community service, 5
= in
retirement or early retirement or has given up business, 6
=
permanently disabled or/and unfit to work or other inactive person,
7
= fulfilling domestic tasks and care responsibilities).
citizenship
factor; the person's citizenship (levels
AT
, EU
and Other
).
py010n
numeric; employee cash or near cash income (net).
py050n
numeric; cash benefits or losses from self-employment (net).
py090n
numeric; unemployment benefits (net).
py100n
numeric; old-age benefits (net).
py110n
numeric; survivor's benefits (net).
py120n
numeric; sickness benefits (net).
py130n
numeric; disability benefits (net).
py140n
numeric; education-related allowances (net).
hy040n
numeric; income from rental of a property or land (net).
hy050n
numeric; family/children related allowances (net).
hy070n
numeric; housing allowances (net).
hy080n
numeric; regular inter-household cash transfer received (net).
hy090n
numeric; interest, dividends, profit from capital investments in unincorporated business (net).
hy110n
numeric; income received by people aged under 16 (net).
hy130n
numeric; regular inter-household cash transfer paid (net).
hy145n
numeric; repayments/receipts for tax adjustment (net).
main
logical; indicates the main income holder (i.e., the person with the highest income) of each household.
The data set is used as population data in some of the examples in package
simFrame
. Note that it is included for illustrative purposes only. It
consists of 25 000 households, hence it does not represent the true population
sizes of Austria and its regions.
Only a few of the large number of variables in the original survey are included
in this example data set. Some variable names are different from the
standardized names used by the statistical agencies, as the latter are rather
cryptic codes. Furthermore, the variables hsize
, eqsize
,
eqIncome
and age
are not included in the standardized format of
EU-SILC data, but have been derived from other variables for convenience.
Moreover, some very sparse income components were not included in the the
generation of this synthetic data set. Thus the equivalized household income is
computed from the available income components.
This is a synthetic data set based on Austrian EU-SILC data from 2006. The original sample was provided by Statistics Austria.
Eurostat (2004) Description of target variables: Cross-sectional and longitudinal. EU-SILC 065/04, Eurostat.
data(eusilcP) summary(eusilcP) strata <- stratify(eusilcP, c("region", "gender")) summary(strata)
data(eusilcP) summary(eusilcP) strata <- stratify(eusilcP, c("region", "gender")) summary(strata)
Generic function for generating data based on a (distribution) model.
generate(control, ...) ## S4 method for signature 'DataControl' generate(control)
generate(control, ...) ## S4 method for signature 'DataControl' generate(control)
control |
a control object inheriting from the virtual class
|
... |
if |
The control class "DataControl"
is quite simple but general. For
user-defined data generation, it often suffices to implement a function and
use it as the distribution
slot in the "DataControl"
object.
See "DataControl"
for some requirements for such a
function.
However, if more specialized data generation models are required, the
framework can be extended by defining a control class "MyDataControl"
extending "VirtualDataControl"
and the corresponding
method generate(control)
with signature 'MyDataControl'
. If,
e.g., a specific distribution or mixture of distributions is frequently used
in simulation experiments, a distinct control class may be more convenient
for the user.
A data.frame
.
control = "character"
generate data using a control class
specified by the character string control
. The slots of the control
object may be supplied as additional arguments.
control = "missing"
generate data using a control object of
class "DataControl"
. Its slots may be supplied as additional
arguments.
control = "DataControl"
generate data as defined by the control
object control
.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
"DataControl"
, "VirtualDataControl"
# using a control object dc <- DataControl(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2)) generate(dc) # supply slots of control object as arguments generate(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2))
# using a control object dc <- DataControl(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2)) generate(dc) # supply slots of control object as arguments generate(size = 10, distribution = rnorm, dots = list(mean = 0, sd = 2))
Return the first parts of an object.
## S4 method for signature 'SampleSetup' head(x, k = 6, n = 6, ...) ## S4 method for signature 'SimControl' head(x) ## S4 method for signature 'SimResults' head(x, ...) ## S4 method for signature 'Strata' head(x, ...) ## S4 method for signature 'VirtualContControl' head(x) ## S4 method for signature 'VirtualDataControl' head(x) ## S4 method for signature 'VirtualNAControl' head(x) ## S4 method for signature 'VirtualSampleControl' head(x)
## S4 method for signature 'SampleSetup' head(x, k = 6, n = 6, ...) ## S4 method for signature 'SimControl' head(x) ## S4 method for signature 'SimResults' head(x, ...) ## S4 method for signature 'Strata' head(x, ...) ## S4 method for signature 'VirtualContControl' head(x) ## S4 method for signature 'VirtualDataControl' head(x) ## S4 method for signature 'VirtualNAControl' head(x) ## S4 method for signature 'VirtualSampleControl' head(x)
x |
an object. |
k |
for objects of class |
n |
for objects of class |
... |
additional arguments to be passed down to methods. |
An object of the same class as x
, but in general smaller. See the
“Methods” section below for details.
signature(x = "SampleSetup")
returns the first parts of set up
samples. The first n
indices of each of the first k
set up
samples are kept.
signature(x = "SimControl")
currently returns the object itself.
signature(x = "SimResults")
returns the first parts of
simulation results. The method of head
for the
data.frame
in slot values
is thereby called.
signature(x = "Strata")
returns the first parts of strata
information. The method of head
for the vector in
slot values
is thereby called and the slots split
and
size
are adapted accordingly.
signature(x = "VirtualContControl")
currently returns the object itself.
signature(x = "VirtualDataControl")
currently returns the object itself.
signature(x = "VirtualNAControl")
currently returns the object itself.
signature(x = "VirtualSampleControl")
currently returns the object itself.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
head
, "SampleSetup"
,
"SimResults"
, "Strata"
## load data data(eusilcP) ## class "SampleSetup" # set up samples using group sampling set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50) summary(set) # get the first 10 indices of each of the first 5 samples head(set, k = 5, n = 10) ## class "Strata" # set up samples using group sampling strata <- stratify(eusilcP, "region") summary(strata) # get strata information for the first 10 observations head(strata, 10)
## load data data(eusilcP) ## class "SampleSetup" # set up samples using group sampling set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50) summary(set) # get the first 10 indices of each of the first 5 samples head(set, k = 5, n = 10) ## class "Strata" # set up samples using group sampling strata <- stratify(eusilcP, "region") summary(strata) # get strata information for the first 10 observations head(strata, 10)
Get the first-order inclusion probabilities from a vector of probability weights.
inclusionProb(prob, size)
inclusionProb(prob, size)
prob |
a numeric vector of non-negative probability weights. |
size |
a non-negative integer giving the sample size. |
A numeric vector of the first-order inclusion probabilities.
This is a faster C++ implementation of
inclusionprobabilities
from package sampling
.
Andreas Alfons
setup
, "SampleSetup"
pweights <- sample(1:5, 25, replace = TRUE) inclusionProb(pweights, 10)
pweights <- sample(1:5, 25, replace = TRUE) inclusionProb(pweights, 10)
Get the length of an object.
## S4 method for signature 'SampleSetup' length(x) ## S4 method for signature 'VirtualContControl' length(x) ## S4 method for signature 'VirtualNAControl' length(x) ## S4 method for signature 'VirtualSampleControl' length(x)
## S4 method for signature 'SampleSetup' length(x) ## S4 method for signature 'VirtualContControl' length(x) ## S4 method for signature 'VirtualNAControl' length(x) ## S4 method for signature 'VirtualSampleControl' length(x)
x |
an object. |
An integer giving the length of the object. See the “Methods” section below for details.
signature(x = "SampleSetup")
get the number of set up samples.
signature(x = "VirtualContControl")
get the number of contamination levels to be used.
signature(x = "VirtualNAControl")
get the number of missing
value rates to be used (the length in case of a vector in slot
NArate
or the number of rows in case of a matrix).
signature(x = "VirtualSampleControl")
get the number of samples to be set up.
Andreas Alfons
## load data data(eusilcP) ## class "SampleSetup" # set up samples using group sampling set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50) summary(set) length(set) ## class "ContControl" cc <- ContControl(target = "eqIncome", epsilon = c(0, 0.0025, 0.005, 0.0075, 0.01), dots = list(mean = 5e+05, sd = 10000)) length(cc) ## class "NAControl" nc <- NAControl(target = "eqIncome", NArate = c(0.1, 0.2, 0.3)) length(nc)
## load data data(eusilcP) ## class "SampleSetup" # set up samples using group sampling set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50) summary(set) length(set) ## class "ContControl" cc <- ContControl(target = "eqIncome", epsilon = c(0, 0.0025, 0.005, 0.0075, 0.01), dots = list(mean = 5e+05, sd = 10000)) length(cc) ## class "NAControl" nc <- NAControl(target = "eqIncome", NArate = c(0.1, 0.2, 0.3)) length(nc)
Class for controlling the insertion of missing values in a simulation experiment.
Objects can be created by calls of the form new("NAControl", ...)
or
NAControl(...)
.
target
:Object of class "OptCharacter"
; a character
vector specifying the variables (columns) in which missing values should
be inserted, or NULL
to insert missing values in all variables
(except the additional ones generated internally).
NArate
:Object of class "NumericMatrix"
giving the
missing value rates, which may be selected individually for the target
variables. In case of a vector, the same missing value rates are used for
all target variables. In case of a matrix, on the other hand, the missing
value rates to be used for each target variable are given by the
respective column.
grouping
:Object of class "character"
specifying a
grouping variable (column) to be used for setting whole groups to
NA
rather than individual values.
aux
:Object of class "character"
specifying auxiliary
variables (columns) whose values are used as probability weights for
selecting the values to be set to NA
in the respective target
variables. If only one variable (column) is specified, it is used for
all target variables.
intoContamination
:Object of class "logical"
indicating whether missing values should also be inserted into
contaminated observations. The default is to insert missing values only
into non-contaminated observations.
Class "VirtualNAControl"
, directly.
Class "OptNAControl"
, by class "VirtualNAControl",
distance 2.
In addition to the accessor and mutator methods for the slots inherited from
"VirtualNAControl"
, the following are available:
getGrouping
signature(x = "NAControl")
: get slot
grouping
.
setGrouping
signature(x = "NAControl")
: set slot
grouping
.
getAux
signature(x = "NAControl")
: get slot
aux
.
setAux
signature(x = "NAControl")
: set slot
aux
.
getIntoContamination
signature(x = "NAControl")
: get
slot intoContamination
.
setIntoContamination
signature(x = "NAControl")
: set
slot intoContamination
.
In addition to the methods inherited from
"VirtualNAControl"
, the following are available:
setNA
signature(x = "data.frame",
control = "NAControl")
: set missing values.
show
signature(object = "NAControl")
: print the object
on the R console.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Since version 0.3, this control class now allows to specify an auxiliary variable with probability weights for each target variable.
The slot grouping
was named group
prior to version 0.2.
Renaming the slot was necessary since accessor and mutator functions were
introduced in this version and a function named getGroup
already
exists.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
data(eusilcP) eusilcP$age[eusilcP$age < 0] <- 0 # this actually occurs sam <- draw(eusilcP[, c("id", "age", "eqIncome")], size = 20) ## missing completely at random mcarc <- NAControl(target = "eqIncome", NArate = 0.2) setNA(sam, mcarc) ## missing at random marc <- NAControl(target = "eqIncome", NArate = 0.2, aux = "age") setNA(sam, marc) ## missing not at random mnarc <- NAControl(target = "eqIncome", NArate = 0.2, aux = "eqIncome") setNA(sam, mnarc)
data(eusilcP) eusilcP$age[eusilcP$age < 0] <- 0 # this actually occurs sam <- draw(eusilcP[, c("id", "age", "eqIncome")], size = 20) ## missing completely at random mcarc <- NAControl(target = "eqIncome", NArate = 0.2) setNA(sam, mcarc) ## missing at random marc <- NAControl(target = "eqIncome", NArate = 0.2, aux = "age") setNA(sam, marc) ## missing not at random mnarc <- NAControl(target = "eqIncome", NArate = 0.2, aux = "eqIncome") setNA(sam, mnarc)
Virtual class used internally for convenience.
A virtual Class: No objects may be created from it.
No methods defined with class "NumericMatrix"
in the signature.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
showClass("NumericMatrix")
showClass("NumericMatrix")
Virtual class used internally for convenience.
A virtual Class: No objects may be created from it.
No methods defined with class "OptBasicVector"
in the signature.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
showClass("OptBasicVector")
showClass("OptBasicVector")
Virtual class used internally for convenience.
A virtual Class: No objects may be created from it.
No methods defined with class "OptCall"
in the signature.
Andreas Alfons
showClass("OptCall")
showClass("OptCall")
Virtual class used internally for convenience.
A virtual Class: No objects may be created from it.
No methods defined with class "OptCharacter"
in the signature.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
showClass("OptCharacter")
showClass("OptCharacter")
Virtual class used internally for convenience.
A virtual Class: No objects may be created from it.
No methods defined with class "OptContControl"
in the signature.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
showClass("OptContControl")
showClass("OptContControl")
Virtual class used internally for convenience.
A virtual Class: No objects may be created from it.
No methods defined with class "OptDataControl"
in the signature.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
showClass("OptDataControl")
showClass("OptDataControl")
Virtual class used internally for convenience.
A virtual Class: No objects may be created from it.
No methods defined with class "OptNAControl"
in the signature.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
showClass("OptNAControl")
showClass("OptNAControl")
Virtual class used internally for convenience.
A virtual Class: No objects may be created from it.
No methods defined with class "OptNumeric"
in the signature.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
showClass("OptNumeric")
showClass("OptNumeric")
Virtual class used internally for convenience.
A virtual Class: No objects may be created from it.
No methods defined with class "OptSampleControl"
in the signature.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
showClass("OptSampleControl")
showClass("OptSampleControl")
Plot simulation results. A suitable plot function is selected automatically, depending on the structure of the results.
## S4 method for signature 'SimResults,missing' plot(x, y , ...)
## S4 method for signature 'SimResults,missing' plot(x, y , ...)
x |
the simulation results. |
y |
not used. |
... |
further arguments to be passed to the selected plot function. |
An object of class "trellis"
. The
update
method can be used to update
components of the object and the print
method (usually called by default) will plot it on an appropriate plotting
device.
The results of simulation experiments with at most one contamination level and at most one missing value rate are visualized by (conditional) box-and-whisker plots. For simulations involving different contamination levels or missing value rates, the average results are plotted against the contamination levels or missing value rates.
x = "SimResults", y = "missing"
plot simulation results.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
simBwplot
, simDensityplot
,
simXyplot
, "SimResults"
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## run simulation results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) ## plot results tv <- mean(eusilcP$eqIncome) # true population mean plot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## run simulation results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) ## plot results plot(results, true = means)
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## run simulation results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) ## plot results tv <- mean(eusilcP$eqIncome) # true population mean plot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## run simulation results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) ## plot results plot(results, true = means)
Generic function for running a simulation experiment.
runSimulation(x, setup, nrep, control, contControl = NULL, NAControl = NULL, design = character(), fun, ..., SAE = FALSE) runSim(...)
runSimulation(x, setup, nrep, control, contControl = NULL, NAControl = NULL, design = character(), fun, ..., SAE = FALSE) runSim(...)
x |
a |
setup |
an object of class |
nrep |
a non-negative integer giving the number of repetitions of the simulation experiment (for model-based simulation, mixed simulation designs or simulation based on real data). |
control |
a control object of class |
contControl |
an object of a class inheriting from
|
NAControl |
an object of a class inheriting from
|
design |
a character vector specifying variables (columns) to be used
for splitting the data into domains. The simulations, including
contamination and the insertion of missing values (unless |
fun |
a function to be applied in each simulation run. |
... |
for |
SAE |
a logical indicating whether small area estimation will be used in the simulation experiment. |
For convenience, the slots of control
may be supplied as arguments.
There are some requirements for slot fun
of the control object
control
. The function must return a numeric vector, or a list with
the two components values
(a numeric vector) and add
(additional results of any class, e.g., statistical models). Note that the
latter is computationally slightly more expensive. A data.frame
is
passed to fun
in every simulation run. The corresponding argument
must be called x
. If comparisons with the original data need to be
made, e.g., for evaluating the quality of imputation methods, the function
should have an argument called orig
. If different domains are used
in the simulation, the indices of the current domain can be passed to the
function via an argument called domain
.
For small area estimation, the following points have to be kept in mind. The
design
for splitting the data must be supplied and SAE
must be set to TRUE
. However, the data are not actually split into
the specified domains. Instead, the whole data set (sample) is passed to
fun
. Also contamination and missing values are added to the whole
data (sample). Last, but not least, the function must have a domain
argument so that the current domain can be extracted from the whole data
(sample).
In every simulation run, fun
is evaluated using try
. Hence
no results are lost if computations fail in any of the simulation runs.
runSim
is a wrapper for runSimulation
.
An object of class "SimResults"
.
x = "ANY", setup = "ANY", nrep = "ANY", control = "missing"
convenience wrapper that allows the slots of control
to be supplied
as arguments
x = "data.frame", setup = "missing", nrep = "missing",
control = "SimControl"
run a simulation experiment based on real data without repetitions (probably useless, but for completeness).
x = "data.frame", setup = "missing", nrep = "numeric",
control = "SimControl"
run a simulation experiment based on real data with repetitions.
x = "data.frame", setup = "SampleSetup", nrep = "missing",
control = "SimControl"
run a design-based simulation experiment with previously set up samples.
x = "data.frame", setup = "VirtualSampleControl",
nrep = "missing", control = "SimControl"
run a design-based simulation experiment.
x = "VirtualDataControl", setup = "missing", nrep = "missing",
control = "SimControl"
run a model-based simulation experiment without repetitions (probably useless, but for completeness).
x = "VirtualDataControl", setup = "missing", nrep = "numeric",
control = "SimControl"
run a model-based simulation experiment with repetitions.
x = "VirtualDataControl", setup = "VirtualSampleControl",
nrep = "missing", control = "SimControl"
run a simulation experiment using a mixed simulation design without repetitions (probably useless, but for completeness).
x = "VirtualDataControl", setup = "VirtualSampleControl",
nrep = "numeric", control = "SimControl"
run a simulation experiment using a mixed simulation design with repetitions.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
"SimControl"
, "SimResults"
,
simBwplot
, simDensityplot
, simXyplot
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## run simulation and explore results results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) head(results) aggregate(results) tv <- mean(eusilcP$eqIncome) # true population mean plot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## run simulation and explore results results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) head(results) aggregate(results) plot(results, true = means)
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## run simulation and explore results results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) head(results) aggregate(results) tv <- mean(eusilcP$eqIncome) # true population mean plot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## run simulation and explore results results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) head(results) aggregate(results) plot(results, true = means)
Class for controlling the setup of samples.
Objects can be created by calls of the form new("SampleControl", ...)
or SampleControl(...)
.
design
:Object of class "BasicVector"
specifying
variables (columns) to be used for stratified sampling.
grouping
:Object of class "BasicVector"
specifying a
grouping variable (column) to be used for sampling whole groups rather
than individual observations.
collect
:Object of class "logical"
; if a grouping
variable is specified and this is FALSE
(which is the default
value), groups are sampled directly. If grouping variable is specified
and this is TRUE
, individuals are sampled in a first step. In a
second step, all individuals that belong to the same group as any of the
sampled individuals are collected and added to the sample. If no
grouping variable is specified, this is ignored.
fun
:Object of class "function"
to be used for
sampling (defaults to srs
). It should return a vector
containing the indices of the sampled items (observations or groups).
size
:Object of class "OptNumeric"
; an optional
non-negative integer giving the number of items (observations or groups)
to sample. In case of stratified sampling, a vector of non-negative
integers, each giving the number of items to sample from the
corresponding stratum, may be supplied.
prob
:Object of class "OptBasicVector"
; an optional
numeric vector giving the probability weights, or a character string or
logical vector specifying a variable (column) that contains the
probability weights.
dots
:Object of class "list"
containing additional
arguments to be passed to fun
.
k
:Object of class "numeric"
; a single positive
integer giving the number of samples to be set up.
There are some restrictions on the argument names of the function
supplied to fun
. If it needs population data as input,
the corresponding argument should be called x
and should expect
a data.frame
. If the sampling method only needs the population size
as input, the argument should be called N
. Note that fun
is
not expected to have both x
and N
as arguments, and that the
latter is much faster for stratified sampling or group sampling.
Furthermore, if the function has arguments for sample size and probability
weights, they should be called size
and prob
, respectively.
Note that a function with prob
as its only argument is perfectly valid
(for probability proportional to size sampling). Further arguments of
fun
may be supplied as a list via the slot dots
.
Class "VirtualSampleControl"
, directly.
Class "OptSampleControl"
, by class "VirtualSampleControl", distance 2.
In addition to the accessor and mutator methods for the slots inherited from
"VirtualSampleControl"
, the following are available:
getDesign
signature(x = "SampleControl")
: get slot
design
.
setDesign
signature(x = "SampleControl")
: set slot
design
.
getGrouping
signature(x = "SampleControl")
: get slot
grouping
.
setGrouping
signature(x = "SampleControl")
: set slot
grouping
.
getCollect
signature(x = "SampleControl")
: get slot
collect
.
setCollect
signature(x = "SampleControl")
: set slot
collect
.
getFun
signature(x = "SampleControl")
: get slot
fun
.
setFun
signature(x = "SampleControl")
: set slot
fun
.
getSize
signature(x = "SampleControl")
: get slot
size
.
setSize
signature(x = "SampleControl")
: set slot
size
.
getProb
signature(x = "SampleControl")
: get slot
prob
.
setProb
signature(x = "SampleControl")
: set slot
prob
.
getDots
signature(x = "SampleControl")
: get slot
dots
.
setDots
signature(x = "SampleControl")
: set slot
dots
.
In addition to the methods inherited from
"VirtualSampleControl"
, the following are available:
clusterSetup
signature(cl = "ANY", x = "data.frame",
control = "SampleControl")
: set up multiple samples on a cluster.
setup
signature(x = "data.frame",
control = "SampleControl")
: set up multiple samples.
show
signature(object = "SampleControl")
: print the
object on the R console.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
The slots grouping
and fun
were named group
and
method
, respectively, prior to version 0.2. Renaming the slots was
necessary since accessor and mutator functions were introduced in this
version and functions named getGroup
, getMethod
and
setMethod
already exist.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
"VirtualSampleControl"
,
"TwoStageControl"
, "SampleSetup"
,
setup
, draw
data(eusilcP) ## simple random sampling srsc <- SampleControl(size = 20) draw(eusilcP[, c("id", "eqIncome")], srsc) ## group sampling gsc <- SampleControl(grouping = "hid", size = 10) draw(eusilcP[, c("hid", "hid", "eqIncome")], gsc) ## stratified simple random sampling ssrsc <- SampleControl(design = "region", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2)) draw(eusilcP[, c("id", "region", "eqIncome")], ssrsc) ## stratified group sampling sgsc <- SampleControl(design = "region", grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2)) draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgsc)
data(eusilcP) ## simple random sampling srsc <- SampleControl(size = 20) draw(eusilcP[, c("id", "eqIncome")], srsc) ## group sampling gsc <- SampleControl(grouping = "hid", size = 10) draw(eusilcP[, c("hid", "hid", "eqIncome")], gsc) ## stratified simple random sampling ssrsc <- SampleControl(design = "region", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2)) draw(eusilcP[, c("id", "region", "eqIncome")], ssrsc) ## stratified group sampling sgsc <- SampleControl(design = "region", grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2)) draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgsc)
Class for set up samples.
Objects can be created by calls of the form new("SampleSetup", ...)
or
SampleSetup(...)
.
However, objects are expected to be created by the function setup
or clusterSetup
, these constructor functions are not supposed to
be called by the user.
indices
:Object of class "list"
; each list element
contains the indices of the sampled observations.
prob
:Object of class "numeric"
giving the
inclusion probabilities.
control
:Object of class "VirtualSampleControl"
; the
control object used to set up the samples.
seed
:Object of class "list"
containing the seeds of
the random number generator before and after setting up the samples,
respectively (for replication purposes).
call
:Object of class "SimCall"
; the function call
used to set up the samples, or NULL
.
getIndices
signature(x = "SampleSetup")
: get slot
indices
.
getProb
signature(x = "SampleSetup")
: get slot
prob
.
getControl
signature(x = "SampleSetup")
: get slot
control
.
getSeed
signature(x = "SampleSetup")
: get slot
seed
.
getCall
signature(x = "SampleSetup")
: get slot
call
.
clusterRunSimulation
signature(cl = "ANY",
x = "data.frame", setup = "SampleSetup", nrep = "missing",
control = "SimControl")
: run a simulation experiment on a cluster.
draw
signature(x = "data.frame",
setup = "SampleSetup")
: draw a sample.
head
signature(x = "SampleSetup")
: returns the first
parts of set up samples.
length
signature(x = "SampleSetup")
: get the number of
set up samples.
runSimulation
signature(x = "data.frame",
setup = "SampleSetup", nrep = "missing", control = "SimControl")
: run a
simulation experiment.
show
signature(object = "SampleSetup")
: print set up
samples on the R console.
summary
signature(object = "SampleSetup")
: produce a
summary of set up samples.
tail
signature(x = "SampleSetup")
: returns the last
parts of set up samples.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
There are no mutator methods available since the slots are not supposed to be changed by the user.
Furthermore, the slot seed
was added in version 0.2, and the slot
control
was added in version 0.3. Since the control object used to
set up the samples is now stored, the redundant slots design
,
grouping
, collect
and fun
were removed. This has been
done as preparation for additional control classes for sampling, which will
be introduced in future versions.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
"SampleControl"
, "TwoStageControl"
,
"VirtualSampleControl"
,
setup
, draw
showClass("SampleSetup")
showClass("SampleSetup")
Functions for random sampling.
srs(N, size, replace = FALSE) ups(N, size, prob, replace = FALSE) brewer(prob, eps = 1e-06) midzuno(prob, eps = 1e-06) tille(prob, eps = 1e-06)
srs(N, size, replace = FALSE) ups(N, size, prob, replace = FALSE) brewer(prob, eps = 1e-06) midzuno(prob, eps = 1e-06) tille(prob, eps = 1e-06)
N |
a non-negative integer giving the number of observations from which to sample. |
size |
a non-negative integer giving the number of observations to sample. |
prob |
for |
replace |
a logical indicating whether sampling should be performed with or without replacement. |
eps |
a numeric control value giving the desired accuracy. |
srs
and ups
are wrappers for simple random sampling and
unequal probability sampling, respectively. Both functions make use of
sample
.
brewer
, midzuno
and tille
perform Brewer's, Midzuno's and
Tillé's method, respectively, for unequal probability sampling
without replacement and fixed sample size.
An integer vector giving the indices of the sampled observations.
brewer
, midzuno
and tille
are faster C++ implementations
of UPbrewer
, UPmidzuno
and UPtille
, respectively, from
package sampling
.
Andreas Alfons
Brewer, K. (1975), A simple procedure for sampling pswor,
Australian Journal of Statistics, 17(3), 166-172.
Midzuno, H. (1952) On the sampling system with probability proportional to sum of size. Annals of the Institute of Statistical Mathematics, 3(2), 99–107.
Tillé, Y. (1996) An elimination procedure of unequal probability sampling without replacement. Biometrika, 83(1), 238–241.
Deville, J.-C. and Tillé, Y. (1998) Unequal probability sampling without replacement through a splitting method. Biometrika, 85(1), 89–101.
"SampleControl"
, "TwoStageControl"
,
setup
, inclusionProb
, sample
## simple random sampling # without replacement srs(10, 5) # with replacement srs(5, 10, replace = TRUE) ## unequal probability sampling # without replacement ups(10, 5, prob = 1:10) # with replacement ups(5, 10, prob = 1:5, replace = TRUE) ## Brewer, Midzuno and Tille sampling # define inclusion probabilities prob <- c(0.2,0.7,0.8,0.5,0.4,0.4) # Brewer sampling brewer(prob) # Midzuno sampling midzuno(prob) # Tille sampling tille(prob)
## simple random sampling # without replacement srs(10, 5) # with replacement srs(5, 10, replace = TRUE) ## unequal probability sampling # without replacement ups(10, 5, prob = 1:10) # with replacement ups(5, 10, prob = 1:5, replace = TRUE) ## Brewer, Midzuno and Tille sampling # define inclusion probabilities prob <- c(0.2,0.7,0.8,0.5,0.4,0.4) # Brewer sampling brewer(prob) # Midzuno sampling midzuno(prob) # Tille sampling tille(prob)
Generic function for inserting missing values into data.
setNA(x, control, ...) ## S4 method for signature 'data.frame,NAControl' setNA(x, control, i)
setNA(x, control, ...) ## S4 method for signature 'data.frame,NAControl' setNA(x, control, i)
x |
the data in which missing values should be inserted. |
control |
a control object inheriting from the virtual class
|
i |
an integer giving the element or row of the slot |
... |
if |
In order to extend the framework by a user-defined control class
"MyNAControl"
(which must extend
"VirtualNAControl"
), a method
setNA(x, control, i)
with signature 'data.frame, MyNAControl'
needs to be implemented.
A data.frame
containing the data with missing values.
x = "data.frame", control = "character"
set missing values
using a control class specified by the character string control
.
The slots of the control object may be supplied as additional arguments.
x = "data.frame", control = "missing"
set missing values using
a control object of class "NAControl"
. Its slots may be supplied as
additional arguments.
x = "data.frame", control = "NAControl"
set missing values as
defined by the control object control
.
Since version 0.3, setNA
no longer checks if auxiliary variable(s)
with probability weights are numeric and contain only finite positive values
(sample
still throws an error in these cases). This has been
removed to improve computational performance in simulation studies.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
"NAControl"
, "VirtualNAControl"
data(eusilcP) eusilcP$age[eusilcP$age < 0] <- 0 # this actually occurs sam <- draw(eusilcP[, c("id", "age", "eqIncome")], size = 20) ## using control objects # missing completely at random mcarc <- NAControl(target = "eqIncome", NArate = 0.2) setNA(sam, mcarc) # missing at random marc <- NAControl(target = "eqIncome", NArate = 0.2, aux = "age") setNA(sam, marc) # missing not at random mnarc <- NAControl(target = "eqIncome", NArate = 0.2, aux = "eqIncome") setNA(sam, mnarc) ## supply slots of control object as arguments # missing completely at random setNA(sam, target = "eqIncome", NArate = 0.2) # missing at random setNA(sam, target = "eqIncome", NArate = 0.2, aux = "age") # missing not at random setNA(sam, target = "eqIncome", NArate = 0.2, aux = "eqIncome")
data(eusilcP) eusilcP$age[eusilcP$age < 0] <- 0 # this actually occurs sam <- draw(eusilcP[, c("id", "age", "eqIncome")], size = 20) ## using control objects # missing completely at random mcarc <- NAControl(target = "eqIncome", NArate = 0.2) setNA(sam, mcarc) # missing at random marc <- NAControl(target = "eqIncome", NArate = 0.2, aux = "age") setNA(sam, marc) # missing not at random mnarc <- NAControl(target = "eqIncome", NArate = 0.2, aux = "eqIncome") setNA(sam, mnarc) ## supply slots of control object as arguments # missing completely at random setNA(sam, target = "eqIncome", NArate = 0.2) # missing at random setNA(sam, target = "eqIncome", NArate = 0.2, aux = "age") # missing not at random setNA(sam, target = "eqIncome", NArate = 0.2, aux = "eqIncome")
Generic function for setting up multiple samples.
setup(x, control, ...) ## S4 method for signature 'data.frame,SampleControl' setup(x, control)
setup(x, control, ...) ## S4 method for signature 'data.frame,SampleControl' setup(x, control)
x |
the data to sample from. |
control |
a control object inheriting from the virtual class
|
... |
if |
A fundamental design principle of the framework in the case of design-based simulation studies is that the sampling procedure is separated from the simulation procedure. Two main advantages arise from setting up all samples in advance.
First, the repeated sampling reduces overall computation time dramatically in certain situations, since computer-intensive tasks like stratification need to be performed only once. This is particularly relevant for large population data. In close-to-reality simulation studies carried out in research projects in survey statistics, often up to 10000 samples are drawn from a population of millions of individuals with stratified sampling designs. For such large data sets, stratification takes a considerable amount of time and is a very memory-intensive task. If the samples are taken on-the-fly, i.e., in every simulation run one sample is drawn, the function to take the stratified sample would typically split the population into the different strata in each of the 10000 simulation runs. If all samples are drawn in advance, on the other hand, the population data need to be split only once and all 10000 samples can be taken from the respective strata together.
Second, the samples can be stored permanently, which simplifies the reproduction of simulation results and may help to maximize comparability of results obtained by different partners in a research project. In particular, this is useful for large population data, when complex sampling techniques may be very time-consuming. In research projects involving different partners, usually different groups investigate different kinds of estimators. If the two groups use not only the same population data, but also the same previously set up samples, their results are highly comparable.
The control class "SampleControl"
is highly flexible and allows
stratified sampling as well as sampling of whole groups rather than
individuals with a specified sampling method. Hence it is often sufficient
to implement the desired sampling method for the simple non-stratified case
to extend the existing framework. See "SampleControl"
for some restrictions on the argument names of such a function, which should
return a vector containing the indices of the sampled observations.
Nevertheless, for very complex sampling procedures, it is possible to define
a control class "MySampleControl"
extending
"VirtualSampleControl"
, and the corresponding method
setup(x, control)
with signature 'data.frame, MySampleControl'
.
In order to optimize computational performance, it is necessary to
efficiently set up multiple samples. Thereby the slot k
of
"VirtualSampleControl"
needs to be used to control the number of
samples, and the resulting object must be of class
"SampleSetup"
.
An object of class "SampleSetup"
.
x = "data.frame", control = "character"
set up multiple samples
using a control class specified by the character string control
.
The slots of the control object may be supplied as additional arguments.
x = "data.frame", control = "missing"
set up multiple samples
using a control object of class "SampleControl"
. Its slots may be
supplied as additional arguments.
x = "data.frame", control = "SampleControl"
set up multiple
samples as defined by the control object control
.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
simSample
, draw
,
"SampleControl"
, "TwoStageControl"
,
"VirtualSampleControl"
,
"SampleSetup"
set.seed(12345) # for reproducibility data(eusilcP) # load data ## simple random sampling srss <- setup(eusilcP, size = 20, k = 4) summary(srss) draw(eusilcP[, c("id", "eqIncome")], srss, i = 1) ## group sampling gss <- setup(eusilcP, grouping = "hid", size = 10, k = 4) summary(gss) draw(eusilcP[, c("hid", "id", "eqIncome")], gss, i = 2) ## stratified simple random sampling ssrss <- setup(eusilcP, design = "region", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(ssrss) draw(eusilcP[, c("id", "region", "eqIncome")], ssrss, i = 3) ## stratified group sampling sgss <- setup(eusilcP, design = "region", grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(sgss) draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgss, i = 4)
set.seed(12345) # for reproducibility data(eusilcP) # load data ## simple random sampling srss <- setup(eusilcP, size = 20, k = 4) summary(srss) draw(eusilcP[, c("id", "eqIncome")], srss, i = 1) ## group sampling gss <- setup(eusilcP, grouping = "hid", size = 10, k = 4) summary(gss) draw(eusilcP[, c("hid", "id", "eqIncome")], gss, i = 2) ## stratified simple random sampling ssrss <- setup(eusilcP, design = "region", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(ssrss) draw(eusilcP[, c("id", "region", "eqIncome")], ssrss, i = 3) ## stratified group sampling sgss <- setup(eusilcP, design = "region", grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(sgss) draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgss, i = 4)
Generic functions for applying a function to subsets of a data set.
simApply(x, design, fun, ...) simSapply(x, design, fun, ..., simplify = TRUE)
simApply(x, design, fun, ...) simSapply(x, design, fun, ..., simplify = TRUE)
x |
the |
design |
a character, logical or numeric vector specifying the variables (columns) used for subsetting. |
fun |
a function to be applied to the subsets. |
simplify |
a logical indicating whether the results should be simplified to a vector or matrix (if possible). |
... |
additional arguments to be passed to |
For simApply
a data.frame
.
For simSapply
, a list, vector or matrix (see sapply
).
x = "data.frame", design = "BasicVector", fun = "function"
apply
a function to subsets given by the variables (columns) in design
.
x = "data.frame", design = "Strata", fun = "function"
apply a
function to subsets given by design
.
x = "data.frame", design = "BasicVector", fun = "function"
apply
a function to subsets given by the variables (columns) in design
.
x = "data.frame", design = "Strata", fun = "function"
apply a
function to subsets given by design
.
Andreas Alfons
data(eusilcP) eusilcP <- eusilcP[, c("region", "gender", "eqIncome")] ## returns data.frame simApply(eusilcP, c("region", "gender"), function(x) median(x$eqIncome)) ## returns vector simSapply(eusilcP, c("region", "gender"), function(x) median(x$eqIncome))
data(eusilcP) eusilcP <- eusilcP[, c("region", "gender", "eqIncome")] ## returns data.frame simApply(eusilcP, c("region", "gender"), function(x) median(x$eqIncome)) ## returns vector simSapply(eusilcP, c("region", "gender"), function(x) median(x$eqIncome))
Generic function for producing box-and-whisker plots.
simBwplot(x, ...) ## S4 method for signature 'SimResults' simBwplot(x, true = NULL, epsilon, NArate, select, ...)
simBwplot(x, ...) ## S4 method for signature 'SimResults' simBwplot(x, true = NULL, epsilon, NArate, select, ...)
x |
the object to be plotted. For plotting simulation results, this
must be an object of class |
true |
a numeric vector giving the true values. If supplied, reference lines are drawn in the corresponding panels. |
epsilon |
a numeric vector specifying contamination levels. If supplied, the values corresponding to these contamination levels are extracted from the simulation results and plotted. |
NArate |
a numeric vector specifying missing value rates. If supplied, the values corresponding to these missing value rates are extracted from the simulation results and plotted. |
select |
a character vector specifying the columns to be plotted. It
must be a subset of the |
... |
additional arguments to be passed down to methods and eventually
to |
For simulation results with multiple contamination levels or missing value rates, conditional box-and-whisker plots are produced.
An object of class "trellis"
. The
update
method can be used to update
components of the object and the print
method (usually called by default) will plot it on an appropriate plotting
device.
x = "SimResults"
produce box-and-whisker plots of simulation results.
Functionality for producing conditional box-and-whisker plots was added in version 0.2. Prior to that, the function gave an error message if simulation results with multiple contamination levels or missing value rates were supplied.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
simDensityplot
, simXyplot
,
bwplot
, "SimResults"
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## run simulation results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) ## plot results tv <- mean(eusilcP$eqIncome) # true population mean simBwplot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## run simulation results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) ## plot results simBwplot(results, true = means)
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## run simulation results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) ## plot results tv <- mean(eusilcP$eqIncome) # true population mean simBwplot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## run simulation results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) ## plot results simBwplot(results, true = means)
Class for controlling how simulation runs are performed.
Objects can be created by calls of the form new("SimControl", ...)
or
SimControl(...)
.
contControl
:Object of class "OptContControl"
; a
control object for contamination, or NULL
.
NAControl
:Object of class "OptNAControl"
; a control
object for inserting missing values, or NULL
.
design
:Object of class "character"
specifying
variables (columns) to be used for splitting the data into domains. The
simulations, including contamination and the insertion of missing values
(unless SAE=TRUE
), are then performed on every domain.
fun
:Object of class "function"
to be applied in each
simulation run.
dots
:Object of class "list"
containing additional
arguments to be passed to fun
.
SAE
:Object of class "logical"
indicating whether
small area estimation will be used in the simulation experiment.
There are some requirements for fun
. It must return a numeric vector,
or a list with the two components values
(a numeric vector) and
add
(additional results of any class, e.g., statistical models).
Note that the latter is computationally slightly more expensive. A
data.frame
is passed to fun
in every simulation run. The
corresponding argument must be called x
. If comparisons with the
original data need to be made, e.g., for evaluating the quality of imputation
methods, the function should have an argument called orig
. If
different domains are used in the simulation, the indices of the current
domain can be passed to the function via an argument called domain
.
For small area estimation, the following points have to be kept in mind. The
design
for splitting the data must be supplied and SAE
must be set to TRUE
. However, the data are not actually split into
the specified domains. Instead, the whole data set (sample) is passed to
fun
. Also contamination and missing values are added to the whole
data (sample). Last, but not least, the function must have a domain
argument so that the current domain can be extracted from the whole data
(sample).
In every simulation run, fun
is evaluated using try
. Hence
no results are lost if computations fail in any of the simulation runs.
getContControl
signature(x = "SimControl")
: get slot
ContControl
.
setContControl
signature(x = "SimControl")
: set slot
ContControl
.
getNAControl
signature(x = "SimControl")
: get slot
NAControl
.
setNAControl
signature(x = "SimControl")
: set slot
NAControl
.
getDesign
signature(x = "SimControl")
: get slot
design
.
setDesign
signature(x = "SimControl")
: set slot
design
.
getFun
signature(x = "SimControl")
: get slot
fun
.
setFun
signature(x = "SimControl")
: set slot
fun
.
getDots
signature(x = "SimControl")
: get slot
dots
.
setDots
signature(x = "SimControl")
: set slot
dots
.
getSAE
signature(x = "SimControl")
: get slot
SAE
.
setSAE
signature(x = "SimControl")
: set slot
SAE
.
clusterRunSimulation
signature(cl = "ANY",
x = "data.frame", setup = "missing", nrep = "numeric",
control = "SimControl")
: run a simulation experiment on a cluster.
clusterRunSimulation
signature(cl = "ANY",
x = "data.frame", setup = "VirtualSampleControl", nrep = "missing",
control = "SimControl")
: run a simulation experiment on a cluster.
clusterRunSimulation
signature(cl = "ANY",
x = "data.frame", setup = "SampleSetup", nrep = "missing",
control = "SimControl")
: run a simulation experiment on a cluster.
clusterRunSimulation
signature(cl = "ANY",
x = "VirtualDataControl", setup = "missing", nrep = "numeric",
control = "SimControl")
: run a simulation experiment on a cluster.
clusterRunSimulation
signature(cl = "ANY",
x = "VirtualDataControl", setup = "VirtualSampleControl",
nrep = "numeric", control = "SimControl")
: run a simulation experiment
on a cluster.
head
signature(x = "SimControl")
: currently returns
the object itself.
runSimulation
signature(x = "data.frame",
setup = "VirtualSampleControl", nrep = "missing",
control = "SimControl")
: run a simulation experiment.
runSimulation
signature(x = "data.frame",
setup = "SampleSetup", nrep = "missing", control = "SimControl")
: run a
simulation experiment.
runSimulation
signature(x = "data.frame",
setup = "missing", nrep = "numeric", control = "SimControl")
: run a
simulation experiment.
runSimulation
signature(x = "data.frame",
setup = "missing", nrep = "missing", control = "SimControl")
: run a
simulation experiment.
runSimulation
signature(x = "VirtualDataControl",
setup = "missing", nrep = "numeric", control = "SimControl")
: run a
simulation experiment.
runSimulation
signature(x = "VirtualDataControl",
setup = "missing", nrep = "missing", control = "SimControl")
: run a
simulation experiment.
runSimulation
signature(x = "VirtualDataControl",
setup = "VirtualSampleControl", nrep = "numeric",
control = "SimControl")
: run a simulation experiment.
runSimulation
signature(x = "VirtualDataControl",
setup = "VirtualSampleControl", nrep = "missing",
control = "SimControl")
: run a simulation experiment.
show
signature(object = "SimControl")
: print the
object on the R console.
summary
signature(object = "SimControl")
: currently
returns the object itself.
tail
signature(x = "SimControl")
: currently returns
the object itself.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## combine these to "SimControl" object and run simulation ctrl <- SimControl(contControl = cc, fun = sim) results <- runSimulation(eusilcP, sc, control = ctrl) ## explore results head(results) aggregate(results) tv <- mean(eusilcP$eqIncome) # true population mean plot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## combine these to "SimControl" object and run simulation ctrl <- SimControl(contControl = cc, design = "group", fun = sim) results <- runSimulation(dc, nrep = 50, control = ctrl) ## explore results head(results) aggregate(results) plot(results, true = means)
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## combine these to "SimControl" object and run simulation ctrl <- SimControl(contControl = cc, fun = sim) results <- runSimulation(eusilcP, sc, control = ctrl) ## explore results head(results) aggregate(results) tv <- mean(eusilcP$eqIncome) # true population mean plot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## combine these to "SimControl" object and run simulation ctrl <- SimControl(contControl = cc, design = "group", fun = sim) results <- runSimulation(dc, nrep = 50, control = ctrl) ## explore results head(results) aggregate(results) plot(results, true = means)
Generic function for producing kernel density plots.
simDensityplot(x, ...) ## S4 method for signature 'SimResults' simDensityplot(x, true = NULL, epsilon, NArate, select, ...)
simDensityplot(x, ...) ## S4 method for signature 'SimResults' simDensityplot(x, true = NULL, epsilon, NArate, select, ...)
x |
the object to be plotted. For plotting simulation results, this
must be an object of class |
true |
a numeric vector giving the true values. If supplied, reference lines are drawn in the corresponding panels. |
epsilon |
a numeric vector specifying contamination levels. If supplied, the values corresponding to these contamination levels are extracted from the simulation results and plotted. |
NArate |
a numeric vector specifying missing value rates. If supplied, the values corresponding to these missing value rates are extracted from the simulation results and plotted. |
select |
a character vector specifying the columns to be plotted. It
must be a subset of the |
... |
additional arguments to be passed down to methods and eventually
to |
For simulation results with multiple contamination levels or missing value rates, conditional kernel density plots are produced.
An object of class "trellis"
. The
update
method can be used to update
components of the object and the print
method (usually called by default) will plot it on an appropriate plotting
device.
x = "SimResults"
produce kernel density plots of simulation results.
Functionality for producing conditional kernel density plots was added in version 0.2. Prior to that, the function gave an error message if simulation results with multiple contamination levels or missing value rates were supplied.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
simBwplot
, simXyplot
,
densityplot
,
"SimResults"
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## run simulation results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) ## plot results tv <- mean(eusilcP$eqIncome) # true population mean simDensityplot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## run simulation results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) ## plot results simDensityplot(results, true = means)
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = 0.02, fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.02)) } ## run simulation results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) ## plot results tv <- mean(eusilcP$eqIncome) # true population mean simDensityplot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = 0.02, dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.02), median = median(x$value)) } ## run simulation results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) ## plot results simDensityplot(results, true = means)
Class for simulation results.
Objects can be created by calls of the form new("SimResults", ...)
or
SimResults(...)
.
However, objects are expected to be created by the function
runSimulation
or clusterRunSimulation
, these
constructor functions are not supposed to be called by the user.
values
:Object of class "data.frame"
containing the
simulation results.
add
:Object of class "list"
containing additional
simulation results, e.g., statistical models.
design
:Object of class "character"
giving the
variables (columns) defining the domains used in the simulation
experiment.
colnames
:Object of class "character"
giving the names
of the columns of values
that contain the actual simulation
results.
epsilon
:Object of class "numeric"
containing the
contamination levels used in the simulation experiment.
NArate
:Object of class "NumericMatrix"
containing the
missing value rates used in the simulation experiment.
dataControl
:Object of class "OptDataControl"
; the
control object used for data generation in model-based simulation, or
NULL
.
sampleControl
:Object of class "OptSampleControl"
; the
control object used for sampling in design-based simulation, or
NULL
.
nrep
:Object of class "numeric"
giving the number of
repetitions of the simulation experiment (for model-based simulation or
simulation based on real data).
control
:Object of class "SimControl"
; the control
object used for running the simulations.
seed
:Object of class "list"
containing the seeds of
the random number generator before and after the simulation experiment,
respectively (for replication of the results).
call
:Object of class "SimCall"
; the function call
used to run the simulation experiment, or NULL
.
getValues
signature(x = "SimResults")
: get slot
values
.
getAdd
signature(x = "SimResults")
: get slot
add
.
getDesign
signature(x = "SimResults")
: get slot
design
.
getColnames
signature(x = "SimResults")
: get slot
colnames
.
getEpsilon
signature(x = "SimResults")
: get slot
epsilon
.
getNArate
signature(x = "SimResults")
: get slot
NArate
.
getDataControl
signature(x = "SimResults")
: get slot
dataControl
.
getSampleControl
signature(x = "SimResults")
: get slot
sampleControl
.
getNrep
signature(x = "SimResults")
: get slot
nrep
.
getControl
signature(x = "SimResults")
: get slot
control
.
getSeed
signature(x = "SimResults")
: get slot
seed
.
getCall
signature(x = "SimResults")
: get slot
call
.
aggregate
signature(x = "SimResults")
: aggregate
simulation results.
head
signature(x = "SimResults")
: returns the first
parts of simulation results.
plot
signature(x = "SimResults", y = "missing")
:
selects a suitable graphical representation of the simulation results
automatically.
show
signature(object = "SimResults")
: print
simulation results on the R console.
simBwplot
signature(x = "SimResults")
: conditional
box-and-whisker plot of simulation results.
simDensityplot
signature(x = "SimResults")
:
conditional kernel density plot of simulation results.
simXyplot
signature(x = "SimResults")
: conditional x-y
plot of simulation results.
summary
signature(x = "SimResults")
: produce a summary
of simulation results.
tail
signature(x = "SimResults")
: returns the last
parts of simulation results.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
There are no mutator methods available since the slots are not supposed to be changed by the user.
Furthermore, the slots dataControl
, sampleControl
, nrep
and control
were added in version 0.3.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
runSimulation
, simBwplot
,
simDensityplot
, simXyplot
showClass("SimResults")
showClass("SimResults")
A convenience wrapper for setting up multiple samples using setup
with control class SampleControl
.
simSample(x, design = character(), grouping = character(), collect = FALSE, fun = srs, size = NULL, prob = NULL, ..., k = 1)
simSample(x, design = character(), grouping = character(), collect = FALSE, fun = srs, size = NULL, prob = NULL, ..., k = 1)
x |
the |
design |
a character, logical or numeric vector specifying variables (columns) to be used for stratified sampling. |
grouping |
a character string, single integer or logical vector specifying a grouping variable (column) to be used for sampling whole groups rather than individual observations. |
collect |
logical; if a grouping variable is specified and this is
|
fun |
a function to be used for sampling (defaults to
|
size |
an optional non-negative integer giving the number of items (observations or groups) to sample. For stratified sampling, a vector of non-negative integers, each giving the number of items to sample from the corresponding stratum. |
prob |
an optional numeric vector giving the probability weights, or a character string or logical vector specifying a variable (column) that contains the probability weights. |
... |
additional arguments to be passed to |
k |
a single positive integer giving the number of samples to be set up. |
There are some restrictions on the argument names of the function
supplied to fun
. If it needs population data as input,
the corresponding argument should be called x
and should expect
a data.frame
. If the sampling method only needs the population size
as input, the argument should be called N
. Note that fun
is
not expected to have both x
and N
as arguments, and that the
latter is much faster for stratified sampling or group sampling.
Furthermore, if the function has arguments for sample size and probability
weights, they should be called size
and prob
, respectively.
Note that a function with prob
as its only argument is perfectly valid
(for probability proportional to size sampling). Further arguments of
fun
may be passed directly via the ... argument.
An object of class "SampleSetup"
.
Andreas Alfons
setup
, "SampleControl"
,
"SampleSetup"
data(eusilcP) ## simple random sampling srss <- simSample(eusilcP, size = 20, k = 4) summary(srss) draw(eusilcP[, c("id", "eqIncome")], srss, i = 1) ## group sampling gss <- simSample(eusilcP, grouping = "hid", size = 10, k = 4) summary(gss) draw(eusilcP[, c("hid", "id", "eqIncome")], gss, i = 2) ## stratified simple random sampling ssrss <- simSample(eusilcP, design = "region", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(ssrss) draw(eusilcP[, c("id", "region", "eqIncome")], ssrss, i = 3) ## stratified group sampling sgss <- simSample(eusilcP, design = "region", grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(sgss) draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgss, i = 4)
data(eusilcP) ## simple random sampling srss <- simSample(eusilcP, size = 20, k = 4) summary(srss) draw(eusilcP[, c("id", "eqIncome")], srss, i = 1) ## group sampling gss <- simSample(eusilcP, grouping = "hid", size = 10, k = 4) summary(gss) draw(eusilcP[, c("hid", "id", "eqIncome")], gss, i = 2) ## stratified simple random sampling ssrss <- simSample(eusilcP, design = "region", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(ssrss) draw(eusilcP[, c("id", "region", "eqIncome")], ssrss, i = 3) ## stratified group sampling sgss <- simSample(eusilcP, design = "region", grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4) summary(sgss) draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgss, i = 4)
Generic function for producing x-y plots. For simulation results, the average results are plotted against the corresponding contamination levels or missing value rates.
simXyplot(x, ...) ## S4 method for signature 'SimResults' simXyplot(x, true = NULL, epsilon, NArate, select, cond = c("Epsilon", "NArate"), average = c("mean", "median"), ...)
simXyplot(x, ...) ## S4 method for signature 'SimResults' simXyplot(x, true = NULL, epsilon, NArate, select, cond = c("Epsilon", "NArate"), average = c("mean", "median"), ...)
x |
the object to be plotted. For plotting simulation results, this
must be an object of class |
true |
a numeric vector giving the true values. If supplied, reference lines are drawn in the corresponding panels. |
epsilon |
a numeric vector specifying contamination levels. If supplied, the values corresponding to these contamination levels are extracted from the simulation results and plotted. |
NArate |
a numeric vector specifying missing value rates. If supplied, the values corresponding to these missing value rates are extracted from the simulation results and plotted. |
select |
a character vector specifying the columns to be plotted. It
must be a subset of the |
cond |
a character string; for simulation results with multiple
contamination levels and multiple missing value rates, this specifies
the column of the simulation results to be used for producing conditional
x-y plots. If |
average |
a character string specifying how the averages should be
computed. Possible values are |
... |
additional arguments to be passed down to methods and eventually
to |
For simulation results with multiple contamination levels and multiple
missing value rates, conditional x-y plots are produced, as specified by
cond
.
An object of class "trellis"
. The
update
method can be used to update
components of the object and the print
method (usually called by default) will plot it on an appropriate plotting
device.
x = "SimResults"
produce x-y plots of simulation results.
Functionality for producing conditional x-y plots (including the argument
cond
) was added in version 0.2. Prior to that, the function gave an
error message if simulation results with multiple contamination levels and
multiple missing value rates were supplied.
The argument average
that specifies how the averages are computed
was added in version 0.1.2. Prior to that, the mean has always been used.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
simBwplot
, simDensityplot
,
xyplot
, "SimResults"
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = seq(0, 0.05, by = 0.01), fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.05)) } ## run simulation results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) ## plot results tv <- mean(eusilcP$eqIncome) # true population mean simXyplot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = seq(0, 0.05, by = 0.01), dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.05), median = median(x$value)) } ## run simulation results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) ## plot results simXyplot(results, true = means)
#### design-based simulation set.seed(12345) # for reproducibility data(eusilcP) # load data ## control objects for sampling and contamination sc <- SampleControl(size = 500, k = 50) cc <- DARContControl(target = "eqIncome", epsilon = seq(0, 0.05, by = 0.01), fun = function(x) x * 25) ## function for simulation runs sim <- function(x) { c(mean = mean(x$eqIncome), trimmed = mean(x$eqIncome, 0.05)) } ## run simulation results <- runSimulation(eusilcP, sc, contControl = cc, fun = sim) ## plot results tv <- mean(eusilcP$eqIncome) # true population mean simXyplot(results, true = tv) #### model-based simulation set.seed(12345) # for reproducibility ## function for generating data rgnorm <- function(n, means) { group <- sample(1:2, n, replace=TRUE) data.frame(group=group, value=rnorm(n) + means[group]) } ## control objects for data generation and contamination means <- c(0, 0.25) dc <- DataControl(size = 500, distribution = rgnorm, dots = list(means = means)) cc <- DCARContControl(target = "value", epsilon = seq(0, 0.05, by = 0.01), dots = list(mean = 15)) ## function for simulation runs sim <- function(x) { c(mean = mean(x$value), trimmed = mean(x$value, trim = 0.05), median = median(x$value)) } ## run simulation results <- runSimulation(dc, nrep = 50, contControl = cc, design = "group", fun = sim) ## plot results simXyplot(results, true = means)
Class containing strata information for a data set.
Objects can be created by calls of the form new("Strata", ...)
or
Strata(...)
.
However, objects are expected to be created by the function
stratify
, these constructor functions are not supposed to be
called by the user.
values
:Object of class "integer"
giving the stratum
number for each observation.
split
:Object of class "list"
; each list element
contains the indices of the observations belonging to the corresponding
stratum.
design
:Object of class "character"
giving the
variables (columns) defining the strata.
nr
:Object of class "integer"
giving the stratum
numbers.
legend
:Object of class "data.frame"
describing the
strata.
size
:Object of class "numeric"
giving the stratum
sizes.
call
:Object of class "OptCall"
; the function call
used to stratify the data, or NULL
.
getValues
signature(x = "Strata")
: get slot
values
.
getSplit
signature(x = "Strata")
: get slot
split
.
getDesign
signature(x = "Strata")
: get slot
design
.
getNr
signature(x = "Strata")
: get slot nr
.
getLegend
signature(x = "Strata")
: get slot
legend
.
getSize
signature(x = "Strata")
: get slot size
.
getCall
signature(x = "Strata")
: get slot call
.
head
signature(x = "Strata")
: returns the first parts
of strata information.
show
signature(object = "Strata")
: print strata
information on the R console.
simApply
signature(x = "data.frame", design = "Strata",
fun = "function")
: apply a function to subsets.
simSapply
signature(x = "data.frame", design = "Strata",
fun = "function")
: apply a function to subsets.
summary
signature(object = "Strata")
: produce a
summary of strata information.
tail
signature(x = "Strata")
: returns the last parts
of strata information.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
There are no mutator methods available since the slots are not supposed to be changed by the user.
Andreas Alfons
showClass("Strata")
showClass("Strata")
Generic function for stratifying data.
stratify(x, design)
stratify(x, design)
x |
the |
design |
a character, logical or numeric vector specifying the variables (columns) to be used for stratification. |
An object of class "Strata"
.
x = "data.frame", design = "BasicVector"
stratify data
according to the variables (columns) given by design
.
Andreas Alfons
"Strata"
data(eusilcP) strata <- stratify(eusilcP, c("region", "gender")) summary(strata)
data(eusilcP) strata <- stratify(eusilcP, c("region", "gender")) summary(strata)
Generic utility functions for stratifying data. These are useful if not all the
information of class "Strata"
is necessary.
getStrataLegend(x, design) getStrataSplit(x, design, USE.NAMES = TRUE) getStrataTable(x, design) getStratumSizes(x, design, USE.NAMES = TRUE) getStratumValues(x, design, split)
getStrataLegend(x, design) getStrataSplit(x, design, USE.NAMES = TRUE) getStrataTable(x, design) getStratumSizes(x, design, USE.NAMES = TRUE) getStratumValues(x, design, split)
x |
the |
design |
a character, logical or numeric vector specifying the variables (columns) to be used for stratification. |
USE.NAMES |
a logical indicating whether information about the strata
should be used as |
split |
an optional list in which each list element contains the indices
of the observations belonging to the corresponding stratum (as returned by
|
For getStrataLegend
, a data.frame
describing the strata.
For getStrataSplit
, a list in which each element contains the
indices of the observations belonging to the corresponding stratum.
For getStrataTable
, a data.frame
describing the strata
and containing the stratum sizes.
For getStratumSizes
, a numeric vector of the stratum sizes.
For getStratumValues
, a numeric vector giving the stratum number for
each observation.
get a data.frame
describing the strata, according to the variables specified by
design
.
get a list in which each
element contains the indices of the observations belonging to the
corresponding stratum, according to the variables specified by
design
.
get a data.frame
describing the strata and containing the stratum sizes, according to the
variables specified by design
.
get the stratum sizes for a list in
which each list element contains the indices of the observations belonging
to the corresponding stratum (as returned by getStrataSplit
).
get the stratum sizes of a
data set, according to the variables specified by design
.
get the
stratum number for each observation, according to the variables specified
by design
. A previously computed list in which each list element
contains the indices of the observations belonging to the corresponding
stratum (as returned by getStrataSplit
) speeds things up a bit.
get the
stratum number for each observation, according to the variables specified
by design
.
Andreas Alfons
data(eusilcP) ## all data getStrataLegend(eusilcP, c("region", "gender")) getStrataTable(eusilcP, c("region", "gender")) getStratumSizes(eusilcP, c("region", "gender")) ## small sample sam <- draw(eusilcP, size = 25) getStrataSplit(sam, "gender") getStratumValues(sam, "gender")
data(eusilcP) ## all data getStrataLegend(eusilcP, c("region", "gender")) getStrataTable(eusilcP, c("region", "gender")) getStratumSizes(eusilcP, c("region", "gender")) ## small sample sam <- draw(eusilcP, size = 25) getStrataSplit(sam, "gender") getStratumValues(sam, "gender")
Produce a summary an object.
## S4 method for signature 'SampleSetup' summary(object) ## S4 method for signature 'SimControl' summary(object) ## S4 method for signature 'SimResults' summary(object, ...) ## S4 method for signature 'Strata' summary(object) ## S4 method for signature 'VirtualContControl' summary(object) ## S4 method for signature 'VirtualDataControl' summary(object) ## S4 method for signature 'VirtualNAControl' summary(object) ## S4 method for signature 'VirtualSampleControl' summary(object)
## S4 method for signature 'SampleSetup' summary(object) ## S4 method for signature 'SimControl' summary(object) ## S4 method for signature 'SimResults' summary(object, ...) ## S4 method for signature 'Strata' summary(object) ## S4 method for signature 'VirtualContControl' summary(object) ## S4 method for signature 'VirtualDataControl' summary(object) ## S4 method for signature 'VirtualNAControl' summary(object) ## S4 method for signature 'VirtualSampleControl' summary(object)
object |
an object. |
... |
additional arguments to be passed down to methods. |
The form of the resulting object depends on the class of the argument
object
. See the “Methods” section below for details.
signature(x = "SampleSetup")
returns an object of class
SummarySampleSetup
, which contains information on the size of each
of the set up samples.
signature(x = "SimControl")
currently returns the object itself.
signature(x = "SimResults")
produces a summary of the
simulation results by calling the method of summary
for the data.frame
in slot values
.
signature(x = "Strata")
returns a data.frame
containing
the size of each stratum.
signature(x = "VirtualContControl")
currently returns the object itself.
signature(x = "VirtualDataControl")
currently returns the object itself.
signature(x = "VirtualNAControl")
currently returns the object itself.
signature(x = "VirtualSampleControl")
currently returns the object itself.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
summary
, "SampleSetup"
,
"SummarySampleSetup"
, "SimResults"
,
"Strata"
## load data data(eusilcP) ## class "SampleSetup" # set up samples using group sampling set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50) summary(set) ## class "Strata" # set up samples using group sampling strata <- stratify(eusilcP, "region") summary(strata)
## load data data(eusilcP) ## class "SampleSetup" # set up samples using group sampling set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50) summary(set) ## class "Strata" # set up samples using group sampling strata <- stratify(eusilcP, "region") summary(strata)
Class containing a summary of set up samples.
Objects can be created by calls of the form
new("SummarySampleSetup", ...)
or SummarySampleSetup(...)
.
However, objects are expected to be created by the summary
method for
class "SampleSetup"
, these constructor functions are not
supposed to be called by the user.
size
:Object of class "numeric"
giving the size of
each of the set up samples.
getSize
signature(x = "SummarySampleSetup")
: get slot
size
.
show
signature(object = "SummarySampleSetup")
: print a
summary of set up samples on the R console.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
There are no mutator methods available since the slots are not supposed to be changed by the user.
Andreas Alfons
showClass("SummarySampleSetup")
showClass("SummarySampleSetup")
Return the last parts of an object.
## S4 method for signature 'SampleSetup' tail(x, k = 6, n = 6, ...) ## S4 method for signature 'SimControl' tail(x) ## S4 method for signature 'SimResults' tail(x, ...) ## S4 method for signature 'Strata' tail(x, ...) ## S4 method for signature 'VirtualContControl' tail(x) ## S4 method for signature 'VirtualDataControl' tail(x) ## S4 method for signature 'VirtualNAControl' tail(x) ## S4 method for signature 'VirtualSampleControl' tail(x)
## S4 method for signature 'SampleSetup' tail(x, k = 6, n = 6, ...) ## S4 method for signature 'SimControl' tail(x) ## S4 method for signature 'SimResults' tail(x, ...) ## S4 method for signature 'Strata' tail(x, ...) ## S4 method for signature 'VirtualContControl' tail(x) ## S4 method for signature 'VirtualDataControl' tail(x) ## S4 method for signature 'VirtualNAControl' tail(x) ## S4 method for signature 'VirtualSampleControl' tail(x)
x |
an object. |
k |
for objects of class |
n |
for objects of class |
... |
additional arguments to be passed down to methods. |
An object of the same class as x
, but in general smaller. See the
“Methods” section below for details.
signature(x = "SampleSetup")
returns the last parts of set up
samples. The last n
indices of each of the last k
set up
samples are kept.
signature(x = "SimControl")
currently returns the object itself.
signature(x = "SimResults")
returns the last parts of
simulation results. The method of tail
for the
data.frame
in slot values
is thereby called.
signature(x = "Strata")
returns the last parts of strata
information. The method of tail
for the vector in
slot values
is thereby called and the slots split
and
size
are adapted accordingly.
signature(x = "VirtualContControl")
currently returns the object itself.
signature(x = "VirtualDataControl")
currently returns the object itself.
signature(x = "VirtualNAControl")
currently returns the object itself.
signature(x = "VirtualSampleControl")
currently returns the object itself.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
tail
, "SampleSetup"
,
"SimResults"
, "Strata"
## load data data(eusilcP) ## class "SampleSetup" # set up samples using group sampling set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50) summary(set) # get the last 10 indices of each of the last 5 samples tail(set, k = 5, n = 10) ## class "Strata" # set up samples using group sampling strata <- stratify(eusilcP, "region") summary(strata) # get strata information for the last 10 observations tail(strata, 10)
## load data data(eusilcP) ## class "SampleSetup" # set up samples using group sampling set <- setup(eusilcP, grouping = "hid", size = 1000, k = 50) summary(set) # get the last 10 indices of each of the last 5 samples tail(set, k = 5, n = 10) ## class "Strata" # set up samples using group sampling strata <- stratify(eusilcP, "region") summary(strata) # get strata information for the last 10 observations tail(strata, 10)
Class for controlling the setup of samples using a two-stage procedure.
TwoStageControl(..., fun1 = srs, fun2 = srs, size1 = NULL, size2 = NULL, prob1 = NULL, prob2 = NULL, dots1 = list(), dots2 = list())
TwoStageControl(..., fun1 = srs, fun2 = srs, size1 = NULL, size2 = NULL, prob1 = NULL, prob2 = NULL, dots1 = list(), dots2 = list())
... |
the slots for the new object (see below). |
fun1 |
the function to be used for sampling in the first stage (the
first list component of slot |
fun2 |
the function to be used for sampling in the second stage (the
second list component of slot |
size1 |
the number of PSUs to sample in the first stage (the first list
component of slot |
size2 |
the number of items to sample in the second stage (the second
list component of slot |
prob1 |
the probability weights for the first stage (the first list
component of slot |
prob2 |
the probability weights for the second stage (the second list
component of slot |
dots1 |
additional arguments to be passed to the function for sampling
in the first stage (the first list component of slot |
dots2 |
additional arguments to be passed to the function for sampling
in the second stage (the second list component of slot |
Objects can be created by calls of the form new("TwoStageControl", ...)
or via the constructor TwoStageControl
.
design
:Object of class "BasicVector"
specifying
variables (columns) to be used for stratified sampling in the first
stage.
grouping
:Object of class "BasicVector"
specifying
grouping variables (columns) to be used for sampling primary sampling
units (PSUs) and secondary sampling units (SSUs), respectively.
fun
:Object of class "list"
; a list of length two
containing the functions to be used for sampling in the first and second
stage, respectively (defaults to srs
for both stages). The
functions should return a vector containing the indices of the sampled
items.
size
:Object of class "list"
; a list of length two,
where each component contains an optional non-negative integer giving the
number of items to sample in the first and second stage, respectively.
In case of stratified sampling in the first stage, a vector of
non-negative integers, each giving the number of PSUs to sample from the
corresponding stratum, may be supplied. For the second stage, a vector
of non-negative integers giving the number of items to sample from each
PSU may be used.
prob
:Object of class "list"
; a list of length two,
where each component gives optional probability weights for the first and
second stage, respectively. Each component may thereby be a numerical
vector, or a character string or integer vector specifying a variable
(column) that contains the probability weights.
dots
:Object of class "list"
; a list of length two,
where each component is again a list containing additional arguments to
be passed to the corresponding function for sampling in fun
.
k
:Object of class "numeric"
; a single positive
integer giving the number of samples to be set up.
There are some restrictions on the argument names of the functions for
sampling in fun
. If the sampling method needs population data as
input, the corresponding argument should be called x
and should expect
a data.frame
. If it only needs the population size as input, the
argument should be called N
. Note that the function is not expected
to have both x
and N
as arguments, and that the latter is
typically much faster. Furthermore, if the function has arguments for sample
size and probability weights, they should be called size
and
prob
, respectively. Note that a function with prob
as its only
argument is perfectly valid (for probability proportional to size sampling).
Further arguments may be supplied as a list via the slot dots
.
Class "VirtualSampleControl"
, directly.
Class "OptSampleControl"
, by class "VirtualSampleControl", distance 2.
In addition to the accessor and mutator methods for the slots inherited from
"VirtualSampleControl"
, the following are available:
getDesign
signature(x = "TwoStageControl")
: get slot
design
.
setDesign
signature(x = "TwoStageControl")
: set slot
design
.
getGrouping
signature(x = "TwoStageControl")
: get slot
grouping
.
setGrouping
signature(x = "TwoStageControl")
: set slot
grouping
.
getCollect
signature(x = "TwoStageControl")
: get slot
collect
.
setCollect
signature(x = "TwoStageControl")
: set slot
collect
.
getFun
signature(x = "TwoStageControl")
: get slot
fun
.
setFun
signature(x = "TwoStageControl")
: set slot
fun
.
getSize
signature(x = "TwoStageControl")
: get slot
size
.
setSize
signature(x = "TwoStageControl")
: set slot
size
.
getProb
signature(x = "TwoStageControl")
: get slot
prob
.
setProb
signature(x = "TwoStageControl")
: set slot
prob
.
getDots
signature(x = "TwoStageControl")
: get slot
dots
.
setDots
signature(x = "TwoStageControl")
: set slot
dots
.
In addition to the methods inherited from
"VirtualSampleControl"
, the following are available:
clusterSetup
signature(cl = "ANY", x = "data.frame",
control = "TwoStageControl")
: set up multiple samples on a cluster.
setup
signature(x = "data.frame",
control = "TwoStageControl")
: set up multiple samples.
show
signature(object = "TwoStageControl")
: print the
object on the R console.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
"VirtualSampleControl"
,
"SampleControl"
, "SampleSetup"
,
setup
, draw
showClass("TwoStageControl")
showClass("TwoStageControl")
Virtual superclass for controlling contamination in a simulation experiment.
A virtual Class: No objects may be created from it.
target
:Object of class "OptCharacter"
; a character
vector specifying specifying the variables (columns) to be contaminated,
or NULL
to contaminate all variables (except the additional ones
generated internally).
epsilon
:Object of class "numeric"
giving the
contamination levels.
Class "OptContControl"
, directly.
getTarget
signature(x = "VirtualContControl")
: get
slot target
.
setTarget
signature(x = "VirtualContControl")
: set
slot target
.
getEpsilon
signature(x = "VirtualContControl")
: get
slot epsilon
.
setEpsilon
signature(x = "VirtualContControl")
: set
slot epsilon
.
head
signature(x = "VirtualContControl")
: currently
returns the object itself.
length
signature(x = "VirtualContControl")
: get the
number of contamination levels to be used.
show
signature(object = "VirtualContControl")
: print
the object on the R console.
summary
signature(object = "VirtualContControl")
:
currently returns the object itself.
tail
signature(x = "VirtualContControl")
: currently
returns the object itself.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
"DCARContControl"
, "DARContControl"
,
"ContControl"
, contaminate
showClass("VirtualContControl")
showClass("VirtualContControl")
Virtual superclass for controlling model-based generation of data.
A virtual Class: No objects may be created from it.
Class "OptDataControl"
, directly.
clusterRunSimulation
signature(cl = "ANY",
x = "VirtualDataControl", setup = "missing", nrep = "numeric",
control = "SimControl")
: run a simulation experiment on a cluster.
clusterRunSimulation
signature(cl = "ANY",
x = "VirtualDataControl", setup = "VirtualSampleControl",
nrep = "numeric", control = "SimControl")
: run a simulation experiment
on a cluster.
head
signature(x = "VirtualContControl")
: currently
returns the object itself.
runSimulation
signature(x = "VirtualDataControl",
setup = "missing", nrep = "numeric", control = "SimControl")
: run a
simulation experiment.
runSimulation
signature(x = "VirtualDataControl",
setup = "missing", nrep = "missing", control = "SimControl")
: run a
simulation experiment.
runSimulation
signature(x = "VirtualDataControl",
setup = "VirtualSampleControl", nrep = "numeric",
control = "SimControl")
: run a simulation experiment.
runSimulation
signature(x = "VirtualDataControl",
setup = "VirtualSampleControl", nrep = "missing",
control = "SimControl")
: run a simulation experiment.
summary
signature(object = "VirtualContControl")
:
currently returns the object itself.
tail
signature(x = "VirtualContControl")
: currently
returns the object itself.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
showClass("VirtualDataControl")
showClass("VirtualDataControl")
Virtual superclass for controlling the insertion of missing values in a simulation experiment.
A virtual Class: No objects may be created from it.
target
:Object of class "OptCharacter"
; a character
vector specifying the variables (columns) in which missing values should
be inserted, or NULL
to insert missing values in all variables
(except the additional ones generated internally).
NArate
:Object of class "NumericMatrix"
giving the
missing value rates, which may be selected individually for the target
variables. In case of a vector, the same missing value rates are used for
all target variables. In case of a matrix, on the other hand, the missing
value rates to be used for each target variable are given by the
respective column.
Class "OptNAControl"
, directly.
getTarget
signature(x = "VirtualNAControl")
: get slot
target
.
setTarget
signature(x = "VirtualNAControl")
: set slot
target
.
getNArate
signature(x = "VirtualNAControl")
: get slot
NArate
.
setNArate
signature(x = "VirtualNAControl")
: set slot
NArate
.
head
signature(x = "VirtualNAControl")
: currently
returns the object itself.
length
signature(x = "VirtualNAControl")
: get the
number of missing value rates to be used (the length in case of a vector
or the number of rows in case of a matrix).
show
signature(object = "VirtualNAControl")
: print the
object on the R console.
summary
signature(object = "VirtualNAControl")
:
currently returns the object itself.
tail
signature(x = "VirtualNAControl")
: currently
returns the object itself.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
showClass("VirtualNAControl")
showClass("VirtualNAControl")
Virtual superclass for controlling the setup of samples.
A virtual Class: No objects may be created from it.
k
:Object of class "numeric"
, a single positive
integer giving the number of samples to be set up.
Class "OptSampleControl"
, directly.
getK
signature(x = "VirtualSampleControl")
: get slot
k
.
setK
signature(x = "VirtualSampleControl")
: set slot
k
.
clusterRunSimulation
signature(cl = "ANY",
x = "data.frame", setup = "VirtualSampleControl", nrep = "missing",
control = "SimControl")
: run a simulation experiment on a cluster.
clusterRunSimulation
signature(cl = "ANY",
x = "VirtualDataControl", setup = "VirtualSampleControl",
nrep = "numeric", control = "SimControl")
: run a simulation experiment
on a cluster.
draw
signature(x = "data.frame",
setup = "VirtualSampleControl")
: draw a sample.
head
signature(x = "VirtualSampleControl")
: currently
returns the object itself.
length
signature(x = "VirtualSampleControl")
: get the
number of samples to be set up.
runSimulation
signature(x = "data.frame",
setup = "VirtualSampleControl", nrep = "missing",
control = "SimControl")
: run a simulation experiment.
runSimulation
signature(x = "VirtualDataControl",
setup = "VirtualSampleControl", nrep = "numeric",
control = "SimControl")
: run a simulation experiment.
runSimulation
signature(x = "VirtualDataControl",
setup = "VirtualSampleControl", nrep = "missing",
control = "SimControl")
: run a simulation experiment.
show
signature(object = "VirtualSampleControl")
: print
the object on the R console.
summary
signature(object = "VirtualSampleControl")
:
currently returns the object itself.
tail
signature(x = "VirtualSampleControl")
: currently
returns the object itself.
A slightly simplified UML class diagram of the framework can be found in
Figure 1 of the package vignette An Object-Oriented Framework for
Statistical Simulation: The R Package simFrame
. Use
vignette("simFrame-intro")
to view this vignette.
Andreas Alfons
Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1–36. doi:10.18637/jss.v037.i03.
"SampleControl"
, "TwoStageControl"
,
"SampleSetup"
, setup
, draw
showClass("VirtualSampleControl")
showClass("VirtualSampleControl")