Title: | One-Dimensional Probability Distribution Support for the 'spatstat' Family |
---|---|
Description: | Estimation of one-dimensional probability distributions including kernel density estimation, weighted empirical cumulative distribution functions, Kaplan-Meier and reduced-sample estimators for right-censored data, heat kernels, kernel properties, quantiles and integration. |
Authors: | Adrian Baddeley [aut, cre, cph] , Tilman M. Davies [aut, ctb, cph] , Martin L. Hazelton [aut, ctb, cph] , Ege Rubak [aut, cph] , Rolf Turner [aut, cph] , Greg McSwiggan [ctb, cph] |
Maintainer: | Adrian Baddeley <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.1-1 |
Built: | 2025-01-04 04:10:06 UTC |
Source: | https://github.com/spatstat/spatstat.univar |
The spatstat.univar package belongs to the spatstat family of packages. It provides utilities for estimating the probability distribution of one-dimensional (real-valued) data.
This package is a member of the spatstat family of packages. It provides utilities for estimation of the probability distribution of one-dimensional (i.e. numerical, real-valued) data. The utilities include:
including variable-bandwidth kernels, boundary correction, bandwidth selection, unnormalised weighted densities, and cumulative distribution functions of density estimates.
including weighted empirical cumulative distributions, weighted median, weighted quantiles, calculating the CDF from a density estimate
including Kaplan-Meier, reduced-sample and other estimators of the cumulative distribution function and hazard function from right-censored data
including calculation of quantiles from an empirical cumulative distribution or a kernel density estimate
including calculation of the probability density, cumulative distribution function, quantiles, random generation, moments and partial moments of the standard smoothing kernels
calculation of the one-dimensional heat kernel in an interval
Numerical integration including Stieltjes integrals and indefinite integrals.
The facilities are described in more detail below.
Kernel density estimation
The package supports fixed-bandwidth and variable-bandwidth kernel estimation of probability densities from numerical data. It provides boundary corrections for kernel estimates of densities on the positive half-line (applicable when the original observations are positive numbers) for both fixed-bandwidth and variable-bandwidth estimates.
If the observations have numerical weights associated with them,
these weights will not be automatically normalised, and indeed
the weights may be negative or zero. This is
unlike the standard R method density.default
.
The main functions are:
unnormdensity
|
extension of density.default
allowing weights to be negative or zero.
|
densityBC
|
fixed-bandwidth kernel estimate with optional boundary correction |
densityAdaptiveKernel
|
adaptive (variable-bandwidth) kernel estimation (generic) |
densityAdaptiveKernel.default
|
adaptive (variable-bandwidth) kernel estimate (method for numeric data, with optional boundary correction) |
bw.abram.default
|
calculate data-dependent bandwidths using Abramson rule |
CDF.density |
cumulative distribution function from kernel density estimate |
Weighted distributions and weighted statistics
Weighted versions of standard operations such as the histogram and empirical distribution function are provided:
whist |
weighted histogram |
ewcdf |
weighted empirical cumulative distribution function |
mean.ewcdf |
mean of weighted ecdf |
quantile.ewcdf |
quantiles of weighted ecdf |
knots.ewcdf |
jump points of weighted ecdf |
weighted.median |
weighted median of numeric values |
weighted.quantile |
weighted quantile of numeric values |
Estimation for right-censored data
Facilities are provided for estimating the probability distribution of right-censored lifetimes (non-negative real random variables).
kaplan.meier |
Kaplan-Meier estimator of cumulative distribution function and hazard rate, from right-censored data |
reduced.sample |
reduced-sample estimator of cumulative distribution function, from right-censored data |
Quantiles
Facilities are provided for computing the quantiles of a probability distribution, given estimates of the probability density or the cumulative distribution function and so on.
CDF.density |
cumulative distribution function from kernel density estimate |
quantile.density |
quantiles of kernel density estimate |
quantile.ewcdf |
quantiles of weighted ecdf |
quantilefun |
quantiles as a function |
quantilefun.ewcdf |
quantiles as a function |
weighted.quantile |
weighted quantile of numeric values |
transformquantiles |
transform the quantiles of a dataset |
Kernels
The standard R function density.default
recognises a list of smoothing kernels by name:
"gaussian"
, "rectangular"
, "triangular"
,
"epanechnikov"
, "biweight"
, "cosine"
and "optcosine"
. For these kernels, spatstat.univar
provides various characteristics:
dkernel
|
probability density of the kernel |
pkernel
|
cumulative distribution function of the kernel |
qkernel
|
quantiles of the kernel |
rkernel
|
generate simulated realisations from the kernel |
kernel.factor
|
scale factor relating bandwidth to half-width of kernel |
kernel.moment
|
partial moment of kernel |
kernel.squint
|
integral of squared kernel |
dkernelBC
|
evaluate the kernel with boundary correction |
Heat kernels
The heat kernel in an interval can be calculated.
hotrod
|
calculate the heat kernel in an interval |
Integration
A few facilities are provided for calculating integrals of real functions.
indefinteg |
indefinite integral |
integral.density |
integral of a kernel density estimate |
stieltjes |
Stieltjes integral |
Utilities
A few utilities for numerical data are also provided.
uniquemap.default |
map duplicates to unique entries |
rounding.default |
determine whether values have been rounded |
firstdigit |
leading digit in decimal representation |
lastdigit |
least significant digit in decimal representation |
ndigits |
number of digits in decimal representation |
This library and its documentation are usable under the terms of the "GNU General Public License", a copy of which is distributed with the package.
Adrian Baddeley [email protected], Tilman Davies [email protected], Martin Hazelton [email protected], Ege Rubak [email protected], Rolf Turner [email protected] and Greg McSwiggan.
Computes adaptive smoothing bandwidths according to the inverse-square-root rule of Abramson (1982).
bw.abram(X, h0, ...)
bw.abram(X, h0, ...)
X |
Data to be smoothed. |
h0 |
Global smoothing bandwidth. A numeric value. |
... |
Additional arguments passed to methods. |
This function computes adaptive smoothing bandwidths for a dataset, using the methods of Abramson (1982) and Hall and Marron (1988).
The function bw.abram
is generic.
There is a default method bw.abram.default
.
The spatstat package family includes methods for spatial objects.
See the documentation for the particular method.
Adrian Baddeley [email protected].
Abramson, I. (1982) On bandwidth variation in kernel estimates — a square root law. Annals of Statistics, 10(4), 1217-1223.
Hall, P. and Marron, J.S. (1988) Variable window width kernel density estimates of probability densities. Probability Theory and Related Fields, 80, 37-49.
Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York.
Computes adaptive smoothing bandwidths for numeric data, according to the inverse-square-root rule of Abramson (1982).
## Default S3 method: bw.abram(X, h0, ..., at = c("data", "grid"), pilot = NULL, hp = h0, trim = 5, smoother = density.default)
## Default S3 method: bw.abram(X, h0, ..., at = c("data", "grid"), pilot = NULL, hp = h0, trim = 5, smoother = density.default)
X |
Data for which bandwidths should be calculated. A numeric vector. |
h0 |
A scalar value giving the global smoothing bandwidth
in the same units as |
... |
Arguments passed to |
at |
Character string (partially matched) specifying whether to
compute bandwidth values only at the data points of |
pilot |
Optional. Specification of a pilot density (possibly unnormalised).
Either a numeric vector giving the pilot density for each
data point of |
hp |
Optional. A scalar pilot bandwidth, used for estimation
of the pilot density, if |
trim |
A trimming value required to curb excessively large bandwidths. See Details. The default is sensible in most cases. |
smoother |
Smoother for the pilot.
A function or character string, specifying the function
to be used to compute the pilot estimate when
|
This function computes adaptive smoothing bandwidths using the methods of Abramson (1982) and Hall and Marron (1988).
The function bw.abram
is generic. The function
bw.abram.default
documented here is the default method
which is designed for numeric data.
If at="data"
(the default) a smoothing bandwidth is
computed for each data point in X
. Alternatively if
at="grid"
a smoothing bandwidth is computed for
a grid of x
values.
Under the Abramson-Hall-Marron rule, the bandwidth at location is
where is a pilot estimate of the
probability density. The variable bandwidths are rescaled by
, the
geometric mean of the
terms evaluated at the
data; this allows the global bandwidth
h0
to be considered on
the same scale as a corresponding fixed bandwidth. The trimming value
trim
has the same interpretation as the required ‘clipping’ of
the pilot density at some small nominal value (see Hall and Marron,
1988), to necessarily prevent extreme bandwidths (which
can occur at very isolated observations).
The pilot density or intensity is determined as follows:
If pilot
is a function
in the R language,
this is taken as the pilot density.
If pilot
is a probability density estimate
(object of class "density"
produced by
density.default
) then this is taken as the
pilot density.
If pilot
is NULL
, then the pilot intensity is
computed as a fixed-bandwidth kernel
intensity estimate using density.default
applied to
the data X
using
the pilot bandwidth hp
.
In each case the pilot density is renormalised to become a probability density, and then the Abramson rule is applied.
Instead of calculating the pilot as a fixed-bandwidth density
estimate, the user can specify another density estimation procedure
using the argument smoother
. This should be either a function
or the character string name of a function. It will replace
density.default
as the function used to calculate the
pilot estimate. The pilot estimate will be computed as
smoother(X, sigma=hp, ...)
if pilot
is NULL
,
or smoother(pilot, sigma=hp, ...)
if pilot
is a point
pattern. If smoother
does not recognise the argument name
sigma
for the smoothing bandwidth, then hp
is effectively
ignored.
Either a numeric vector of the same length as X
giving the Abramson bandwidth for each point
(when at = "data"
, the default),
or a function
giving the Abramson bandwidths
as a function of location.
Tilman Davies [email protected]. Adapted by Adrian Baddeley [email protected].
Abramson, I. (1982) On bandwidth variation in kernel estimates — a square root law. Annals of Statistics, 10(4), 1217-1223.
Hall, P. and Marron, J.S. (1988) Variable window width kernel density estimates of probability densities. Probability Theory and Related Fields, 80, 37-49.
Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York.
xx <- rexp(20) bw.abram(xx)
xx <- rexp(20) bw.abram(xx)
Computes variable smoothing bandwidths intended to be proportional to the observed data values, raised to a given power.
bw.pow(X, h0, POW = 0.75, trim = 5, ...)
bw.pow(X, h0, POW = 0.75, trim = 5, ...)
X |
Data for which bandwidths should be calculated. A numeric vector of positive values. |
h0 |
A scalar value giving the global smoothing bandwidth
in the same units as |
POW |
Numeric value. The exponent of the power transformation to
be applied to |
trim |
A trimming value required to curb excessively large bandwidths. See Details. The default is sensible in most cases. |
... |
Ignored. |
This function computes adaptive smoothing bandwidths
for the data values in X
.
Larger data values are assigned larger bandwidths.
Bandwidths are proportional to X^POW
.
The bandwidth at location is
where is the geometric mean of the values
.
This allows the global bandwidth
h0
to be considered on
the same scale as a corresponding fixed bandwidth.
A numeric vector of the same length as X
.
Tilman Davies [email protected]. Adapted by Adrian Baddeley [email protected].
xx <- sort(rexp(10)) bb <- bw.pow(xx) signif(rbind(xx, bb), 3)
xx <- sort(rexp(10)) bb <- bw.pow(xx) signif(rbind(xx, bb), 3)
Use Taylor's non-random bootstrap technique to select the bandwidth for kernel density estimation on the real line.
bw.taylor(x, ..., srange = NULL, useC = TRUE)
bw.taylor(x, ..., srange = NULL, useC = TRUE)
x |
Numeric vector. |
... |
Ignored. |
srange |
Range of bandwidths to be considered. A numeric vector of length 2. |
useC |
Logical value specifying whether to use faster C code. |
This function selects a bandwidth for kernel density estimation
of a probability density on the real line,
using the numeric data x
and assuming a Gaussian kernel. The result
is the numeric value of the standard deviation of the Gaussian kernel.
The function uses the method of Taylor (1989) who showed that, when using the Gaussian kernel, the optimisation criterion can be computed rapidly from the data without any randomised resampling.
The domain of the probability density is assumed to be the entire real line. Boundary correction is not currently implemented.
The result of bw.taylor
is a single numeric value
giving the selected bandwidth.
A single numeric value.
Tilman Davies [email protected] and Adrian Baddeley [email protected].
Taylor, C.C. (1989) Choice of the Smoothing Parameter in Kernel Density Estimation, Biometrika 76 4, 705–712.
bw.nrd
in the stats package
for standard bandwidth selectors.
x <- rnorm(30) bw.taylor(x)
x <- rnorm(30) bw.taylor(x)
Given a kernel estimate of a probability density, compute the corresponding cumulative distribution function.
CDF(f, ...) ## S3 method for class 'density' CDF(f, ..., warn = TRUE)
CDF(f, ...) ## S3 method for class 'density' CDF(f, ..., warn = TRUE)
f |
Density estimate (object of class |
... |
Ignored. |
warn |
Logical value indicating whether to issue a warning if the
density estimate |
CDF
is generic, with a method for class "density"
.
This calculates the cumulative distribution function
whose probability density has been estimated and stored in the object
f
. The object f
must belong to the class "density"
,
and would typically have been obtained from a call to the function
density
.
A function, which can be applied to any numeric value or vector of values.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected]
b <- density(runif(10)) f <- CDF(b) f(0.5) plot(f)
b <- density(runif(10)) f <- CDF(b) f(0.5) plot(f)
Computes an adaptive estimate of probability density or intensity using a variable-bandwidth smoothing kernel.
densityAdaptiveKernel(X, ...)
densityAdaptiveKernel(X, ...)
X |
Data to be smoothed. |
... |
Additional arguments passed to methods. |
This generic function computes an adaptive kernel estimate of probability density or intensity.
The function densityAdaptiveKernel
is generic.
The spatstat package family includes methods for spatial objects.
See documentation for each method.
Adrian Baddeley [email protected] and Tilman Davies [email protected].
Computes an adaptive estimate of probability density from numeric data, using a variable-bandwidth smoothing kernel.
## Default S3 method: densityAdaptiveKernel(X, bw, ..., weights = NULL, zerocor=c("none", "weighted", "convolution", "reflection", "bdrykern", "JonesFoster"), at = c("grid", "data"), ngroups=Inf, fast=TRUE)
## Default S3 method: densityAdaptiveKernel(X, bw, ..., weights = NULL, zerocor=c("none", "weighted", "convolution", "reflection", "bdrykern", "JonesFoster"), at = c("grid", "data"), ngroups=Inf, fast=TRUE)
X |
Data to be smoothed. A numeric vector. |
bw |
Smoothing bandwidths. Either a numeric vector of the same length as
|
... |
Additional arguments passed to |
weights |
Optional. Numeric vector of weights attached to each value in |
zerocor |
Character string (partially matched) specifying a boundary
correction. This is appropriate when |
at |
String (partially matched) specifying whether to evaluate the
probability density only at the data points ( |
ngroups |
Integer, |
fast |
Logical value specifying whether to use the Fast Fourier Transform to accelerate computations, when appropriate. |
This function computes an adaptive kernel estimate of
probability density on the real line (if zerocor="none"
)
or on the positive real line (if zerocor
is another value).
The argument bw
specifies the smoothing bandwidths to be
applied to each of the points in X
. It may be a numeric vector
of bandwidth values, or a function yielding the
bandwidth values.
If the values in X
are
and the corresponding bandwidths are
then the adaptive kernel estimate of intensity at a location
is
where is the value at
of the (possibly edge-corrected) smoothing kernel with bandwidth
induced by a data point at
.
Exact computation of the estimate above can be time-consuming:
it takes times longer than fixed-bandwidth smoothing.
The partitioning method of Davies and Baddeley (2018)
accelerates this computation by partitioning the range of
bandwidths into ngroups
intervals,
correspondingly subdividing X
into
ngroups
subsets according to bandwidth,
and applying fixed-bandwidth smoothing to each subset.
If ngroups=NULL
then we use a default rule where ngroups
is the integer part of the square root of
the number of points in X
, so that the computation time is
only about times slower than fixed-bandwidth
smoothing. Any positive value of
ngroups
can be specified by the user. Specifying ngroups=Inf
enforces exact
computation of the estimate without partitioning. Specifying
ngroups=1
is the same as fixed-bandwidth smoothing with
bandwidth sigma=median(bw)
.
If at="data"
, a numeric vector of the same length as X
.
If at="grid"
, a probability density object of class "density"
.
The function densityAdaptiveKernel.default
computes one adaptive estimate of probability density,
determined by the smoothing bandwidth values bw
.
Typically the bandwidth values are computed by first computing
a pilot estimate of the intensity, then using bw.abram.default
to compute the vector of bandwidths according to Abramson's rule.
This involves specifying a global bandwidth h0
.
Adrian Baddeley [email protected] and Tilman Davies [email protected].
Davies, T.M. and Baddeley, A. (2018)
Fast computation of spatially adaptive kernel estimates.
Statistics and Computing, 28(4), 937-956.
Hall, P. and Marron, J.S. (1988)
Variable window width kernel density estimates of probability
densities.
Probability Theory and Related Fields, 80, 37-49.
Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York.
xx <- rexp(100, rate=5) plot(density(xx)) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="w")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="c")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="r")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="b")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="J")) curve(5 * exp(-5 * x), add=TRUE, col=3)
xx <- rexp(100, rate=5) plot(density(xx)) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="w")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="c")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="r")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="b")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="J")) curve(5 * exp(-5 * x), add=TRUE, col=3)
A simple implementation of fixed-bandwidth kernel density estimation on the real line, or the positive real half-line, including optional corrections for a boundary at zero.
densityBC(x, kernel = "epanechnikov", bw=NULL, ..., h=NULL, adjust = 1, weights = rep(1, length(x))/length(x), from, to = max(x), n = 256, zerocor = c("none", "weighted", "convolution", "reflection", "bdrykern", "JonesFoster"), fast=FALSE, internal=list())
densityBC(x, kernel = "epanechnikov", bw=NULL, ..., h=NULL, adjust = 1, weights = rep(1, length(x))/length(x), from, to = max(x), n = 256, zerocor = c("none", "weighted", "convolution", "reflection", "bdrykern", "JonesFoster"), fast=FALSE, internal=list())
x |
Numeric vector. |
kernel |
String specifying kernel.
Options are
|
bw , h
|
Alternative specifications of the scale factor for the kernel.
The bandwidth |
adjust |
Numeric value used to rescale the bandwidth |
weights |
Numeric vector of weights associated with |
from , to
|
Lower and upper limits of interval on which density should be
computed.
The default value of |
n |
Number of |
zerocor |
String (partially matched) specifying a correction for the boundary effect
bias at |
fast |
Logical value specifying whether to perform the calculation rapidly
using the Fast Fourier Transform ( |
internal |
Internal use only. |
... |
Additional arguments are ignored. |
If zerocor
is absent or given as "none"
,
this function computes the fixed bandwidth kernel estimator of the
probability density on the real line.
If zerocor
is given, it is assumed that the density
is confined to the positive half-line, and a boundary correction is
applied:
The contribution from each point
is weighted by the factor
where
is the total mass of the kernel centred on
that lies in the positive half-line, and
is the
cumulative distribution function of the kernel
The estimate of the density is
weighted by the factor
where
is given above.
if the kernel centred at data point
has a tail that lies on the negative half-line, this tail is
reflected onto the positive half-line.
The density estimate is computed using the
Boundary Kernel associated with the chosen kernel
(Wand and Jones, 1995, page 47).
That is, when estimating the density for values of
close to zero (defined as
for all kernels
except the Gaussian), the kernel contribution
is multiplied by a
term that is a linear function of
.
The modification of the Boundary Kernel estimate
proposed by Jones and Foster (1996), equal to
where
is the convolution estimator
and
is the boundary kernel estimator.
If fast=TRUE
, the calculations are performed rapidly using
density.default
which employs the Fast Fourier
Transform. If fast=FALSE
(the default), the calculations are
performed exactly using slower C code.
An object of class "density"
as described in the help file
for density.default
. It contains at least the entries
x |
Vector of |
y |
Vector of density values |
Adrian Baddeley [email protected] and Martin Hazelton [email protected].
Baddeley, A., Chang, Y-M., Davies, T.M. and Hazelton, M. (2024) In preparation.
Jones, M.C. and Foster, P.J. (1996) A simple nonnegative boundary correction method for kernel density estimation. Statistica Sinica, 6 (4) 1005–1013.
Wand, M.P. and Jones, M.C. (1995) Kernel Smoothing. Chapman and Hall.
sim.dat <- rexp(500) fhatN <- densityBC(sim.dat, "biweight", h=0.4) fhatB <- densityBC(sim.dat, "biweight", h=0.4, zerocor="bdrykern") plot(fhatN, ylim=c(0,1.1), main="density estimates") lines(fhatB, col=2) curve(dexp(x), add=TRUE, from=0, col=3) legend(2, 0.8, legend=c("fixed bandwidth", "boundary kernel", "true density"), col=1:3, lty=rep(1,3))
sim.dat <- rexp(500) fhatN <- densityBC(sim.dat, "biweight", h=0.4) fhatB <- densityBC(sim.dat, "biweight", h=0.4, zerocor="bdrykern") plot(fhatN, ylim=c(0,1.1), main="density estimates") lines(fhatB, col=2) curve(dexp(x), add=TRUE, from=0, col=3) legend(2, 0.8, legend=c("fixed bandwidth", "boundary kernel", "true density"), col=1:3, lty=rep(1,3))
Density, distribution function, quantile function and random generation for several distributions used in kernel estimation for numerical data.
dkernel(x, kernel = "gaussian", mean = 0, sd = 1) pkernel(q, kernel = "gaussian", mean = 0, sd = 1, lower.tail = TRUE) qkernel(p, kernel = "gaussian", mean = 0, sd = 1, lower.tail = TRUE) rkernel(n, kernel = "gaussian", mean = 0, sd = 1)
dkernel(x, kernel = "gaussian", mean = 0, sd = 1) pkernel(q, kernel = "gaussian", mean = 0, sd = 1, lower.tail = TRUE) qkernel(p, kernel = "gaussian", mean = 0, sd = 1, lower.tail = TRUE) rkernel(n, kernel = "gaussian", mean = 0, sd = 1)
x , q
|
Vector of quantiles. |
p |
Vector of probabilities. |
kernel |
String name of the kernel.
Options are
|
n |
Number of observations. |
mean |
Mean of distribution. |
sd |
Standard deviation of distribution. |
lower.tail |
logical; if |
These functions give the probability density, cumulative distribution function, quantile function and random generation for several distributions used in kernel estimation for one-dimensional (numerical) data.
The available kernels are those used in density.default
,
namely "gaussian"
, "rectangular"
,
"triangular"
,
"epanechnikov"
,
"biweight"
,
"cosine"
and "optcosine"
.
For more information about these kernels,
see density.default
.
dkernel
gives the probability density,
pkernel
gives the cumulative distribution function,
qkernel
gives the quantile function,
and rkernel
generates random deviates.
A numeric vector.
For dkernel
, a vector of the same length as x
containing the corresponding values of the probability density.
For pkernel
, a vector of the same length as x
containing the corresponding values of the cumulative distribution function.
For qkernel
, a vector of the same length as p
containing the corresponding quantiles.
For rkernel
, a vector of length n
containing randomly generated values.
Adrian Baddeley [email protected] and Martin Hazelton [email protected].
density.default
,
kernel.factor
,
kernel.moment
,
kernel.squint
.
x <- seq(-3,3,length=100) plot(x, dkernel(x, "epa"), type="l", main=c("Epanechnikov kernel", "probability density")) plot(x, pkernel(x, "opt"), type="l", main=c("OptCosine kernel", "cumulative distribution function")) p <- seq(0,1, length=256) plot(p, qkernel(p, "biw"), type="l", main=c("Biweight kernel", "cumulative distribution function")) y <- rkernel(100, "tri") hist(y, main="Random variates from triangular density") rug(y)
x <- seq(-3,3,length=100) plot(x, dkernel(x, "epa"), type="l", main=c("Epanechnikov kernel", "probability density")) plot(x, pkernel(x, "opt"), type="l", main=c("OptCosine kernel", "cumulative distribution function")) p <- seq(0,1, length=256) plot(p, qkernel(p, "biw"), type="l", main=c("Biweight kernel", "cumulative distribution function")) y <- rkernel(100, "tri") hist(y, main="Random variates from triangular density") rug(y)
Computes the boundary-corrected version of a smoothing kernel density function.
dkernelBC(x, mean, sd = 1, kernel = "gaussian", zerocor = c("none", "weighted", "convolution", "reflection", "bdrykern"))
dkernelBC(x, mean, sd = 1, kernel = "gaussian", zerocor = c("none", "weighted", "convolution", "reflection", "bdrykern"))
x |
Numeric. Values of the function argument, at which the function should be evaluated. |
mean |
Numeric. The mean of the uncorrected kernel. |
sd |
Numeric value. The standard deviation of the uncorrected kernel. |
kernel |
Character string giving the name of the kernel
as recognised by |
zerocor |
String (partially matched) specifying a correction for the boundary effect
bias at |
The kernel density function identified by kernel
with standard deviation sd
and mean mean
will be computed, and truncated onto the positive half-line.
The boundary correction specified by zerocor
will then
be applied. The result is the vector of corrected density values.
Numeric value or numeric vector.
Adrian Baddeley [email protected].
densityBC
to compute a density estimate using
the boundary-corrected kernel.
dkernel
to compute the un-corrected
kernel density function, and density.default
to compute
an uncorrected density estimate.
match.kernel
for the list of
recognised names of kernels.
curve(dkernelBC(x, mean=1, zerocor="none"), to=5) curve(dkernelBC(x, mean=1, zerocor="weighted"), to=5) curve(dkernelBC(x, mean=1, zerocor="reflection"), to=5) curve(dkernelBC(x, mean=1, zerocor="convolution"), to=5) curve(dkernelBC(x, mean=1, zerocor="bdrykern"), to=5)
curve(dkernelBC(x, mean=1, zerocor="none"), to=5) curve(dkernelBC(x, mean=1, zerocor="weighted"), to=5) curve(dkernelBC(x, mean=1, zerocor="reflection"), to=5) curve(dkernelBC(x, mean=1, zerocor="convolution"), to=5) curve(dkernelBC(x, mean=1, zerocor="bdrykern"), to=5)
Compute a weighted version of the empirical cumulative distribution function.
ewcdf(x, weights = NULL, normalise=TRUE, adjust=1)
ewcdf(x, weights = NULL, normalise=TRUE, adjust=1)
x |
Numeric vector of observations. |
weights |
Optional. Numeric vector of non-negative weights for |
normalise |
Logical value indicating whether the weights should be rescaled so that they sum to 1. |
adjust |
Numeric value. Adjustment factor.
The weights will be multiplied by |
This is a modification of the standard function ecdf
allowing the observations x
to have weights.
The weighted e.c.d.f. (empirical cumulative distribution function)
Fn
is defined so that, for any real number y
, the value of
Fn(y)
is equal to the total weight of all entries of
x
that are less than or equal to y
. That is
Fn(y) = sum(weights[x <= y])
.
Thus Fn
is a step function which jumps at the
values of x
. The height of the jump at a point y
is the total weight of all entries in x
number of tied observations at that value. Missing values are
ignored.
If weights
is omitted, the default is equivalent to
ecdf(x)
except for the class membership.
The result of ewcdf
is a function, of class "ewcdf"
,
inheriting from the classes "ecdf"
(only if normalise=TRUE
)
and "stepfun"
.
The class ewcdf
has methods for
print
,
quantile
and mean
.
The inherited classes ecdf
and stepfun
have methods for plot
and summary
.
A function, of class "ewcdf"
, inheriting from
"ecdf"
(if normalise=TRUE
) and "stepfun"
.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
ecdf
.
Integrals with respect to the weighted cumulative distribution function
can be computed using stieltjes
.
x <- rnorm(100) w <- runif(100) plot(e <- ewcdf(x,w)) e
x <- rnorm(100) w <- runif(100) plot(e <- ewcdf(x,w)) e
Find the first or last digit in the decimal representation of a number.
firstdigit(x) lastdigit(x) ndigits(x)
firstdigit(x) lastdigit(x) ndigits(x)
x |
A numeric value or numeric vector. |
firstdigit(x)
finds the first (most significant) digit,
lastdigit(x)
finds the last (least significant) digit,
and ndigits(x)
finds the number of digits,
in the decimal representation of each entry of x
.
The decimal representation is truncated at the number of digits
available for double precision numbers on the hardware, usually 15.
An integer or integer vector of the same length as x
.
Adrian Baddeley [email protected]
firstdigit(42) lastdigit(42) ndigits(42) firstdigit(-0.1234) lastdigit(-0.1234) ndigits(-0.1234) firstdigit(0) lastdigit(0) ndigits(0)
firstdigit(42) lastdigit(42) ndigits(42) firstdigit(-0.1234) lastdigit(-0.1234) ndigits(-0.1234) firstdigit(0) lastdigit(0) ndigits(0)
Calculate values of the heat kernel on a one-dimensional rod. The ends of the rod may be assumed to be insulated, or absorbing.
hotrod(len, xsource, xquery, sigma, ends=c("insulated", "absorbing"), nmax=20)
hotrod(len, xsource, xquery, sigma, ends=c("insulated", "absorbing"), nmax=20)
len |
Length of the rod. A single number or numeric vector. |
xsource |
Positions of the source points, from the left end of the rod
(in the same distance units as |
xquery |
Positions of the query points, from the left end of the rod
(in the same distance units as |
sigma |
Bandwidth for kernel. A single number or a numeric vector. |
ends |
Character string (partially matched) specifying whether the ends of the rod are assumed to be insulated or absorbing. |
nmax |
Number of terms in the infinite sum to use. A single integer or an integer vector. |
Computes the heat kernel as an infinite sum.
Number or numeric vector.
Greg McSwiggan and Adrian Baddeley [email protected].
curve(hotrod(1, 0.1, x, 0.7)) # check it's a probability density f <- function(x) hotrod(1, 0.1, x, 0.7) integrate(f, 0, 1) ## absorbing ends curve(hotrod(1, 0.1, x, 0.7, ends="a"))
curve(hotrod(1, 0.1, x, 0.7)) # check it's a probability density f <- function(x) hotrod(1, 0.1, x, 0.7) integrate(f, 0, 1) ## absorbing ends curve(hotrod(1, 0.1, x, 0.7, ends="a"))
Computes the indefinite integral of the given function.
indefinteg(f, x, ..., method=c("trapezoid", "quadrature"), lower=min(x), nfine=8192)
indefinteg(f, x, ..., method=c("trapezoid", "quadrature"), lower=min(x), nfine=8192)
f |
an R function taking a numeric first argument and returning a numeric vector of the same length. |
x |
Vector of values of the argument for which the indefinite integral should be evaluated. |
... |
additional arguments to be passed to |
method |
String (partially matched) specifying how to compute the integrals. |
lower |
Lower limit of integration. A single number. |
nfine |
Number of sub-intervals to use for computation
if |
The indefinite integral of the given function f
is computed numerically at each of the desired values x
.
The lower limit of integration is taken to be min(x)
.
The result is a numeric vector y
of the same length as
x
, with entries
If method='trapezoid'
(the default),
the integrals are computed rapidly using the trapezoid rule.
If method='quadrature'
the integrals are computed
accurately but much more slowly, using the numerical quadrature routine
integrate
.
If method='trapezoid'
the function f
is first evaluated
on a finer grid of values of the function argument.
The fine grid contains nfine
sample points.
The values of the indefinite integral on the fine grid
are computed using the trapezoidal approximation.
Finally the values of the indefinite integral are extracted at
the desired argument values x
.
Numeric vector of the same length as x
.
Adrian Baddeley [email protected].
curve(indefinteg(sin, x), to=pi)
curve(indefinteg(sin, x), to=pi)
Computes the integral of a function or spatial object.
integral(f, domain=NULL, ...)
integral(f, domain=NULL, ...)
f |
A function, or a spatial object that can be treated as a function. |
domain |
Optional. Data specifying the domain of integration. |
... |
Arguments passed to methods. |
The function integral
is generic.
It calculates the integral of a function, or
the integral of a spatial object that can be treated as a function.
It has methods
for one-dimensional functions ("density"
, "fv"
)
and for spatial objects
("im"
, "msr"
, "linim"
, "linfun"
).
A single numeric or complex value, or a vector of such values.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
integral.im
in package spatstat.geom
.
Compute the integral of a kernel density estimate over a specified range.
## S3 method for class 'density' integral(f, domain = NULL, weight=NULL, ...)
## S3 method for class 'density' integral(f, domain = NULL, weight=NULL, ...)
f |
A one-dimensional probability density estimate
(object of class |
domain |
Optional. Range of values of the argument |
weight |
Optional. A |
... |
Ignored. |
This is a method for the generic function integral
.
It computes the numerical integral
of the density estimate f
.
If weight
is specified, then the weighted integral
is computed, where is the function specified by
weight
.
This function must return finite numerical values.
If domain
is specified, the integral is restricted to the
interval of values given by the
domain
.
Integrals are calculated numerically using the trapezoidal rule restricted to the domain given.
A single numerical value.
Adrian Baddeley [email protected].
x <- runif(10) d <- density(x, bw=0.1) integral(d) # should be approximately 1 integral(d, domain=c(-Inf, 0)) # mass on negative half-line ## mean of density integral(d, weight=function(x) x)
x <- runif(10) d <- density(x, bw=0.1) integral(d) # should be approximately 1 integral(d, domain=c(-Inf, 0)) # mass on negative half-line ## mean of density integral(d, weight=function(x) x)
Compute the Kaplan-Meier estimator of a survival time distribution function, from histogram data
kaplan.meier(obs, nco, breaks, upperobs=0)
kaplan.meier(obs, nco, breaks, upperobs=0)
obs |
vector of |
nco |
vector of |
breaks |
Vector of |
upperobs |
Number of observations beyond the rightmost breakpoint, if any. |
This function is needed mainly for internal use in spatstat, but may be useful in other applications where you want to form the Kaplan-Meier estimator from a huge dataset.
Suppose are the survival times of individuals
with unknown distribution function
which we wish to estimate. Suppose these times are right-censored
by random censoring times
.
Thus the observations consist of right-censored survival times
and non-censoring indicators
for each
.
If the number of observations is large, it is efficient to
use histograms.
Form the histogram
obs
of all observed times .
That is,
obs[k]
counts the number of values
in the interval
(breaks[k],breaks[k+1]]
for
and
[breaks[1],breaks[2]]
for .
Also form the histogram
nco
of all uncensored times,
i.e. those such that
.
These two histograms are the arguments passed to
kaplan.meier
.
The vectors km
and lambda
returned by kaplan.meier
are (histogram approximations to) the Kaplan-Meier estimator
of and its hazard rate
.
Specifically,
km[k]
is an estimate of
F(breaks[k+1])
, and lambda[k]
is an estimate of
the average of over the interval
(breaks[k],breaks[k+1])
.
The histogram breaks must include .
If the histogram breaks do not span the range of the observations,
it is important to count how many survival times
exceed the rightmost breakpoint,
and give this as the value
upperobs
.
A list with two elements:
km |
Kaplan-Meier estimate of the survival time c.d.f. |
lambda |
corresponding Nelson-Aalen estimate of the
hazard rate |
These are numeric vectors of length .
Adrian Baddeley [email protected]
and Rolf Turner [email protected]
Returns a scale factor for the kernels used in density estimation for numerical data.
kernel.factor(kernel = "gaussian")
kernel.factor(kernel = "gaussian")
kernel |
String name of the kernel.
Options are
|
Kernel estimation of a probability density in one dimension
is performed by density.default
using a kernel function selected from the list above.
This function computes a scale constant for the kernel.
For the Gaussian kernel, this constant is equal to 1.
Otherwise, the constant is such that the kernel
with standard deviation
is supported on the interval
.
For more information about these kernels,
see density.default
.
A single number.
Adrian Baddeley [email protected] and Martin Hazelton [email protected].
density.default
,
dkernel
,
kernel.moment
,
kernel.squint
kernel.factor("rect") # bandwidth for Epanechnikov kernel with half-width h=1 h <- 1 bw <- h/kernel.factor("epa")
kernel.factor("rect") # bandwidth for Epanechnikov kernel with half-width h=1 h <- 1 bw <- h/kernel.factor("epa")
Computes the complete or incomplete th moment of a
smoothing kernel.
kernel.moment(m, r, kernel = "gaussian", mean=0, sd=1/kernel.factor(kernel))
kernel.moment(m, r, kernel = "gaussian", mean=0, sd=1/kernel.factor(kernel))
m |
Exponent (order of moment). An integer. |
r |
Upper limit of integration for the incomplete moment.
A numeric value or numeric vector.
Set |
kernel |
String name of the kernel.
Options are
|
mean , sd
|
Optional numerical values giving the mean and standard deviation of the kernel. |
Kernel estimation of a probability density in one dimension
is performed by density.default
using a kernel function selected from the list above.
For more information about these kernels,
see density.default
.
The function kernel.moment
computes the integral
where is the selected kernel,
is the upper limit of
integration, and
is the exponent or order.
Note that, if mean
and sd
are not specified, the
calculations assume that is the standard form of the kernel,
which has support
and
standard deviation
where
c = kernel.factor(kernel)
.
The code uses the explicit analytic expressions when
m = 0, 1, 2
and numerical integration otherwise.
A single number, or a numeric vector of the same length as r
.
Adrian Baddeley [email protected] and Martin Hazelton [email protected].
density.default
,
dkernel
,
kernel.factor
,
kernel.squint
kernel.moment(1, 0.1, "epa") curve(kernel.moment(2, x, "epa"), from=-1, to=1)
kernel.moment(1, 0.1, "epa") curve(kernel.moment(2, x, "epa"), from=-1, to=1)
Computes the integral of the squared kernel, for the kernels used in density estimation for numerical data.
kernel.squint(kernel = "gaussian", bw=1)
kernel.squint(kernel = "gaussian", bw=1)
kernel |
String name of the kernel.
Options are
|
bw |
Bandwidth (standard deviation) of the kernel. |
Kernel estimation of a probability density in one dimension
is performed by density.default
using a kernel function selected from the list above.
This function computes the integral of the squared kernel,
where is the kernel with bandwidth
bw
.
A single number.
Adrian Baddeley [email protected] and Martin Hazelton [email protected].
density.default
,
dkernel
,
kernel.moment
,
kernel.factor
kernel.squint("gaussian", 3) # integral of squared Epanechnikov kernel with half-width h=1 h <- 1 bw <- h/kernel.factor("epa") kernel.squint("epa", bw)
kernel.squint("gaussian", 3) # integral of squared Epanechnikov kernel with half-width h=1 h <- 1 bw <- h/kernel.factor("epa") kernel.squint("epa", bw)
Compute the Kaplan-Meier and Reduced Sample estimators of a survival time distribution function, using histogram techniques
km.rs(o, cc, d, breaks)
km.rs(o, cc, d, breaks)
o |
vector of observed survival times |
cc |
vector of censoring times |
d |
vector of non-censoring indicators |
breaks |
Vector of breakpoints to be used to form histograms. |
This function is needed mainly for internal use in spatstat, but may be useful in other applications where you want to form the Kaplan-Meier estimator from a huge dataset.
Suppose are the survival times of individuals
with unknown distribution function
which we wish to estimate. Suppose these times are right-censored
by random censoring times
.
Thus the observations consist of right-censored survival times
and non-censoring indicators
for each
.
The arguments to this function are
vectors o
, cc
, d
of observed values of ,
and
respectively.
The function computes histograms and forms the reduced-sample
and Kaplan-Meier estimates of
by
invoking the functions
kaplan.meier
and reduced.sample
.
This is efficient if the lengths of o
, cc
, d
(i.e. the number of observations) is large.
The vectors km
and hazard
returned by kaplan.meier
are (histogram approximations to) the Kaplan-Meier estimator
of and its hazard rate
.
Specifically,
km[k]
is an estimate of
F(breaks[k+1])
, and lambda[k]
is an estimate of
the average of over the interval
(breaks[k],breaks[k+1])
. This approximation is exact only if the
survival times are discrete and the
histogram breaks are fine enough to ensure that each interval
(breaks[k],breaks[k+1])
contains only one possible value of
the survival time.
The vector rs
is the reduced-sample estimator,
rs[k]
being the reduced sample estimate of F(breaks[k+1])
.
This value is exact, i.e. the use of histograms does not introduce any
approximation error in the reduced-sample estimator.
A list with five elements
rs |
Reduced-sample estimate of the survival time c.d.f. |
km |
Kaplan-Meier estimate of the survival time c.d.f. |
hazard |
corresponding Nelson-Aalen estimate of the
hazard rate |
r |
values of |
breaks |
the breakpoints vector |
Adrian Baddeley [email protected]
and Rolf Turner [email protected]
Extract the knots (jump points) of an empirical weighted cumulative distribution function.
## S3 method for class 'ewcdf' knots(Fn, ...)
## S3 method for class 'ewcdf' knots(Fn, ...)
Fn |
An empirical weighted cumulative
distribution function (object of class |
... |
Ignored. |
The function knots
is generic. This
function knots.ewcdf
is the method for
the class "ewcdf"
of empirical weighted cumulative
distribution functions. Objects of class "ewcdf"
are created by ewcdf
.
The jump points (locations of increments) of the function will be returned as a numeric vector.
Numeric vector.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
x <- c(1, 2, 5) w <- runif(3) e <- ewcdf(x,w) knots(e)
x <- c(1, 2, 5) w <- runif(3) e <- ewcdf(x,w) knots(e)
Calculates the mean of a (weighted or unweighted) empirical cumulative distribution function.
## S3 method for class 'ecdf' mean(x, trim=0, ...) ## S3 method for class 'ewcdf' mean(x, trim=0, ...)
## S3 method for class 'ecdf' mean(x, trim=0, ...) ## S3 method for class 'ewcdf' mean(x, trim=0, ...)
x |
An empirical cumulative distribution function
(object of class |
trim |
The fraction (0 to 0.5) of data values to be trimmed from each end of their range, before the mean is computed. |
... |
Ignored. |
These functions are methods for the generic
mean
for the classes "ecdf"
and "ewcdf"
.
They calculate the mean of the probability distribution
corresponding to the cumulative distribution function x
.
This is equivalent to calculating the (weighted or unweighted)
mean of the original data values.
For weighted empirical cumulative distribution functions
(class "ewcdf"
) the weights will first be normalised so that they
sum to 1. The result of mean.ewcdf
is always an average or weighted average or the original data values.
The argument trim
is interpreted as a probability
under this normalised distribution; the corresponding
quantiles are computed, and data outside these quantiles is deleted
before calculating the weighted mean.
A single number.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
Generic mean
and
weighted.mean
.
ecdf
, ewcdf
to create the cumulative distribution functions.
stieltjes
for integration with respect to
a cumulative distribution function.
x <- 1:5 mean(x) mean(ecdf(x)) w <- 1:5 mean(ewcdf(x, w))
x <- 1:5 mean(x) mean(ecdf(x)) w <- 1:5 mean(ewcdf(x, w))
Given a kernel estimate of a probability density, compute quantiles.
## S3 method for class 'density' quantile(x, probs = seq(0, 1, 0.25), names = TRUE, ..., warn = TRUE)
## S3 method for class 'density' quantile(x, probs = seq(0, 1, 0.25), names = TRUE, ..., warn = TRUE)
x |
Object of class |
probs |
Numeric vector of probabilities for which the quantiles are required. |
names |
Logical value indicating whether to attach names (based on
|
... |
Ignored. |
warn |
Logical value indicating whether to issue a warning if the
density estimate |
This function calculates quantiles of the probability distribution
whose probability density has been estimated and stored in the object
x
. The object x
must belong to the class "density"
,
and would typically have been obtained from a call to the function
density
.
The probability density is first normalised so that the total
probability is equal to 1. A warning is issued if the density
estimate was restricted to an interval (i.e. if x
was created by a call to density
which
included either of the arguments from
and to
).
Next, the density estimate is numerically integrated to obtain an estimate
of the cumulative distribution function . Then
for each desired probability
, the algorithm finds the
corresponding quantile
.
The quantile corresponding to probability
satisfies
up to
the resolution of the grid of values contained in
x
.
The quantile is computed from the right, that is,
is the smallest available value of
such that
.
A numeric vector containing the quantiles.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
quantile
,
quantile.ewcdf
,
CDF
.
dd <- density(runif(10)) quantile(dd)
dd <- density(runif(10)) quantile(dd)
Compute quantiles of a weighted empirical cumulative distribution function.
## S3 method for class 'ewcdf' quantile(x, probs = seq(0, 1, 0.25), names = TRUE, ..., normalise = TRUE, type=1)
## S3 method for class 'ewcdf' quantile(x, probs = seq(0, 1, 0.25), names = TRUE, ..., normalise = TRUE, type=1)
x |
A weighted empirical cumulative distribution function
(object of class |
probs |
probabilities for which the quantiles are desired. A numeric vector of values between 0 and 1. |
names |
Logical. If |
... |
Ignored. |
normalise |
Logical value indicating whether |
type |
Integer specifying the type of quantile to be calculated,
as explained in |
This is a method for the generic quantile
function for the class ewcdf
of empirical weighted cumulative
distribution functions.
The quantile for a probability p
is computed
as the right-continuous inverse of the cumulative
distribution function x
(assuming type=1
, the default).
If normalise=TRUE
(the default),
the weighted cumulative function x
is first normalised to
have total mass 1
so that it can be interpreted as a
cumulative probability distribution function.
Numeric vector of quantiles, of the same length as probs
.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected] and Kevin Ummel.
z <- rnorm(50) w <- runif(50) Fun <- ewcdf(z, w) quantile(Fun, c(0.95,0.99))
z <- rnorm(50) w <- runif(50) Fun <- ewcdf(z, w) quantile(Fun, c(0.95,0.99))
Return the inverse function of a cumulative distribution function.
quantilefun(x, ...) ## S3 method for class 'ecdf' quantilefun(x, ..., type=1) ## S3 method for class 'ewcdf' quantilefun(x, ..., type=1)
quantilefun(x, ...) ## S3 method for class 'ecdf' quantilefun(x, ..., type=1) ## S3 method for class 'ewcdf' quantilefun(x, ..., type=1)
x |
Data for which the quantile function should be calculated.
Either an object containing data (such as a pixel image)
or an object representing a cumulative distribution function
(of class |
... |
Other arguments passed to methods. |
type |
Integer specifying the type of quantiles,
as explained in |
Whereas the command quantile
calculates
the quantiles of a dataset corresponding to desired probabilities
, the command
quantilefun
returns a function which can be used to compute any quantiles of the
dataset.
If f <- quantilefun(x)
then f
is a function such that
f(p)
is the quantile associated with any given probability p
.
For example f(0.5)
is the median of the original data, and
f(0.99)
is the 99th percentile of the original data.
If x
is a pixel image (object of class "im"
)
then the pixel values of x
will be extracted
and the quantile function of the pixel values is constructed.
If x
is an object representing a cumulative distribution
function (object of class "ecdf"
or "ewcdf"
) then the
quantile function of the original data is constructed.
A function in the R language.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
ewcdf
,
quantile.ewcdf
,
ecdf
,
quantile
## numeric data z <- rnorm(50) FZ <- ecdf(z) QZ <- quantilefun(FZ) QZ(0.5) # median value of z if(interactive()) plot(QZ,xlim=c(0,1),xlab="probability",ylab="quantile of z")
## numeric data z <- rnorm(50) FZ <- ecdf(z) QZ <- quantilefun(FZ) QZ(0.5) # median value of z if(interactive()) plot(QZ,xlim=c(0,1),xlab="probability",ylab="quantile of z")
Compute the Reduced Sample estimator of a survival time distribution function, from histogram data
reduced.sample(nco, cen, ncc, show=FALSE, uppercen=0)
reduced.sample(nco, cen, ncc, show=FALSE, uppercen=0)
nco |
vector of counts giving the histogram of uncensored observations (those survival times that are less than or equal to the censoring time) |
cen |
vector of counts giving the histogram of censoring times |
ncc |
vector of counts giving the histogram of censoring times for the uncensored observations only |
uppercen |
number of censoring times greater than the rightmost histogram breakpoint (if there are any) |
show |
Logical value controlling the amount of detail returned by the function value (see below) |
This function is needed mainly for internal use in spatstat, but may be useful in other applications where you want to form the reduced sample estimator from a huge dataset.
Suppose are the survival times of individuals
with unknown distribution function
which we wish to estimate. Suppose these times are right-censored
by random censoring times
.
Thus the observations consist of right-censored survival times
and non-censoring indicators
for each
.
If the number of observations is large, it is efficient to
use histograms.
Form the histogram
cen
of all censoring times .
That is,
obs[k]
counts the number of values
in the interval
(breaks[k],breaks[k+1]]
for
and
[breaks[1],breaks[2]]
for .
Also form the histogram
nco
of all uncensored times,
i.e. those such that
,
and the histogram of all censoring times for which the survival time
is uncensored,
i.e. those
such that
.
These three histograms are the arguments passed to
kaplan.meier
.
The return value rs
is the reduced-sample estimator
of the distribution function . Specifically,
rs[k]
is the reduced sample estimate of F(breaks[k+1])
.
The value is exact, i.e. the use of histograms does not introduce any
approximation error.
Note that, for the results to be valid, either the histogram breaks
must span the censoring times, or the number of censoring times
that do not fall in a histogram cell must have been counted in
uppercen
.
If show = FALSE
, a numeric vector giving the values of
the reduced sample estimator.
If show=TRUE
, a list with three components which are
vectors of equal length,
rs |
Reduced sample estimate of the survival time c.d.f. |
numerator |
numerator of the reduced sample estimator |
denominator |
denominator of the reduced sample estimator |
Adrian Baddeley [email protected]
and Rolf Turner [email protected]
Given a numeric vector, determine whether the values have been rounded to a certain number of decimal places.
rounding(x) ## Default S3 method: rounding(x)
rounding(x) ## Default S3 method: rounding(x)
x |
A numeric vector, or an object containing numeric spatial coordinates. |
The function rounding
is generic.
Its purpose is to determine whether numerical values have been rounded
to a certain number of decimal places.
The spatstat family of packages provides methods for
rounding
for various spatial objects.
For a numeric vector x
, the default method rounding.default
determines whether the values in x
have been rounded
to a certain number of decimal places.
If the entries of x
are not all integers, then
rounding(x)
returns the smallest number of digits d
after the decimal point
such that round(x, digits=d)
is identical to
x
.
For example if rounding(x) = 2
then the entries of
x
are rounded to 2 decimal places, and are multiples of 0.01.
If all the entries of x
are integers, then
rounding(x)
returns -d
, where
d
is the smallest number of digits before the decimal point
such that round(x, digits=-d)
is identical to
x
.
For example if rounding(x) = -3
then the entries of
x
are multiples of 1000.
If rounding(x) = 0
then the entries of x
are integers
but not multiples of 10.
If all entries of x
are equal to 0, a value of 0 is returned.
An integer.
Adrian Baddeley [email protected] and Rolf Turner [email protected]
round.ppp
in package spatstat.geom
.
rounding(c(0.1, 0.3, 1.2)) rounding(c(1940, 1880, 2010)) rounding(0)
rounding(c(0.1, 0.3, 1.2)) rounding(c(1940, 1880, 2010)) rounding(0)
Computes the Stieltjes integral
of a function with respect to a function
.
stieltjes(f, M, ...)
stieltjes(f, M, ...)
f |
The integrand. A function in the R language. |
M |
The cumulative function against which |
... |
Additional arguments passed to |
This command computes the Stieltjes integral
of a real-valued function
with respect to a nondecreasing function
.
One common use of the Stieltjes integral is
to find the mean value of a random variable from its
cumulative distribution function . The mean value is
the Stieltjes integral of
with respect to
.
The argument f
should be a function
in the R language.
It should accept a numeric vector argument x
and should return
a numeric vector of the same length.
The argument M
should be either a step function
(object of class "stepfun"
) or a function value table
(object of class "fv"
).
Objects of class "stepfun"
are returned by
ecdf
, ewcdf
,
and other utilities.
A list containing the value of the Stieltjes integral
computed using each of the versions of the function M
.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
x <- runif(100) w <- runif(100) H <- ewcdf(x, w) stieltjes(function(x) { x^2 }, H)
x <- runif(100) w <- runif(100) H <- ewcdf(x, w) stieltjes(function(x) { x^2 }, H)
Apply a transformation to the quantiles of a vector, or to the quantiles of the pixel values in a pixel image.
transformquantiles(X, uniform = FALSE, reverse = FALSE, ...)
transformquantiles(X, uniform = FALSE, reverse = FALSE, ...)
X |
A numeric vector, matrix, array, or a pixel image
(object of class |
uniform |
Logical value specifying whether each quantile value should be replaced by the corresponding cumulative probability (called histogram equalisation, transformation to uniformity or probability integral transformation). |
reverse |
Logical value specifying whether to swap the upper and lower quantiles. |
... |
Ignored. |
The argument X
may be a vector, matrix,
array, or a pixel image (object of class "im"
).
The algorithm will first extract the entries or pixel values of
X
as a vector, and sort the values into ascending order.
If uniform=TRUE
, the entries in this vector will be replaced by the
corresponding cumulative probabilities (the k
th
smallest value will be replaced by the
number (k-0.5)/n
where n
is the total number of values).
If reverse=TRUE
, the resulting vector will be reversed
so that it is in descending order (so that the k
th smallest
value will be swapped with the k
th largest value).
Finally the transformed values will be replaced into the original positions in the vector, matrix, array, or pixel image.
The case uniform=TRUE, reverse=FALSE
is called
transformation to uniformity, the
probability integral transformation,
histogram equalisation, or quantile transformation.
The resulting values are uniformly distributed between 0 and 1;
a histogram of the values in X
is flat.
Another object of the same type as X
.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
To apply an arbitrary function f
to the pixel values in an image,
use the idiom X[] <- f(X[])
.
X <- c(3, 5, 1, 2, 4) transformquantiles(X, reverse=TRUE) transformquantiles(X, uniform=TRUE) transformquantiles(X, uniform=TRUE, reverse=TRUE)
X <- c(3, 5, 1, 2, 4) transformquantiles(X, reverse=TRUE) transformquantiles(X, uniform=TRUE) transformquantiles(X, uniform=TRUE, reverse=TRUE)
Determine whether entries in a vector (or rows in a matrix or data frame) are duplicated, choose a unique representative for each set of duplicates, and map the duplicates to the unique representative.
uniquemap(x) ## Default S3 method: uniquemap(x) ## S3 method for class 'data.frame' uniquemap(x) ## S3 method for class 'matrix' uniquemap(x)
uniquemap(x) ## Default S3 method: uniquemap(x) ## S3 method for class 'data.frame' uniquemap(x) ## S3 method for class 'matrix' uniquemap(x)
x |
A vector, data frame or matrix, or another type of data. |
The function uniquemap
is generic, with methods
for point patterns, data frames, and a default method.
The default method expects a vector. It determines whether any entries
of the vector x
are duplicated,
and constructs a mapping of the indices of x
so that all duplicates are mapped to a unique representative index.
The result is an integer vector u
such that
u[j] = i
if
the entries x[i]
and x[j]
are identical and
point i
has been chosen as the unique representative.
The entry u[i] = i
means either that point i
is
unique, or that it has been chosen as the unique representative
of its equivalence class.
The method for data.frame
determines whether any rows of the
data frame x
are duplicated, and constructs a mapping of the
row indices so that all duplicate rows are mapped to a unique
representative row.
An integer vector.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
uniquemap.ppp
in spatstat.geom
x <- c(3, 5, 2, 4, 2, 3) uniquemap(x) df <- data.frame(A=x, B=42) uniquemap(df) z <- cbind(x, 10-x) uniquemap(z)
x <- c(3, 5, 2, 4, 2, 3) uniquemap(x) df <- data.frame(A=x, B=42) uniquemap(df) z <- cbind(x, 10-x) uniquemap(z)
An unnormalised version of kernel density estimation where the weights are not required to sum to 1. The weights may be positive, negative or zero.
unnormdensity(x, ..., weights = NULL, defaults)
unnormdensity(x, ..., weights = NULL, defaults)
x |
Numeric vector of data |
... |
Optional arguments passed to |
'
weights |
Optional numeric vector of weights for the data. The default is equivalent to assuming a weight of 1 for each observation. |
defaults |
Optional, named list of arguments passed to
|
This is an alternative to the standard R kernel density estimation function
density.default
.
The standard density.default
requires the weights
to be nonnegative numbers that add up to 1,
and returns a probability density (a function that integrates to 1).
This function unnormdensity
does not impose any requirement
on the weights
except that they be finite. Individual weights may be
positive, negative or zero. The result is a function that does not
necessarily integrate to 1 and may be negative. The result is
the convolution of the kernel with the weighted data,
where are the data points and
are the
weights.
The argument weights
should be a numeric vector of the same
length as x
, or a single numeric value. The default is to
assume a weight of 1 for each observation in x
.
The algorithm first selects the kernel bandwidth by
applying density.default
to the data
x
with normalised, positive weight vector
w = abs(weights)/sum(abs(weights))
and
extracting the selected bandwidth.
Then the result is computed by applying
applying density.default
to x
twice
using the normalised positive and negative parts of the weights.
Note that the arguments ...
must be passed by name,
i.e. in the form (name=value
). Arguments that do not match
an argument of density.default
will be ignored
silently.
Object of class "density"
as described in
density.default
.
If weights
is not specified,
the default is to assign a weight to each
observation
.
This is not the same behaviour as in density.default
which
effectively assumes a weight of for each observation
where
n=length(x)
.
Adrian Baddeley [email protected] and Rolf Turner [email protected]
d <- unnormdensity(1:3, weights=c(-1,0,1), bw=0.3) if(interactive()) plot(d)
d <- unnormdensity(1:3, weights=c(-1,0,1), bw=0.3) if(interactive()) plot(d)
Compute the median, quantiles or variance of a set of numbers which have weights associated with them.
weighted.median(x, w, na.rm = TRUE, type=2, collapse=TRUE) weighted.quantile(x, w, probs=seq(0,1,0.25), na.rm = TRUE, type=4, collapse=TRUE) weighted.var(x, w, na.rm = TRUE)
weighted.median(x, w, na.rm = TRUE, type=2, collapse=TRUE) weighted.quantile(x, w, probs=seq(0,1,0.25), na.rm = TRUE, type=4, collapse=TRUE) weighted.var(x, w, na.rm = TRUE)
x |
Data values. A vector of numeric values, for which the median or quantiles are required. |
w |
Weights.
A vector of nonnegative numbers, of the same length as |
probs |
Probabilities for which the quantiles should be computed. A numeric vector of values between 0 and 1. |
na.rm |
Logical. Whether to ignore |
type |
Integer specifying the rule for calculating the median or quantile,
corresponding to the rules available for
|
collapse |
Research use only. |
The i
th observation x[i]
is treated as having
a weight proportional to w[i]
.
The weighted median is a value m
such that the total weight of data less than or equal to m
is equal to half the total weight. More generally, the weighted quantile with
probability p
is a value q
such that the total weight of data less than or equal to q
is equal to p
times the total weight.
If there is no such value, then
if type=1
, the next largest value is returned
(this is the right-continuous inverse of the left-continuous
cumulative distribution function);
if type=2
, the average of the two surrounding values is
returned (the average of the right-continuous and left-continuous
inverses);
if type=4
, linear interpolation is performed.
Note that the default rule for weighted.median
is
type=2
, consistent with the traditional definition of the median,
while the default for weighted.quantile
is type=4
.
A numeric value or vector.
Adrian Baddeley [email protected].
x <- 1:20 w <- runif(20) weighted.median(x, w) weighted.quantile(x, w) weighted.var(x, w)
x <- 1:20 w <- runif(20) weighted.median(x, w) weighted.quantile(x, w) weighted.var(x, w)
Computes the weighted histogram of a set of observations with a given set of weights.
whist(x, breaks, weights = NULL, method=c("C", "interpreted"))
whist(x, breaks, weights = NULL, method=c("C", "interpreted"))
x |
Numeric vector of observed values. |
breaks |
Vector of breakpoints for the histogram. |
weights |
Numeric vector of weights for the observed values. |
method |
Developer use only.
A character string specifying whether to use internal C code
( |
This low-level function computes (but does not plot) the weighted
histogram of a vector of observations x
using a given
vector of weights
.
The arguments x
and weights
should be numeric vectors of
equal length. They may include NA
or infinite values.
The argument breaks
should be a numeric vector whose entries
are strictly increasing. These values define the boundaries between the
successive histogram cells.
The breaks do not have to span the range
of the observations.
There are N-1
histogram cells, where N = length(breaks)
.
An observation x[i]
falls in the j
th cell if
breaks[j] <= x[i] < breaks[j+1]
(for j < N-1
)
or
breaks[j] <= x[i] <= breaks[j+1]
(for j = N-1
).
The weighted histogram value h[j]
for the j
th cell is
the sum of weights[i]
for all observations x[i]
that
fall in the cell.
Note that, in contrast to the function hist
,
the function whist
does not require the breakpoints to span the
range of the observations x
. Values of x
that fall outside the
range of breaks
are handled separately; their total weight
is returned as an attribute of the histogram.
A numeric vector of length N-1
containing the
histogram values, where N = length(breaks)
.
The return value also has attributes "low"
and "high"
giving the total weight of all observations that are less than
the lowest breakpoint, or greater than the highest breakpoint,
respectively.
Adrian Baddeley [email protected]
and Rolf Turner [email protected]
with thanks to Peter Dalgaard.
x <- rnorm(100) b <- seq(-1,1,length=21) w <- runif(100) whist(x,b,w)
x <- rnorm(100) b <- seq(-1,1,length=21) w <- runif(100) whist(x,b,w)