| Title: | One-Dimensional Probability Distribution Support for the 'spatstat' Family |
|---|---|
| Description: | Estimation of one-dimensional probability distributions including kernel density estimation, weighted empirical cumulative distribution functions, Kaplan-Meier and reduced-sample estimators for right-censored data, heat kernels, special distributions, kernel properties, quantiles and integration. |
| Authors: | Adrian Baddeley [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-9499-8382>), Tilman M. Davies [aut, ctb, cph] (ORCID: <https://orcid.org/0000-0003-0565-1825>), Martin L. Hazelton [aut, ctb, cph] (ORCID: <https://orcid.org/0000-0001-7831-725X>), Ege Rubak [aut, cph] (ORCID: <https://orcid.org/0000-0002-6675-533X>), Rolf Turner [aut, cph] (ORCID: <https://orcid.org/0000-0001-5521-5218>), Greg McSwiggan [ctb, cph] |
| Maintainer: | Adrian Baddeley <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 3.2-0 |
| Built: | 2026-05-18 06:58:22 UTC |
| Source: | https://github.com/spatstat/spatstat.univar |
The spatstat.univar package belongs to the spatstat family of packages. It provides utilities for estimating the probability distribution of one-dimensional (real-valued) data.
This package is a member of the spatstat family of packages. It provides utilities for estimation of the probability distribution of one-dimensional (i.e. numerical, real-valued) data. The utilities include:
including variable-bandwidth kernels, boundary correction, bandwidth selection, unnormalised weighted densities, and cumulative distribution functions of density estimates.
including weighted empirical cumulative distributions, weighted median, weighted quantiles, calculating the CDF from a density estimate
including Kaplan-Meier, reduced-sample and other estimators of the cumulative distribution function and hazard function from right-censored data
including calculation of quantiles from an empirical cumulative distribution or a kernel density estimate
including calculation of the probability density, cumulative distribution function, quantiles, random generation, moments and partial moments of the standard smoothing kernels
including calculation of the one-dimensional heat kernel in an interval, and the distribution of a weighted sum of chi-squared variables
Numerical integration including Stieltjes integrals and indefinite integrals.
The facilities are described in more detail below.
Kernel density estimation
The package supports fixed-bandwidth and variable-bandwidth kernel estimation of probability densities from numerical data. It provides boundary corrections for kernel estimates of densities on the positive half-line (applicable when the original observations are positive numbers) for both fixed-bandwidth and variable-bandwidth estimates.
If the observations have numerical weights associated with them,
these weights will not be automatically normalised, and indeed
the weights may be negative or zero. This is
unlike the standard R method density.default.
The main functions are:
unnormdensity
|
extension of density.default
allowing weights to be negative or zero.
|
densityBC
|
fixed-bandwidth kernel estimate with optional boundary correction |
densityAdaptiveKernel
|
adaptive (variable-bandwidth) kernel estimation (generic) |
densityAdaptiveKernel.default
|
adaptive (variable-bandwidth) kernel estimate (method for numeric data, with optional boundary correction) |
bw.abram.default
|
calculate data-dependent bandwidths using Abramson rule |
CDF.density |
cumulative distribution function from kernel density estimate |
Weighted distributions and weighted statistics
Weighted versions of standard operations such as the histogram and empirical distribution function are provided:
whist |
weighted histogram |
ewcdf |
weighted empirical cumulative distribution function |
mean.ewcdf |
mean of weighted ecdf |
quantile.ewcdf |
quantiles of weighted ecdf |
knots.ewcdf |
jump points of weighted ecdf |
weighted.median |
weighted median of numeric values |
weighted.quantile |
weighted quantile of numeric values |
Estimation for right-censored data
Facilities are provided for estimating the probability distribution of right-censored lifetimes (non-negative real random variables).
kaplan.meier |
Kaplan-Meier estimator of cumulative distribution function and hazard rate, from right-censored data |
reduced.sample |
reduced-sample estimator of cumulative distribution function, from right-censored data |
Quantiles
Facilities are provided for computing the quantiles of a probability distribution, given estimates of the probability density or the cumulative distribution function and so on.
CDF.density |
cumulative distribution function from kernel density estimate |
quantile.density |
quantiles of kernel density estimate |
quantile.ewcdf |
quantiles of weighted ecdf |
quantilefun |
quantiles as a function |
quantilefun.ewcdf |
quantiles as a function |
weighted.quantile |
weighted quantile of numeric values |
transformquantiles |
transform the quantiles of a dataset |
Kernels
The standard R function density.default
recognises a list of smoothing kernels by name:
"gaussian", "rectangular", "triangular",
"epanechnikov", "biweight", "cosine"
and "optcosine". For these kernels, spatstat.univar
provides various characteristics:
dkernel
|
probability density of the kernel |
pkernel
|
cumulative distribution function of the kernel |
qkernel
|
quantiles of the kernel |
rkernel
|
generate simulated realisations from the kernel |
kernel.factor
|
scale factor relating bandwidth to half-width of kernel |
kernel.moment
|
partial moment of kernel |
kernel.squint
|
integral of squared kernel |
dkernelBC
|
evaluate the kernel with boundary correction |
Special distributions
Support is provided for some special probability distributions:
hotrod
|
heat kernel in a one-dimensional interval |
dsocs
|
distribution of a weighted sum of chi-squared variables |
Integration
A few facilities are provided for calculating integrals of real functions.
indefinteg |
indefinite integral |
integral.density |
integral of a kernel density estimate |
stieltjes |
Stieltjes integral |
Utilities
A few utilities for numerical data are also provided.
uniquemap.default |
map duplicates to unique entries |
rounding.default |
determine whether values have been rounded |
firstdigit |
leading digit in decimal representation |
lastdigit |
least significant digit in decimal representation |
ndigits |
number of digits in decimal representation |
This library and its documentation are usable under the terms of the "GNU General Public License", a copy of which is distributed with the package.
Adrian Baddeley [email protected], Tilman Davies [email protected], Martin Hazelton [email protected], Ege Rubak [email protected], Rolf Turner [email protected] and Greg McSwiggan.
Computes adaptive smoothing bandwidths according to the inverse-square-root rule of Abramson (1982).
bw.abram(X, h0, ...)bw.abram(X, h0, ...)
X |
Data to be smoothed. |
h0 |
Global smoothing bandwidth. A numeric value. |
... |
Additional arguments passed to methods. |
This function computes adaptive smoothing bandwidths for a dataset, using the methods of Abramson (1982) and Hall and Marron (1988).
The function bw.abram is generic.
There is a default method bw.abram.default.
The spatstat package family includes methods for spatial objects.
See the documentation for the particular method.
Adrian Baddeley [email protected].
Abramson, I. (1982) On bandwidth variation in kernel estimates — a square root law. Annals of Statistics, 10(4), 1217-1223.
Hall, P. and Marron, J.S. (1988) Variable window width kernel density estimates of probability densities. Probability Theory and Related Fields, 80, 37-49.
Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York.
Computes adaptive smoothing bandwidths for numeric data, according to the inverse-square-root rule of Abramson (1982).
## Default S3 method: bw.abram(X, h0, ..., at = c("data", "grid"), pilot = NULL, hp = h0, trim = 5, smoother = density.default)## Default S3 method: bw.abram(X, h0, ..., at = c("data", "grid"), pilot = NULL, hp = h0, trim = 5, smoother = density.default)
X |
Data for which bandwidths should be calculated. A numeric vector. |
h0 |
A scalar value giving the global smoothing bandwidth
in the same units as |
... |
Arguments passed to |
at |
Character string (partially matched) specifying whether to
compute bandwidth values only at the data points of |
pilot |
Optional. Specification of a pilot density (possibly unnormalised).
Either a numeric vector giving the pilot density for each
data point of |
hp |
Optional. A scalar pilot bandwidth, used for estimation
of the pilot density, if |
trim |
A trimming value required to curb excessively large bandwidths. See Details. The default is sensible in most cases. |
smoother |
Smoother for the pilot.
A function or character string, specifying the function
to be used to compute the pilot estimate when
|
This function computes adaptive smoothing bandwidths using the methods of Abramson (1982) and Hall and Marron (1988).
The function bw.abram is generic. The function
bw.abram.default documented here is the default method
which is designed for numeric data.
If at="data" (the default) a smoothing bandwidth is
computed for each data point in X. Alternatively if
at="grid" a smoothing bandwidth is computed for
a grid of x values.
Under the Abramson-Hall-Marron rule, the bandwidth at location is
where is a pilot estimate of the
probability density. The variable bandwidths are rescaled by , the
geometric mean of the terms evaluated at the
data; this allows the global bandwidth h0 to be considered on
the same scale as a corresponding fixed bandwidth. The trimming value
trim has the same interpretation as the required ‘clipping’ of
the pilot density at some small nominal value (see Hall and Marron,
1988), to necessarily prevent extreme bandwidths (which
can occur at very isolated observations).
The pilot density or intensity is determined as follows:
If pilot is a function in the R language,
this is taken as the pilot density.
If pilot is a probability density estimate
(object of class "density" produced by
density.default) then this is taken as the
pilot density.
If pilot is NULL, then the pilot intensity is
computed as a fixed-bandwidth kernel
intensity estimate using density.default applied to
the data X using
the pilot bandwidth hp.
In each case the pilot density is renormalised to become a probability density, and then the Abramson rule is applied.
Instead of calculating the pilot as a fixed-bandwidth density
estimate, the user can specify another density estimation procedure
using the argument smoother. This should be either a function
or the character string name of a function. It will replace
density.default as the function used to calculate the
pilot estimate. The pilot estimate will be computed as
smoother(X, sigma=hp, ...) if pilot is NULL,
or smoother(pilot, sigma=hp, ...) if pilot is a point
pattern. If smoother does not recognise the argument name
sigma for the smoothing bandwidth, then hp is effectively
ignored.
Either a numeric vector of the same length as X
giving the Abramson bandwidth for each point
(when at = "data", the default),
or a function giving the Abramson bandwidths
as a function of location.
Tilman Davies [email protected]. Adapted by Adrian Baddeley [email protected].
Abramson, I. (1982) On bandwidth variation in kernel estimates — a square root law. Annals of Statistics, 10(4), 1217-1223.
Hall, P. and Marron, J.S. (1988) Variable window width kernel density estimates of probability densities. Probability Theory and Related Fields, 80, 37-49.
Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York.
xx <- rexp(20) bw.abram(xx)xx <- rexp(20) bw.abram(xx)
Computes variable smoothing bandwidths intended to be proportional to the observed data values, raised to a given power.
bw.pow(X, h0, POW = 0.75, trim = 5, ...)bw.pow(X, h0, POW = 0.75, trim = 5, ...)
X |
Data for which bandwidths should be calculated. A numeric vector of positive values. |
h0 |
A scalar value giving the global smoothing bandwidth
in the same units as |
POW |
Numeric value. The exponent of the power transformation to
be applied to |
trim |
A trimming value required to curb excessively large bandwidths. See Details. The default is sensible in most cases. |
... |
Ignored. |
This function computes adaptive smoothing bandwidths
for the data values in X.
Larger data values are assigned larger bandwidths.
Bandwidths are proportional to X^POW.
The bandwidth at location is
where is the geometric mean of the values
.
This allows the global bandwidth h0 to be considered on
the same scale as a corresponding fixed bandwidth.
A numeric vector of the same length as X.
Tilman Davies [email protected]. Adapted by Adrian Baddeley [email protected].
xx <- sort(rexp(10)) bb <- bw.pow(xx) signif(rbind(xx, bb), 3)xx <- sort(rexp(10)) bb <- bw.pow(xx) signif(rbind(xx, bb), 3)
Use Taylor's non-random bootstrap technique to select the bandwidth for kernel density estimation on the real line.
bw.taylor(x, ..., srange = NULL, useC = TRUE)bw.taylor(x, ..., srange = NULL, useC = TRUE)
x |
Numeric vector. |
... |
Ignored. |
srange |
Range of bandwidths to be considered. A numeric vector of length 2. |
useC |
Logical value specifying whether to use faster C code. |
This function selects a bandwidth for kernel density estimation
of a probability density on the real line,
using the numeric data x and assuming a Gaussian kernel. The result
is the numeric value of the standard deviation of the Gaussian kernel.
The function uses the method of Taylor (1989) who showed that, when using the Gaussian kernel, the optimisation criterion can be computed rapidly from the data without any randomised resampling.
The domain of the probability density is assumed to be the entire real line. Boundary correction is not currently implemented.
The result of bw.taylor is a single numeric value
giving the selected bandwidth.
A single numeric value.
Tilman Davies [email protected] and Adrian Baddeley [email protected].
Taylor, C.C. (1989) Choice of the Smoothing Parameter in Kernel Density Estimation, Biometrika 76 4, 705–712.
bw.nrd in the stats package
for standard bandwidth selectors.
x <- rnorm(30) bw.taylor(x)x <- rnorm(30) bw.taylor(x)
Given a kernel estimate of a probability density, compute the corresponding cumulative distribution function.
CDF(f, ...) ## S3 method for class 'density' CDF(f, ..., warn = TRUE)CDF(f, ...) ## S3 method for class 'density' CDF(f, ..., warn = TRUE)
f |
Density estimate (object of class |
... |
Ignored. |
warn |
Logical value indicating whether to issue a warning if the
density estimate |
CDF is generic, with a method for class "density".
This calculates the cumulative distribution function
whose probability density has been estimated and stored in the object
f. The object f must belong to the class "density",
and would typically have been obtained from a call to the function
density.
A function, which can be applied to any numeric value or vector of values.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected]
b <- density(runif(10)) f <- CDF(b) f(0.5) plot(f)b <- density(runif(10)) f <- CDF(b) f(0.5) plot(f)
Computes an adaptive estimate of probability density or intensity using a variable-bandwidth smoothing kernel.
densityAdaptiveKernel(X, ...)densityAdaptiveKernel(X, ...)
X |
Data to be smoothed. |
... |
Additional arguments passed to methods. |
This generic function computes an adaptive kernel estimate of probability density or intensity.
The function densityAdaptiveKernel is generic.
The spatstat package family includes methods for spatial objects.
See documentation for each method.
Adrian Baddeley [email protected] and Tilman Davies [email protected].
Computes an adaptive estimate of probability density from numeric data, using a variable-bandwidth smoothing kernel.
## Default S3 method: densityAdaptiveKernel(X, bw, ..., weights = NULL, zerocor=c("none", "weighted", "convolution", "reflection", "bdrykern", "JonesFoster"), at = c("grid", "data"), ngroups=Inf, fast=TRUE)## Default S3 method: densityAdaptiveKernel(X, bw, ..., weights = NULL, zerocor=c("none", "weighted", "convolution", "reflection", "bdrykern", "JonesFoster"), at = c("grid", "data"), ngroups=Inf, fast=TRUE)
X |
Data to be smoothed. A numeric vector. |
bw |
Smoothing bandwidths. Either a numeric vector of the same length as
|
... |
Additional arguments passed to |
weights |
Optional. Numeric vector of weights attached to each value in |
zerocor |
Character string (partially matched) specifying a boundary
correction. This is appropriate when |
at |
String (partially matched) specifying whether to evaluate the
probability density only at the data points ( |
ngroups |
Integer, |
fast |
Logical value specifying whether to use the Fast Fourier Transform to accelerate computations, when appropriate. |
This function computes an adaptive kernel estimate of
probability density on the real line (if zerocor="none")
or on the positive real line (if zerocor is another value).
The argument bw specifies the smoothing bandwidths to be
applied to each of the points in X. It may be a numeric vector
of bandwidth values, or a function yielding the
bandwidth values.
If the values in X are
and the corresponding bandwidths are
then the adaptive kernel estimate of intensity at a location is
where is the value at
of the (possibly edge-corrected) smoothing kernel with bandwidth
induced by a data point at .
Exact computation of the estimate above can be time-consuming:
it takes times longer than fixed-bandwidth smoothing.
The partitioning method of Davies and Baddeley (2018)
accelerates this computation by partitioning the range of
bandwidths into ngroups intervals,
correspondingly subdividing X into
ngroups subsets according to bandwidth,
and applying fixed-bandwidth smoothing to each subset.
If ngroups=NULL then we use a default rule where ngroups
is the integer part of the square root of
the number of points in X, so that the computation time is
only about times slower than fixed-bandwidth
smoothing. Any positive value of ngroups
can be specified by the user. Specifying ngroups=Inf enforces exact
computation of the estimate without partitioning. Specifying
ngroups=1 is the same as fixed-bandwidth smoothing with
bandwidth sigma=median(bw).
If at="data", a numeric vector of the same length as X.
If at="grid", a probability density object of class "density".
The function densityAdaptiveKernel.default
computes one adaptive estimate of probability density,
determined by the smoothing bandwidth values bw.
Typically the bandwidth values are computed by first computing
a pilot estimate of the intensity, then using bw.abram.default
to compute the vector of bandwidths according to Abramson's rule.
This involves specifying a global bandwidth h0.
Adrian Baddeley [email protected] and Tilman Davies [email protected].
Davies, T.M. and Baddeley, A. (2018)
Fast computation of spatially adaptive kernel estimates.
Statistics and Computing, 28(4), 937-956.
Hall, P. and Marron, J.S. (1988)
Variable window width kernel density estimates of probability
densities.
Probability Theory and Related Fields, 80, 37-49.
Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York.
xx <- rexp(100, rate=5) plot(density(xx)) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="w")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="c")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="r")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="b")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="J")) curve(5 * exp(-5 * x), add=TRUE, col=3)xx <- rexp(100, rate=5) plot(density(xx)) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="w")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="c")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="r")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="b")) curve(5 * exp(-5 * x), add=TRUE, col=3) plot(densityAdaptiveKernel(xx, at="grid", zerocor="J")) curve(5 * exp(-5 * x), add=TRUE, col=3)
Fixed-bandwidth kernel density estimation on the real line, or the positive real half-line, including optional corrections for a boundary at zero.
densityBC(x, kernel = "epanechnikov", bw=NULL, ..., h=NULL, adjust = 1, weights = rep(1, length(x))/length(x), from, to = max(x), n = 256, zerocor = c("none", "weighted", "convolution", "reflection", "bdrykern", "JonesFoster"), fast=FALSE, xout=NULL, internal=list())densityBC(x, kernel = "epanechnikov", bw=NULL, ..., h=NULL, adjust = 1, weights = rep(1, length(x))/length(x), from, to = max(x), n = 256, zerocor = c("none", "weighted", "convolution", "reflection", "bdrykern", "JonesFoster"), fast=FALSE, xout=NULL, internal=list())
x |
Numeric vector of observed values. |
kernel |
String specifying kernel.
Options are
|
bw, h
|
Alternative specifications of the scale factor for the kernel.
The bandwidth |
adjust |
Numeric value used to rescale the bandwidth |
weights |
Numeric vector of weights associated with |
from, to
|
Lower and upper limits of interval on which density should be
computed.
The default value of |
n |
Number of |
zerocor |
String (partially matched) specifying a correction for the boundary effect
bias at |
fast |
Logical value specifying whether to perform the calculation rapidly
using the Fast Fourier Transform ( |
xout |
Optional. Numeric vector of values of |
internal |
Internal use only. |
... |
Additional arguments are ignored. |
This function computes a fixed-bandwidth kernel estimate of a probability density on the real line, or the positive half-line, including optional boundary corrections for truncation of the density onto the positive half line.
Weighted estimates are supported, including negative weights. Weights are not renormalised to sum to 1. The resulting probability density estimate is not renormalised to integrate to 1.
Options for the smoothing kernel
are described in the help for density.default.
The default is the Epanechnikov (truncated quadratic) kernel.
If zerocor is missing, or given as "none",
this function computes the fixed-bandwidth kernel estimator of the
probability density on the real line,
using density.default.
The estimated probability density (unnormalised) is
where are the data values,
are the weights (defaulting
to ),
and is the kernel, a probability density on the
real line.
If zerocor is given, the probability density is assumed to be
confined to the positive half-line; the numerical values in x
must all be non-negative; and a boundary correction is
applied to compensate for bias arising due to truncation at the origin:
zerocor="weighted":The contribution from each data point
is weighted by the factor
where is the total mass of the kernel centred on
that lies in the positive half-line, and is the
cumulative distribution function of the kernel.
The corrected estimate is
This is the “cut-and-normalization” method of Gasser and Mueller (1979). Effectively the kernel is renormalized so that it integrates to 1, and the adjusted kernel conserves mass.
zerocor="convolution":The estimate of the density is
weighted by the factor where
is given above.
The corrected estimate is
This is the “convolution”, “uniform” or “zero-order” boundary correction method often attributed to Diggle (1985). This correction does not conserve mass. It is faster to compute than the weighted correction.
zerocor="reflection":if the kernel centred at data point
has a tail that lies on the negative half-line, this tail is
reflected onto the positive half-line.
The corrected estimate is
This is the “reflection” method first proposed by
Boneva et al (1971). This correction conserves mass.
The estimated density always has zero derivative at the origin,
, which may or
may not be desirable.
zerocor="bdrykern":The density estimate is computed using the Linear Boundary Kernel associated with the chosen kernel (Wand and Jones, 1995, page 47). The estimated (unnormalised) probability density is
where and
with
where
That is, when estimating the density for values of
close to zero (defined as for all kernels
except the Gaussian), the kernel contribution
is multiplied by a
term that is a linear function of ,
with coefficients depending on .
This correction does not conserve mass and may result in
negative values, but is asymptotically optimal.
Computation time for this estimate is greater than for
the options above.
zerocor="JonesFoster":The modification of the Boundary Kernel estimate proposed by Jones and Foster (1996) is computed:
where is the convolution estimator
and is the linear boundary kernel estimator.
This ensures that the estimate is always nonnegative
and retains the asymptotic optimality of the linear boundary
kernel.
Computation time for this estimate
is greater than for all the options above.
If fast=TRUE, the calculations are performed rapidly using
density.default which employs the Fast Fourier
Transform. If fast=FALSE (the default), the calculations are
performed exactly using slower C code.
An object of class "density" as described in the help file
for density.default. It contains at least the entries
x |
Vector of |
y |
Vector of density values |
Adrian Baddeley [email protected] and Martin Hazelton [email protected].
Baddeley, A., Davies, T.M. and Hazelton, M.L. (2025) An improved estimator of the pair correlation function of a spatial point process. Biometrika 111, 2, article asaf021.
Boneva, L.I., Kendall, D.G. and Stefanov, I. (1971) Spline transformations: three new diagnostic aids for the statistical data-analyst (with discussion). Journal of the Royal Statistical Society, Series B, 33, 1–70.
Diggle, P.J. (1985) A kernel method for smoothing point process data. Journal of the Royal Statistical Society, Series C (Applied Statistics), 34 138–147.
Gasser, Th. and Mueller, H.-G. (1979). Kernel estimation of regression functions. In Th. Gasser and M. Rosenblatt (editors) Smoothing Techniques for Curve Estimation, pages 23–68. Springer, Berlin.
Jones, M.C. and Foster, P.J. (1996) A simple nonnegative boundary correction method for kernel density estimation. Statistica Sinica, 6 (4) 1005–1013.
Wand, M.P. and Jones, M.C. (1995) Kernel Smoothing. Chapman and Hall.
dkernel for the kernel itself.
densityAdaptiveKernel.default for adaptive
(variable-bandwidth) estimation.
sim.dat <- rexp(500) fhatN <- densityBC(sim.dat, "biweight", h=0.4) fhatB <- densityBC(sim.dat, "biweight", h=0.4, zerocor="bdrykern") plot(fhatN, ylim=c(0,1.1), main="density estimates") lines(fhatB, col=2) curve(dexp(x), add=TRUE, from=0, col=3) legend(2, 0.8, legend=c("fixed bandwidth", "boundary kernel", "true density"), col=1:3, lty=rep(1,3))sim.dat <- rexp(500) fhatN <- densityBC(sim.dat, "biweight", h=0.4) fhatB <- densityBC(sim.dat, "biweight", h=0.4, zerocor="bdrykern") plot(fhatN, ylim=c(0,1.1), main="density estimates") lines(fhatB, col=2) curve(dexp(x), add=TRUE, from=0, col=3) legend(2, 0.8, legend=c("fixed bandwidth", "boundary kernel", "true density"), col=1:3, lty=rep(1,3))
Density, distribution function, quantile function and random generation for several distributions used in kernel estimation for numerical data.
dkernel(x, kernel = "gaussian", mean = 0, sd = 1) pkernel(q, kernel = "gaussian", mean = 0, sd = 1, lower.tail = TRUE) qkernel(p, kernel = "gaussian", mean = 0, sd = 1, lower.tail = TRUE) rkernel(n, kernel = "gaussian", mean = 0, sd = 1)dkernel(x, kernel = "gaussian", mean = 0, sd = 1) pkernel(q, kernel = "gaussian", mean = 0, sd = 1, lower.tail = TRUE) qkernel(p, kernel = "gaussian", mean = 0, sd = 1, lower.tail = TRUE) rkernel(n, kernel = "gaussian", mean = 0, sd = 1)
x, q
|
Vector of quantiles. |
p |
Vector of probabilities. |
kernel |
String name of the kernel.
Options are
|
n |
Number of observations. |
mean |
Mean of distribution. |
sd |
Standard deviation of distribution. |
lower.tail |
logical; if |
These functions give the probability density, cumulative distribution function, quantile function and random generation for several distributions used in kernel estimation for one-dimensional (numerical) data.
The available kernels are those used in density.default,
namely "gaussian", "rectangular",
"triangular",
"epanechnikov",
"biweight",
"cosine" and "optcosine".
For more information about these kernels,
see density.default.
dkernel gives the probability density,
pkernel gives the cumulative distribution function,
qkernel gives the quantile function,
and rkernel generates random deviates.
A numeric vector.
For dkernel, a vector of the same length as x
containing the corresponding values of the probability density.
For pkernel, a vector of the same length as x
containing the corresponding values of the cumulative distribution function.
For qkernel, a vector of the same length as p
containing the corresponding quantiles.
For rkernel, a vector of length n
containing randomly generated values.
Adrian Baddeley [email protected] and Martin Hazelton [email protected].
density.default,
kernel.factor,
kernel.moment,
kernel.squint.
x <- seq(-3,3,length=100) plot(x, dkernel(x, "epa"), type="l", main=c("Epanechnikov kernel", "probability density")) plot(x, pkernel(x, "opt"), type="l", main=c("OptCosine kernel", "cumulative distribution function")) p <- seq(0,1, length=256) plot(p, qkernel(p, "biw"), type="l", main=c("Biweight kernel", "cumulative distribution function")) y <- rkernel(100, "tri") hist(y, main="Random variates from triangular density") rug(y)x <- seq(-3,3,length=100) plot(x, dkernel(x, "epa"), type="l", main=c("Epanechnikov kernel", "probability density")) plot(x, pkernel(x, "opt"), type="l", main=c("OptCosine kernel", "cumulative distribution function")) p <- seq(0,1, length=256) plot(p, qkernel(p, "biw"), type="l", main=c("Biweight kernel", "cumulative distribution function")) y <- rkernel(100, "tri") hist(y, main="Random variates from triangular density") rug(y)
Computes the boundary-corrected version of a smoothing kernel density function.
dkernelBC(x, mean, sd = 1, kernel = "gaussian", zerocor = c("none", "weighted", "convolution", "reflection", "bdrykern"))dkernelBC(x, mean, sd = 1, kernel = "gaussian", zerocor = c("none", "weighted", "convolution", "reflection", "bdrykern"))
x |
Numeric. Values of the function argument, at which the function should be evaluated. |
mean |
Numeric. The mean of the uncorrected kernel. |
sd |
Numeric value. The standard deviation of the uncorrected kernel. |
kernel |
Character string giving the name of the kernel
as recognised by |
zerocor |
String (partially matched) specifying a correction for the boundary effect
bias at |
The kernel density function identified by kernel
with standard deviation sd and mean mean
will be computed, and truncated onto the positive half-line.
The boundary correction specified by zerocor will then
be applied. The result is the vector of corrected density values.
Numeric value or numeric vector.
Adrian Baddeley [email protected].
densityBC to compute a density estimate using
the boundary-corrected kernel.
dkernel to compute the un-corrected
kernel density function, and density.default to compute
an uncorrected density estimate.
match.kernel for the list of
recognised names of kernels.
curve(dkernelBC(x, mean=1, zerocor="none"), to=5) curve(dkernelBC(x, mean=1, zerocor="weighted"), to=5) curve(dkernelBC(x, mean=1, zerocor="reflection"), to=5) curve(dkernelBC(x, mean=1, zerocor="convolution"), to=5) curve(dkernelBC(x, mean=1, zerocor="bdrykern"), to=5)curve(dkernelBC(x, mean=1, zerocor="none"), to=5) curve(dkernelBC(x, mean=1, zerocor="weighted"), to=5) curve(dkernelBC(x, mean=1, zerocor="reflection"), to=5) curve(dkernelBC(x, mean=1, zerocor="convolution"), to=5) curve(dkernelBC(x, mean=1, zerocor="bdrykern"), to=5)
Probability density, cumulative probability distribution function,
quantile function, and random generation for a linear combination of
random variables.
dsocs(x, lambda, log= FALSE, method=c("Wood", "Farebrother")) psocs(q, lambda, lower.tail=TRUE, log.p=FALSE, method=c("Wood", "Farebrother")) qsocs(p, lambda, lower.tail=TRUE, log.p=FALSE, method=c("Wood", "Farebrother")) rsocs(n, lambda, approx=FALSE)dsocs(x, lambda, log= FALSE, method=c("Wood", "Farebrother")) psocs(q, lambda, lower.tail=TRUE, log.p=FALSE, method=c("Wood", "Farebrother")) qsocs(p, lambda, lower.tail=TRUE, log.p=FALSE, method=c("Wood", "Farebrother")) rsocs(n, lambda, approx=FALSE)
x, q
|
Numeric vector of quantiles |
p |
Numeric vector of probabilities |
n |
Integer. Number of random values to generate. |
lambda |
Numeric vector of coefficients of the linear combination. |
lower.tail |
Logical value. If |
log, log.p
|
Logical value. If |
approx |
Logical value specifying whether to generate random values
from the approximate distribution ( |
method |
Character string (partially matched) specifying
whether to use Wood's approximation ( |
These functions calculate the probability density (dsocs),
cumulative distribution function (psocs) and
quantile function (qsocs), and generate random realisations
(rsocs) of, a random variable such that
where are independent, identically
distributed standard normal random variables, and
are
nonnegative coefficients. Thus, is a linear combination of
independent random variables
each having 1 degree of freedom.
The argument lambda contains the coefficients
.
if method="Wood" (the default),
the approximation of Wood (1989) is used:
the distribution of is approximated by an
distribution, except in marginal cases when it is approximated
by a gamma distribution or inverse-gamma distribution.
This method is faster.
if method="Farebrother", the
numerical algorithm of Farebrother (1984) is used.
The cumulative distribution function and the probability density
are evaluated to a high level of precision
using a truncated infinite series. The quantile function is
evaluated by root-finding.
For the random generator rsocs, the default behaviour is to
generate realisations of according to its exact distribution
by constructing random variables and summing.
If approx=TRUE, then the realisations are generated from the
(Wood's method) approximate distribution instead.
A numeric vector.
For the probability density dsocs, distribution function psocs
and quantile function qsocs:
if method="Wood" (the default),
the probability density, distribution function and quantiles are
computed using the relevant functions from the stats package:
df, pf, qf
or dgamma, pgamma,
qgamma, using the parameters calculated
by Wood's approximation.
This method is faster, and the stats package functions have
been thoroughly checked to ensure logically consistent behaviour
for extreme values of the arguments.
if method="Farebrother",
the probability density and distribution function are calculated
to high precision (such that the error in the probability value
is less than 1e-6) by Farebrother's (1984) algorithm
as implemented in farebro.
Quantiles are evaluated by root-finding using
uniroot with a tolerance of 1e-6
for the discrepancy (the difference between the desired probability
and the probability calculated by Farebrother's algorithm)
so that the true error in the probability value is less than 2e-6.
If either the Farebrother algorithm or uniroot
reports an error, the computation falls back on Wood's approximation.
For the random generator rsocs,
if approx=FALSE (the default),
exact realisations of are generated
by generating standard normal random variables using
rnorm, squaring,
and summing with the weights lambda.
if approx=TRUE,
realisations of the approximate distribution
are generated using the relevant function
rf or rgamma
with the parameters calculated by Wood's approximation.
Adrian Baddeley [email protected].
Farebrother, R.W. (1984)
Algorithm AS 204:
The distribution of a positive linear combination of
random variables. Applied Statistics 33 (3) 332–339.
Wood, A.T.A (1989) An F approximation to the distribution of a linear combination of chi-squared variables. Communications in Statistics – Simulation and Computation 18:4, 1439–1456.
pf,
pgamma,
farebro, uniroot
for underlying algorithms.
online <- interactive() N <- if(online) 1000 else 50 co <- c(1, 0.5, 0.3) xx <- rsocs(N, co) plot(density(xx)) curve(dsocs(x, co), add=TRUE, col=8, lwd=5) curve(dsocs(x, co, method="F"), add=TRUE, col=2) qqplot(xx, qsocs(ppoints(N), co)) abline(0,1) plot(ppoints(N), psocs(sort(xx), co)) abline(0,1) ## accuracy of quantiles: Wood (1989) table II, case I lam <- c(0.5, rep(0.1, 5)) pp <- if(online) { c(0.01, 0.025, 0.05, 0.1, 0.25, 0.75, 0.9, 0.95, 0.975, 0.99) } else { c(0.05, 0.95) } qe <- qsocs(pp, lam, method="F") qw <- qsocs(pp, lam, method="W") ta <- rbind(exact=qe, Wood=qw) ta <- round(ta, 3) colnames(ta) <- paste0(round(100 * pp, 1), "%") taonline <- interactive() N <- if(online) 1000 else 50 co <- c(1, 0.5, 0.3) xx <- rsocs(N, co) plot(density(xx)) curve(dsocs(x, co), add=TRUE, col=8, lwd=5) curve(dsocs(x, co, method="F"), add=TRUE, col=2) qqplot(xx, qsocs(ppoints(N), co)) abline(0,1) plot(ppoints(N), psocs(sort(xx), co)) abline(0,1) ## accuracy of quantiles: Wood (1989) table II, case I lam <- c(0.5, rep(0.1, 5)) pp <- if(online) { c(0.01, 0.025, 0.05, 0.1, 0.25, 0.75, 0.9, 0.95, 0.975, 0.99) } else { c(0.05, 0.95) } qe <- qsocs(pp, lam, method="F") qw <- qsocs(pp, lam, method="W") ta <- rbind(exact=qe, Wood=qw) ta <- round(ta, 3) colnames(ta) <- paste0(round(100 * pp, 1), "%") ta
Compute a weighted version of the empirical cumulative distribution function.
ewcdf(x, weights = NULL, normalise=TRUE, adjust=1)ewcdf(x, weights = NULL, normalise=TRUE, adjust=1)
x |
Numeric vector of observations. |
weights |
Optional. Numeric vector of non-negative weights for |
normalise |
Logical value indicating whether the weights should be rescaled so that they sum to 1. |
adjust |
Numeric value. Adjustment factor.
The weights will be multiplied by |
This is a modification of the standard function ecdf
allowing the observations x to have weights.
The weighted e.c.d.f. (empirical cumulative distribution function)
Fn is defined so that, for any real number y, the value of
Fn(y) is equal to the total weight of all entries of
x that are less than or equal to y. That is
Fn(y) = sum(weights[x <= y]).
Thus Fn is a step function which jumps at the
values of x. The height of the jump at a point y
is the total weight of all entries in x
number of tied observations at that value. Missing values are
ignored.
If weights is omitted, the default is equivalent to
ecdf(x) except for the class membership.
The result of ewcdf is a function, of class "ewcdf",
inheriting from the classes "ecdf" (only if normalise=TRUE)
and "stepfun".
The class ewcdf has methods for
print,
quantile
and mean.
The inherited classes ecdf and stepfun
have methods for plot and summary.
A function, of class "ewcdf", inheriting from
"ecdf" (if normalise=TRUE) and "stepfun".
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
ecdf.
Integrals with respect to the weighted cumulative distribution function
can be computed using stieltjes.
x <- rnorm(100) w <- runif(100) plot(e <- ewcdf(x,w)) ex <- rnorm(100) w <- runif(100) plot(e <- ewcdf(x,w)) e
Calculates the cumulative probability distribution function and the probability density function of a weighted sum of noncentral chi-squared variables.
farebro(lambda, mult = 1, delta = 0, x, ..., maxit = 1e+06, eps = 1e-06, warn = TRUE)farebro(lambda, mult = 1, delta = 0, x, ..., maxit = 1e+06, eps = 1e-06, warn = TRUE)
lambda |
Numeric vector of positive weights. |
mult |
Integer value or integer vector of multiplicities (degrees of freedom). |
delta |
Numeric value or numeric vector of noncentrality parameters. |
x |
Numeric value or vector giving the function argument (quantiles for which the cumulative probability and density are desired). |
... |
Ignored. |
maxit |
Integer. Maximum number of iterations. |
eps |
Positive numeric value. Maximum error in the probability value. |
warn |
Logical value specifying whether to issue a warning if an error code is returned. |
This function implements the algorithm of
Farebrother (1984) for computing the cumulative distribution
function
and probability density function
of the random variable
where are independent
random variables and has a noncentral
distribution with degrees of
freedom and noncentrality parameter .
The arguments lambda, mult and delta
should be vectors of equal length, or will be recycled so that
they have equal length.
The algorithm uses an exact mathematical representation of
as an infinite series, and a mathematical upper bound on the error
of the truncated series, to compute an approximation to
which is guaranteed to be accurate to within the specified error
eps. The algorithm involves calculating the probability density
as well. It returns the values of both and
.
An error code ifault is returned for each entry of x.
Nonzero error codes are translated to text warnings if
warn=TRUE.
A data frame, with one row for each entry in x,
and columns x (input value),
p (cumulative probability),
d (probability density)
and ifault (error code).
R.W. Farebrother, adapted by Adrian Baddeley [email protected]. Original Algol code of Farebrother (1984) translated to C by Adrian Baddeley and modified to handle a vector of quantile values. R interface by Adrian Baddeley.
Farebrother, R.W. (1984)
Algorithm AS 204:
The distribution of a positive linear combination of
random variables. Applied Statistics 33 (3) 332–339.
## Farebrother (1984) Table 1, example Q_5 (rows 13-15) farebro(c(7,3), c(6,2), c(6,2), c(20, 100, 200))## Farebrother (1984) Table 1, example Q_5 (rows 13-15) farebro(c(7,3), c(6,2), c(6,2), c(20, 100, 200))
Find the first or last digit in the decimal representation of a number.
firstdigit(x) lastdigit(x) ndigits(x)firstdigit(x) lastdigit(x) ndigits(x)
x |
A numeric value or numeric vector. |
firstdigit(x) finds the first (most significant) digit,
lastdigit(x) finds the last (least significant) digit,
and ndigits(x) finds the number of digits,
in the decimal representation of each entry of x.
The decimal representation is truncated at the number of digits
available for double precision numbers on the hardware, usually 15.
An integer or integer vector of the same length as x.
Adrian Baddeley [email protected]
firstdigit(42) lastdigit(42) ndigits(42) firstdigit(-0.1234) lastdigit(-0.1234) ndigits(-0.1234) firstdigit(0) lastdigit(0) ndigits(0)firstdigit(42) lastdigit(42) ndigits(42) firstdigit(-0.1234) lastdigit(-0.1234) ndigits(-0.1234) firstdigit(0) lastdigit(0) ndigits(0)
Calculate values of the heat kernel on a one-dimensional rod. The ends of the rod may be assumed to be insulated, or absorbing.
hotrod(len, xsource, xquery, sigma, ends=c("insulated", "absorbing"), nmax=20)hotrod(len, xsource, xquery, sigma, ends=c("insulated", "absorbing"), nmax=20)
len |
Length of the rod. A single number or numeric vector. |
xsource |
Positions of the source points, from the left end of the rod
(in the same distance units as |
xquery |
Positions of the query points, from the left end of the rod
(in the same distance units as |
sigma |
Bandwidth for kernel. A single number or a numeric vector. |
ends |
Character string (partially matched) specifying whether the ends of the rod are assumed to be insulated or absorbing. |
nmax |
Number of terms in the infinite sum to use. A single integer or an integer vector. |
Computes the heat kernel as an infinite sum.
Number or numeric vector.
Greg McSwiggan and Adrian Baddeley [email protected].
curve(hotrod(1, 0.1, x, 0.7)) # check it's a probability density f <- function(x) hotrod(1, 0.1, x, 0.7) integrate(f, 0, 1) ## absorbing ends curve(hotrod(1, 0.1, x, 0.7, ends="a"))curve(hotrod(1, 0.1, x, 0.7)) # check it's a probability density f <- function(x) hotrod(1, 0.1, x, 0.7) integrate(f, 0, 1) ## absorbing ends curve(hotrod(1, 0.1, x, 0.7, ends="a"))
Computes the indefinite integral of the given function.
indefinteg(f, x, ..., method=c("trapezoid", "quadrature"), lower=min(x), nfine=8192)indefinteg(f, x, ..., method=c("trapezoid", "quadrature"), lower=min(x), nfine=8192)
f |
an R function taking a numeric first argument and returning a numeric vector of the same length. |
x |
Vector of values of the argument for which the indefinite integral should be evaluated. |
... |
additional arguments to be passed to |
method |
String (partially matched) specifying how to compute the integrals. |
lower |
Lower limit of integration. A single number. |
nfine |
Number of sub-intervals to use for computation
if |
The indefinite integral of the given function f
is computed numerically at each of the desired values x.
The lower limit of integration is taken to be min(x).
The result is a numeric vector y of the same length as
x, with entries
If method='trapezoid' (the default),
the integrals are computed rapidly using the trapezoid rule.
If method='quadrature' the integrals are computed
accurately but much more slowly, using the numerical quadrature routine
integrate.
If method='trapezoid'
the function f is first evaluated
on a finer grid of values of the function argument.
The fine grid contains nfine sample points.
The values of the indefinite integral on the fine grid
are computed using the trapezoidal approximation.
Finally the values of the indefinite integral are extracted at
the desired argument values x.
Numeric vector of the same length as x.
Adrian Baddeley [email protected].
curve(indefinteg(sin, x), to=pi)curve(indefinteg(sin, x), to=pi)
Computes the integral of a function or spatial object.
integral(f, domain=NULL, ...)integral(f, domain=NULL, ...)
f |
A function, or a spatial object that can be treated as a function. |
domain |
Optional. Data specifying the domain of integration. |
... |
Arguments passed to methods. |
The function integral is generic.
It calculates the integral of a function, or
the integral of a spatial object that can be treated as a function.
It has methods
for one-dimensional functions ("density", "fv")
and for spatial objects
("im", "msr", "linim", "linfun").
A single numeric or complex value, or a vector of such values.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
integral.im in package spatstat.geom.
Compute the integral of a kernel density estimate over a specified range.
## S3 method for class 'density' integral(f, domain = NULL, weight=NULL, ...)## S3 method for class 'density' integral(f, domain = NULL, weight=NULL, ...)
f |
A one-dimensional probability density estimate
(object of class |
domain |
Optional. Range of values of the argument |
weight |
Optional. A |
... |
Ignored. |
This is a method for the generic function integral.
It computes the numerical integral
of the density estimate f.
If weight is specified, then the weighted integral
is computed, where is the function specified by weight.
This function must return finite numerical values.
If domain is specified, the integral is restricted to the
interval of values given by the domain.
Integrals are calculated numerically using the trapezoidal rule restricted to the domain given.
A single numerical value.
Adrian Baddeley [email protected].
x <- runif(10) d <- density(x, bw=0.1) integral(d) # should be approximately 1 integral(d, domain=c(-Inf, 0)) # mass on negative half-line ## mean of density integral(d, weight=function(x) x)x <- runif(10) d <- density(x, bw=0.1) integral(d) # should be approximately 1 integral(d, domain=c(-Inf, 0)) # mass on negative half-line ## mean of density integral(d, weight=function(x) x)
Compute the Kaplan-Meier estimator of a survival time distribution function, from histogram data
kaplan.meier(obs, nco, breaks, upperobs=0)kaplan.meier(obs, nco, breaks, upperobs=0)
obs |
vector of |
nco |
vector of |
breaks |
Vector of |
upperobs |
Number of observations beyond the rightmost breakpoint, if any. |
This function is needed mainly for internal use in spatstat, but may be useful in other applications where you want to form the Kaplan-Meier estimator from a huge dataset.
Suppose are the survival times of individuals
with unknown distribution function
which we wish to estimate. Suppose these times are right-censored
by random censoring times .
Thus the observations consist of right-censored survival times
and non-censoring indicators
for each .
If the number of observations is large, it is efficient to
use histograms.
Form the histogram obs of all observed times .
That is, obs[k] counts the number of values
in the interval
(breaks[k],breaks[k+1]] for
and [breaks[1],breaks[2]] for .
Also form the histogram nco of all uncensored times,
i.e. those such that .
These two histograms are the arguments passed to kaplan.meier.
The vectors km and lambda returned by kaplan.meier
are (histogram approximations to) the Kaplan-Meier estimator
of and its hazard rate .
Specifically, km[k] is an estimate of
F(breaks[k+1]), and lambda[k] is an estimate of
the average of over the interval
(breaks[k],breaks[k+1]).
The histogram breaks must include .
If the histogram breaks do not span the range of the observations,
it is important to count how many survival times
exceed the rightmost breakpoint,
and give this as the value upperobs.
A list with two elements:
km |
Kaplan-Meier estimate of the survival time c.d.f. |
lambda |
corresponding Nelson-Aalen estimate of the
hazard rate |
These are numeric vectors of length .
Adrian Baddeley [email protected]
and Rolf Turner [email protected]
Returns a scale factor for the kernels used in density estimation for numerical data.
kernel.factor(kernel = "gaussian")kernel.factor(kernel = "gaussian")
kernel |
String name of the kernel.
Options are
|
Kernel estimation of a probability density in one dimension
is performed by density.default
using a kernel function selected from the list above.
This function computes a scale constant for the kernel.
For the Gaussian kernel, this constant is equal to 1.
Otherwise, the constant is such that the kernel
with standard deviation is supported on the interval
.
For more information about these kernels,
see density.default.
A single number.
Adrian Baddeley [email protected] and Martin Hazelton [email protected].
density.default,
dkernel,
kernel.moment,
kernel.squint
kernel.factor("rect") # bandwidth for Epanechnikov kernel with half-width h=1 h <- 1 bw <- h/kernel.factor("epa")kernel.factor("rect") # bandwidth for Epanechnikov kernel with half-width h=1 h <- 1 bw <- h/kernel.factor("epa")
Computes the complete or incomplete th moment of a
smoothing kernel.
kernel.moment(m, r, kernel = "gaussian", mean=0, sd=1/kernel.factor(kernel))kernel.moment(m, r, kernel = "gaussian", mean=0, sd=1/kernel.factor(kernel))
m |
Exponent (order of moment). An integer. |
r |
Upper limit of integration for the incomplete moment.
A numeric value or numeric vector.
Set |
kernel |
String name of the kernel.
Options are
|
mean, sd
|
Optional numerical values giving the mean and standard deviation of the kernel. |
Kernel estimation of a probability density in one dimension
is performed by density.default
using a kernel function selected from the list above.
For more information about these kernels,
see density.default.
The function kernel.moment computes the integral
where is the selected kernel, is the upper limit of
integration, and is the exponent or order.
Note that, if mean and sd are not specified, the
calculations assume that is the standard form of the kernel,
which has support and
standard deviation where c = kernel.factor(kernel).
The code uses the explicit analytic expressions when
m = 0, 1, 2 and numerical integration otherwise.
A single number, or a numeric vector of the same length as r.
Adrian Baddeley [email protected] and Martin Hazelton [email protected].
density.default,
dkernel,
kernel.factor,
kernel.squint
kernel.moment(1, 0.1, "epa") curve(kernel.moment(2, x, "epa"), from=-1, to=1)kernel.moment(1, 0.1, "epa") curve(kernel.moment(2, x, "epa"), from=-1, to=1)
Computes the integral of the squared kernel, for the kernels used in density estimation for numerical data.
kernel.squint(kernel = "gaussian", bw=1)kernel.squint(kernel = "gaussian", bw=1)
kernel |
String name of the kernel.
Options are
|
bw |
Bandwidth (standard deviation) of the kernel. |
Kernel estimation of a probability density in one dimension
is performed by density.default
using a kernel function selected from the list above.
This function computes the integral of the squared kernel,
where is the kernel with bandwidth bw.
A single number.
Adrian Baddeley [email protected] and Martin Hazelton [email protected].
density.default,
dkernel,
kernel.moment,
kernel.factor
kernel.squint("gaussian", 3) # integral of squared Epanechnikov kernel with half-width h=1 h <- 1 bw <- h/kernel.factor("epa") kernel.squint("epa", bw)kernel.squint("gaussian", 3) # integral of squared Epanechnikov kernel with half-width h=1 h <- 1 bw <- h/kernel.factor("epa") kernel.squint("epa", bw)
Compute the Kaplan-Meier and Reduced Sample estimators of a survival time distribution function, using histogram techniques
km.rs(o, cc, d, breaks)km.rs(o, cc, d, breaks)
o |
vector of observed survival times |
cc |
vector of censoring times |
d |
vector of non-censoring indicators |
breaks |
Vector of breakpoints to be used to form histograms. |
This function is needed mainly for internal use in spatstat, but may be useful in other applications where you want to form the Kaplan-Meier estimator from a huge dataset.
Suppose are the survival times of individuals
with unknown distribution function
which we wish to estimate. Suppose these times are right-censored
by random censoring times .
Thus the observations consist of right-censored survival times
and non-censoring indicators
for each .
The arguments to this function are
vectors o, cc, d
of observed values of ,
and respectively.
The function computes histograms and forms the reduced-sample
and Kaplan-Meier estimates of by
invoking the functions kaplan.meier
and reduced.sample.
This is efficient if the lengths of o, cc, d
(i.e. the number of observations) is large.
The vectors km and hazard returned by kaplan.meier
are (histogram approximations to) the Kaplan-Meier estimator
of and its hazard rate .
Specifically, km[k] is an estimate of
F(breaks[k+1]), and lambda[k] is an estimate of
the average of over the interval
(breaks[k],breaks[k+1]). This approximation is exact only if the
survival times are discrete and the
histogram breaks are fine enough to ensure that each interval
(breaks[k],breaks[k+1]) contains only one possible value of
the survival time.
The vector rs is the reduced-sample estimator,
rs[k] being the reduced sample estimate of F(breaks[k+1]).
This value is exact, i.e. the use of histograms does not introduce any
approximation error in the reduced-sample estimator.
A list with five elements
rs |
Reduced-sample estimate of the survival time c.d.f. |
km |
Kaplan-Meier estimate of the survival time c.d.f. |
hazard |
corresponding Nelson-Aalen estimate of the
hazard rate |
r |
values of |
breaks |
the breakpoints vector |
Adrian Baddeley [email protected]
and Rolf Turner [email protected]
Extract the knots (jump points) of an empirical cumulative distribution function.
## S3 method for class 'ewcdf' knots(Fn, ...) ## S3 method for class 'ecdf' knots(Fn, ...)## S3 method for class 'ewcdf' knots(Fn, ...) ## S3 method for class 'ecdf' knots(Fn, ...)
Fn |
An empirical cumulative distribution function
(object of class |
... |
Ignored. |
The function knots is generic.
The function knots.ecdf is the method for the class "ecdf"
of empirical cumulative distribution functions; objects of this class
are created by ecdf).
The function knots.ewcdf is the method for
the class "ewcdf" of empirical weighted cumulative
distribution functions. Objects of class "ewcdf"
are created by ewcdf.
The jump points (locations of increments) of the function Fn
will be returned as a numeric vector.
Numeric vector.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
x <- c(1, 2, 5) w <- runif(3) e <- ewcdf(x,w) knots(e)x <- c(1, 2, 5) w <- runif(3) e <- ewcdf(x,w) knots(e)
Calculates the mean of a (weighted or unweighted) empirical cumulative distribution function.
## S3 method for class 'ecdf' mean(x, trim=0, ...) ## S3 method for class 'ewcdf' mean(x, trim=0, ...)## S3 method for class 'ecdf' mean(x, trim=0, ...) ## S3 method for class 'ewcdf' mean(x, trim=0, ...)
x |
An empirical cumulative distribution function
(object of class |
trim |
The fraction (0 to 0.5) of data values to be trimmed from each end of their range, before the mean is computed. |
... |
Ignored. |
These functions are methods for the generic
mean
for the classes "ecdf" and "ewcdf".
They calculate the mean of the probability distribution
corresponding to the cumulative distribution function x.
This is equivalent to calculating the (weighted or unweighted)
mean of the original data values.
For weighted empirical cumulative distribution functions
(class "ewcdf") the weights will first be normalised so that they
sum to 1. The result of mean.ewcdf
is always an average or weighted average or the original data values.
The argument trim is interpreted as a probability
under this normalised distribution; the corresponding
quantiles are computed, and data outside these quantiles is deleted
before calculating the weighted mean.
A single number.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
Generic mean and
weighted.mean.
ecdf, ewcdf
to create the cumulative distribution functions.
stieltjes for integration with respect to
a cumulative distribution function.
x <- 1:5 mean(x) mean(ecdf(x)) w <- 1:5 mean(ewcdf(x, w))x <- 1:5 mean(x) mean(ecdf(x)) w <- 1:5 mean(ewcdf(x, w))
Given a kernel estimate of a probability density, compute quantiles.
## S3 method for class 'density' quantile(x, probs = seq(0, 1, 0.25), names = TRUE, ..., warn = TRUE)## S3 method for class 'density' quantile(x, probs = seq(0, 1, 0.25), names = TRUE, ..., warn = TRUE)
x |
Object of class |
probs |
Numeric vector of probabilities for which the quantiles are required. |
names |
Logical value indicating whether to attach names (based on
|
... |
Ignored. |
warn |
Logical value indicating whether to issue a warning if the
density estimate |
This function calculates quantiles of the probability distribution
whose probability density has been estimated and stored in the object
x. The object x must belong to the class "density",
and would typically have been obtained from a call to the function
density.
The probability density is first normalised so that the total
probability is equal to 1. A warning is issued if the density
estimate was restricted to an interval (i.e. if x
was created by a call to density which
included either of the arguments from and to).
Next, the density estimate is numerically integrated to obtain an estimate
of the cumulative distribution function . Then
for each desired probability , the algorithm finds the
corresponding quantile .
The quantile corresponding to probability
satisfies up to
the resolution of the grid of values contained in x.
The quantile is computed from the right, that is,
is the smallest available value of such that
.
A numeric vector containing the quantiles.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
quantile,
quantile.ewcdf,
CDF.
dd <- density(runif(10)) quantile(dd)dd <- density(runif(10)) quantile(dd)
Compute quantiles of a weighted empirical cumulative distribution function.
## S3 method for class 'ewcdf' quantile(x, probs = seq(0, 1, 0.25), names = TRUE, ..., normalise = TRUE, type=1)## S3 method for class 'ewcdf' quantile(x, probs = seq(0, 1, 0.25), names = TRUE, ..., normalise = TRUE, type=1)
x |
A weighted empirical cumulative distribution function
(object of class |
probs |
probabilities for which the quantiles are desired. A numeric vector of values between 0 and 1. |
names |
Logical. If |
... |
Ignored. |
normalise |
Logical value indicating whether |
type |
Integer specifying the type of quantile to be calculated,
as explained in |
This is a method for the generic quantile
function for the class ewcdf of empirical weighted cumulative
distribution functions.
The quantile for a probability p is computed
as the right-continuous inverse of the cumulative
distribution function x (assuming type=1, the default).
If normalise=TRUE (the default),
the weighted cumulative function x is first normalised to
have total mass 1 so that it can be interpreted as a
cumulative probability distribution function.
Numeric vector of quantiles, of the same length as probs.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected] and Kevin Ummel.
z <- rnorm(50) w <- runif(50) Fun <- ewcdf(z, w) quantile(Fun, c(0.95,0.99))z <- rnorm(50) w <- runif(50) Fun <- ewcdf(z, w) quantile(Fun, c(0.95,0.99))
Return the inverse function of a cumulative distribution function.
quantilefun(x, ...) ## S3 method for class 'ecdf' quantilefun(x, ..., type=1) ## S3 method for class 'ewcdf' quantilefun(x, ..., type=1)quantilefun(x, ...) ## S3 method for class 'ecdf' quantilefun(x, ..., type=1) ## S3 method for class 'ewcdf' quantilefun(x, ..., type=1)
x |
Data for which the quantile function should be calculated.
Either an object containing data (such as a pixel image)
or an object representing a cumulative distribution function
(of class |
... |
Other arguments passed to methods. |
type |
Integer specifying the type of quantiles,
as explained in |
Whereas the command quantile calculates
the quantiles of a dataset corresponding to desired probabilities
, the command quantilefun
returns a function which can be used to compute any quantiles of the
dataset.
If f <- quantilefun(x) then f is a function such that
f(p) is the quantile associated with any given probability p.
For example f(0.5) is the median of the original data, and
f(0.99) is the 99th percentile of the original data.
If x is a pixel image (object of class "im")
then the pixel values of x will be extracted
and the quantile function of the pixel values is constructed.
If x is an object representing a cumulative distribution
function (object of class "ecdf" or "ewcdf") then the
quantile function of the original data is constructed.
A function in the R language.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
ewcdf,
quantile.ewcdf,
ecdf,
quantile
## numeric data z <- rnorm(50) FZ <- ecdf(z) QZ <- quantilefun(FZ) QZ(0.5) # median value of z if(interactive()) plot(QZ,xlim=c(0,1),xlab="probability",ylab="quantile of z")## numeric data z <- rnorm(50) FZ <- ecdf(z) QZ <- quantilefun(FZ) QZ(0.5) # median value of z if(interactive()) plot(QZ,xlim=c(0,1),xlab="probability",ylab="quantile of z")
Compute the Reduced Sample estimator of a survival time distribution function, from histogram data
reduced.sample(nco, cen, ncc, show=FALSE, uppercen=0)reduced.sample(nco, cen, ncc, show=FALSE, uppercen=0)
nco |
vector of counts giving the histogram of uncensored observations (those survival times that are less than or equal to the censoring time) |
cen |
vector of counts giving the histogram of censoring times |
ncc |
vector of counts giving the histogram of censoring times for the uncensored observations only |
uppercen |
number of censoring times greater than the rightmost histogram breakpoint (if there are any) |
show |
Logical value controlling the amount of detail returned by the function value (see below) |
This function is needed mainly for internal use in spatstat, but may be useful in other applications where you want to form the reduced sample estimator from a huge dataset.
Suppose are the survival times of individuals
with unknown distribution function
which we wish to estimate. Suppose these times are right-censored
by random censoring times .
Thus the observations consist of right-censored survival times
and non-censoring indicators
for each .
If the number of observations is large, it is efficient to
use histograms.
Form the histogram cen of all censoring times .
That is, obs[k] counts the number of values
in the interval
(breaks[k],breaks[k+1]] for
and [breaks[1],breaks[2]] for .
Also form the histogram nco of all uncensored times,
i.e. those such that ,
and the histogram of all censoring times for which the survival time
is uncensored,
i.e. those such that .
These three histograms are the arguments passed to kaplan.meier.
The return value rs is the reduced-sample estimator
of the distribution function . Specifically,
rs[k] is the reduced sample estimate of F(breaks[k+1]).
The value is exact, i.e. the use of histograms does not introduce any
approximation error.
Note that, for the results to be valid, either the histogram breaks
must span the censoring times, or the number of censoring times
that do not fall in a histogram cell must have been counted in
uppercen.
If show = FALSE, a numeric vector giving the values of
the reduced sample estimator.
If show=TRUE, a list with three components which are
vectors of equal length,
rs |
Reduced sample estimate of the survival time c.d.f. |
numerator |
numerator of the reduced sample estimator |
denominator |
denominator of the reduced sample estimator |
Adrian Baddeley [email protected]
and Rolf Turner [email protected]
Given a numeric vector, determine whether the values have been rounded to a certain number of decimal places.
rounding(x) ## Default S3 method: rounding(x)rounding(x) ## Default S3 method: rounding(x)
x |
A numeric vector, or an object containing numeric spatial coordinates. |
The function rounding is generic.
Its purpose is to determine whether numerical values have been rounded
to a certain number of decimal places.
The spatstat family of packages provides methods for
rounding for various spatial objects.
For a numeric vector x, the default method rounding.default
determines whether the values in x have been rounded
to a certain number of decimal places.
If the entries of x are not all integers, then
rounding(x) returns the smallest number of digits d
after the decimal point
such that round(x, digits=d) is identical to
x.
For example if rounding(x) = 2 then the entries of
x are rounded to 2 decimal places, and are multiples of 0.01.
If all the entries of x are integers, then
rounding(x) returns -d, where
d is the smallest number of digits before the decimal point
such that round(x, digits=-d) is identical to
x.
For example if rounding(x) = -3 then the entries of
x are multiples of 1000.
If rounding(x) = 0 then the entries of x are integers
but not multiples of 10.
If all entries of x are equal to 0, a value of 0 is returned.
An integer.
Adrian Baddeley [email protected] and Rolf Turner [email protected]
round.ppp in package spatstat.geom.
rounding(c(0.1, 0.3, 1.2)) rounding(c(1940, 1880, 2010)) rounding(0)rounding(c(0.1, 0.3, 1.2)) rounding(c(1940, 1880, 2010)) rounding(0)
Computes the Stieltjes integral
of a function with respect to a function .
stieltjes(f, M, ...)stieltjes(f, M, ...)
f |
The integrand. A function in the R language. |
M |
The cumulative function against which |
... |
Additional arguments passed to |
This command computes the Stieltjes integral
of a real-valued function
with respect to a nondecreasing function .
One common use of the Stieltjes integral is
to find the mean value of a random variable from its
cumulative distribution function . The mean value is
the Stieltjes integral of with respect to .
The argument f should be a function in the R language.
It should accept a numeric vector argument x and should return
a numeric vector of the same length.
The argument M should be either a step function
(object of class "stepfun") or a function value table
(object of class "fv"
).
Objects of class "stepfun" are returned by
ecdf, ewcdf,
and other utilities.
A list containing the value of the Stieltjes integral
computed using each of the versions of the function M.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
x <- runif(100) w <- runif(100) H <- ewcdf(x, w) stieltjes(function(x) { x^2 }, H)x <- runif(100) w <- runif(100) H <- ewcdf(x, w) stieltjes(function(x) { x^2 }, H)
Apply a transformation to the quantiles of a vector, or to the quantiles of the pixel values in a pixel image.
transformquantiles(X, uniform = FALSE, reverse = FALSE, ...)transformquantiles(X, uniform = FALSE, reverse = FALSE, ...)
X |
A numeric vector, matrix, array, or a pixel image
(object of class |
uniform |
Logical value specifying whether each quantile value should be replaced by the corresponding cumulative probability (called histogram equalisation, transformation to uniformity or probability integral transformation). |
reverse |
Logical value specifying whether to swap the upper and lower quantiles. |
... |
Ignored. |
The argument X may be a vector, matrix,
array, or a pixel image (object of class "im").
The algorithm will first extract the entries or pixel values of
X as a vector, and sort the values into ascending order.
If uniform=TRUE, the entries in this vector will be replaced by the
corresponding cumulative probabilities (the kth
smallest value will be replaced by the
number (k-0.5)/n where n is the total number of values).
If reverse=TRUE, the resulting vector will be reversed
so that it is in descending order (so that the kth smallest
value will be swapped with the kth largest value).
Finally the transformed values will be replaced into the original positions in the vector, matrix, array, or pixel image.
The case uniform=TRUE, reverse=FALSE is called
transformation to uniformity, the
probability integral transformation,
histogram equalisation, or quantile transformation.
The resulting values are uniformly distributed between 0 and 1;
a histogram of the values in X is flat.
Another object of the same type as X.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
To apply an arbitrary function f to the pixel values in an image,
use the idiom X[] <- f(X[]).
X <- c(3, 5, 1, 2, 4) transformquantiles(X, reverse=TRUE) transformquantiles(X, uniform=TRUE) transformquantiles(X, uniform=TRUE, reverse=TRUE)X <- c(3, 5, 1, 2, 4) transformquantiles(X, reverse=TRUE) transformquantiles(X, uniform=TRUE) transformquantiles(X, uniform=TRUE, reverse=TRUE)
Determine whether entries in a vector (or rows in a matrix or data frame) are duplicated, choose a unique representative for each set of duplicates, and map the duplicates to the unique representative.
uniquemap(x) ## Default S3 method: uniquemap(x) ## S3 method for class 'data.frame' uniquemap(x) ## S3 method for class 'matrix' uniquemap(x)uniquemap(x) ## Default S3 method: uniquemap(x) ## S3 method for class 'data.frame' uniquemap(x) ## S3 method for class 'matrix' uniquemap(x)
x |
A vector, data frame or matrix, or another type of data. |
The function uniquemap is generic, with methods
for point patterns, data frames, and a default method.
The default method expects a vector. It determines whether any entries
of the vector x are duplicated,
and constructs a mapping of the indices of x
so that all duplicates are mapped to a unique representative index.
The result is an integer vector u such that
u[j] = i if
the entries x[i] and x[j] are identical and
point i has been chosen as the unique representative.
The entry u[i] = i means either that point i is
unique, or that it has been chosen as the unique representative
of its equivalence class.
The method for data.frame determines whether any rows of the
data frame x are duplicated, and constructs a mapping of the
row indices so that all duplicate rows are mapped to a unique
representative row.
An integer vector.
Adrian Baddeley [email protected], Rolf Turner [email protected] and Ege Rubak [email protected].
uniquemap.ppp in spatstat.geom
x <- c(3, 5, 2, 4, 2, 3) uniquemap(x) df <- data.frame(A=x, B=42) uniquemap(df) z <- cbind(x, 10-x) uniquemap(z)x <- c(3, 5, 2, 4, 2, 3) uniquemap(x) df <- data.frame(A=x, B=42) uniquemap(df) z <- cbind(x, 10-x) uniquemap(z)
An unnormalised version of kernel density estimation where the weights are not required to sum to 1. The weights may be positive, negative or zero.
unnormdensity(x, ..., weights = NULL, defaults)unnormdensity(x, ..., weights = NULL, defaults)
x |
Numeric vector of data |
... |
Optional arguments passed to |
'
weights |
Optional numeric vector of weights for the data. The default is equivalent to assuming a weight of 1 for each observation. |
defaults |
Optional, named list of arguments passed to
|
This is an alternative to the standard R kernel density estimation function
density.default.
The standard density.default
requires the weights to be nonnegative numbers that add up to 1,
and returns a probability density (a function that integrates to 1).
This function unnormdensity does not impose any requirement
on the weights except that they be finite. Individual weights may be
positive, negative or zero. The result is a function that does not
necessarily integrate to 1 and may be negative. The result is
the convolution of the kernel with the weighted data,
where are the data points and are the
weights.
The argument weights should be a numeric vector of the same
length as x, or a single numeric value. The default is to
assume a weight of 1 for each observation in x.
The algorithm first selects the kernel bandwidth by
applying density.default to the data
x with normalised, positive weight vector
w = abs(weights)/sum(abs(weights)) and
extracting the selected bandwidth.
Then the result is computed by applying
applying density.default to x twice
using the normalised positive and negative parts of the weights.
Note that the arguments ... must be passed by name,
i.e. in the form (name=value). Arguments that do not match
an argument of density.default will be ignored
silently.
Object of class "density" as described in
density.default.
The result contains an additional component $kernel giving the name
of the smoothing kernel that was used.
If weights is not specified,
the default is to assign a weight to each
observation .
This is not the same behaviour as in density.default which
effectively assumes a weight of for each observation
where n=length(x).
Adrian Baddeley [email protected] and Rolf Turner [email protected]
d <- unnormdensity(1:3, weights=c(-1,0,1), bw=0.3) if(interactive()) plot(d)d <- unnormdensity(1:3, weights=c(-1,0,1), bw=0.3) if(interactive()) plot(d)
Compute the median, quantiles or variance of a set of numbers which have weights associated with them.
weighted.median(x, w, na.rm = TRUE, type=2, collapse=FALSE) weighted.quantile(x, w, probs=seq(0,1,0.25), na.rm = TRUE, type=4, collapse=FALSE) weighted.var(x, w, na.rm = TRUE)weighted.median(x, w, na.rm = TRUE, type=2, collapse=FALSE) weighted.quantile(x, w, probs=seq(0,1,0.25), na.rm = TRUE, type=4, collapse=FALSE) weighted.var(x, w, na.rm = TRUE)
x |
Data values. A vector of numeric values, for which the median or quantiles are required. |
w |
Weights.
A vector of nonnegative numbers, of the same length as |
probs |
Probabilities for which the quantiles should be computed. A numeric vector of values between 0 and 1. |
na.rm |
Logical. Whether to ignore |
type |
Integer specifying the rule for calculating the median or quantile,
corresponding to the rules available for
|
collapse |
Logical value specifying whether
duplicated values in |
The ith observation x[i] is treated as having
a weight proportional to w[i].
The weighted sample median is a value m
such that the total weight of data less than or equal to m
is equal to half the total weight. More generally,
the weighted sample quantile with
probability p is a value q
such that the total weight of data less than or equal to q
is equal to p times the total weight.
Define the weighted empirical cumulative distribution function
That is, is the fraction of total weight that is
associated with data values
less than or equal to the value .
The weighted quantile for probability is a number
defined so that wherever possible.
There are different definitions of the quantile depending on
how this should be achieved.
For unweighted data,
there are 9 different definitions of the sample median and sample
quantile, which enjoy slightly different properties. These 9 different
definitions are explained in the help for
quantile.default. The user's choice of algorithm is
selected using the argument type.
For weighted data, the first
4
of the 9 definitions of sample quantile
have been generalised to weighted quantiles.
The functions weighted.median and weighted.quantile
documented here provide these definitions of weighted sample
quantile. The user's choice of algorithm is again
selected using the argument type.
Suppose the data values have been arranged in increasing order
as
.
If one of the data values satisfies
exactly, then
if type=1, type=3 or type=4,
the code returns this value, ;
if type=2,
the code returns
the average of this value and the next largest value,
.
If there is no data value satisfying
exactly, then
the code finds the two adjacent values
and which satisfy
and
, and defines the quantile
as follows:
if type=1 or type=2, the code returns
the larger value ;
if type=3, the code returns the value
which minimises the discrepancy, that is,
if we define
and
,
then
if ,
the code returns ;
if ,
the code returns ;
if ,
then the even-numbered value
is returned, that is, the code returns where
equals if is even, and equals
if is even.
if type=4, the code returns
a value obtained by linearly interpolating between the
two adjacent values and ,
that is, it returns the value
where
.
For very small probabilities the value
is returned.
For very large probabilities the value
is returned.
Type 1 is the left-continuous quantile function.
Type 2 is consistent with the traditional definition of the sample median.
Types 1 and 3 always return a value selected from the input data x,
while types 2 and 4 sometimes return values that are interpolated
between the input data values.
Note that the default settings are different for weighted.median
and weighted.quantile.
The implementation of type 3 is experimental and may be changed.
weighted.median returns a numeric value.
weighted.quantile returns a numeric vector
of the same length as probs.
Adrian Baddeley [email protected].
x <- 1:20 w <- runif(20) weighted.median(x, w) weighted.quantile(x, w, probs=(0:4)/4) weighted.var(x, w)x <- 1:20 w <- runif(20) weighted.median(x, w) weighted.quantile(x, w, probs=(0:4)/4) weighted.var(x, w)
Computes the weighted histogram of a set of observations with a given set of weights.
whist(x, breaks, weights = NULL, method=c("C", "interpreted"))whist(x, breaks, weights = NULL, method=c("C", "interpreted"))
x |
Numeric vector of observed values. |
breaks |
Vector of breakpoints for the histogram. |
weights |
Numeric vector of weights for the observed values. |
method |
Developer use only.
A character string specifying whether to use internal C code
( |
This low-level function computes (but does not plot) the weighted
histogram of a vector of observations x using a given
vector of weights.
The arguments x and weights should be numeric vectors of
equal length. They may include NA or infinite values.
The argument breaks should be a numeric vector whose entries
are strictly increasing. These values define the boundaries between the
successive histogram cells.
The breaks do not have to span the range
of the observations.
There are N-1 histogram cells, where N = length(breaks).
An observation x[i] falls in the jth cell if
breaks[j] <= x[i] < breaks[j+1] (for j < N-1)
or
breaks[j] <= x[i] <= breaks[j+1] (for j = N-1).
The weighted histogram value h[j] for the jth cell is
the sum of weights[i] for all observations x[i] that
fall in the cell.
Note that, in contrast to the function hist,
the function whist does not require the breakpoints to span the
range of the observations x. Values of x that fall outside the
range of breaks are handled separately; their total weight
is returned as an attribute of the histogram.
A numeric vector of length N-1 containing the
histogram values, where N = length(breaks).
The return value also has attributes "low" and "high"
giving the total weight of all observations that are less than
the lowest breakpoint, or greater than the highest breakpoint,
respectively.
Adrian Baddeley [email protected]
and Rolf Turner [email protected]
with thanks to Peter Dalgaard.
x <- rnorm(100) b <- seq(-1,1,length=21) w <- runif(100) whist(x,b,w)x <- rnorm(100) b <- seq(-1,1,length=21) w <- runif(100) whist(x,b,w)