Matematiska institutionen Stockholms universitet

# RESEARCH REPORTS FROM THE INSTITUTE

## Series A

212
Rolf Sundberg: Comparison of confidence procedures for type I censored exponential lifetimes (June -99) (Abstract)
211
Mårten Vågerö: A remark on small-sample properties of logistic regression in three-point designs (May 1999) (Abstract (pdf))
210
Esbjörn Ohlsson: Comparison of PRN Techniques for Small Size PPS Sample Coordination (March 1999) (Abstract) (Abstract (dvi))(Full report (pdf))
209
MårtenVågerö , Rolf Sundberg: The distribution of the maximum likelihood-estimator in up-and-down experiments for quantal dose-response data. (February 1999) (Abstract) To appear in J. Biopharmaceutical Statistics 1999
208
Olivier Guilbaud: Exact nonparametric confidence intervals for quantiles with progressive Type-II censoring (December 1998)

207
Rolf Sundberg: Aspects of statistical regression in sensometrics (Augusti -98) (Abstract) To appear in Food Quality and Preference 1999

206
Marie Linder , Rolf Sundberg: Precision of prediction in second order calibration, with special reference to calibration by BLLS or SVD (Juni -98) (Abstract)

205
Bertil Matérn: Cramérs "Mathematical methods". Några glimtar från bokens historia (Juni -98) (Abstract)

204
Håkan Andersson: Epidemic models on graphs and lattices: A short survey. (March -98) (Abstract)

203
Rolf Sundberg: Statistical aspects on fitting the Arrhenius equation. (February -98) (Abstract) Published in Chemometrics and Intell. Lab. Systems 1998

202
Grazyna Wolczynska: An explicit formula for option pricing in discrete incomplete markets. (Dec -97) (Abstract)

201
Tiina Orusild: Confidence intervals for quantiles under finite population sampling (Nov 1997) (Abstract)

200
Rolf Sundberg: Multivariate calibration - direct and indirect regression methodology (June -97) (Abstract) Published in Scand. J. Statist. 1999

199
Marie Linder , Rolf Sundberg: Regression for two-way data: Bilinear least squares and a simple alternative (June -97) (Abstract) Published in Chemometrics and Intell. Lab. Systems 1998

198
Håkan Andersson: Limit Theorems for a Random Graph Epidemic Model (June -97) (Abstract)
197
Mikael Andersson: The Final Size of Multitype Chain-Binomial Epidemic Processes (May -97)

196
Åke Svensson: Monotonicity of epidemics in closed populations (Feb. -97) (Abstract)

195
Staffan Wrigge: Contributions to the theory of large deviations for random sums (Jan. -97)
194
Esbjörn Ohlsson: Methods for PPS Size One Sample Coordination (Dec. -96) (Abstract)
193
Åke Svensson: On the distribution of the final size of spread in finite populations (Sept. -96) (Abstract)
192
Rolf Sundberg: Modelling variations in schistosome egg counts (Sept. -96)
191
Håkan Andersson: Epidemics in a Population with Social Structures (Aug. -96)
190
Håkan Andersson, Tom Britton: Heterogeneity in Epidemic Models and its Effect on the Spread of Infection (Aug. -96)
189
Anders Björkström, Rolf Sundberg: A generalized view on continuum regression. (July -96) Published in Scand. J. Statist. 1999
188
Rolf Sundberg: Conditional statistical inference and quantification of relevance.
(July -96) (
Abstract)
187
Sun Wanlong: A General way for obtaining confidence limits (June -96)
186
Louise af Klintberg: Estimation in an epidemic model with several stages of infection. (March -96)
185
Björn Johansson, Tomas Björk: Parameter estimation and reverse martingales. (Oct. -95)
184
Tom Britton: A test to detect within-family infectivity when the whole epidemic pro-
cess is observed. (Oct. -95)
183
Anders Björkström, Rolf Sundberg: Continuum regression is not always continuous. (June -95) Published in J. Roy. Statist. Soc, Series B, 1996
182
Esbjörn Ohlsson: Sequential poisson sampling. (June -95). (Abstract)
181
Tom Britton: Tests to detect clustering of infected indivduals within families. (April -95)

180
Björn Johansson: Unbiased estimation in exponential families. (Oct. -94)

## Abstracts of selected reports

212
Rolf Sundberg: Comparison of confidence procedures for type I censored exponential lifetimes (June -99)

In the model of type I censored exponential lifetimes, coverage probabilities are compared for a number of confidence interval constructions proposed in literature. The coverage probabilities are calculated exactly for samples sizes up to 50 and for different degrees of censoring and different degrees of intended confidence. If not only a fair two-sided coverage is desired, but also fair one-sided coverages, only few methods are quite satisfactory. A likelihood-based interval and a third root transformation to normality work almost perfectly, but the $\chi^2$-based method that is perfect under no censoring and under type II censoring can also be advocated.

Key words: Confidence interval, coverage probability, exponential distribution, failure times, fixed censoring.

210
Esbjörn Ohlsson: Comparison of PRN Techniques for Small Size PPS Sample Coordination. (Full report (pdf))

Consider multi-stage sampling from a stratified finite population, with a few primary sampling units selected in each stratum using probabilities proportional to size (pps). In a repeated survey, it is at times desired to redesign the sample with new size measures and new strata, while retaining as many units as possible fromthe old sample. In a former paper (Ohlsson, 1996) we considered the case with sample size $n=1$, for which we gave an overview of existing methods and proposed a new method based on the use of permanent random numbers (PRN). In the present paper we focus on the case with small stratum sample sizes ($2<n<4$). We discuss the properties of different PRN methods and present a simulationstudy of their achieved sample overlap.

Key words: overlap maximization; overlap control; sample redesign; permanent random numbers; probabilities proportional to size.

209
MårtenVågerö , Rolf Sundberg: The distribution of the maximum likelihood-estimator in up-and-down experiments for quantal dose-response data.

Standard maximum likelihood logistic or probit regression can be found to have been used in biopharmaceutical practice for inference about tolerance threshold distributions in situations where subjects (patients) have been allocated doses according to an up-and-down design. For example, a steeper dose-response curve than expected was reported in one such study. The present paper demonstrates that the maximum likelihood estimator systematically and considerably exaggerates the regression parameter with moderately large sample sizes. Thus a probable explanation for finding a steeper curve than expected is the method used to analyse the experiment, i.e. the bias in the maximum likelihood estimator. An additional consequence of this bias is that the mean/median/ will be estimated with a misleading precision. In particular, confidence intervals will be much too narrow. As a conclusion, we warn against conventional logistic or probit regression in combination with up-and-down designs.

Key words: Bias, Tolerance threshold distribution, Logistic regression, Sequential design, Staircase method.

207
Rolf Sundberg: Aspects of statistical regression in sensometrics (Augusti -98)

We discuss roles of regression analysis in sensometric studies, distinguishing description, interpretation and prediction purposes. We give a brief review of linear regression methods for prediction in situations with near-collinear explanatory variables, including for example ridge regression and partial least squares, and we discuss latent variable models. We finally discuss problems with statistical and causal inference from regression on covariates in designed experiments. Illustrations in the paper are based on a sensometric study of apple flavour under varied storage conditions (Brockhoff et al. in Food Quality and Preference 1993), with sensory response data and gas chromatography measurements as covariates.

Key words: Causality; covariates; designed experiments; multiple regression; near-collinearity; prediction; PCR; PLS; ridge regression.

205
Bertil Matérn: Cramérs "Mathematical methods". Några glimtar från bokens historia.

The article starts by listing some sources that can shed light on Harald Cramér's work with the famous textbook "Mathematical Methods of Statistics". Many written, mostly unpublished, documents exist. The following sections deal with the prehistory of the book, with various preparations, some facts about how the writing and publishing of the book proceeded, how the book was received, and its influence on statistical research and education.

204
Håkan Andersson: Epidemic models on graphs and lattices: A short survey.

This survey paper discusses a class of stochastic continuous time models for epidemic spread across a static or dynamic social network. Various simple graphs are proposed (Bernoulli random graphs; graphs with prescribed degrees; graphs with a certain amount of short loops; overlapping subgraphs representing the superposition of independent networks; dynamically changing graphs; the two-dimensional lattice), and for each of these structures expressions for important epidemiological quantities like the basic reproduction number, the final size of the epidemic and the time dynamics of the proportion of susceptible and infectious individuals are derived. The modelling assumptions are meaningful for finite populations, but the results are only valid asymptotically as the population size tends to infinity. The theoretical work is accompanied by computer simulations and numerical calculations.

203
Rolf Sundberg: Statistical aspects on fitting the Arrhenius equation.

Motivated by a recent mathematical paper we discuss statistical parameter estimation in the Arrhenius equation, that relates kinetic reaction rates to temperature. In opposition to the paper in question we argue theoretically for the appropriateness of using ordinary least squares on log-transformed data and supply some empirical support in this direction. [Published in Chemometrics and Intell. Lab. Systems, 1998, vol 41, 249-252]

202
Grazyna Wolczynska: An explicit formula for option pricing in discrete incomplete markets.

Some aspects of the pricing of European call option are discussed. We consider the simplest case of an incomplete market in the situation when the model of the market is discrete and increments of shares prices have a trinomial distribution. We look for similarities between this model and the model of Cox, Ross and Rubinstein. In particular we consider the possibility of using induction backwards and we look for an optimal price and strategy using the method of risk minimization step by step from the date of realization T to 0.

201
Tiina Orusild: Confidence intervals for quantiles under finite population sampling (Nov 1997)

A method for confidence intervals for finite population quantiles is presented. This is a large-sample confidence interval based on asymptotic considerations. The situation of interest is where the observations emanate from a probability sample drawn without replacement from a finite population. Results of simulation studies are presented where we compare the method proposed here with two other methods for large-sample confidence intervals. The advantage with our method is that it has a wider range of applicability than the other two methods.

200
Rolf Sundberg: Multivariate calibration - direct and indirect regression methodology

This paper tries first to introduce and motivate the methodology of multivariate calibration. Next a review is given, mostly avoiding technicalities, of the somewhat messy theory of the subject. Two approaches are distinguished: The estimation approach (controlled calibration) and the prediction approach (natural calibration). Among problems discussed are the choice of estimator, the choice of confidence region, methodology for handling situations with more variables than observations, near-collinearities (with counter-measures like ridge type regression, principal components regression, partial least squares regression and continuum regression), pretreatment of data, and cross-validation versus true prediction. Examples discussed in detail concern estimation of the age of a rhinoceros from its horn lengths (low-dimensional), and nitrate prediction in waste-water from high-dimensional spectroscopic measurements. [To appear (with discussion) in Scand. J. Statist. 1999, vol 26]

199
Marie Linder , Rolf Sundberg: Regression for two-way data: Bilinear least squares and a simple alternative (June -97)

We consider calibration of second-order, or hyphenated, instruments generating bilinear two-way data for each specimen. The bilinear regression model is to be estimated from a number of specimens of known compositions. We propose a simple estimator and illustrate how it works on simulated data. The estimator, which we call the SVD (singular value decomposition) estimator is usually not much less efficient than bilinear least squares. The advantages of our method over bilinear least squares are that it is easier to compute, its standard errors are explicit, and it has a simpler correlation. [To appear in Chemometrics and Intell. lab. Systems, 1998, vol 42]

198
Håkan Andersson: Limit Theorems for a Random Graph Epidemic Model

We consider a simple SIR epidemic in a large homogeneous population that is not necessarily homogeneously mixing. Rather each individual has a fixed circle of acquaintances and the epidemic spreads along this social network. In case the number of initially infective individuals stays small, a branching process approximation for the number of infectives is in force. Moreover, we provide a deterministic approximation of the bivariate process of susceptible and infective individuals, valid when the number of initially infective individuals is large. The basic reproduction number and the asymptotic final epidemic size of the process are also discussed. The model is described in the framework of random graphs.

196
Åke Svensson: Monotonicity of epidemics in closed populations.

Stochastic monotonicity properties of epidemics in closed populations are proved. Epidemic models with different parameter values are compared with respect to the final size of an epidemic and the processes counting the number of infected and the number of immune persons. The monotonicity is related to the size of the population and parameters describing sensitivity and infectiousness. The general results are illustrated for some well-known epidemic models. The proofs are based on couplings of the epidemic processes.

194
Esbjörn Ohlsson: Methods for PPS Size One Sample Coordination (Dec. -96)

Consider multi-stage sampling from a stratified finite population, with one primary sampling unit selected in each stratum, using probabilities proportional to size (pps). In a repeated survey, it is sometimes desired to redesign the sample with new size measures and new strata, while retaining as many units as possible from the old sample. In this paper we: (i) propose a new method for this problem, based on the use of permanent random numbers; (ii) give an overview of previous methods; and (iii) give a numerical study of expected overlap of the old and new sample, for different methods. One finding is that the relatively simple method by Kish and Scott (1971) is quite close to the optimal expected overlap. The new method also gives quite large expected overlap, and has the merits of being very simple to apply. As opposed to the Kish and Scott procedure, and others, it preserves the independence between strata.

193

Åke Svensson: On the distribution of the final size of spread in finite populations (Sept. -96)

Linear equations for (exact) calculation of the probability distribution for the final size of spread in finite populations are derived by taking expectations of Doléans-Dade exponentials. The results are applicable in a number of different stochastic models. The results relates both to models where the amount of infectivities are specified (such as the Reed-Frost and the General Epidemic models) and to models where the number of contacts are specified (such as the Martin-Löf and the Maki--Thompson models). A third kind of models where the epidemic is supervised with the help of a control group is also considered.

188Rolf Sundberg: Conditional statistical inference and quantification of relevance.
(July -96)

We argue that the precision of a point estimator and the confidence of an interval estimator represent random quantities to be predicted. This predictive approach has implications for conditional inference, because it immediately allows a natural quantification of the concept of relevance of conditioning. We discuss types of ancillary statistics, and Basu's pathological examples in particular. Conditioning on an ancillary that is a precision index makes inference more relevant. However, we gain in relevance even in many other conditioning situations, by trading bias for variance. We illustrate numerically by some examples which have been extensively discussed in the conditioning literature.

182
Esbjörn Ohlsson: Sequential poisson sampling. (June -95)

Poisson sampling is a well known, simple procedure for sampling from a finite population with probabilities proportional to size (pps). In conjunction with permanent random numbers (PRN), Poisson sampling offers a simple solution to the problem of updating a pps sample while retaining as many units as possible and to the problem of minimizing (or maximizing) the overlap of several surveys with different design that use the same frame. A severe drawback of Poisson sampling is that the realized sample size is random, with considerable variation. In this paper we suggest an alteration of Poisson sampling, called sequential Poisson sampling, which yields a fixed sample size n. Sequential Poisson sampling can be used for sample coordination with PRN in the same way as Poisson sampling. Sequential Poisson sampling is presently used to sample outlets for the Swedish Consumer Price Index.

In this paper we derive results on asymptotic normality of the sequential Poisson sampling estimator and the conventional Poisson sampling ratio-type estimator. In particular, these two estimators are shown to be asymptotically unbiased and asymptotically equally efficient. The accuracy of the corresponding approximate expressions for the mean and variance are investigated in a simulation study. Here these estimators are seen to be approximately unbiased and to have approximately the same variance (as suggested by the asymptotic results). The exception is in situations where Poisson sampling gives some samples with size 0, in which case sequential Poisson sampling is per force more efficient. We also investigate the performance of some associated variance estimators. Our conclusion is that since the procedures give equally efficient estimation, and are equally simple to use for sample coordination, sequential Poisson sampling should be preferred to ordinary Poisson sampling, because of the fixed sample size.