We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. Reservoir sampling solves this by assigning each item from the stream wi... Weighted Reservoir Sampling from Distributed Streams. Authors. The reservoir based versions of Algorithms A, A-Res and A-ExpJ, have very small requirements for auxiliary storage space (m keys organized as a heap) and during the sampling process their reservoir continuously contains a weighted random sample that is valid for the already processed data. This is the answer: (* S has items to sample, R will contain the result *) ReservoirSample(S[1..n], R[1..k]) // fill the reservoir array for i = 1 to k R[i] := S[i] // replace elements with gradually decreasing probability for i = k+1 to n j := random(1, i) // important: inclusive range if j <= k R[j] := S[i]
The sequential version of weighted reservoir sampling was considered by Efraimidis and Spirakis, who presented a one-pass O (s) algorithm for weighted SWOR. We present and analyze a fully distributed algorithm for both problems based on the reservoir technique and a weighted k-means algorithm to cluster a data sample augmented with weights. Download Citation | Communication-Efficient (Weighted) Reservoir Sampling | We consider communication-efficient weighted and unweighted (uniform) random sampling from distributed streams presented as a sequence of mini-batches of items. R's default sampling without replacement using sample.int seems to require quadratic run time, e.g. when using weights drawn from a uniform distribution. Chao, M. T. "A general purpose unequal probability sampling plan." "Chao's list sequential scheme for unequal probability sampling." References. Rajesh Jayaram, Carnegie Mellon University Gokarna Sharma, Kent State University Srikanta Tirthapura, Iowa State University Follow David P. Woodruff, Carnegie Mellon University. A parallel uniform random sampling algorithm is given in. The weighted-reservoir sampling algorithm exploits the following well-known properties of exponential random variates: When $$X_i \sim \mathrm{Exponential}(w_i)$$, $$R = {\mathrm{argmin}}_i X_i$$, and $$T = \min_i X_i$$ then $$R \sim p$$ and $$T \sim \mathrm{Exponential}\left( \sum_i w_i \right)$$. 1 PROBLEM DEFINITION The problem of random sampling without replacement (RS) calls for the selection of m distinct random items out of a population of size n. If all items have the same probability to be selected, the problem is known as uniform RS. 