# random number generator
rng = np.random.default_rng()
x = rng.integers(low=1, high=100, size = 100)
y = rng.integers(low=1, high=100, size = 100)
# create dataframe
df = pl.DataFrame({'x': x, 'y': y})Poisson Disk Sampling
Some notes on Poisson disk sampling.
Put simply, it is a sampling algorithm that places the sample randomly1, such that they are not too close from each other.
1 see source here: https://en.wikipedia.org/wiki/Supersampling#Poisson_disk
Observation 1: Random points end up forming clusters
In the following we will use numpy and vega-altair to show that random uniform sampling results in clusters.
# plot chart
alt.Chart(data = df).mark_point().encode(
x='x',
y='y'
)Observation 2: Bridson’ algorithm
Robert Bridson provided an \(O(n)\) algorithm2, that is a modification of dart throwing, that creates samples that are at least a certain distnace apart from each other.
2 … in this paper
The following code is borrowed from this blog post.
# choose up to k points around each reference point
# default is 30, as per the original paper
k = 30
# minimum distance between samples
r = 10
width, height = 100, 100
# cell side length
a = r / np.sqrt(2)
# num of cells in the x- and y- directions of the grid
nx, ny = int(width / a) + 1, int(height / a) + 1
# a list of coordinates in the grid of cells
coords_list = [(ix, iy) for ix in range(nx) for iy in range(ny)]
# initialize the dictionary of cells
cells = {coords: None for coords in coords_list}Define functions:
Code
def get_cell_coords(pt):
"""Get the coordinates of the cell that pt = (x,y) falls in."""
return int(pt[0] // a), int(pt[1] // a)
def get_neighbours(coords):
"""Return the indexes of points in cells neighbouring cell at coords.
For the cell at coords = (x,y), return the indexes of points in the cells
with neighbouring coordinates illustrated below: ie those cells that could
contain points closer than r.
ooo
ooooo
ooXoo
ooooo
ooo
"""
dxdy = [(-1,-2),(0,-2),(1,-2),(-2,-1),(-1,-1),(0,-1),(1,-1),(2,-1),
(-2,0),(-1,0),(1,0),(2,0),(-2,1),(-1,1),(0,1),(1,1),(2,1),
(-1,2),(0,2),(1,2),(0,0)]
neighbours = []
for dx, dy in dxdy:
neighbour_coords = coords[0] + dx, coords[1] + dy
if not (0 <= neighbour_coords[0] < nx and
0 <= neighbour_coords[1] < ny):
# We're off the grid: no neighbours here.
continue
neighbour_cell = cells[neighbour_coords]
if neighbour_cell is not None:
# This cell is occupied: store this index of the contained point.
neighbours.append(neighbour_cell)
return neighbours
def point_valid(pt):
'''Is pt a valid point to emit as a sample?
It must be no closer than r from any other point: check the cells in its
immediate neighbourhood.
'''
cell_coords = get_cell_coords(pt)
for idx in get_neighbours(cell_coords):
nearby_pt = samples[idx]
# Squared distance between or candidate point, pt, and this nearby_pt.
distance2 = (nearby_pt[0]-pt[0])**2 + (nearby_pt[1]-pt[1])**2
if distance2 < r**2:
# The points are too close, so pt is not a candidate.
return False
# All points tested: if we're here, pt is valid
return True
def get_point(k, refpt):
'''Try to find a candidate point relative to refpt to emit in the sample.
We draw up to k points from the annulus of inner radius r and outer radius 2r
around the reference point, refpt. If none of them are suitable (because
they're too close to existing points in the sample), return False.
Otherwise, return the pt.
'''
i = 0
while i < k:
i += 1
rho = np.sqrt(np.random.uniform(r**2, 4 * r**2))
theta = np.random.uniform(0, 2*np.pi)
pt = refpt[0] + rho*np.cos(theta), refpt[1] + rho*np.sin(theta)
if not (0 <= pt[0] < width and 0 <= pt[1] < height):
# This point falls outside the domain, so try again.
continue
if point_valid(pt):
return pt
# We failed to find a suitable point in the vicinity of refpt.
return FalseActual sample:
# Pick a random point to start with.
pt = (np.random.uniform(0, width), np.random.uniform(0, height))
samples = [pt]
# Our first sample is indexed at 0 in the samples list...
cells[get_cell_coords(pt)] = 0
# ... and it is active, in the sense that we're going to look for more points
# in its neighbourhood.
active = [0]
nsamples = 1
# As long as there are points in the active list, keep trying to find samples.
while active:
# choose a random "reference" point from the active list.
idx = np.random.choice(active)
refpt = samples[idx]
# Try to pick a new point relative to the reference point.
pt = get_point(k, refpt)
if pt:
# Point pt is valid: add it to the samples list and mark it as active
samples.append(pt)
nsamples += 1
active.append(len(samples)-1)
cells[get_cell_coords(pt)] = len(samples) - 1
else:
# We had to give up looking for valid points near refpt, so remove it
# from the list of "active" points.
active.remove(idx)Turn np.array into data frame for plotting:
df = pl.DataFrame(np.round(samples, 2)).rename({'column_0':'x', 'column_1':'y'})
df| x | y |
|---|---|
| f64 | f64 |
| 61.63 | 94.91 |
| 48.0 | 94.44 |
| 75.05 | 87.33 |
| 62.33 | 84.48 |
| 48.76 | 80.67 |
| … | … |
| 31.45 | 99.34 |
| 68.56 | 19.15 |
| 80.32 | 4.83 |
| 94.16 | 3.38 |
| 69.83 | 1.43 |
Plot the samples after poisson disk sampling:
# plot chart
alt.Chart(data = df).mark_point().encode(
x='x',
y='y'
)Poisson Disk Sampling with scipy
Compare the results to the custom poisson disk sampling function implemented by scipy, with documentation here.
from scipy.stats import qmc
# create engine
engine = qmc.PoissonDisk(d = 2, radius = 10 / 100, rng = None)
sample = engine.random(64)df = pl.DataFrame(np.round(qmc.scale(sample, l_bounds = [0, 0], u_bounds = [100, 100]), 2)).rename({'column_0': 'x', 'column_1': 'y'})
df| x | y |
|---|---|
| f64 | f64 |
| 63.05 | 85.83 |
| 73.1 | 84.98 |
| 52.88 | 88.09 |
| 56.37 | 72.02 |
| 66.1 | 98.73 |
| … | … |
| 37.3 | 26.58 |
| 48.62 | 45.11 |
| 99.55 | 80.4 |
| 63.13 | 9.37 |
| 99.32 | 0.13 |
# plot chart
alt.Chart(data = df).mark_point().encode(
x='x',
y='y'
)Tangent
Let’s say instead of trying to spread the points to the entire surface, we only sample say 10 points, and let’s also make the minimum space between the points smaller.
::: {#f5d07e89 .cell 0=‘c’ 1=‘o’ 2=‘d’ 3=‘e’ 4=‘-’ 5=‘f’ 6=‘o’ 7=‘l’ 8=‘d’ 9=‘:’ 10=‘f’ 11=‘a’ 12=‘l’ 13=‘s’ 14=‘e’ execution_count=64}
engine = qmc.PoissonDisk(d = 2, radius = 5 / 100, rng = None)
sample = engine.random(20)
df = pl.DataFrame(np.round(qmc.scale(sample, l_bounds = [0, 0], u_bounds = [100, 100]), 2)).rename({'column_0': 'x', 'column_1': 'y'}):::
# plot chart
alt.Chart(data = df).mark_point().encode(
x='x',
y='y'
)Sometimes the dots appear quite tightly clustered together. We could try to make them spread across more by utilizing the optimization parameter:
engine = qmc.PoissonDisk(d = 2, radius = 5 / 100, rng = None, optimization="lloyd")
sample = engine.random(20)
df = pl.DataFrame(np.round(qmc.scale(sample, l_bounds = [0, 0], u_bounds = [100, 100]), 2)).rename({'column_0': 'x', 'column_1': 'y'})# plot chart
alt.Chart(data = df).mark_point().encode(
x='x',
y='y'
)… which comes at a cost of sometimes violating the minimum distance apart from each other.
Implementations in other languages
There are also javascript implementation of the Bridson algorithm, for example, see this github repo, which not only allows specifying the minimum distance between each points, but also the maximum distance between each points.
There is also this R implementation that comes with a nice illustration.