The network ecology of settlement persistence

K. Blake Vernon, Simon Brewer, Brian Codding, and Scott Ortman

Why do some human settlements last longer than others?

Inspiration

Figure 1

Is settlement persistence an inherent good?

No, its value derives from the contribution it makes to human life and well-being.

Figure 2

The Lord Baron model

Goal is to maximize \(F\) given trade-off between \(P\)rimary and \(S\)ocial production, with \(A\) controlling the rate at which primary production declines with increased \(R\) (Renfrew and Poston 1979).

\[F(R) = P(R) + S(R)\]

Figure 3

This anticipates many of the ideas that have come to be known as settlement scaling theory (Ortman et al 2014).

Settlement dynamics

Lord Baron assumes that settlement is a dynamic system with multiple, discontinuous equilibrium states.

Figure 4

A1 is the “break point”, A2 the “sustain point” (Fujita, Krugman, Venables 1999)

Scaling effects

\(R\) can be thought of as the per capita contribution of an individual to the “public good.” And, the sigmoid shape of \(S\) suggests that everyone gets slightly more out of the village than they put in, especially early on.

Figure 5

Collective Action Problem

Lord Baron assumes per capita costs and benefits, so it can’t account for asymmetric interactions (i.e., free-riders, Tories, the 1%, etc). And, if you can’t get buy-in, the whole system unravels (like a GoFundMe).

Figure 6

This is a problem for Lord Baron as an explanation for the origins of urban agglomerations.

But, what about persistence?!

Read it from left to right, starting with the village equilibrium state.

Figure 7

“Agglomerations, once established, are usually able to survive even under conditions that would not cause them to form in the first place” (Fujita, Krugman, Venables 1999).

Expectations

  1. Agglomerated systems should persist longer than dispersed systems.
  2. Everyone should be “better off” in an agglomeration system, whether they are
    • profiting off that system or
    • trapped in it, having no viable alternative.

Unit of analysis

Discretized spatial and temporal units.

Figure 8

Population

Estimated for each grid cell using Uniform Probability Density Analysis (Ortman 2016).

Figure 9

Duration

Derived by applying threshold to population reconstruction.

Figure 10

Agglomeration

Based on population distribution within travel time \(t\) of a focal grid cell.

Figure 11

If you squint, this looks like a proxy for spatial network centrality.

Climate

Reconstructed using paleoCAR (Bocinsky and Kohler 2014).

Figure 12

Survival analysis

What explains the amount of time \(T\) that passes before a settlement is abandoned?

\[ \begin{aligned} T &\sim f(t)\\ S(t) &= Pr(T > t) = \int_{t}^{\infty} f(u)du\\ h(t) &= \frac{f(t)}{S(t)} \end{aligned} \]

with

  • \(S(t)\) being the survival function and
  • \(h(t)\) the hazard rate: the number of settlements you can expect to be abandoned at \(t\) given that they persisted up to \(t\).

Discrete-time proportional hazards

The hazard rate gives the expectation for \(T\), which is normally assumed to be continuous. This is implausible in an archaeological context, so we switch to discrete time and model the hazard rate using ordinary logistic regression.

\[ \begin{aligned} E(T) &= h(t)\\ logit(h(t)) &= \alpha + \beta X + \epsilon \end{aligned} \]

Going to use Random Forest for this because it does not require an assumption about the distribution of \(T\).

\(X\): maximum agglomeration, maximum population, Maize GDD per time step, PPT per time step, initial start date, and region.

To handle spatial autocorrelation, the model also includes the first two principal components derived by applying PCA to the full cost-distance matrix (similar to MESF).

Just for fun…

here’s the R code for this implementation of RF.

persistence_model <- randomForest(
  leave ~ time_step 
  + population + agglomeration 
  + ppt + gdd 
  + start + region
  + PC1 + PC2,
  data = as.data.frame(persistence),
  sampsize = c("0" = round(n/5), "1" = n),
  ntree = 2000,
  importance = TRUE
)

The sampsize = c("0" = round(n/5), "1" = n) argument is an overreaction to the fact that class imbalance is huge in this case.

Hazard rate

For illustration purposes.

Figure 13

Lingering issues

Probably in order of importance…

  1. Need to build population reconstruction using deep learning (Reese 2021) to generate estimates using the entire tree ring record, rather than type sites.
  2. A better way of measuring agglomeration.
  3. Need a more fine-grained climate reconstruction. Currently, can only get to approximately 1-km resolution.
  4. Would like to use standard regression for inference.
  5. Might need to include lags.

Acknowledgments



  • Matt Peeples
  • Peter Yaworsky
  • Weston McCool
  • Josh Watts
  • The {extendr} crew




And thanks to Andreas and Eleftheria for organizing!