Table inference for combinatorial origin-destination choices in agent-based population synthesis

A key challenge in agent-based mobility simulations is the synthesis of individual agent socioeconomic profiles. Such profiles include locations of agent activities, which dictate the quality of the simulated travel patterns. These locations are typically represented in origin-destination matrices that are sampled using coarse travel surveys. This is because fine-grained trip profiles are scarce and fragmented due to privacy and cost reasons. The discrepancy between data and sampling resolutions renders agent traits non-identifiable due to the combinatorial space of data-consistent individual attributes. This problem is pertinent to any agent-based inference setting where the latent state is discrete. Existing approaches have used continuous relaxations of the underlying location assignments and subsequent ad-hoc discretisation thereof. We propose a framework to efficiently navigate this space offering improved reconstruction and coverage as well as linear-time sampling of the ground truth origin-destination table. This allows us to avoid factorially growing rejection rates and poor summary statistic consistency inherent in discrete choice modelling. We achieve this by introducing joint sampling schemes for the continuous intensity and discrete table of agent trips, as well as Markov bases that can efficiently traverse this combinatorial space subject to summary statistic constraints. Our framework's benefits are demonstrated in multiple controlled experiments and a large-scale application to agent work trip reconstruction in Cambridge, UK.


Introduction
Agent-based models (ABMs) are becoming increasingly popular policy-making tools in areas such as epidemic and transportation modelling (Bonabeau, 2002).
The emergent structure arising from ABM simulations relies on the quality of the underlying agent population's demographic and socioeconomic attributes.
In transportation ABMs, such as MATSim (Axhausen and ETH Zürich, 2016), simulated travel patterns are predominantly governed by the location where agent activities take place (e.g.working, shopping).The trips between activities are summarised in origin-destination matrices (ODMs), which are often either partially or not available a priori.Therefore, population synthesis is performed to create artificial agents whose attributes (e.g.workplace location) have the same summary statistics as those described by population averages (e.g.regional job availability).Location choice synthesis translates to reconstructing integer-valued origin-destination matrices whose margins are summary statistics.To this end, coarse/aggregate agent activity surveys by geographical region and activity type are mainly leveraged (Fournier et al., 2021).This is because fine-grained individual/disaggregate profiles are scarce and fragmented due to privacy and/or data acquisition cost reasons.Therefore, a discrepancy arises between the spatial resolutions of the data and latent states.Inferring individual agent trips subject to population summary statistics necessitates the exploration of a combinatorial choice space.The size of this space induces identifiability issues since a unique set of agent location choices consistent with the data cannot be recovered.
A downsampling approach of sampling individual choices is computationally infeasible for any real-world application.Assuming that there are M agents with L available location choices, then computing the likelihood of the aggregate data given individual model parameters requires summing over L M possible arXiv:2307.02184v2[stat.ME] 7 Jul 2023 location configurations, many of which are inconsistent with the data.Computational and identifiability issues can be alleviated by appropriately constraining the discrete latent space.The problem of exploring a constrained combinatorial agent state space is pertinent to any agent-based inference setting where the latent state is discrete.
Although discrete choice models (Train, 2009) are popular candidates for disaggregating agent location choices, they cannot encode aggregate statistic constraints.
Therefore, they either accrue errors when reconstructing ODMs or lead to factorially growing rejection rates (DeSalvo and Zhao, 2016) when forced to adhere to discrete constraints in a rejection-type scheme.A suite of greedy optimisation algorithms such as iterative proportional fitting (Bishop, Fienberg, and Holland, 2007) and combinatorial optimisation (Voas and Williamson, 2000) were employed to assimilate summary statistic constraints in continuous and discrete spaces, respectively.These methods suffer from poor convergence to local optima, yielding solutions heavily dependent on good initialisations.Moreover, operating in a continuous probability/intensity space requires an additional sampling step to discretise the ODM, such as stochastic rounding (Croci et al., 2022).This is an ad-hoc treatment of the problem and produces errors.The unidentifiable nature of disaggregating agent choices from aggregate data calls for uncertainty quantification in order to give practitioners the ability to interrogate and rank the sampled ODMs according to their probability.
Probabilistic methods have overcome some of the aforementioned limitations (Farooq et al., 2013;Sun and Axhausen, 2016), but remain approximate since they operate in the continuous intensity/probability space.In the case of location choice synthesis, ODMs are equivalent to two-way contingency tables of two categorical variables (e.g.origin residential population and destination workforce population) and the joint distribution of the two variables is explored using Gibbs sampling.Table marginal probabilities are elicited by normalising the discrete summary statistics.This approximation incurs information loss and may cause marginal class imbalances in high dimensional tables (Fournier et al., 2021), meaning a growing divergence between ground truth and sampled marginal distributions.In addition, partially available data cannot be accommodated in a principled manner and unreasonable conditional independence assumptions are imposed.

C
Constrained ODM This work (Ellam) (Gaskin) Fisher's non-central hypergeometric Doubly and cell ✓ × × Constrained Fisher's non-central hypergeometric Table 1: Comparison of our method's capabilities against previous works.Agent choices are described by a discrete table (T) or a continuous intensity (Λ).Subscripts define summary statistics: the row and column sums/margins are indexed by (•, +), (+, •), respectively.The cell universe X ⊇ X ′ contains table/intensity indices of an I × J matrix.
The work of (Carvalho, 2014) endeavoured to address these two problems by adopting a Bayesian paradigm that operates directly on the discrete table space.However, neither the most efficient proposal mechanism nor the available intensity structure were exploited.
Instead, a Metropolis-Hastings (MH) scheme for sampling contingency tables was employed that proposes jumps of size at most one in O(# origins × # destinations ), causing poor mixing in high-dimensional tables.Furthermore, the author argued for a hierarchical construction that jointly learns the constrained discrete ODM and the underlying intensity function.
In doing so, they attempted to leverage a family of log-non-linear intensity models known as spatial interaction models (SIMs) (Wilson, 1971).SIMs incorporate summary statistic constraints directly in the continuous intensity space.Despite this effort, a log-linearity assumption was imposed on the SIM to simplify parameter inference.Also, the known dynamics of competition between destination locations (Dearden and Wilson, 2015) were ignored, effectively stripping SIMs of all their embedded structure.Moreover, additional data were required to calibrate the intensity function, such as seed matrices, which are seldom available, as opposed to regularly observed data on the economic utility of travelling to destination locations.The works of (Ellam et al., 2018) and (Gaskin, Pavliotis, and Girolami, 2023) alleviated this problem by constructing a physics-driven log-non-linear SIM intensity prior.However, both approaches operated strictly in the continuous intensity space and could not explore the discrete table space where population synthesis is performed.

Contributions
Our paper focuses on reconstructing origin-destination agent trip matrices summarising residence-to-workplace location choices.We offer an upsampling Bayesian approach that jointly samples from the discrete table (T) and continuous intensity (Λ) spaces for agent location choice synthesis.Our framework seamlessly assimilates any type of aggregate summary statistic as a constraint, which in turn regularises the space of admissible disaggregate/individual agent choices.We demonstrate an improved reconstruction and coverage of a partially observed origin-destination matrix summarising agent trips from residential to workplace locations in Cambridge, UK.
while the working population at each destination (column sums) is We assume that the total origin and destination demand are both conserved: This construction defines a totally constrained SIM.The demand for destination zones depends on the destination's attractiveness denoted by w := (w 1 , . . . ,w J ) ∈ R J >0 .Let the log-attraction be x := log(w).Between two destinations of similar attractiveness, agents are assumed to prefer nearby zones.Therefore, a cost matrix C = (c i,j ) I,J i,j=1 is introduced to reflect travel impedance.These two assumptions are justified by economic arguments Pooler, 1994.The maximum entropy distribution of agent trips subject to the total number of agents being conserved is derived by maximising which yields a closed-form expression for the trip intensity: where α, β control the two competing forces of attractiveness and deterrence.A higher α relative to β characterises a preference over destinations with higher job availability, while the contrary indicates a predilection for closer workplaces.The destination attractiveness w is governed by the Harris-Wilson (Harris and Wilson, 1978) system of J coupled ordinary differential equations (ODEs): where κ > 0 is the number of agents competing for one job, δ > 0 is the smallest number of jobs a destination can have and Λ +j (t) − κw j (t) is the net job capacity in destination j.A positive net job capacity translates to a higher economic activity (more travellers than jobs) and a boost in local employment, and vice versa.In equilibrium, the J stationary points of the above ODE can be computed using The value of κ can be elicited by summing the above equation over destinations, which yields while δ corresponds to the case when no agent travels to destination j ′ (Λ +j ′ = 0), i.e.
where the potential function V (x) in the drift term is equal to and θ = (α, β) is the free parameter vector.The steady-state distribution of 10 is shown in (Ellam et al., 2018) to be the Boltzmann-Gibbs measure The observed data y are assumed to be noisy pertubartions of x, where the error between the two satisfies log(e) ∼ N (0, σ 2 d I), that is We introduce a data augmentation step to perform inference at the higher resolution origin-destination table space of agent trips as depicted in Figure 1.Assume that the I × J discrete contingency table T summarising the number of agents living in location i and working in location j is Poisson distributed: where the T ij 's are conditionally independent given the Λ ij 's.The contingency table inherits constraint 3.
These hard-coded constraints can be viewed as noise-free data on the discrete table space.We abbreviate the vector of row sums, column sums and the scalar total of T by T •+ , T +• and T ++ , respectively.Note that T uniquely determines the rest of the aforementioned random variables and T ++ = Λ ++ .Moreover, the distribution of x in 12 coupled with a prior on θ jointly induces a prior over the intensity function Λ.
Performing inference in a discrete higher-resolution table space circumvents challenges associated with enforcing summary statistic constraints in the continuous intensity space.First, the doubly constrained intensity (see Table 1) admits solutions retrieved only through an iterative procedure that converges to poor local optima without any quantification of uncertainty, since the physical model in (10) becomes redundant.Second, maximising (4) subject to individual cell constraints induces discontinuities in the Λ space prohibiting SIM parameter calibration.To avoid dealing with discontinuities, a fully observable table is required, which is seldom available and defeats the purpose of ODM reconstruction.Alternatively, more parameters can be introduced, which entails identifiability problems as the number of free parameters becomes O(I + J) instead of O(J).Moreover, augmenting C Λ to match C T strengthens the dependence between T|Λ, C and y|x, C. As a result, constraints are implicitly weighted (hard C Λ and soft C T constraints), which inflicts identifiability issues in Λ.
3 Discrete Table Inference Let the set of table indices (cells) be For any subset N I+J be a bijective function that maps every cell x ∈ X k to the (I + J)-dimensional binary vector with the i-th and (I + j)-th entries equal to one and the rest being zero.Define S k : T → N I+J to be the summary statistic operator applying a uniquely defined The ordered collection1 of summary statistic operators S 1 (T ′ ), . . ., S K (T ′ ) is abbreviated by S(T ′ ).Define a collection of discrete summary statistics C T = s 1 , . . ., s K expressed as constraints on table space, where each s k is a realisation of S k .
We leverage the same convention to define continuous constraints C Λ in the intensity space.The union of table and intensity constraints is summarised by C. We sometimes refer to C T by C to avoid notation clutter.In Table 1 the singly constrained ODM model corresponds to a given C as opposed to singly constrained tables and intensities that map to C T and C Λ , respectively.Equivalently, constrained models are defined by combinations of constrained tables and intensities.
We denote the function space of all C T -admissible contingency tables (origin-destination matrices) of dimension dim(T) = I × J by T C T = {T ∈ T : S(T) = C T } and drop the dependence on T for notational convenience.This space contains all agent location choices consistent with the aggregate summary We achieve this by devising a Metropolis-Hastings-within-Gibbs scheme to sample from P(T|x, θ, C), p(x|θ, T, y) and p(θ|x, T, y).The conditional samplers for x and θ have acceptance ratios similar to those in (Ellam et al., 2018) and equal to where Although a singly constrained intensity can be leveraged here, enforcing hard constraints through C Λ and potentially different soft constraints through P(T|x, θ, C) would cause identifiability issues in x.We aim to provide a general construction for joint table and intensity inference and employ singly constrained SIMs only when In the following exposition, we show that the type of summary statistic data available determines whether the constrained table distribution P(T|x, θ, C) can be sampled directly or indirectly through Markov Chain Monte Carlo (MCMC).

Tractable constrained table sampling
In this section, we offer closed-form contingency table sampling.Without loss of generality, assume that only one of the two table margins is known; namely Then, any subset C T of the universe of summary statistic constraints form posterior table marginal as shown in Table 1.By the construction in ( 15), the case for C T = ∅ is equivalent to unconstrained table sampling conditioned on an intensity model, which in our case is a SIM.Cell constraints violating the posterior's tractability.Furthermore, leveraging that T uniquely determines both T ++ and T +• and applying Bayes' rule it follows that the models with C T = {T ++ } and C T = {T +• } yield Multinomial and product Multinomial distributions, respectively (See full derivations in Appendix A).Equivalently, and We obtain N independent samples T (1:N ) from ( 15), (19) (20) in closed-form.Samples from the Poisson and product Multinomial distributions can be drawn in parallel.We note that the space complexity of table sampling is O(IJ) while the time complexity for ( 15), (19) (20) is O(IJ), O(1) and O(I), respectively.
Moreover, coupling either constraint T ++ or T •+ with cell constraints leaves the target distribution unchanged but shrinks its support.Hence, the available table margin is updated by subtracting the value of fixed cell constraints from margin statistics and performing inference on the free cells.We present the joint intensity and table sampling algorithm for tractably constrained tables in Algorithm 1.

Intractable constrained table sampling
In this section, we introduce an MCMC scheme for sampling tables subject to any subset of the power set P T excluding those subsets contained in the constraint universe of the previous section.By conditioning on both table margins and leveraging the conditional  (Agresti, 2002): where is called the odds ratio and encodes the strength of dependence between row i and column j.Complete independence is achieved if and only if ω ij = 1.Our choice of intensity model encodes this dependence in the travel cost matrix C. Origin-destination independence is achieved if and only if the travel cost's effect on destination choice is irrelevant (β = 0).Moreover, the normalising constant of ( 21) is a partition function defined over the support of all tables satisfying the conditioned margins and can't be efficiently computed by direct enumeration.In Appendix B we prove an extension of Chu-Vandermonde's convolution theorem for Multinomial coefficients (Belbachir, 2014) that facilitates computation of the normalising constant in O(1).In particular, we show that the following identity holds: where is the Multinomial coefficient.Shrinking the T C space using elements of the constraint universe above requires a Markov Basis (MB) MCMC sampling scheme (Diaconis and Sturmfels, 1998) due to the intractability of the induced table posterior.

Markov Basis MCMC
We construct a C T -admissible table for initialising Markov Basis MCMC using a suite of greedy deterministic algorithms, such as iterative proportional fitting (Bishop, Fienberg, and Holland, 2007).We concoct a proposal mechanism on T C as follows.
Condition (i) guarantees that all proposed moves do not modify the summary statistics in C T , while condition (ii) ensures that there exists a path between any two tables such that any table member of the path is C T -admissible.The collection of constraints C T generates a Markov basis M. When I × J tables satisfy both row and column margins, M consists of functions f 1 , . . ., f L such that ∀ x = (i 1 , j 1 ), The case for coupling individual cell constraints with table margins requires a minor modification.Let X ′ ⊆ X and C ′ T be the individual cell admissibility criteria.Then, M is updated to exclude all basis functions f l with f 1. MB-MH: Let η ∈ {−1, 1} and choose η from this set with probability 1 2 independent of l.If the chain is at T ∈ T C it will move to T ′ = T + ηf l with probability In all other cases, the chain stays at T.

MB-Gibbs: Let
and move to In both cases an aperiodic, reversible, connected Markov chain in T C is constructed with stationary distribution proportional to µ(T).The proof of Proposition 3.1 is provided in (Diaconis and Sturmfels, 1998).Theoretical guarantees of Markov Basis MCMC convergence on T C show that the MB-MH scheme in 3.1 mixes slowly and is not scalable to high-dimensional I × J tables for large T ++ .Instead, a Gibbs sampler can be constructed as detailed in the same proposition (MB-Gibbs).
In Algorithm 2 Metropolis-within-Gibbs Markov Basis MCMC sampling algorithm for intractably constrained tables.

9:
Find the valid η support yielding C T -admissible tables. 10: Use MB-Gibbs in case 2 of 3.1 to sample valid η with specified µ.
12: end for The curse of dimensionality prohibits the use of any standard convergence diagnosis techniques, such as the Gelman and Rubin criterion (Gelman and Rubin, n.d.).
Therefore, we employ the l 1 norm to empirically assess the convergence of sample summary statistics and establish convergence in probability.Furthermore, we assume the underlying intensity function is known a priori, which acts as a ground truth.In the case of Fisher's non-central hypergeometric distribution, exact moments are not available (McCullagh and Nelder, 2019).These are approximated by the moments of a product Multinomial kernel derived in Appendix A.

Experimental Results and Discussion
We showcase table sampling convergence results based on a fixed synthetic intensity across different numbers of origins I, destinations J and agents M = T ++ .
Figure 2 depicts empirical convergence rates based on a total of 10 3 chains each run for 10 3 steps.Sparse tables ( ) induce multimodal distributions in T C and mix slowly compared to their dense counterparts ( ).Convergence is decelerated more by a larger number of agents rather than higher table dimensionality.The number of agents grows as fast as the diameter of the chain's state space and bounds the number of MCMC steps required to reach the stationary distribution.This observation agrees with the theoretical bounds obtained in (Diaconis and Sturmfels, 1998), although the latter bounds are derived based on a uniform measure over T C explored using MB-MH.Despite this discrepancy, theoretical results provide an upper bound for our case of direct sampling, as evidenced by Figure 3. Direct sampling from the closed-form table posterior achieves the fastest convergence, and we use it to benchmark against Markov basis MCMC.Any doubly constrained table can be explored using either MB-MH ( ) or MB-Gibbs ( ).Encoding additional constraints in T to contract the posterior entails the overhead of using MCMC, introducing a tradeoff between convergence rate and distribution contraction in the presence of more summary statistic constraints C T .
Furthermore, we present a large-scale application of discrete ODM reconstruction to Cambridge commuting patterns from residence to workplace locations, using the ODM models in Table 1.The precise experimental setup mimics that of (Ellam et al., 2018) and is provided in the Supporting information.In light of new summary statistics C T (e.g. , , ), the table posterior contracts and its high mass region concentrates around the ground truth table ( ), as shown in Figure 4.The fact that the low-noise table samples ( , , , ) are nearby their high-noise counterparts ( , , , ) indicates a more dominant effect of the table likelihood on the posterior relative to that of the intensity SDE prior, which enforces the confidence in our reconstructed ODM.The intensity samples of (Gaskin, Pavliotis, and Girolami, 2023) ( , , ) have the highest variance amongst the sampled intensities due to the random initialisations of the Neural ODE solver in (Gaskin, Pavliotis, and Girolami, 2023).Despite this, the intensity distributions in (Ellam et al., 2018) and (Gaskin, Pavliotis, and Girolami, 2023) have insufficient C Λ constraints and a higher divergence from the ground truth table region than table samples.Our intensity samples are also distant from the ground truth table ( , , , , , , , ) because they are informed strongly by C Λ and weakly by C T (See Figure 1), where the former set is smaller than the latter.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 4: Visualisation of the table (left) and intensity (right) samples projected in 2D using T-distributed stochastic neighbour embedding (Hinton and Roweis, 2002).Samples are coloured by the constraint sets in Intensity samples are weakly informed through C Λ and P(T|x, θ, C) and more distant from the ground truth.
99% highest posterior mass (HPM) region.We elucidate each of these metrics in the Supporting information.The best error-coverage tradeoff, lowest SRMSE, MBD, and highest SSI are attained in the doubly and 20% cell constrained model due to it having the richest constraint set C. Our doubly constrained models account for an SRMSE reduction of 16% relative to the singly constrained model while sustaining an acceptable ground truth coverage equal to approximately 80%.The apparent increase in the mean intensity SRMSE across all doubly constrained models potentially alludes to the SIM's lack of expressivity.This may be because C T and y give rise to conflicting SIM parameter configurations in the limit of large C T .The MBD decrease in the growth of C indicates that the expected upper bound on the number of Markov Basis moves required to exactly match T D is reduced.In the totally and singly constraint models, our table posterior mean matches or outperforms the intensity mean of (Ellam et al., 2018) and (Gaskin, Pavliotis, and Girolami, 2023) in terms of data fit (SSI) and  (Ellam et al., 2018) and (Gaskin, Pavliotis, and Girolami, 2023) in the continuous intensity and discrete table levels across noise regimes γ and constraint sets C. Our method achieves the best error-coverage tradeoff in the doubly and 20% cell constrained ODM as well as the best reconstruction errors (MDB, SRMSE), data fit (SSI) and ground truth coverage (CP) across all ODM models, as indicated by * .As the size of C increases, the ground truth table is better reconstructed (reduced SRMSE, increased SSI) and covered by the table posterior.In the cases of total and singly constrained ODMs, we attain a similar SSI and coverage to (Gaskin, Pavliotis, and Girolami, 2023).We note that the last three ODM models cannot be handled by (Ellam et al., 2018) or (Gaskin, Pavliotis, and Girolami, 2023).
SRMSE.The highest ground truth cell coverage probability (94%) is achieved by the most relaxed table, namely the unconstrained table, but entails a high bias.A lower SRMSE (0.67 instead of 0.85) is attained by the intensity field of the totally constrained model in (Gaskin, Pavliotis, and Girolami, 2023), at the expense of a coverage probability drop from 94% to 85% and a discretisation error accrued for population synthesis.
Our framework's benefits also extend to SIM parameter estimation.In Figure 5 we show that the log destination attraction prediction R 2 increases for larger constraint sets C T from 0.77 to 0.84.This allows us to explain the evolved destination employment by informing the data-generating process through C instead of increasing the diffusivity of the SDE prior in (12).Therefore, we mitigate the identifiability issues of the multimodal θ posterior emerging in the high noise regime.The x predictions are further improved in the high noise regime (R 2 = 0.99) compared to the low noise counterpart (R 2 = 0.84), which favours the hypothesis of a stochastic growth in destination employment.In the high noise regime, unbiased estimators of θ are devised based on a more disperse SDE prior on Λ 12. Increased prior diffusivity steers the x posterior marginal towards a larger region of plausible SDE solutions in the vicinity of y, which improves the quality of x predictions.Additionally, we recover the x and θ posterior marginals obtained in (Ellam et al., 2018) at a fraction of additional computational cost.
In conclusion, performing population synthesis directly on the discrete high-resolution space of agent attributes bears tangible empirical benefits.These include improved reconstruction and coverage of the ground truth ODM, as well as table posterior contraction in the limit of constraint data C T .If population synthesis is not of interest, SIM parameters can be adequately estimated using competitive approaches such as (Gaskin, Pavliotis, and Girolami, 2023).Combining such optimisation methods with Markov Basis MCMC in a naive Bayes scheme can be promising, as it exploits the advantages of both optimisation and MCMC techniques.Regardless, the apparent shortcomings of SIMs call for a comparative study of various intensity model classes, such as discrete choice models  plausible data fit than its low noise counterpart (left) due to improved uncertainty quantification.This is attributed to the unbiased estimation of the normalising constant ( 13) obtained in the former case compared to biased estimation and subsequent collapse of the θ posterior to a Dirac mass in the latter case.Enriching C T from , to , , increases R 2 , illustrating the added benefits of inference on a higher resolution table level.(Train, 2009).Finally, the multi-faceted nature of population synthesis opens up future avenues of research beyond ODM reconstruction, where more convoluted dependency structures can be exploited.Proof.Apply Baye's rule and notice that T fully determines any member of C T .It follows that This is the kernel of a Multinomial distribution on a one-way contingency table which resembles the kernel of a Multinomial distribution on a two-way contingency table T with the desired parameters.
which is proportional to the desired distribution.
Then, T|C, Λ is a random variable distributed according to Fisher's non-central hypergeometric law with , by virtue of Remark A.1.This resembles the kernel of Fisher's non-central hypergeometric distribution on a two-way contingency table T with the desired parameters.
Corollary A.4.1.In doubly constrained tables, the Markov Basis step size proposal distribution P(η) is proportional to Fisher's non-central hypergeometric distribution for 2 × 2 contingency tables with table margins T i 1 + , T i 2 + , T +j 1 , T +j 2 and odds ratio . Proof.Without loss of generality consider an arbitrary Markov Basis move illustrated in Figure 6.All entries in the table remain constant except for entries at cells (i 1 , j 1 ), (i 1 , j 2 ), (i 2 , j 1 ), (i 2 , j 2 ).Let the updated table be T ′ .Expand the distribution in Lemma A.4 and simplify terms depending only on T, Λ: which is the kernel of Fisher's non-central hypergeometric on a 2 × 2 table with the desired parameters.

B An extension to Chu Vandermonde's theorem for Multinomial coefficients
Theorem B.1.Let T C be the space of admissible tables satisfying This is an extension of the Chu-Vandermonde theorem for Multinomial coefficients (Belbachir, 2014) to polynomials with Multinomial coefficients.
Proof.We proceed with an algebraic proof.Let [x] J j=1 = 1 + x + • • • + x J be a polynomial of order J.By writing the left-hand side in polynomial form and expanding it we get where the exchange of product and sum in the last line is permitted due to the grouping of terms in the sum by column T •j ∀ j = 1, . . ., J. The second and fourth equalities follow by direct application of the Multinomial theorem (Berge, 1971).This completes the proof.The Theorem above allows us to compute the normalising constant of Fisher's non-central hypergeometric distribution in Lemma A.4.We note that summing over the support T +• yields Subsequently, the normalised Fisher's non-central hypergeometric distribution is equal to which is similar to the kernel of a product Multinomial with T +• number of trials, ω ω +• event probabilities.We adopt this approximation for computing ground truth moments in our synthetic doubly constrained model experiments of Figure 3.

C Table posterior gradients
The Hamiltonian Monte Carlo update of x necessitates the use of the table log-likelihood gradients inside the leapfrog integrator.We list these gradients for the totally and singly constrained intensities with C Λ = {Λ ++ } and C Λ = {Λ +• }, respectively.The total derivative of the table log-likelihood is equal to Remark C.1.The totally constrained intensity Remark C.2.The singly constrained intensity and the second derivative term is calculated using Remark C.3.

D Experimental setup
In this section, we elucidate the experimental setup used for synthetic experiments and the large-scale application to commuting patterns in Cambridge, UK.

D.1 Synthetic experiments
Regarding synthetic experiments with fixed Λ, Figure 2 is produced by running 10 3 instances of line 7 from Algorithm 1 for 10 3 steps.Synthetic tables with dimensions 2 × 3, 33 × 33 and a total number of agents 10 2 and 5 × 10 3 are created.The constraint set is  Residence and workplace facilities and Cambridge's entire transportation network are extracted from (Geofabrik, 2023).Separate clusters of origin and destination facilities are created for each origin LSOA and destination MSOA with each cluster size being equal to 20 facilities to limit computational cost.
Dijkstra's shortest path algorithm (Dijkstra, 2022) is used to compute the distance between all pairwise origin-destination facility pairs.Spatial effects arising from unobserved facilities near the spatial boundary cause overestimation of travel costs from and to boundary LSOAs and MSOAs.To mitigate this effect, Ripley's k function (Dixon, 2002) is computed to estimate the spatial density of home and work facilities in a radius of 500 meters around the boundary.Then, the cost matrix is multiplied by the percentage of facilities within both the 500 meter radius and the modelled LSOAs and MSOAs of interest relative to the total number of facilities within the radius.The rationale behind this edge correction is that a larger proportion of unaccounted-for facilities in the cost matrix computation should reduce the travel cost from/to that origin/destination.
The spatial interaction model parameters are fixed to ϵ = 1, κ = 1.025, δ = 0.0128 and σ d = 3% × log(13) in the same fashion as (Ellam et al., 2018) to ensure that our results are comparable.The joint MCMC samplers 1 and 2 are run for N = 10 5 steps and their samples are utilised to generate Figures 4, 5 and Table 2.In both samplers, x-acceptance is at least 90%, ensuring numerical stability of the leapfrog integrator in HMC, and θ-acceptance ranges from 30% to 70% depending on the size of C and the corresponding concentration of measure.In the high-noise regime, importance sampling estimates for the normalising constant in (13) comprises of 100 particles and a uniform temperature schedule of 50 inverse temperatures.We ensure that at least 75% of the signs are positive in order to obtain sufficiently representative samples of the normalising constant.The first 10 4 T, x, θ samples are discarded and a thinning of 90 is applied to ensure that 10 3 independent samples from the joint posterior are obtained.We note that in the case of Algorithm 1 independent samples of T are directly accessible by virtue of closed-form sampling.In all intractable table sampling experiments, MB-Gibbs is employed to explore the table posterior marginal, due to its superior efficiency relative to MB-MH.The conditional expectations E[x|y, C] and E[θ|y, C] are computed using the unbiased estimator in (Ellam et al., 2018): , where sign(θ (n) ) is the sign of sampled θ (n) .
The MCMC samples of the non-joint scheme in (Ellam et al., 2018) are retrieved by using the same specification as above.In order to draw a comparison between our method and the Neural network approach in (Gaskin, Pavliotis, and Girolami, 2023), we initialise 10 2 random seeds and run the neural net for 10 3 epochs giving a total of 10 5 pseudo-samples.Similarly to the setup above, we apply the same burn-in and thinning to these pseudo-samples.In the free/variable noise regime, we learn the SDE noise together with x and θ.The SIM parameters are conditioned as depicted in Table 1 using the same values as above.
In Figure 4, table and intensity samples, as well as the ground truth ODM, are jointly projected to 2D using a T-distributed stochastic neighbour embedding (Hinton and Roweis, 2002) based on the l 1 metric and a perplexity equal to 30.The validation statistics in Table 2 are defined as follows.The standardised root mean square error (Oshan, 2016) is equal to , and the Sorensen similarity index is equal to whose domain is [0, 1] and values closer to 1 imply a better ODM fit.Moreover, we define the Markov Basis distance as By construction, this distance function gives an upper bound to the number of Markov basis moves of step size η = 1 required for T to reach T D while staying in T D .
Finally, the coverage probability of the ground truth table T D is calculated by first identifying the lower Lq T (1:N ) (x) and upper Uq T (1:N ) (x) boundaries of the high probability region containing q% of the total mass for each table cell x ∈ X .Then, the q% cell coverage probability is equal to where where 1 is the indicator function, and T (1:N ) is the tensor of all table samples.

E Auxiliary results for the Cambridge application
In this section, we append additional experimental results of the large-scale application to Cambridge commuting patterns.Figure 7 compares the smallest ODM reconstruction errors from each method in Table 1.The doubly and 20% cell constrained table significantly improves ground truth coverage of free cells (row 2) over the singly constrained intensity of (Ellam et al., 2018) (row 4) even though both use the same cost matrix.Therefore, augmenting C T shrinks the support of free cells and diminishes their reconstruction error.Compared to the singly constrained intensity of (Gaskin, Pavliotis, and Girolami, 2023), the table posterior mean achieves a reduced SRMSE (0.51 versus 0.61) and does not overestimate the trips to popular destinations.Such destinations include 7 and 13, which correspond to areas where the city's university premises and hospital are located and are highly attractive.The neural network's low noise predictions (row 3) are characterised by stronger attraction effects, which translates to over-estimation of trips to the aforementioned destinations.The higher noise regime has little effect on the neural network's trip prediction error as opposed to the table posterior's error, which is substantially reduced in the high noise regime.This suggests that continuous intensity is more susceptible to ODM overfitting than the discrete table.A diffuse SDE smooths the marginal distributions of free table cells without changing their support and allows for more contributions of the P T|x, θ, C term to the posterior mean.

F Posterior mean convergence consistency
We obtain approximate results on the consistency between the discrete table and continuous intensity posterior mean estimators in Figure 8.A ground truth approximation of Fisher's non-central hypergeometric's mean T is leveraged to ensure that the following identity holds empirically: A pattern of faster convergence rates in the limit of more data C T is depicted, although hardly visible.(row 2), the low noise Neural network in (Gaskin, Pavliotis, and Girolami, 2023) (row 3) and the high noise MCMC scheme in (Ellam et al., 2018) (row 4).Each table's rows resemble the number of trips by destination, and vice versa.Cells with a black boundary correspond to fixed cells while ✓ cells cover the ground truth in the 99% HPM.The high noise table predictions interpolate missing cell data in the vicinity of fixed cells and produce a more accurate trip ODM.The error profile similarities across all methods are attributed to the use of the same cost matrix, which captures the spatial covariance of trips.

Figure 1 :
Figure 1: Plate diagram of our modelling framework.Rectangular and circular nodes are deterministic and random variables, respectively.Shaded nodes correspond to conditioned quantities.
In other words, the constrained cell values are deducted from the rest of the summary statistic constraints in C T .A Markov Basis Markov chain (MBMC) can now be constructed.Proposition 3.1.(Adapted from (Diaconis and Sturmfels, 1998)): Let µ be a probability measure on T C .Given a Markov basis M that satisfies 3.3, generate a Markov chain in T C by sampling l uniformly at random from {1, . . ., L}.Consider the Markov Basis Metropolis-Hastings (MB-MH) and Gibbs (MB-Gibbs) proposals: Figure 2: l 1 error norm of E[T|y, T •+ ] across table sizes dim(T)

Figure 5 :
Figure 5: Posterior predictions of E[x|y, C] against observed log employment data y using Algorithms 1 and 2. The high noise regime (right) achieves a more

Figure 6 :
Figure 6: Markov basis move on the space of I × J contingency tables with C T = {T •+ , T +• }.
Figure 3 is run on the same synthetic datasets with C T = {T •+ } and C T = {T •+ , T +• } using lines 7 and 7-10 of Algorithms 1 and 2, respectively.Each one of the 10 3 MCMC instances is run for

Figure 8 :
Figure 8: Convergence of the l 1 norm of E[T|y, C] − E[Λ|y, C] for different constraint sets C in the low ( , , ) and high ( , , ) noise regimes.

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT A stochastic perturbation of 6 incorporates uncertainty in the competition dynamics emerging from the randomness of agents' choice mechanisms.This gives rise to the Harris-Wilson stochastic differential equation (SDE) for the time evolution of log destination attraction x Definition 3.1.Consider an ordered 1 collection of constraints C T and table summary statistics operators S with associated functions S. A table T ′ is C T -admissible if and only if its summary statistics satisfy all the constraints in C T , i.e. S k (T

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT statistics C T .The set T C k contains all tables that when applied S k over cells X k satisfy the k-th constraint of C T .In the rest of the paper we set C Λ = Λ ++ unless otherwrise stated.Our goal is to sample from P (T, x, θ|C, y) 2 , where

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT distributions of T •+ |T ++ , Λ and T ++ |Λ, the induced conditional distribution becomes Fisher's non-central multivariate hypergeometric

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT doubly constrained tables, η is distributed according to Fisher's non-central hypergeometric distribution for 2 × 2 tables.The derivation of this result is provided in in Corollary A.4.1 of Appendix A. The overhead of generating M for any constrained table is at most O(I 2 J 2 ) in both time and space.This overhead can be easily overcome by amortising the construction of M prior to sampling.The sampling procedure for a constrained model with an intractable table marginal distribution and underlying SIM intensity model is summarised in Algorithm 2. The time complexity of proposing a move in T C is O(1) and O(max max(s) s ∈ C ) for MB-MH and MB-Gibbs, respectively.The corresponding space complexities are both O(IJ).
T •+ ] across table sizes dim(T)and number of agents T ++ using Algorithm 1. Convergence is slower l 1 error norm of E[T|y, C T ] for a 33 × 33 table with 5000 Table 2 for low ( ), high ( ) and variable ( ) noise regimes.The ground truth table ( ) is better covered by the discrete table posterior regardless of C, and the table distribution becomes increasingly concentrated around the ground truth table in light of more data C T .

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT

Table 2 :
Ground truth table validation metrics comparing our method against

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT T •+ with the desired parameters.Lemma A.2. Let C T = {T ++ }.Then, T|C, Λ is a Multinomially distributed random variable with T ++ number of trials and Λ Proof.Apply Baye's rule, notice that T fully determines any member of C T and leverage Remark A.1.It follows that Apply Baye's rule, Remark A.1, Corollary A.1.1 and exploit the conditional independence between T •+ and T +• given Λ.It follows that P T|C, Λ ∝ P T|C Λ , Λ T •+ , T •+ , T ++ table margins and odds ratios equal to ω =Λ ++ Λ Λ •+ Λ +• .Proof.

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT Table Inference for Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT where ∂ log P T|C, Λ 10 4 steps using both MB-MH and MB-Gibbs to compare convergence rates.The ground truth table average g(Λ) is computed using the moments of the product Multinomial distribution with known probabilities.In doubly constrained models, this ground truth constitutes an approximation introduced in Appendix B, since the moments of Fisher's non-central hypergeometric cannot be computed in closed form.D.2 Real-world application to Cambridge, UKThe Cambridge home-to-work trip dataset contains 69 origins and 13 destinations at lower super output area (LSOA) and middle super output area (MSOA) spatial resolutions, respectively.The total number of agents whose work trips are recorded is 33, 704.Job availability data from (Office for National Statistics, 2015; Office for National Statistics, 2014) is leveraged as observed log destination attraction y.A cost matrix of travel deterrence is computed as follows.

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT

Table Inference for
Combinatorial Origin-Destination Choices in Agent-Based Population Synthesis A PREPRINT