Probabilistic Neural Computing with Stochastic Devices

The brain has effectively proven a powerful inspiration for the development of computing architectures in which processing is tightly integrated with memory, communication is event‐driven, and analog computation can be performed at scale. These neuromorphic systems increasingly show an ability to improve the efficiency and speed of scientific computing and artificial intelligence applications. Herein, it is proposed that the brain's ubiquitous stochasticity represents an additional source of inspiration for expanding the reach of neuromorphic computing to probabilistic applications. To date, many efforts exploring probabilistic computing have focused primarily on one scale of the microelectronics stack, such as implementing probabilistic algorithms on deterministic hardware or developing probabilistic devices and circuits with the expectation that they will be leveraged by eventual probabilistic architectures. A co‐design vision is described by which large numbers of devices, such as magnetic tunnel junctions and tunnel diodes, can be operated in a stochastic regime and incorporated into a scalable neuromorphic architecture that can impact a number of probabilistic computing applications, such as Monte Carlo simulations and Bayesian neural networks. Finally, a framework is presented to categorize increasingly advanced hardware‐based probabilistic computing technologies.


Introduction
The world is uncertain but, as a general rule, our computers are not. For decades, we have used explicit programming and built deterministic computers for a variety of purposes such as automating mundane tasks and solving complex scientific problems; demand effective optimization across multiple scales. Nevertheless, there is a growing appreciation that not only is it increasingly expensive to enforce deterministic behavior in conventional microelectronics and computing technologies, but that it may be unnecessary to do so for applications in which incorporating stochastic behavior could prove to be beneficial. Accordingly, in recent years, more inherently probabilistic approaches to computing have begun to receive increased attention as an alternative to deterministic computing. [5] Indeed, many complex computational problems, such as modeling nuclear and high-energy physics events, understanding complex biological systems, simulating more precise climate models, optimization, and implementing more effective AI, require simulating probabilistic behaviors on existing deterministic hardware. We consider probabilistic computing as any computing process that calculates or approximates solutions to a model or task (or distributions of solutions) through random sampling or probabilistic manipulation. Probabilistic approaches are widely used when a problem is best modeled as a stochastic system, such as in quantum mechanics, but can also be used in lieu of complex deterministic models by sampling a different, ideally simpler, model. The software use of probabilistic methods on deterministic hardware has long been a major emphasis of the numerical methods community, and while there remain many open questions in this field (such as how to leverage extreme parallelism of Exascale systems [6] and whether ML can act as a surrogate for such tasks [7,8] ), these are largely outside the scope here.
Rather, herein, we consider the implications of future hardware-based technologies for sampling applications, and thus our paper specifically focuses on those numerical methods for probabilistic computing which typically rely on repeatedly sampling application-relevant probabilistic and statistical distributions. In sampling tasks, the computational burden often falls squarely on the speed and efficiency of random number generators (RNGs) and their subsequent transformations. As we will discuss, it is an open question of whether sampling provided by stochastic devices can be effectively used to produce suitable random numbers for numerical computing applications, [9] and it is also unknown how stochasticity can be leveraged in neuromorphic architectures. [10] At the same time, the availability of hardware that makes probabilistic computing more efficient creates an opportunity for these techniques to extend to application areas that have not traditionally been thought of as probabilistic in nature. [5] In today's computing, random numbers are generally produced using pseudo-random number generators (PRNGs). PRNGs are deterministic algorithms that produce a sequence of bits following an initial value (the "seed"), which both conform to the distribution of interest and arrive in sufficiently random order. Statistical measures that compare differences in distribution, like entropy, and rigorous random tests like those in the NIST package [11] provide the means of testing PRNGs. Such algorithms satisfying these types of tests can be efficiently computed on hardware that is already optimized for serial arithmetic. Although the statistical implications of this determinism require care in the development of complex applications to ensure validity (with some famous failures, such as the RANDU generator [12] ), PRNGs are used both due to their ease of generation and their utility in the verification of codes, whereby a set seed will provide repeated behavior.
Despite their widespread use, there are limitations of PRNGs that make a hardware alternative to PRNGs, or a "true" random number generator (tRNG) appealing (Figure 1). First, applications that have stringent demands on the quality of random numbers, such as cryptography, often push the limit of today's PRNGs. Second, the serial operation of PRNGs introduces complexities in highly parallel architectures which may need to generate a high quantity of random numbers in parallel. Finally, PRNGs typically produce random numbers from a uniform distribution, requiring additional computation to convert a sample to the type of random distribution required. To date, most tRNGs have focused primarily on this quality consideration, with tRNG circuits that are highly effective for cryptography applications but may not scale to large-scale numerical tasks. Herein, we consider that for broad computing applications quality, quantity and type will all be important: would it be possible to generate a large number of the right type of true random numbers efficiently?
One example of a system that leverages probabilistic computing at a large-scale is the human brain, a complex system with 10 15 synaptic connections between 10 11 neural cells. The release of neurotransmitters at synapses is a probabilistic process on the order of one release of neurotransmitter per second Figure 1. The quality, quantity, and types of random numbers produced are all important. a) The quality of a random number often has strict interpretations from information theory, such as whether it is possible to predict by knowing earlier random numbers in a sequence. b) The quantity of random numbers is determined not only by the speed by which a random number can be generated, but also the ability to effectively generate statistically distinct random numbers in parallel. c) The type of random number is an often overlooked feature of RNGs -while most RNGs produce uniformly distributed random numbers, applications often require numbers from complex distributions, requiring considerable computational resources to transform a uniform number to the appropriate distribution. www.advancedsciencenews.com

(3 of 16)
© 2022 The Authors. Advanced Materials published by Wiley-VCH GmbH per synapse. [13][14][15] Despite its ubiquity, the brain's stochasticity remains an underexplored area of neuroscience. What is known is that the brain's stochasticity is tightly regulated within each region's specific neuron populations and there is a growing appreciation of the computational implications of this widespread stochasticity. [16,17] Furthermore, the brain's apparent randomness is not limited to the synapse scale, but appears at other spatial scales as well, such as the reconfiguration of neural circuit architectures over time, [18] and probabilistic models are effective at explaining observations of large-scale recordings of neural populations. [19]

A Co-Design Vision for Probabilistic Computing
If we consider the brain's degree of randomness as a notional goal for a probabilistic computing system, it is worth noting how far today's deterministic microelectronics are from achieving that magnitude. Using today's conventional systems, the generation of 10 15 random numbers per second (RN s −1 ) would require ≈1000 CPUs and 150 kW using software-based PRNGs. [20] Circuit-based tRNGs, such as ring oscillators, may improve energy efficiency, but would require over 100 000 circuits [21] and leave unsolved the communication of outputs to the computational logic. Upon recognizing that existing microelectronics approaches fail to deliver necessary capabilities in probabilistic computing, while the brain provides widespread stochasticity tightly integrated into its computations, we present a new philosophy for embracing probabilistic computing. We start with the premise that a computational system with a brain-like stochastic capability of producing 10 15 RN s −1 represents a fundamentally new computational opportunity. To accomplish this goal of ubiquitous stochasticity, we first recognize that there are several implications that must be addressed: First, we must consider that achieving the targeted scale of tRNGs will require adapting our devices and circuits to the physics of materials, rather than the other way around. The continued scaling of transistors has enabled attaining the high resource requirements for useful contemporary computations. For stochastic computing, we cannot assume that a similar scaling opportunity will exist. Meeting this challenge requires consideration of novel device types and materials, such that useful random number generation can be accomplished by a handful of nanoscale devices with a size and power footprint comparable to modern transistors. This tailoring of our devices and circuits to leverage non-trivial behaviors at the physics and materials scales will enable us to achieve dramatic efficiency gains.
Second, we must transform device-level randomness into useful statistical samples without resorting to time-consuming calculations. Meeting this challenge will require multiscale codesign for the algorithms to leverage the underlying physics of the devices. Further, by leveraging the stochasticity of individual devices, our resource will likely produce rather simple stochastic variables, such as a Bernoulli random variable ("1" with probability p , "0" otherwise). We refer to these devices as "coinflip" devices, and we will build up complexity from there.
Third, there is the question of how we would use these random numbers and integrate them into numerical computa-tions. Producing a billion random numbers is of little value if we are simply going to use them serially in a conventional von Neumann manner. Rather we must ask the question of what leveraging an extremely large number of stochastic sources in parallel entails? Here, we recognize that the neuromorphic architectures provide a path to how to use stochastic resources in parallel, as well as a framework to consider novel materials and devices.
Finally, there is the question of how to build and program such a probabilistic computer. This is not simply an architectural question, but also a device and circuits question, and one for which we propose will rely on increasingly more sophisticated AI design tools in the future.
This perspective presents a neuromorphic strategy for a probabilistic computer that addresses these implications. This neuromorphic strategy arises from more than just the original brain inspiration for 10 15 RN s −1 . Neuromorphic computing has been shown to provide efficiencies due to leveraging the benefits of both analog and digital computation, learning, processing-in-memory, event-driven communication, a high degree of parallelization, and a natural ability to program neurons to represent graphs. [22,23] As we will explore, these advantages of neuromorphic approaches are well-positioned to make ubiquitous stochasticity a reality.

How Ubiquitous Stochasticity Can Impact Current Applications of Probabilistic Computing
To explore the value of ubiquitous stochasticity, it is useful to consider how more efficiently generated random numbers can impact applications for which random number generation is perceived as a limitation. While the value of probabilistic computing may well extend far beyond the applications discussed here, [24] as tasks that currently leverage simulated probabilistic computing today, we believe that these are likely the applications that can most effectively drive the development of this technology.
It is important to note that while random numbers are widespread in computation, the relative requirements of random numbers for different applications vary considerably ( Figure 1). For cryptographic uses, the quality of a random number is paramount-the value of any encryption method is limited by the ability of a random number source to uniformly sample, without bias, a source distribution. While quality is related to precision, these are not one and the same-while a PRNG may produce a 128-bit random number, any biases can greatly limit the effective precision realized by the encryption.
The other major application of random numbers is in numerical sampling, which is the primary emphasis of our approach. Here, we consider two distinct cases: conventional modeling and simulation (Mod-Sim) and sampling AI algorithms, although sampling for stochastic optimization and randomized algorithms is a related and promising area of research. [24] While mathematically these two applications are related, in practice it is increasingly appreciated that many AI algorithms, such as neural networks, can be quite tolerant of low precision calculation. [25] Offsetting this ability of neural networks to perform effectively in lower precision regimes is their large size-neural networks often contain millions or billions of parameters that correspondingly increase the required volume of random numbers that would need to be generated for effective sampling. In contrast, for conventional Mod-Sim, random numbers are typically generated as part of well-defined numerical codes for Monte Carlo simulations of complex physics (see Appendix 1), with the complexity and quality of the random numbers commensurate with the overall task.
The availability of ubiquitous device-level stochasticity provides the potential to leverage tRNGs at high throughput for sampling applications. However, to accomplish this efficiently, it is important to consider what advantages can be achieved by producing random numbers in the right format where they are needed for the computation, as opposed to simply generating a stream of uniformly random bits that have to be converted to the desired format and subsequently delivered to where they are needed. To explore this, we will consider specific cases from AI and Mod-sim applications, while acknowledging that every application will benefit from some degree of specialization.

In Situ Sampling of Neural Networks
For artificial neural networks (ANNs), there are two particular applications of random numbers that fall under the Bayesian neural networks umbrella (which simply encapsulates ANN approaches that are designed and interpreted through a Bayesian statistics perspective). Arguably, the most widespread form of ANN sampling today is in generative neural networks, whereby sampling neuron activities in a particular layer (such as the innermost layer of a variational autoencoder, aka VAE [26] ), the network can be used to produce a range of outputs. In generative networks, these often take the form of generating representative samples for a class that was not part of the training data but are illustrative of the datasets used. [27] Less common are methods in sampling the parameterization of neural networks themselves. Such sampling is expensive, in large part because most ANNs push computational limits and Monte Carlo approaches require many samples. For Bayesian analysis, it is important to quantify the sensitivity of a model to its parameterization allowing the determination of confidence in outputs; however, the extremely large number of trained variables in neural networks makes this challenging. To date, most techniques for sampling ANNs have focused on repurposing training regularization techniques, such as neuron dropout, for sampling a network during inference, [28] though because synaptic weights between neurons are the learned parameters there is growing interest in sampling there. [29,30] Perhaps the most straightforward application of ubiquitous stochastic devices may be in sampling ANNs, as they are already increasingly recognized as well suited for processing-inmemory approaches. Sampling ANNs can be achieved by either inserting noise in the parameters of a trained network (i.e., the weights) or on the activations of neurons. Either sampling approach will require a very high number of parallel random numbers, but it is likely that the numbers can be relatively simple. For instance, dropout sampling can be implemented by a simple Bernoulli variable ("1" or "0" with a probability), either gating the use of that weight in the sample or by reflecting the weight as a probability. Similarly, neuron activations can be uniformly sampled using a dropout-like approach [28] or can include a probabilistic component. [31] Additionally, Bernoulli sampling will have the effect of increasing the sparsity of communication within the ANN, which is increasingly appreciated as important for gaining efficiency in sampling.
In these cases, neuromorphic strategies for more efficient ANNs through processing-in-memory and event-driven spiking communication are ideally suited to incorporate the ubiquitous stochastic sampling capability we propose. By incorporating stochastic devices within the processing elements of an ANN, a system can gain the computational advantages of sampling without the high additional cost of communicating random numbers throughout the circuit. Furthermore, while in situ training of devices in neuromorphic architectures for ANN applications remains an area of active research, incorporating stochastic devices into the training itself would provide a means to program the stochasticity of the network. [32] Finally, an ANN can learn to account for any constraints of the stochastic devices while it is being trained for its application.

In Situ Generation of Random Numbers for Sampling
Despite the requirement for extensive sampling, Monte Carlo techniques remain the go-to solution for a number of computational physics and related applications, particularly so when high-dimensional integrations are required. While there are cases for simple uniform random numbers, in general, most of the more computationally demanding sampling tasks require the generation of random numbers taken from particular physics-derived probability distributions (as illustrated by the probability density function (PDF) in Figure 2b). Monte Carlo simulations draw sample trajectories of a stochastic process defined on an underlying space, requiring many samples to estimate a solution on average. Very often the actual math of the stochastic process itself is not overly burdensome, but the generation of suitable random numbers, both in terms of volume and form, becomes a bottleneck. Such Monte Carlo simulation of stochastic processes indeed was one of the original motivations for the first computing systems, [33,34] and there exist several well-understood, though still computationally expensive and difficult to tune, sampling techniques such as Metropolis-Hastings and Gibbs sampling algorithms used to convert uniform random numbers (which are relatively straightforward to generate) to a random sample from the desired PDF. [35,36] Simplifying, often these methods include a technique known as rejection sampling, which performs successive random draws to determine if a proposed random number is from the desired distribution. Depending on the desired distribution, such approaches can become quite computationally expensive, and considerable care must be taken to avoid bias.
Given the importance of sampling the right type of random number, our challenge is to identify a strategy by which to generate and use ubiquitous random numbers to produce samples from the requisite distributions. In Figure 2, we show a few techniques for converting Bernoulli samples to desired distributions. In each of these cases, we will consider that our goal is to produce a binary random number that represents some value corresponding to a range within the support of a PDF, and the probability of observing that binary number is equivalent to the integral of the PDF over the bin limits. We refer to the integral values across all bins as the discretized PDF.
As one extreme, we show how a single coin that we can dynamically tune can approximate a distribution. In this case, a coin continues to be flipped with a probability reflective of the residual probability of belonging to the next bin of the PDF. [1] The algorithm stops once the coin lands on heads and the number of flips taken represents the bin of the PDF that the random number belongs. Such a naive sampling using such an approach would be computationally inefficient (in a worse case requiring a number of coinflips that is exponential in the Adv. Mater. 2023, 35, 2204569 Figure 2. Illustration of how coinflip devices can be used to sample a random number, x, from an arbitrary probability distribution. We illustrate a case where "x" is a random integer between 0 and 7. a) Envisioned coinflip devices will have two outputs, "heads" or "tails" which equate to different electrical properties, such as resistance, and will exist in one or the other state with some probability (illustrated by the pie chart). b) In sampling applications, particularly in simulation, it is often necessary to draw a number from a complex probability density function, or PDF. Conventionally, this is achieved by sampling a uniform random number and analytically or numerically converting that number to the desired distribution. c) One approach is to use a single coin that is repeatedly flipped with different probabilities to simulate the PDF. [1] d) With many weighted coins, it is possible to directly sample a desired random number by representing the PDF as a series of weighted binary decisions, or coinflips. e) Finally, it is possible to treat the stochastic coinflips as a resource that is converted to a desired random number using a neural network or similar transformation.
precision of the random number), but such approaches illustrate that a single coin with tunable probabilities can be used to approximate any distribution.
At the other extreme, we consider a brute force direct sampling of the random number. If one considers the full discretized PDF, one can produce a probability tree by working backward from the probabilities of landing in any bin relative to its neighbors (the lower leaves of the tree) and computing backward what each Bernoulli coinflip probabilities should be. Such a naïve expansion is clearly inefficient and impractical for higher precision, but it provides an immediate illustration that random numbers can be produced directly and quickly if one can produce a large number of weighted coinflips in parallel. Furthermore, this approach also lends itself to optimization in smooth real-world probability distributions; with dependent probabilities between branches presenting an opportunity to take advantage of device or circuit-generated correlations and repeated structure in the tree structure offering an opportunity to greatly reduce the number of required Bernoulli devices.
A third approach that is to date not well explored is to leverage the function approximation abilities of neural networks to perform the desired sample transformation. There are several possible approaches to this. One such way is to approximate the mathematical inverse of a distribution's cumulative distribution function (CDF). If f is a CDF, and x is a value from its distribution, then y = f(x) is the probability of having a sample from the distribution less than or equal to x . Hence, if y is a uniform random number and if f −1 exists, then x = f −1 (y) is a sample from the desired distribution. A neural network could learn a direct transformation of a uniform random number to the desired distribution or have a network learn how to best convert a set of randomly tuned and variable devices to achieve the necessary sampling. This latter approach would be of particular utility in leveraging perhaps otherwise non-ideal device-to-device variability to achieve sampling more effectively.
Importantly, like the AI application, each of these methods for producing desired random numbers stand to benefit from having many random sources working in parallel. Furthermore, as we will illustrate next, these approaches can be realized more efficiently by having a neural circuit integrate over multiple coinflips to produce the desired outputs.

Linking Probabilistic Computing to Neural Architectures
The circuits and architectures serve as a necessary intermediary between the hardware/devices and theory/algorithms, however, this area of research is largely underserved because the circuits and architecture, by nature, cannot be readily altered in isolation. At present, arithmetic logic circuits and processing unit architectures have been long-established for a deterministic framework, and they are unlikely to be altered without radical changes first occurring on the hardware and theoretical fronts. Moreover, most current algorithms, particularly those used in AI, have been optimized for use in what are primarily deterministic architectures [37] with PRNGs used to inject artificial stochasticity at the application level.
For example, Figure 3 shows that, although the programmatic advantages of using PRNGs are considerable, the benefit of specialized parallel architectures for probabilistic algorithms will likely always be limited if they have to rely on an embedded PRNG, since most PRNGs are ultimately software generated. In other words, the "Von Neumann Bottleneck" between processing and memory (which limits the efficiency of software) is also a random number bottleneck. Just as simply using a faster tRNG in lieu of a PRNG will have a limited upside because the overall computation will still be serial; maintaining a reliance on PRNGs in an otherwise parallel architecture will simply make the generation of random numbers a bottleneck. Thus, like the algorithmic motivations above, from an architectural perspective, it is likely critical that any ubiquitous source of randomness be tightly integrated with processing. For this reason, we will explore here the development of a stochastic processing in-memory architecture. Figure 3d schematically shows one potential approach in which stochastic device tRNGs can be integrated at each intersection in a crossbar/crosspoint in-memory neural architecture. Similar layouts are used in analog neuromorphic processors for neural network inference and training [38][39][40][41] and multivariate analyses indicate significant advantages in latency and energy consumption compared to conventional digital processors. [42] In the envisions probabilistic neuromorphic paradigm shown  , a PRNG uses numerical function and state stored in memory to draw the next pseudo-random number. A von Neumann tRNG (top right) would be accessed by the processor as any other specialized logic element, which can provide acceleration but within the context of the existing memory instruction bottleneck. For in-memory computing paradigms, such as neuromorphic, PRNGs (bottom left) still have to be accessed in a manner similar to von Neumann systems, wherein each individual processing element would still have to use conventional logic to update its PRNG state, in effect providing a random number bottleneck. In contrast, in-memory computing would allow tRNGs to be closely coupled to the processing and memory (bottom right), allowing random numbers to benefit from the same efficiency gains from co-locating processing and memory.
here, the crossbar architecture ensures that stochastic devices are available at each synaptic connection between neurons, whereby the outputs of a number of synapses are integrated into each neuron's processing. Ultimately each synapse may be a single stochastic device tuned to a particular probability, or it may consist of a small circuit that models the synapse. There are challenges with using crossbar architectures for computation, the most impactful of which is crosstalk between information-carrying lines. [43] By contrast, stochastic devices will generally not have a built-in memory function, and a careful choice of a three-terminal device to implement, for example, the local storage of weights, obviates the crosstalk issue. However, implementation of a local form of memory significantly increases the complexity of the hardware and presents challenges for scaling up to relevant problems, following similar arguments for the incorporation of selector devise with analog crossbar architectures. [44,45] While Figure 3d illustrates a case where stochastic devices are placed at the intersections of a crossbar architecture, there are many other potential strategies to integrate stochasticity with neuromorphic processing. Neuromorphic architectures must account for at least two computing elements: neurons, which conceptually operate in parallel and can carry state forward in time through potentially sophisticated dynamics, and synapses, which are far more numerous while typically simpler in their calculations. Depending on the application, if the control of the stochasticity is of particular importance, it may be preferable to place the stochastic components within the neuron circuits as opposed to synapses. This neuron-level stochasticity has been shown to be useful already in several regimes, such as learning probabilistic neural networks for simple arithmetic, [46] integer factorization, [47] and restricted Boltzmann machines, wherein neuron activity is modeled as stochastic. [48,49] Likewise, neuron-level stochasticity, though using PRNGs as in Figure 3c, is what is available on today's large-scale spiking neuromorphic platforms and has been shown to be useful for numerical sampling applications on platforms including Intel's Loihi and IBM TrueNorth, and SpiN-Naker. [50][51][52] Nevertheless, the ability to effectively deploy stochasticity at the synapse memories itself (which the brain does), as opposed to just the neurons, likely will provide a more powerful probabilistic computing resource. Recently, a stochastic neural network was implemented with a crossbar array architecture with ferroelectric field effect transistor synapse weights connected to Ag/HfO 2 conducting bridge memory selector devices. [53] The stochastic nature of Ag filament formation/rupture in the selector device renders each synapse subject to Bernoulli sampling, thus setting a random selection of synapses to zero during operation. This in effect produces confidence intervals around neural network classification predictions, a task that is difficult to accomplish with conventional deterministic hardware. This brings us to the next topic, which is identifying circuits that can be effective for controlling stochastic devices.

Identifying Circuits for Scalable Probabilistic Computing
The principal challenge from the circuit level is to identify potential mechanisms for mapping the inherent stochasticity of our devices with the required probability distributions for our algorithms. The simplest approach would be to configure the stochastic devices into well-established probabilistic logic elements and combine those to implement more complex functions. [54,55] There are various examples aiming to leverage stochasticity for computation. Stochastic computing (SC) was introduced in the 1960s as an alternative to digital binary computing. [56,57] SC represents numbers as bit-streams that are processed by digital logic circuits. The numbers are interpreted as probabilities of a "0" or a "1". [54] Despite the error tolerance and gains shown in low-cost computation (e.g., multiplication with a single AND gate), SC was deemed impractical due to long computational times and low accuracy. However, with increasing uncertainty in modern technology, there is an increasing need to better understand ways to exploit probability in computation. Therefore, alternate computing techniques to leverage stochasticity have been considered, such as using stochastic devices to build stochastic logic gates. Maciel et al., 2020, demonstrate non-volatile logic gates leveraging magnetic tunnel junctions (MTJs). [58] However, while this approach would reproduce the advantages of deterministic compositional digital circuits, it would not leverage the RNGs across scales because it would be unlikely to fully capture the unique physics provided to us by the stochastic devices, and it would only provide a least-common denominator contribution to our probabilistic computing algorithm. Recently, Dutta et al. introduced the concept of probability bits (p-bits) as well as binary stochastic neurons. [5] The argument is to leverage p-bits to build p-circuits (probability circuits) that can address applications associated with quantum circuits. Invertible circuits and analog circuits are other avenues to leverage the device physics and stochasticity of MTJ and tunnel diode (TD) devices.
If we specifically consider the scale of stochasticity needed in our driving sampling applications, it is clear that a processingin-memory architecture, such as those explored within neuromorphic computing, is necessary for leveraging ubiquitous stochasticity. Less clear, though, is what the circuits for integrating stochasticity with computation should look like. There are several degrees of freedom that must be explored in this respect. The first is how the stochastic devices should be controlled. One option is that the circuits themselves regulate the stochastic behavior, something which has proven to be effective for the logical functions in p-bits and is potentially useful in the algorithm such as the Gryzska approach to sample random numbers. [1] Such dynamic control of a device has strong similarities to analog computation and confers many of the same benefits and challenges.
An alternative approach would be to treat the stochasticity of devices as a fixed resource and surround the devices with conventional logic that use that stochasticity in the desired manner. Such circuits would likely be easier to construct and would provide fewer demands on the device designs but may not confer the same degree of scalability and efficiency that a p-bits-like approach of dynamically using the stochastic devices would confer.
Regardless, by emphasizing the specific sampling applications above and recognizing the value of generating novel taskspecific circuits, we can be direct about how we will map the specific behavior of stochastic devices to the necessary task.
This introduces a new opportunity for designing novel circuits capable of leveraging the physics inherent in our devices to provide unique computational elements for our algorithmic requirements (e.g., samples from the desired complex distribution). The resultant architecture will depend on the application's need as well as understanding of the integration requirements and capabilities.

Nanoscale Coinflip Devices
Contrary to the daunting size and power challenge of generating 10 15 RN s −1 using current approaches, if a transistor in a modern microprocessor (0.1 fJ, (100 nm) 2 ) could somehow be cajoled into sampling statistical distributions at 1 GHz instead of operating like a switch, our 10 15 random samples can be produced by a million devices with a footprint of ≈0.1 mm 2, and drawing ≈100 mW of power. [59] What should we need the nanoscale device to do? By analogy to their digital and analog brethren, stochastic devices can be thought of as having an input signal, that probes or influences some form of underlying stochasticity, and producing an output signal (Figure 4). For a survey of potential stochastic devices, see Appendix 2.
To be generally useful, the underlying stochasticity of the device must produce an intermediate representation that can be transformed into arbitrary statistical distributions efficiently. A purely analog device, which takes an analog input to set an initial state, evolves stochastically, and whose analog end-state is measured, may be extremely efficient where there is a close match of the underlying physics to the statistics of the problem being solved. In our view, this is unlikely to be sufficiently flexible to be generally useful to probabilistic computing, because this would require identifying a strategy to modify the underlying physics of a stochastic analog device to match an arbitrary statistical distribution required by a problem. This seems daunting, particularly at large scales and arbitrary precision, however, this is a potential future direction for techniques such as neural networks for which training strategies may be able to take the physics of devices into account. [60] More practical is to use one of two other intermediate representations that can be used to sample arbitrary distributions. One transforms a uniformly distributed random number into other distributions and would benefit from a device that produces an analog output (Figure 4c). A second transforms a sequence of random bits to sample distributions, such as illustrated in Figure 2, and would benefit from a device with a digital output, which we term a coinflips device.
The physics of underlying devices rarely produces a uniformly distributed random output, as evidenced by the significant amount of conditioning required for modern cryptographic or Monte Carlo applications, which both require them, and the daunting power and space estimates quoted in the introduction. However, the discovery of a source of large amplitude Gaussian noise, a power-and space-efficient digitization scheme to make it available to computation, and a simple way to transform Gaussian distributed samples to samples of an arbitrary distribution, could make devices producing an analog output quite useful. We are not going to focus on this class of devices here, even though they are entirely relevant to probabilistic computing, because there remain significant gaps in making that paradigm a reality.
We use the term coinflips device to refer to a device that produces a binary "heads-tails" output, and either takes an analog input that corresponds to biasing the coinflip (Figure 4b) or takes no input and produces a coinflip with a fixed probability (Figure 4d). To enable probabilistic computing, the coinflip device must facilitate multiscale codesign at the other levels. Integration of coinflip devices alongside conventional logic is needed to realize architectures that circumvent the von Neumann bottleneck. This kind of fine-grained integration requires more than just process and materials compatibility-an analog signal may need to be provided to the input to the coinflip device, and the output signal from the device may need to be boosted to digital logic levels. Additional circuitry will also be needed to move from the intermediate representations the devices efficiently generate to stochasticity that is ultimately useful, whether it involves analog neurons in a neuromorphic circuit or digital circuits to sample useful distributions. All this needs to be accomplished using the area and power footprint of <100 transistors per coinflip device in order to keep the overall footprint to <1 mm 2 and the power to <10 W. Fortunately, coinflip devices which produce two distinct output signals denoting heads and tails can often be boosted to digital logic levels using only a handful of transistors. Thus, provided an appropriate source of randomness at the device level, the bulk of the size, power, and speed consideration can be focused on turning that source of randomness into sampling an application-specific distribution function. Importantly, there is a significant opportunity in understanding resource  . We can access stochastic devices with either analog or digital inputs and outputs. An analog output device (left) can provide an output voltage from the distribution of potential values, with the input either changing that distribution (analog in) or simply sampling from a fixed distribution (digital in) case. In contrast, a digital output device (right) would provide one of two outputs (heads or tails, in our case), which we call a "coinflip". For such coinflips, an analog input could gradually shift the probability of getting heads or tails; whereas a digital input would simply sample from a fixed Bernoulli probability. tradeoffs related to statistical accuracy and precision of the samples the devices are used to generate.

Materials are the Underlying Source of Randomness
The randomness that underlies probabilistic computing ultimately originates with fluctuations at the material level, while the other layers of abstraction transform and leverage this randomness. An important dichotomy here is useful fluctuations we are trying to control on one hand, and undesirable fluctuations on the other. The latter may result in two nominally identical devices producing different statistics, or the same device producing inconsistent statistics over time. Before considering material properties that may amplify desirable fluctuations or suppress undesirable ones, it's important to recognize that fluctuations commonly originate from three basic physical phenomena-quantum superposition, number fluctuations, and thermal (or quantum) fluctuations. However, as we are specifically considering the opportunities offered by weighting and readout of simple coinflips at large scales, it is unlikely that quantum superposition can be a useful source of fluctuations in the foreseeable future because of the significant limitations associated with the extreme environmental requirements for most quantum systems. [61] In practice, myriad sources of both number fluctuations and thermal fluctuations are active in any material system and will play the roles of heroes and villains in probabilistic computing. Any average phenomena having a discrete basiswhether it is currently being carried by discrete electrons, or the number of atoms in a 1 nm-thick oxide-will be subject to number fluctuations. To have a large fluctuation on a small background signal requires the total expected a number of elements per unit time or length to be small. Unfortunately, most devices that produce or count single photons, electrons, etc. are energy inefficient. Thermal fluctuations from finite temperature are the other major source of stochasticity in a material. For continuous degrees of freedom, these fluctuations tend to be small compared to a large background signal and will likely require too much signal conditioning to be efficient. In general, we believe good coinflip devices will rely on thermal fluctuations in systems with discrete degrees of freedom.
A typical two-level system has activated kinetics back and forth over an energy barrier (Figure 5a) and can be used in two different ways to generate a coinflip. In the first, the system has a shallow enough barrier between the two states that thermal excitation over the barrier leads to fast transitions from one state to the other and vice versa (Figure 5a). Here, tuning the device to a weighted coinflip is accomplished by making one of the potential wells deeper than the other. In the second, the system has well-defined states with a tall barrier. The system is brought to the unstable point between the two states and released (Figure 5b), whence thermal fluctuations will tilt the system towards one state or another. Tuning the weighting of the device is accomplished by releasing the device slightly to the left or right of the unstable point between the two wells. In a variation of this mode of operation, the potential well itself is distorted so as to have a single minimum at this location, which can be used to initialize the starting position of the particle when the barrier is re-established (Figure 5c).
Two concrete examples of materials and devices which are promising for generating weighted coinflips-the TD and the MTJ-are shown in Figure 5. A TD consists of a strongly n-type and p-type doped region in a semiconductor, where the resulting depletion region between the two is very narrow (Figure 5b). While large discrete TDs have historically been used in analog high-speed electronics, we envision nanoscale TDs integrated into front-end-of-line CMOS manufacturing for probabilistic computing. The TD can conduct the same amount of current either through tunneling or thermionic emission. Which branch the device takes depends on the detailed charge occupancy of the defects in the junction, [62] and is detected by the TD as a low (tunneling) or high (thermionic emission) voltage. Conceptually, it is easiest to think of the TD in terms of a double-well potential where the x-axis is the charge occupancy of a single defect. [63] Tuning this device is accomplished with a current pulse that gives the defect an average charge occupancy corresponding to the weight of the coinflip.
An MTJ is also a tunneling device but has a very different principle of operation (Figure 5c). It consists of two thin magnetic metal electrodes separated by a thin insulating tunnel barrier and can be readily integrated into back-end-of-line CMOS manufacturing. Devices are in the form of a nanopillar, an MTJ with a diameter less than ≈50 nm, with one electrode with the fixed magnetic moment and the other with a magnetic moment that is free to reorient. [64] The tunneling resistance  depends on the relative alignment of the magnetic moments of the electrodes; anti-alignment produces a high resistance state and parallel alignment a low resistance state, with a resistance change of a factor or 2 or 3 commonly realized. The MTJ can also be thought of in terms of a double-well potential, with the x-axis being the direction of the magnetization of the free layer. Thermal energy can switch the orientation of the free layer, an effect known as superparamagnetism, producing twolevel resistance fluctuations in the MTJ, in one mode of operation. [65,66] In the second mode of operation, applied current pulses are used to initialize the free layer into a known unstable magnetic state, which is read out after letting the device relax into one of the two stable states.
In general, defects in the material will limit its usefulness as an RNG. These mostly originate from disorder and will alter the potential landscape of the main system. Fluctuations of these hidden two-level systems can uncontrollably change the weighting of the coinflip device. The ability to minimize the amplitude of this effect through materials and structure choice is expected to be a significant distinguishing feature of a given scheme. The other phenomenon that will limit the usefulness of a given material or structure is number fluctuations, particularly in the geometry of a device when a critical dimension contains a countable number of atoms. Although moving to nanoscale dimensions can provide speed benefits from improved parasitic resistance and capacitance, it makes number fluctuations worse. Both these pathologies speak to a qualitative similarity between probabilistic devices and analog devicesthere is no way to eliminate these problems with brute force. More than likely, probabilistic devices will need to be tuned at boot-up to compensate for device-to-device variation.

Riding the AI Wave to Future Probabilistic Architectures
As we consider how to co-design a probabilistic computing system, it is important to consider that a novel computing paradigm may yield impact in unexpected application domains if approached correctly. As an example, as their name implies, graphics processing units (GPUs) were originally a specialized architecture developed to optimize the rendering of graphics at a time when conventional CPUs were not powerful enough for such applications. However, their architecture was constructed in a way such that many general-purpose computations could be efficiently run on GPUs as well if organized appropriately. [67] The ability of GPUs to accelerate general linear algebra calculations enabled ANNs to overcome their prior computational limitations, [68] in a sense allowing ANNs to succeed after many decades of being perceived as a failed strategy for AI. For this reason, it is useful to consider that decisions made at the device, circuits, and architecture level may ultimately determine which algorithms will win out in "the hardware lottery" down the road. [37] From this perspective, it is apparent that probabilistic algorithms and applications, such as Monte Carlo simulations or Bayesian models, have perhaps lost out in the hardware lottery, as they have been developed within a microelectronics framework that has prioritized deterministic conventional computing systems and a growing emphasis on single instruction, multiple data parallelism, which is an awkward fit to the branching inherent in applications such as random walks. While specific accelerators for generating random numbers are useful for tasks such as cryptography, the emphasis has often been on the quality of random numbers over throughput. For this reason, it is critical to leverage a strategy for parallelism such as neuromorphic architectures, which increasingly are appearing to be suitable for non-AI applications as well, [23,52,69] to enable general solutions to probabilistic computing, such as p-bits [5] and the coinflips approach we describe here, to be effective for both known probabilistic applications as well as offer opportunities to extend the impact of probabilistic computing to more diverse problems.
Computer architectures do not necessarily demand strict boundaries in how devices are used. Looking to the inspiration for neuromorphic computing, the brain does not consist of separate digital, analog, and stochastic components that interact with each other; rather the brain is simultaneously digital, analog, and stochastic in its operation. This reality stands in stark contrast to how modern computing systems are designed and programmed (and perhaps explains much of the challenge in understanding its computations), and this inherently integrated architecture is likely a significant unexplored source from which we can leverage the brain's ability to enable future computing algorithms and efficiencies. [70] Therefore, rather than seeing such a tight coupling of digital, analog, and stochastic computation as a challenge, we see it as an opportunity. Recent advances in computational design tools provide us with a glimpse into ways of achieving new capabilities. [71] Furthermore, AI solutions for programming are increasingly becoming more viable; for instance, with the recently introduced AlphaCode approach, AI can program at a competitive level. [72] Indeed, AI-derived solutions for computing systems may prove compelling precisely because they need not be constrained to intermediate composable representations that the current computing infrastructure is based on.

A Proposed Framework for Efficient Probabilistic Computing
Finally, it is worth returning to considering the utility of accelerating random number generation. Ultimately, if the generation of the right quality, quantity, and type of random numbers become highly efficient, if not effectively free, what can be accomplished? To this end, we conclude with a proposed framework to assess the value of probabilistic computing (Figure 6). The lowest level, which we term Level 0, is the simple draw of a random number, today typically through PRNGs in software. As we have discussed above, while this is an extremely wellstudied area of computing, the development of tRNGs that suitably provide the necessary quantity, quality, and type of random numbers will enable impact on the applications at the higher levels we describe below.
We define Level 1 as sampling from an application-specific distribution, which is the primary use of RNGs in computing today. Many of these applications, be it stochastic search algorithms or Monte Carlo scientific computing algorithms, often embody a compromise between few, but often expensive, deterministic calculations and relatively simple, but numerically more numerous, stochastic calculations. Radically improving the cost of stochastic calculations, as shown in Figure 2, would allow hardware-accelerated probabilistic approaches to have significant impacts on Level 1, but these benefits would be mitigated in part by the considerable existing software ecosystem that assumes sampling algorithms are inefficient.
We consider Level 2 as the ability to compute distributions directly through the sampling machinery. Enabling Level 2 would not only improve existing probabilistic applications, but a Level 2 hardware stochastic solution may offer something fundamentally advantageous to problems that are currently best approached deterministically. Here, we can consider two options. First, as work with p-bits has shown, stochastic logic circuits can be configured to very efficiently solve some problems that are typically considered hard deterministically, such as integer factorization and Ising problems. The second type of Level 2 application is those for which ubiquitous stochasticity enables probabilistic methods to be more extensively used within an application. This is best illustrated by uncertainty quantification tasks [73] and is particularly valuable in situations where even increasing available data surpasses domain knowl-edge, such as in deep learning. [74] Both forward modeling (given a model and data, what can we predict?) and inverse problems (given this data, what model most likely is at play?) stand to benefit greatly from more tightly integrated uncertainty quantification, and these challenges represent a growing concern as computing moves towards data-centric applications.
While Level 1 applications would benefit from our proposed approach, may not be sufficiently impactful to justify a substantial shift from the status quo. However, a technology that can successfully impact the applications illustrated within Level 2 would be sufficiently disruptive to encourage hardware development and associated architecture and algorithms shifts. Nonetheless, we would be remiss not to acknowledge the potential impact of applications beyond these, which we dub Level 3. In our framework, Level 3 applications are those that only really make sense to explore in the context of a probabilistic computing paradigm, ones for which the applications themselves are inherently stochastic and highly coupled, as we see in quantum mechanics or perhaps information processing within the brain. One can, in principle, approach these using deterministic hardware that emulates stochasticity, but these will ultimately be limited to small scales that may not be informative. In cases for which abstractions are available, this may not Adv. Mater. 2023, 35, 2204569 Figure 6. Illustration of Proposed Levels for Hardware-enabled Probabilistic Computing. The extent to which hardware is able to perform the necessary computations represent opportunities for acceleration and energy efficiency. Furthermore, in addition to the overall random number generation process being more efficient at higher levels, fewer samples will also likely be required as more direct sampling is used. In the illustration, s u represents a sample from a uniform distribution; s A represents a sample from distribution A, and s f(A,B) is a sample from a distribution that is a function of A and B.
prove prohibitive (for instance, first principles understanding of physics has enabled us to move to more computationally amenable abstractions to model chemical systems at larger scales), in other cases such as biology and especially neuroscience it may only be possible to understand complex systems by suitably exploring low-level stochasticity at large scales.
We do not introduce Level 3 to drive this technology, as it remains too ill-defined to motivate a full redesign of the microelectronics ecosystem. However, we believe that by addressing the applications in Levels 1 and 2 while embracing the codesign philosophy highlighted here, we can advance probabilistic computing in a manner that permits us to address established challenges in microelectronics, but also permits us to potentially enable radical changes to computation that can have a considerable impact on society.

Concluding Remarks
Co-design between research and engineering communities has been widely discussed as an important, perhaps even necessary, approach toward advancing microelectronics as we enter a post Moore's Law era. [75,76] While a co-design approach to future microelectronics is an attractive hypothesis, it has proven difficult to verify within the established ecosystem of deterministic von Neumann computing because progress within more established computing paradigms is often restricted to incremental gains. This is because the potential benefits of radical changes to one element of the technology stack have to be weighed against the possible disruption of other parts of the technology stack. This rigidity has allowed deterministic computing to benefit from many decades of continuous improvement, but at the expense of exploring alternative paradigms. For this reason, we view probabilistic computing as a particularly attractive area to explore the value of co-design, particularly within the development of neuromorphic microelectronics.
The probabilistic neural computing approach proposed in this paper has the additional benefit of having a very well understood and highly optimized status quo to compare against. Today, we achieve probabilistic computing within the deterministic microelectronics stack by generating pseudo-random numbers at the software layer, and then expending additional deterministic computations to suitably processing those random numbers. This process has been highly optimized over the last seventyfive years; in a sense optimized to be as effective as possible to emulate stochasticity if constrained to deterministic hardware. The constraint of deterministic hardware has introduced unexpected tradeoffs, such as limiting the extent that we can parallelize PRNGs while preserving quality. Here, by shifting that to stochastic hardware, we have the opportunity to seek a more globally optimal solution for probabilistic computing. There will still always be tradeoffs between the cost, quality, and type of random numbers we generate; but we can take advantage of today's modern AI tools and specific application requirements to better account for these tradeoffs in our co-design approach.
Although we have provided an expansive vision for probabilistic computing that is ambitious and requires innovation across technology scales, the specific approaches proposed here are demonstrable within the existing microelectronics ecosystem. We provide a device strategy that is able to leverage well understood approaches (MTJs and TDs) that are suitable for incorporation into modern microelectronics pipelines. Similarly, we envision that we can incorporate these devices into probabilistic circuits that will allow relatively straightforward incorporation of stochastic components into increasingly wellunderstood neuromorphic architectures. Likewise, there is an increasing appreciation of how both neuromorphic algorithms and randomized algorithms may provide advantages compared to more conventional approaches.
That is not to say that these challenges are easy, and there will be opportunities for innovation at the materials, device, circuits, architecture, and algorithms scales. We also recognize that there are opportunities beyond our focus on binary coinflip devices, such as the potential to use stochastic devices that can provide analog outputs, which may prove particularly powerful on AI tasks. By taking the view that probabilistic neuromorphic computing represents a new paradigm full of new questions, we expect to see unexpected opportunities for long-term impact on computing and microelectronics.

Appendices Appendix 1. Collider Physics Simulations and RNG
Our co-design goal is the generation of 10 15 random bits per second from novel devices and the development of applications to use this new method for RNG. Particle production experiments in high-energy colliders, such as the Large Hadron Collider, Relativistic Heavy Ion Collider (RHIC), and the future Electron-Ion Collider, have been targeted as areas that would benefit from a ubiquitous stochastic approach to computing. Collider experiments have varied goals including searches for physics beyond the Standard Model or ascertaining the partonic (quark, anti-quark, or gluon) content of the proton. Collider experiments heavily rely on complex simulation in the analysis of particle-production data. Similar methods are also used in the study of ultrahigh energy cosmic rays, detected by their extensive air showers in the atmosphere. Goals of cosmic ray research are to determine the origin and identity of the highest-energy cosmic rays. These simulations most often involve the generation of billions or more of events to understand experimental data. Such simulations are generally done in two steps: (1) event generators are used to model Quantum Chromodynamics (QCD) particle production; and (2) the detector response is then modeled from particles from events simulated in step (1). We focus here on step (1), since event generators are common tools used by multiple different experiments. Detector response modeling is also a heavy consumer of RNG but involves experiment-specific geometries.
Models of event generation of high-energy particle collisions use PRNGs to select Partons from measured Parton distribution functions. QCD dictates how the selected Partons scattered after RNG are used to select the scattered Parton directions. A tool common to multiple eventgenerator models is the Lund string model. [77] The scattered Partons carry a color charge which is absolutely confined in hadrons that are the strongly interacting particles we observe. To maintain overall color neutrality QCD strings are drawn between the scattered Partons and the spectator Partons. The color-magnetic and color-electric fields are string-like because the gluon quanta of these fields carry color charge, resulting in the field lines attracting each other. The dynamics of the particle collision increase tension in the QCD strings until it becomes energetically favorable to break the QCD string by the creation of a quark-antiquark pair. This basic process continues until the kinetic energy of the scattered Partons is converted to the rest mass and kinetic energy of produced hadrons, many of which are short-lived resonances. PRNGs are used by event generators to determine the daughter particle identities, their emission angles from the known intrinsic spin of the resonance, and particularly for long-lived unstable particles, their decay length specifying the distance from the collision vertex where the particle decay occurs. Event generators typically generate uniform pseudorandom numbers on the interval from 0 to 1, and then convert these pseudorandom numbers to random values from specific distributions using either analytic or numerical methods.
Simulations of particle production in a collider or particle production from extensive air showers are most often performed on dedicated computer servers. To illustrate how leveraging coinflip devices can impact this research, the fraction of the CPU time spent on the generation of uniform pseudorandom numbers is shown in Figure 7. The event generator EPOS [78] is used to simulate collisions of the 56 Fe nucleus with the 14 N nucleus. This example is most relevant for extensive air showers produced by cosmic rays where the 56 Fe would be the primary cosmic ray and the 14 N nucleus represents a target in the atmosphere. Such collisions can be studied at RHIC, but to date have not been studied. The collision is characterized by scaling the momentum of each nucleus by the number of nucleons and then computing the total center-of-mass energy (√s NN ).
When the goal of 10 15 random bits per second is realized, it is clear from Figure 7 that significant savings in CPU time for event generation would be realized. The total impact would be greater than estimated here since weighted probabilities will be generated by coinflips devices, allowing direct random number generation from a variety of distributions. In addition to impact on particle-production simulations, investigations are also underway to ascertain if probabilistic neuromorphic computing approaches are applicable to real-time pattern recognition, such as triggering events having a QCD jet.

Appendix 2. Stochastic Device Zoo
Given that every device exhibits some degree of stochasticity, there is a strong motivation to enumerate fitness criteria for probabilistic computing and categorize device candidates for system-level requirements like size, speed, and energy consumption. Extensive reviews have assessed the suitability of CMOS [79] and unconventional [80,81] devices for random number generation. What complicates a simple comparison is the range of devices, and that a given device type often has different internal mechanisms that can be leveraged, each with its own tradespace for size, speed, and energy consumption. For example, CMOS implementations can leverage metastability, chaotic behavior, or clock jitter to generate random numbers from thermal fluctuations, while optical devices can leverage shot noise or field fluctuations to generate random numbers from quantum fluctuations. Similarly, a range of devices often leverages the same mechanism, for example, metastability due to charge fluctuations in CMOS, filament formation in memristors, and spin orientation in magnetic tunnel junctions. While this richness precludes making reductive statements, a quick survey of the range of size, speed, and the energy consumption is useful to understand the space of possibilities. Recent efforts have exceeded the 3 pJ bit −1 , 1000 µm 2 area, and 200 Mbps shown in one recently proposed CMOS implementation [82] by 100× in energy, [83] 10 000× in the area [84] and 1 000 000× in speed [85] using unconventional devices. These rapid advances indicate that generating 10 15 random numbers per second at low energy and space cost is within reach, and that it is premature to winnow the field of candidate devices based on engineering considerations.
Understanding how different design considerations at the device level influence opportunities at the other levels of the computing hierarchy is critical. The stochasticity produced by these devices varies widely, as does the stochasticity needed by algorithms, and the potential cost of transforming between the two can be significant. Recently, matching stochastic devices with ML applications has attracted interest based on the parallels between stochastic devices and analog devices [86] used as neurons. For example, artificial neurons based on stochastic magnetic tunnel junctions have been used to create reversible circuits that perform factorization, [87] and Ising machines that perform combinatorial optimization. [88] Along similar lines, stochastic memristor-based neurons have been leveraged for many applications including pattern matching, [89] where they provide a significant gain in performance compared to CMOS implementations. [90] Generally, just as matching the physics of devices to computation provides substantial gains for analog computation, so will it for probabilistic computation, with the concomitant loss of generality. In this manuscript, we have identified a different approach to probabilistic computing that is conceptually more general, which is accomplished by tying into digital concepts. Perhaps one of these approaches will be the one that eventually reveals the more general opportunity presented by probabilistic computing broadly.
Some generalizations can be made from these contemporary efforts. Despite myriad possibilities, it remains unclear what the breakthrough application driver for probabilistic computing will be. Conversely, matching the nature of stochasticity in the device to how it is used is necessary to obtain the kind of performance and efficiency improvements needed to achieve wider impact. We hope that creating a nomenclature to classify the space of stochastic devices, broad as it is, aids non-experts by making the discovery of application-to-device mappings conceptually simpler.