Historically, the application of phase-change materials and devices has been limited to the provision of non-volatile memories. Recently, however, the potential has been demonstrated for using phase-change devices as the basis for new forms of brain-like computing, by exploiting their multilevel resistance capability to provide electronic mimics of biological synapses. Here, a different and previously under-explored property that is also intrinsic to phase-change materials and devices, namely accumulation, is exploited to demonstrate that nanometer-scale electronic phase-change devices can also provide a powerful form of arithmetic computing. Complicated arithmetic operations are carried out, including parallel factorization and fractional division, using simple nanoscale phase-change cells that process and store data simultaneously and at the same physical location, promising a most efficient and effective means for implementing beyond von-Neumann computing. This same accumulation property can be used to provide a particularly simple form phase-change integrate-and-fire “neuron”, which, by combining both phase-change synapse and neuron electronic mimics, potentially opens up a route to the realization of all-phase-change neuromorphic processing.
Electronic systems and devices have revolutionized almost every aspect of our daily life, impacting on all sectors of society. This revolution has been brought about by the seemingly inexorable improvement in the performance, and the reduction in cost, of silicon-based CMOS (complementary metal oxide silicon) technologies, in particular CMOS-based microprocessors, memories and logic. The progress of CMOS technology over the past twenty years has been driven by an aggressive downscaling of minimum feature sizes. However, as pointed out in the 2011 ITRS (International Technology Roadmap for Semiconductors) Roadmap the continued scaling of CMOS is problematic and there is a pressing need for new device concepts, in particular for a “new beyond-CMOS information processing technology” in which “a nonbinary data representation may be required.”1 In this work we show that phase-change electronic materials and devices have the potential to meet such pressing needs. In particular we show that electronic phase-change devices are capable of performing complicated non-binary arithmetic processing and computation, including for example fast parallel factorization and fractional division. Furthermore, such computation is carried out simultaneously with storage, at the same physical location, of the computed result, leading to a particularly simple and effective form of non-von-Neumann computing. The classical von-Neumann architecture,2 which physically separates processing and memory operations, is limited in so much as the processor cannot execute a program faster than instructions and data can be fetched from, and returned to, memory, leading to the well-known von-Neumann bottleneck.3 While scientists and engineers have been successful in reducing the impact of this bottleneck (by for example increasing use of cache memory and, more recently, multicore architectures), it has long been realized that a computer architecture in which processing and storage are carried out simultaneously and at the same physical location could offer very significant performance (speed and power) benefits. Thus, much effort has recently been expended in searching for materials, devices and systems capable of providing alternative computing paradigms, including memristor, e.g.,4–7 neuromophic, e.g.,8, 9 and connectionist, e.g.,10–12 type approaches. Phase-change based systems, as we show here, also offer an attractive route to an alternative computing architecture.
Reversible switching in various types of disordered semiconductors that subsequently became known as phase-change materials was first reported by Ovshinsky in the 1960s.13 Of course many materials exist in an amorphous as well as a crystalline phase; however, only a very small subset have all the properties necessary (e.g., large electrical or optical contrast between phases; fast crystallization at programming temperatures but stability against spontaneous crystallization for many years at ambient temperature; rapid amorphization capability) to be considered as technological phase-change materials. Indeed, it is only very recently that a detailed understanding has been achieved of what makes a true phase-change material, and why some seemingly similar materials display technologically useful phase-change behavior but others do not, thus allowing the recent development of a set of design rules for what constitutes a phase-change material.14 One of the most extensively studied phase-change materials, and the one that we use in this work, is the ternary alloy Ge2Sb2Te5 which has been used for optical disk memories,15 scanning-probe based storage,16–18 the fabrication of synaptic mimics19–21 and, perhaps its most widely known recent application, the development of binary non-volatile electrical phase-change memories (PCMs).22–26 For binary storage, a PCM cell is switched between amorphous and crystalline phases (and back again) using single pulses. A relatively high amplitude and short duration (RESET) pulse is used to form the amorphous phase, a lower amplitude and longer duration (SET) pulse to form the crystalline phase. In the amorphous phase the cell exhibits a high resistance (typically 100s kΩ to MΩ), while in the crystalline state the cell has a low resistance (typically 10s kΩ). To configure a phase-change device as an arithmetic computer, we tailor the input pulse amplitude and/or duration such that the SET state is reached from the RESET state not with a single pulse (as for normal binary memory operation) but with a pre-determined number of pulses, thus providing a form of phase-change accumulator. Unlike the accumulators to be found in conventional processor architectures however, the phase-change accumulator is non-volatile, works directly in high-order bases, is capable of carrying out both basic (e.g., addition, subtraction, multiplication, division) and advanced (e.g., factorization, fractional division) arithmetic operations, can function as a simple neuronal mimic and can provide a form of rudimentary non-volatile logic.
The idea of using phase-change materials for arithmetic computation was originally suggested by Ovshinsky,27, 28 and we ourselves recently demonstrated in the optical domain and on the (tens of) micrometer length-scale the execution of basic arithmetic operations.29 Here we show that this arithmetic capability is also available in the electrical domain and, importantly, on the nanoscale. Furthermore, as well demonstrating addition, subtraction, multiplication and division with single electrical phase change cells of the 100 nm size scale, we also demonstrate explicitly the potential of nanoscale phase-change accumulators to i) work directly in a range of arbitrary bases (we demonstrate systems working from base-2 up to as high as base-512), ii) perform complex arithmetic operations, such as parallel factorization and fractional division, iii) process and store data simultaneously and at the same physical location, iv) implement a form of sequential multi-input logic, and v) provide a simple and efficient form of integrate-and-fire neuronal mimic.
2. Results and Discussion
The basic mechanism for implementing a phase-change accumulator is shown in Figure1 using a conventional mushroom-type PCM cell. In this instance we have implemented (in simulation) a base-10 accumulator by choosing (or designing) the input excitation amplitude and duration such that 10 input pulses are required to convert the cell from the RESET state to the SET state. Each pulse applied was identical, with an amplitude of 1.085 V and a duration of 60 ns (by comparison a single pulse of 1.5 V and 60 ns duration completely switches the cell for normal binary operation, see Supporting Information). The cell accumulates energy from each input pulse, eventually acquiring enough energy to transform the active region from the fully amorphous starting phase to the crystalline phase. Also in Figure 1 we show the state of the active region of the mushroom cell for various cell states. It can be seen that in state-1 (i.e., after receipt of 1st input pulse) one or two crystal nuclei have formed in the amorphous dome, but they have little if any effect on the cell's resistance. After the input of five pulses (state-5) a few more nuclei have formed and some are beginning to grow, with further growth clearly evident by state-9, but not enough to significantly decrease the cell resistance. Between state-9 and state-10 however there is a large decrease in resistance, and by state-10 the amorphous dome is almost fully re-crystallized and the resistance is well below our chosen decision level of 300 kΩ. It can be seen from Figure 1 that there exists a high resistance plateau region. Here states (corresponding to the input of pulses 1 to 7 in this case) lie below the percolation threshold (for conduction, see ref. 30), thus having substantially the same resistance values; nevertheless these states remain distinct since it takes different amounts of energy (i.e., different number of subsequent excitation pulses) to transform each different state in the plateau region to the SET state (or to a resistance below the decision level). Thus, we are not storing information in different resistance levels and are not using the same scheme as proposed for multi-level PCM memories31–34 or for phase-change based synaptic-like functionality.19–21 Indeed, a perfect accumulator would have only two-levels of resistance, one above the decision level for all states from state-0 to state-(n-1) of a base-n system, and one below the decision level for state-n. This distinction between the accumulation regime of phase-change devices and the multi-level regime is sometimes overlooked and the two are often conflated, but they are different modes of operation and it is the former, accumulation, that here bestows on phase-change cells their capability for arithmetic computing. We should also point out that, unlike in multi-level phase-change memory applications, arithmetic computing via accumulation should not require special methods to combat resistance drift. Indeed, if the low-resistance end-point state of the accumulator is the SET state, then this is essentially stable against resistance drift33, 34 whereas all other pre-endpoint states will over time drift upwards in resistance, actually increasing the decision window between resistances above and below the decision level.
It is easy to see that the system of Figure 1 provides a base-10 accumulator response using a single phase-change cell - each successive input pulse transforms the phase-change cell sequentially from state 1 to 9 (state 0 being the fully RESET state), and upon receipt of pulse 10 the cell switches into the low-resistance state, informing us by a simple resistance measurement (to detect when the resistance is below the decision level) that the count from 0 to 9 is complete. It should also be stressed that this process is non-volatile; if the power is removed from the phase-change system it will remain in its existing state, and processing can recommence from where it left off when power is re-supplied. Further details of the models used to produce the results of Figure 1 and crystallization patterns for all states-0 to state-10 are given in the Supporting Information. Once we have designed an accumulator response of the form shown in Figure 1, we can carry out a remarkable range of arithmetic computations in a most efficient (in terms of the number of cells required), fast (PCM cells have recently35 been switched in 500 ps) and straightforward manner (as we demonstrate experimentally below). Furthermore, the phase-change accumulator cell both calculates (processes) and stores the result of such computations simultaneously in the same cell, potentially providing a powerful form of beyond von-Neumann computing.
We experimentally implemented phase-change accumulators working in various bases using the pseudo-device structure shown in Figure2a. Here the top electrode is a lithographically defined Pt/Ti pad contacted by the tip of a conductive atomic force microscope (C-AFM), and the active layer is Ge2Sb2Te5. The amorphous carbon (a-C) layers in the device are electrically conductive sp2-rich material and allow for tailoring of the electrical and thermal properties of the stack while at the same time providing environmental protection of the Ge2Sb2Te5, such a-C layers having been used successfully in previous investigations of phase-change scanning probe storage.16–18, 36
A static I–V curve for our pseudo-devices, as measured by the C-AFM, is shown in Figure 2b (further details of our experimental procedures are given in the Supporting Information). In this case the top electrode was positive, but note that the I–V curve is essentially polarity independent in phase-change systems6 since phase-change materials undergo unipolar switching, unlike many bipolar switching memristive materials and devices.5–9 It can be seen that the threshold voltage for this particular configuration is around 3.7 V. When we apply input pulses with an amplitude greater than this threshold (and typical pulse durations of approximately a hundred nanoseconds), then the cell switches into the SET state with a single pulse as in normal binary memory operation. However, if we reduce the input pulse amplitude (while keeping the duration constant), then the number of pulses required to reach the SET state increases, as shown Figure 2c. By appropriate choice of the input pulse amplitude (and duration - here we used a full-width-half-maximum (FWHM) pulse width of ≈100 ns, see Figure S2 in the Supporting Information) we can thus tailor the number of pulses required for complete switching (note that for sensing, i.e., reading, the resistance of a device a voltage is used that is far below the threshold, typically ≤1 V, so avoiding any read-induced changes in resistance). In Figure3a for example we show a case for which 10 pulses are required to reach the low resistance state, yielding a base-10 accumulator suitable for performing arithmetic directly in base-10 using a single phase-change cell. In Figure 3b we show accumulator responses suitable for working in a variety of different bases, specifically in this case base-2, base-4 and base-6. Experimentally we have been able to demonstrate accumulators working up to as high as base-512 (Figure S3 Supporting Information), although lower-order accumulators provided more controllable, reliable and repeatable results with our C-AFM based approach (we would expect superior controllability with real PCM-type devices). It will be noticed that the responses of Figure 3 show a more gradual change of resistance (with pulse number) than evidenced for the PCM cell results of Figure 1. This most likely reflects a more gradual change in crystallization in the experimental case, resulting from the differences in the structure of our pseudo-devices from those of real PCM cells (specifically the PCM cell has a physically-confined active (Ge2Sb2Te5) region and a bespoke TiN heater-electrode, whereas our pseudo-devices have a continuous Ge2Sb2Te5 layer and no heater electrode). Nonetheless, it can be seen from Figure 3 that the detection window (i.e., difference in resistance between states above and below the decision level) for our pseudo-devices is easily high enough for practical use, being typically around 100 kΩ or so.
We now demonstrate addition, directly in base-10 using the accumulator response of Figure 3a. For example, to perform the sum (110+310) we started with the cell of Figure 3a in the amorphous (RESET) state and applied (3.12 V, 100 ns FWHM) pulses equal in number to the first addend (one pulse in this case), thus leaving the phase-change accumulator in state-1 (as shown in Figure 3a). We then applied (identical) pulses equal in number to the second addend (three further pulses in this case), causing the base-10 accumulator to move on to state-4. The phase-change cell thus carried out the addition (110+310) and simultaneously stored the result, since the cell resides in state-4. To access the result of the sum we applied identical input pulses until the cell reached its low-resistance state (i.e., state-10), with the complement (to the base) of the number of pulses needed revealing the result; six pulses were needed in this case, so the answer to (110+310) is, as expected, 410. Should the result of an addition exceed 910, then the base-10 accumulator is reset, a carry forward recorded, and the process continued until the required number of pulses have been inputted to the accumulator (for example (710+610) would lead to the cell being reset once, so a carry forward of one, with the accumulator left in state-3). As already mentioned, this ability to work directly in high-order bases with a single phase-change cell is exceedingly efficient; by comparison a conventional binary 3-bit full adder requires five AND, five XOR and two OR gates, i.e., around 100 CMOS transistors, to perform the equivalent of the addition of two base-8 numbers. We should also mention that phase-change devices have the potential for excellent scalability, down to the single-nanometer scale,36–38 so that a phase-change based approach to computing could potentially offer very significant savings in terms of chip area as compared to conventional CMOS implementations.
We now demonstrate experimentally a simple and reliable way to perform electronic subtraction using nanoscale phase-change accumulator cells. Here we choose to work in base-6 (demonstrating the flexibility of the phase-change arithmetic computing approach), so that input pulse amplitude and duration is tuned to yield base-6 accumulator responses of the form shown in Figure 3b. To perform subtraction we use two cells (both working as base-6 accumulators) and the fact that the difference between two numbers is the same when a common number is added to each of the numbers in the subtraction. Specifically here we performed the subtraction (36 −16), using the following steps: i) first we inputted pulses equal in number to the minuend (3) to Cell A; ii) then we inputted pulses equal in number to subtrahend (1) to Cell B; iii) next we applied further pulses to Cell A until the resistance of Cell A fell below the decision level (three pulses were needed here, see Figure4); iv) finally we inputted to Cell B an identical number of pulses to that needed in stage (iii) (i.e., three pulses). At the end of this process the result of the subtraction is stored in the final state of Cell B; here Cell B was in state-4. To access the result we count the number of additional (identical) pulses we need to apply to Cell B until its resistance falls below the decision level; here two pulses were necessary (see Figure 4), yielding the correct result. Thus, we have experimentally carried out the base-6 subtraction (36 − 16) = 26, using two base-6 phase-change accumulators and re-casting the subtraction as (36 − 16) = (36 + 36) − (16 + 36). Note that if we swopped the roles of Cells A and B in the above subtraction method, re-casting the subtraction as (36 + 56) − (16 + 56), the final result would be stored in Cell A and accessed as the complement to the base of the number of pulses needed to take Cell A below the decision level; this provides algorithmic consistency with the addition method described earlier, which may be useful in practical systems.
Note that since multiplication can be implemented by sequential-addition, and division by sequential-subtraction, it is clear that we can also carry out multiplication and division, directly in high-order bases, using nanoscale phase-change accumulators. Indeed, it is possible to carry out division directly using a single phase-change accumulator rather than by a two-cell sequential subtraction approach. In this alternative, single-cell division ‘algorithm’ we use the divisor to define the threshold, rather than the base, inputting to the accumulator pulses equal in number to the dividend, re-setting each time the threshold is passed to reveal the quotient, with the reminder stored in the cell end-state (see ref. 29 for details). While such an approach to division is attractive in terms of efficiency, with only a single cell being needed and division being performed directly rather than by successive subtractions, the necessity to re-configure the accumulator (in terms of the number of pulses required to switch the cell) for each divisor encountered is not attractive for general computation. However, this method of division is well suited to parallel factorization. In Figure 4b, for example, we show the process for parallel factorization of the number 6 (here, for demonstration purposes, we limit ourselves to finding if 2 and 4 are factors of 6). To do this we used two cells operating in parallel: Cell A was set-up so that it switched after two pulses were inputted, i.e., it operated as a base-2 accumulator, while Cell B was set-up as a base-4 accumulator. Next we inputted to each cell pulses equal in number to the number to be factored (6 in this case), resetting a cell (or in our case, since, as previously shown, re-amorphization in the C-AFM environment is difficult,16–18 we moved to a new cell) should it cross the decision threshold. Once this process has been completed, any cell whose end point is below the decision level has its base as a factor. Here, as shown in Figure 4b, after the inputting of 6 pulses to both cells the end-point of Cell A was below the decision level, so 2 is a factor of 6, whereas the end-point of Cell B was above the decision level, so 4 is not a factor of 6. Note that in our C-AFM system it was not possible to apply pulses in parallel to multiple cells (since we only have one tip), however, fast parallel operation would be feasible with real PCM-type devices.
The natural accumulation process inherent to phase-change materials and devices and that we have used above to perform arithmetic computations, can also be used to provide a particularly simple form of integrate-and-fire neuronal mimic. For example, the phase-change cell in the neuron circuit of Figure5 accumulates excitations from incoming pulses and fires (i.e., switches to a low resistance state), causing the comparator to switch, only after the receipt of a certain number of pulses. If the mushroom-type PCM cell and input pulses of Figure 1 were used, the series resistance RS (in Figure 5) chosen to be around half the PCM RESET resistance (here RRESET ≈ 750 kΩ, so let RS ≈ 375 kΩ), and a reference voltage (VREF) chosen for the comparator of VREAD/2, then for the first 9 pulses the read input to the comparator would be below VREF, but would rise above the reference with the 10th pulse, causing the comparator to switch and (via a pulse circuit) generate an output spike (note that VREAD could be applied separately to the excitation pulses, or more practicably could be incorporated into such pulses by using a simple DC offset, as shown in the figure). The number of pulses needed to make the phase-change neuron fire can of course be readily adjusted by changing the pulse amplitude and/or duration used. The circuit of Figure 5 is considerably simpler than conventional CMOS neurons that can require, depending on their complexity, around 8 to 20 CMOS transistors to implement.39–42 (although some of these transistors, typically four or five,43 are used to implement comparator-type amplifiers, which are also used in our design of Figure 5). The fact that phase-change cells have also recently been shown to be capable, by using the multi-level resistance regime, of emulating a synaptic-like response19–21 means that it may well be possible to design and build systems in which both neuronal and synaptic-like responses are provided by phase-change devices (operating respectively in the accumulation and multi-level resistance regimes).
Accumulation might also be used to provide a form of serial non-volatile logic, since accumulator responses, such as those shown in Figure 3, also perform a serial AND (or NAND) function. For example a base-2 accumulator provides a 2-input serial AND operation, a base-4 accumulator a 4-input AND etc. Such phase-change logic has the advantage of being non-volatile, but the requirement to enter data serially, to reset the cell after each logic operation and to have a separate read cycle means that such devices would not be logic gates in the classical sense. Nonetheless, their simplicity and efficiency of implementation may be attractive for certain specialized applications. Interestingly we note that 13-input AND/NAND gates are commonly available from many semiconductor manufacturers (see e.g., ref. 44) and typically need over 50 CMOS transistors to implement. A 13-input serial phase-change AND/NAND could however be realized with a single nanoscale phase-change cell, using the base-13 accumulator response shown in Figure S3 (Supporting Information).
So far we have considered only integer arithmetic. However, phase-change based computing with multi-digit numbers is readily implemented by using separate phase-change cells to represent powers of the base. For example, in base-10 eight phase-change cells could represent the integers 0 to 99999999 or the fixed point the number 0 to 9999.9999. Arithmetic computations using such multi-digit and fixed point numbers would be implemented using the above algorithms on a digit by digit basis. The use of multi-digit numbers would also allow us to perform full fractional division using the simple and practicable two-cell subtraction method described above. For example, suppose we wish to compute the division 910÷210; we use two cells (A & B) both operating as base-10 accumulators and re-cast the division as the successive subtraction, re-setting both cells each time a subtraction has been completed. The number of resets reveals the quotient, while the remainder will be stored in the end-state of Cell B (as the number of pulses needed to take Cell B from its end-state resistance to a resistance below the decision level). Thus for 910÷210 this two-cell successive subtraction algorithm yields the expected result of 4 remainder 1. Now, to attain a full fractional result we simply ‘move’ the remainder up to a cell representing the next higher power of the base, thus our remainder 1 becomes 10, and repeat the two-cell successive subtraction process. Thus, we perform 1010÷210 by successive subtraction of 10-2, yielding the result 5 and so revealing that 910÷210 = 4.5.
Finally, in this section, we would like to point out that much of the related control hardware needed to implement a practicable arithmetic unit capable of implementing the above arithmetic processes in a self-contained way can also be provided by phase-change devices. For example, we showed above that the complement of a number was often required (e.g., to access the results of additions and multiplications). Such a complement could be generated and stored in a phase-change cell by a form of offset copying from the cell containing the number whose complement is desired. For example, to generate and store the complement of the (base-10) number 3 contained in a particular phase-change cell, we utilise a second cell initially in the reset state; we then apply input pulses to the first cell until it reaches the SET state, while also applying the same number of pulses to the second complement cell. To transform the first cell to the SET state requires 7 pulses, hence the number 7, the base-10 complement of 3, is generated and stored in the second complement cell in this example. Negative numbers could be handled by the equivalent of a sign-bit, for example by using the SET/RESET state of a sign-cell to represent positive and negative numbers (or vice-versa).
In summary, we have shown that the accumulation regime of nanoscale phase-change memory type devices can be used to perform not only the full range of arithmetic operations (addition, subtraction, multiplication, division), but also carry out complicated non-binary processing and computation, including parallel factorization and fractional division. Such arithmetic computation is carried in a non-von-Neumann architecture in which processing and non-volatile storage are carried out simultaneously by the same nanoscale phase-change cell. Furthermore, computations can be carried out directly in high-order bases, such as base-10. Thus phase-change accumulator-based arithmetic computing has the potential to be extremely efficient when compared to conventional CMOS-based arithmetic processors. The same accumulation property that endows an arithmetic capability can also be used to provide a simple form of non-volatile logic and to implement a simple and efficient form of nanoscale phase-change neuron. Thus, phase-change devices potentially offer a range of functionality that goes beyond simple binary memory to encompass new forms of phase-change based computing.
4. Experimental Section
Phase-change pseudo-devices comprising e-beam lithographed top electrodes fabricated on top of a continuous 10 nm thin film of Ge2Sb2Te5 (GST) sandwiched between two conductive sp-2 rich a-C layers, were used, as shown in Figure 2a (and in Figure S1a of the Supporting Information). The lower a-C layer was prepared to be highly conducting (σ ≥ 102 Ω−1m−1) and in combination with a Ti layer provides the bottom electrode. The top a-C layer provided environmental protection for the GST and was also conducting, but less so than the bottom a-C layer (σ ≈ 50 Ω−1m−1). The basic Ti/a-C/GST/a-C structure was similar to that used successfully in several studies to demonstrate scanning probe-based storage using phase-change materials16–18 and was prepared by DC/RF magnetron sputtering. Note that as a consequence of additional series resistances caused by the inclusion of the a-C layers, the resistance difference between amorphous (RESET) and crystalline (SET) states in the pseudo-devices was substantially smaller than that seen in commercial PCM devices. On top of the Ti/a-C/GST/a-C structure an array of evenly distributed, 15-nm-thick Pt dots were fabricated using standard e-beam (NanoBeam nB3 system) lithographic patterning and metal sputtering. A 5-nm Ti layer was used as an adhesion layer between Pt dots and the a-C capping layer and the resulting 20-nm-thick Pt/Ti combination gave an excellent Ohmic contact. The Pt/Ti dots served as a fixed top electrode and were typically 100 nm in size (diameter). The Pt/Ti electrodes were contacted using a conductive-diamond AFM tip (Bruker AFM probes) using a Bruker Innova SPM system with C-AFM capability. The conductive-diamond tips were durable and had a very high-current carrying capacity (unlike metal-coated C-AFM tips).
Further details of all experimental methods (and of the theoretical and computational model used for the simulation results of Figure 1) are given in the Supporting Information.
Supporting Information is available from the Wiley Online Library or from the author.
The authors gratefully acknowledge EPSRC for grant funding (EP/F015046/1). They also would like to thank Dr. A Pauza, formerly of Plasmon Data Systems Ltd, for help in preparation of the GST samples. Professor Peter Ashwin from the University of Exeter is also acknowledged for helpful discussions and guidance in the formulation of the GCA simulator. The authors are also very grateful to Mr. David Anderson of the University of Exeter for valuable assistance with the lithography of the pseudo-devices.