Bioinspired Robotic Vision with Online Learning Capability and Rotation‐Invariant Properties

Reliable image perception is critical for living organisms. Biologic sensory organs and nervous systems evolved interdependently to allow apprehension of visual information regardless of spatial orientation. By contrast, convolutional neural networks usually have limited tolerance to rotational transformations. There are software‐based approaches used to address this issue, such as artificial rotation of training data or preliminary image processing. However, these workarounds require a large computational effort and are mostly done offline. This work presents a bioinspired, robotic vision system with inherent rotation‐invariant properties that may be taught either offline or in real time by feeding back error indications. It is successfully trained to counter the move of a human player in a game of Paper Scissors Stone. The architecture and operation principles are first discussed alongside the experimental setup. This is followed by performance analysis of pattern recognition under misaligned and rotated conditions. Finally, the process of online, supervised learning is demonstrated and analyzed.


DOI: 10.1002/aisy.202100025
Reliable image perception is critical for living organisms. Biologic sensory organs and nervous systems evolved interdependently to allow apprehension of visual information regardless of spatial orientation. By contrast, convolutional neural networks usually have limited tolerance to rotational transformations. There are software-based approaches used to address this issue, such as artificial rotation of training data or preliminary image processing. However, these workarounds require a large computational effort and are mostly done offline. This work presents a bioinspired, robotic vision system with inherent rotation-invariant properties that may be taught either offline or in real time by feeding back error indications. It is successfully trained to counter the move of a human player in a game of Paper Scissors Stone. The architecture and operation principles are first discussed alongside the experimental setup. This is followed by performance analysis of pattern recognition under misaligned and rotated conditions. Finally, the process of online, supervised learning is demonstrated and analyzed.
commonly used for tasks such as object identification. In these approaches, a feed forward CNN implements two bracketed pairs of convolution operator followed by a pooling layer. [21] They are constructed according to the two visual pathways theory, stating that a primate visual cortex can be split between dorsal (where) and ventral (what) streams originating from the primary visual cortex. [22] However, they tend to be oversimplistic and are unable to successfully account for important aspects of perception such as detail preservation. Bioinspired, artificial smart retinas based on a 2D array paradigm of photoelectric logic gates were also suggested. [23][24][25][26] More recently, a three-input material nonimplication and logic conjunction (NIMPLY-AND) gate constructed out of two memristors, [27] and a pull-down resistor was demonstrated. [28] These gates may form building blocks for the design of a configurable array-matrix that effectively implements in situ image compression.
Herein, we present a system that was designed in an attempt to mimic the functionality of biologic visual systems while processing information contained in granular receptive fields by a CNN. First, the architectural concepts and algorithms are discussed, followed by an actual implementation using off-theshelf components. This system was trained in two separate experiments to perform either pattern recognition or play Paper Scissors Stone (PSS) against a human player. Rotational invariance is demonstrated in the first experiment, while versatile, intelligent decision-making is shown in the second. Finally, the results of an online, supervised learning session are presented. This research can present an opportunity for the development of intelligent, visual perception apparatuses. Such systems could replace both bionic prostheses [29][30][31] and robotic eyes, [32] combining image sensors with artificial intelligent image processing into a single platform.

Architectural Concepts
The basis for image acquisition lies in the implementation of electro-optic, logical conjunction (AND) gates. These gates were arranged in the form of a 2D array. Visual information could then be directly imprinted onto this array as light inputs to corresponding gates to form a pixelated image. Figure 1a shows a schematic illustration of a single AND gate. It is constructed out of a light-dependent resistor (LDR) device S having one terminal connected to an electric input A and the other terminal (which also forms the output) to a passive resistor R. LDRs were chosen due to their transfer characteristics under illumination. Photodiodes usually show a sharp exponential transition in forward-bias current once the illumination threshold is reached (more compatible with representing a binary state transition). When reverse-biased, they display a linear behavior with the irradiation power intensity. Therefore, very large illumination variations would be required to generate equally proportionate current differences. LDRs, on the contrary, display a more gradual, exponential reduction in resistance as the light intensity increases. It is therefore more suitable for generating a wider range of analog levels as was used in this work. With LDRs, the overall sensitivity may therefore be tuned to respond to a much wider range of illumination intensity, while producing a gradual, Goldilockszone response, without saturating the gates (as seen in Figure 1).
The light input B is determined by the presence of illumination, and used to change the resistance of the LDR upon exposure. Both S and R form a current path from A to the common reference (Gnd) as the output voltage level V o is determined by a voltage divider. When the LDR is exposed to light, its resistance drops, and the resistance ratio R/(R þ S) changes. A change from %100 kΩ in darkness to about 500-100 Ω under illumination (10 3 -10 4 lux) was measured for S. The resistance R was implemented by using a potentiometer set to %1 kΩ. In this manner, as the gate was irradiated (B set to logic "1") the analog output voltage level increased from δ to α. Taking the resistance ratio into account, it corresponded to an output logic transition from "0" to "1." This functionality is summarized in the truth table of Figure 1.
Biologic neurons are known to have an activation threshold. To account for a similar functionality in this work, a threshold level was determined based on the analog characteristics of Figure 1. Electro-optical logic AND operation based on an LDR-resistor construction. a) Schematic depiction of the logic gate serving as a building block in the light-sensory array. b) The truth table for an electric A and light B inputs resulting in an analog output level V o that can be translated into a logic AND function. c) Simulation results of the photoelectric gate with different input voltage levels (2.5, 3.0, and 3.3), operating conditions and illumination intensities used to determine a sensory threshold of 1 V. Inset: Circuit under simulation consisting of an LDR S (represented by a vertical rectangle inside a circle), a potentiometer set to 1 kΩ (zigzagged line), and a parasitic capacitate (two parallel horizontal lines). Both the resistance and capacitance are connected between the output node V o and common ground (Gnd, upside down triangle). The LDR is connected between V o and the supply voltage V dd .
www.advancedsciencenews.com www.advintellsyst.com the gate. This threshold was used to define a state of being exposed to illumination for the system. It therefore allowed canceling-out environmental biasing throughout operation. Such a predefined threshold may also be adjusted during real-time operation to emulate relaxation sensory in living organisms. [33] The threshold level was determined based on the simulation results shown in Figure 1c.
The simulation shows the output voltage V o under different illumination intensities and supply levels (2.5, 3.0, and 3.3 V). Three operating conditions (low, typical, and high) were defined, based on the variance in LDR-irradiated resistance (AE50%) and potentiometer tuning resolution (AE10%). The figure therefore depicts three curves for each of the supply levels for a total of nine curves. The plots reveal that δ ≤ 0.1 V in darkness and α ≅ 1.3-2.1 V for levels above 10 3 lux. A threshold level of V TH ¼ 1 V was thus defined (marked by a dashed line). This threshold is associated with an intensity that may vary from 10 2 to 5 Â 10 2 lux, depending on the LDR's resistance variance (low, typical, and high operating corners). The inset in Figure 1c shows the circuit under simulation including the lumped capacitance across the output node. It should be mentioned that biologic neuron activation thresholds are lower than the one defined in this work. Early models such as Hodgkin-Huxley [34,35] describe the neuronal action potential as having amplitudes of up to 100 mV. V TH in this work is determined primarily by the supply voltage V dd that, in turn, is dictated by the requirements of the peripheral circuitry. This circuitry was implemented using off-the-shelf, 3.3 V components to demonstrate a prototype. Alternatively, low-power controllers or advanced application-specific circuits (e.g., 1.8 V and below) may be used to implement the same circuits. In such a case, the threshold level would be reduced by the same ratio because the AND gates are based on passive devices.
The AND gate may be regarded as an artificial neuron that operates on two weights (a variable voltage AV dd and resistance R) and the light input to give an output subject to fulfilling a threshold condition. This enables a sensitivity parameter to be defined which may be used to determine when the system is exposed to illumination and thus to cancel out the environmental bias. For example, it can consist of a summation over the AND gates that cross the threshold and effectively flip to a high logic level. This number can be lowered to reflect a high-sensitivity or increased to reduce the sensitivity. Therefore, once a threshold is defined, the AND gates may be treated as digital entities as reflected by the truth table in Figure 1b.
The implementation of an image sensor capable of capturing pixelated image data implies efficient and compact placement of light-sensitive devices. A schematic diagram showing the systemlevel architecture is shown in Figure 2. An array of LDRs S ij each having one terminal connected to an analog supply AV dd Figure 2. Schematic depiction of the architecture used to implement the vision system. a) Detailed schematics showing LDR devices (S ij ) configured in the form of an array, with each having a respective selection device (M ij ). The array was evaluated in a row-by-row basis through WL i activation. AV dd is an analog supply level serving as the electric input. A set of potentiometers (R j ) are connected to each BL j to form an optoelectric logic gate. An analog output level (V oj ) is fed into an ADC j via a voltage buffer. The combined digital data from the ADCs form the input vector to the ANN, and the computation result is shown over a display module. b) Top-level conception showing the evaluation of a single row of photoelectric AND gates.
www.advancedsciencenews.com www.advintellsyst.com (effectively acting as the electric logic input A) is shown in Figure 2a. Exposing the array to light resulted in conductance changes in individual LDRs. In our chosen methodology, array information was evaluated on a row-by-row basis through the use of word-lines (WLs). For this purpose, selection devices M ij were connected to individual LDRs and each row was activated by its corresponding WL i . As the AND gate functionality requires an additional series resistor, four potentiometers R j were placed at the termination points of the bit-lines (BLs). In this manner, activation of WL i led to the formation of an analog voltage level V oj on BL j . These voltages were then buffered and converted to digital representations by a set of analog-to-digital converters (ADCs, marked A/D j ) and buffered by a controller. A set-vector was constructed once all the rows in the array were evaluated. This vector served as an input to a three-layer ANN. Figure 2b shows the top-level conception of the dataflow. It shows the manner in which a row of optoelectric AND gates is evaluated. Their analog output levels are converted into digital format and input to the ANN. The computation result will be displayed to the user after all four rows are latched and the ANN output is valid. As part of this work, the ANN was trained to either perform pattern recognition or play PSS, and the result shown over a display module.

Artificial Vision System and Test-Bench Implementation
The concepts discussed in the previous section were implemented as a simplified prototype using off-the-shelf components. A schematic drawing showing the test-bench is given in Figure 3a. It consists of two parts that are the player's move generator (shown on top) and vision system (bottom). A player may therefore initiate any random move in a game of PSS by pressing a corresponding soft-button on the screen of the Smartphone. The mobile device then communicates this move to a designated controller (CNT) through a Bluetooth channel. Once received, CNT manipulates the WLs and BLs of the LED array using a predetermined sequence. It results in an image being generated over the LEDs and imprinted as resistance changes in the LDRs.
The images in Figure 3b,c depict the vision system prototype. In the figure, both the front and back sides of the system are shown, while either moved back (Figure 3b) or placed on top, in an aligned configuration with the LEDs (Figure 3c). Referring to Figure 3b, a 4 Â 4 array of LDRs (1), with an area of about 20 Â 25 mm 2 , and four BL-terminating potentiometers (2) were placed over the front face. As for the display (7), buffers (8), and selection devices (9), these were placed over the back side, as shown in Figure 3c. A microprocessor-based module (4) was programmed to implement the controller along with a three-layered, feed-forward, back-propagation CNN. The chosen configuration is considered as good, general purpose architecture for either supervised or unsupervised learning. WLs and the analog supply (3) were also driven by the controller, to allow acquisition of an image imprint over the LDRs. An 8 Â 8 lightemitting diode (LED) array (5), having an area of about 20 Â 20 mm 2 , was controlled through a gamepad (6) Bluetooth application running on an Android smartphone. These LEDs were driven by another separate and unrelated processor module. No communication channels were formed between the two processors except for LED-LDR interaction. Image patterns (i.e., Paper, Scissors, Stone, and All) were programmed into the second processor's memory and activated by pressing the corresponding keypads on the application, as shown in Figure 3c. Each decision made by the ANN, being either pattern recognition or PSS response, was presented over the display module (7).

Testing Methodology
In an attempt to imitate the manner in which retinal ganglion cells spike in response to their conjugated bipolar cells' polarization state, the LEDs were triggered on a row-by-row basis as well. It was done using a pulse width modulation (PWM) scheme that allowed control over their on-time through duty cycle variations. It is probably worth mentioning that the chosen approach is by no means exclusive, and other ways may be used to produce a similar outcome (e.g., PWM of the WLs' in the LDR array instead). However, the overall result should be similar because, www.advancedsciencenews.com www.advintellsyst.com as will be discussed later, the input set-vector values are determined by summation and root-mean-square (RMS) calculation of BL voltage levels. These RMS values determine whether the system is being exposed to illumination, by comparing them with the previously mentioned threshold level, much like the activation of biologic neurons. Figure 4a,c,e,g,i contain images that show the different patterns generated using the LED array. Those patterns were designed and preprogrammed into the second controller. Each image is accompanied by a green-black 8 Â 8 bit representation alongside it where green represents an on-LED and black an off one. Figure 4b,d,f,h,j shows the corresponding output for each image as produced by the ANN over the display module. The calculation outcome was defined to be either similar to the projected image (pattern recognition experiment) or a countermove to the player's move in PSS. LED patterns were intentionally designed to inherently contain different levels of symmetry. Starting from Paper, that is asymmetric (Figure 4a), continuing with Scissors, having a single axis of symmetry (Figure 4e), and ending with Stone and All (Figure 4c,g,i) with multiple axes. Rotation-invariant functionality testing was performed while rotating the LDR array around its normal vector (z-axis).
The purpose was to demonstrate that both multisymmetry shapes (Stone, All) and asymmetric (Paper) and y-symmetry (Scissors) could be identified by the CNN regardless of the relative LED-LDR arrays orientation.

Bioinspired Image Acquisition
This section details the bioinspired formulation and methodologies used to generate the input to the ANN. In biologic visual systems, off-bipolar cells display a hyperpolarization-like response under illumination that decays into a depolarized state in darkness. On the contrary, on-bipolar cells produce a depolarized output under illumination and hyperpolarize in darkness. This is an essential construction that allows for better accuracy and sensitivity in retinal center-surround cell configurations. The pixelated image was processed based on an attempt to mimic this biological behavior. Essentially, it was done through an exponential activation of the divergence of the gradient in discrete space (i.e., Laplace operator). Using the outcome of the second derivative over the array output effectively mimics the centersurround functionality. In this manner, the gradient between neighboring LDRs is amplified and fed into the ANN rather than the actual analog voltage level. This mimics event-based processing in biologic vision. The gradient may also be mapped to a binary number using a threshold level to further reduce the number of input nodes to the ANN. In the biologic counterpart, bipolar cells respond based on the exposure of the central cell, with reference to its neighboring cells as discussed previously. ANN inputs are thus generated by small regions in the visual field, in an attempt to account for Receptive Field stimulation. [14] Rotation-invariant pattern recognition implies correct image classification while in a misaligned position, based on a supervised-training done using a dataset that corresponds only to an aligned placement. The approach chosen herein was to attempt generalization at the image acquisition stage while keeping the ANN simple. As mentioned previously, analog voltage levels produced by the AND gate configurations were read in a row-by-row basis, converted to a digital format and latched by the controller in a continuous manner throughout operation. These voltage levels were recorded over a predetermined number of sampling rounds, where each started with the activation of WL 0 and ended with WL 3 , in a manner that latched the entire set of outputs V o into a 4 Â 4 matrix (Λ). One may think of this matrix as a potential over which the Laplacian operates to identify "sources" and "sinks." Instead of feeding image data directly into a CNN, as done in conventional approaches, the sources-sinks map is being inputted. This abstraction allows for improved rotation-invariant  properties. The matrix Q ij was derived from these levels using the Arrhenius equation. A normalized activation parameter was generated by application of the Laplace operator with cyclic boundary conditions over this pixel analog information (V o , ij ), as detailed in Equation (1) where a is a unitless pre-exponential factor. Λ RMS ij is the rootmean-square (RMS) value of V o over n samples, Δx ¼ Δy ¼ 1, and V AVG is a moving average for all the measured outputs, over m ≥ n samples. Λ RMS ij allows the evaluation of the energy embedded within a time-alternating signal and establishes an activation threshold as occurs in biologic neurons. The AND gate's analog outputs V o thus facilitated the computation through latching and averaging of the 4 Â 4 LDR array. V AVG is then calculated as follows V av is obtained by averaging over the entire analog input span during one sampling cycle. Q may be viewed as a 1D vector q i with multiple rows reorganized into a single row. This vector was treated as a set-input to the ANN without any information loss as shown in Equation (5).
The ANN will remain in a standby mode until the array is exposed to a predetermined, minimal amount of irradiation.
Once this limit is reached, the ANN is triggered to perform a calculation based on an analog input as detailed earlier. In this manner, the sensitivity of the system may be determined and   modified in real time to cancel out any environmental bias. When in standby, the state of the array is periodically evaluated to determine whether the ANN should be triggered. During this process, the output of each AND is compared with V TH and a binary digit (W ij ) is assigned to the output. These bits are added and once the summation surpasses the sensory limit a trigger occurs. For n ¼ 1, V RMS ij in a single sampling cycle can be used to determine whether the array is exposed to illumination while overriding environmental effects. An illumination condition can thus be defined based on Λ RMS ij and the simulation-based threshold V TH multiplied by an empirical factor b (V À1 ) The sensory limit is compared with a summation over all matrix elements X i, j W ij ≥ sensory limit (7) This approach may also be used to determine a saturationdependent threshold and account for sensory relaxation found in living organisms' retinas by resetting the average levels after a predetermined time period during which no illumination was applied. Figure 7. Demonstration of rotation invariance in pattern recognition for an ANN trained using an input-set corresponding for a fully aligned state. Correct pattern identification with %25 of rotation: a) All. b) Paper. c) Scissors. d) Stone; correct pattern identification with %12 of rotation: e) All. f ) Paper. g) Scissors. h) Stone; correct pattern identification with zero rotation (aligned): i) All. j) Paper. k) Scissors. l) Stone.

Paper Scissors Stone
The ANN was taught to correctly respond with a winning move in a game of PSS as shown in Figure 5. Training the ANN to either play PSS or perform pattern recognition essentially has the same level of complexity because the ANN's output is simply interchanged in response to a certain input. The purpose of this experiment was to highlight the versatility and flexibility of the artificial visual system presented in this work. The system was trained offline, while during the experiment, it was able to counter the moves of a human player to a very high success rate (>95% for 50 rounds). The success rate may be improved even further by enlarging the size of the training set, as will be shown in the next section.

Rotation-Invariant Pattern Recognition
This section summarizes the results of an image classification experiment based on misaligned placements between the vision system and test-bench. Initially, the ANN was trained using a first input-set and up to a specific accuracy. The performance was then evaluated by counting the number of misidentified patterns from a series of random test images, for both angular and spatial misalignments. The analysis was then repeated for a second, larger training set, along with higher accuracy. It should be emphasized that the input training sets used throughout this experiment were strictly based on an aligned situation (zero displacement). Correct identification under misplaced conditions would therefore serve to indicate the successfulness of model generalization that accounts for rotational translations.
Figure 6a-c shows the definitions of misaligned placements. The figure shows skewed and tilted placements along the y, x and rotational axes, respectively, marked as dy, dx, and dt. For the first part of the experiment, a training set consisting of 45 input patterns was used. Figure 6d shows the commutative training error as a function of time. Training was arrested once the error dropped below 10 À3 . It took roughly 120 s to complete the entire session and the training consisted of about 35 epochs, containing ten iterations each. Figure 7 shows successful image recognition under different rotations. Figure 7a-d shows correct identification of All, Paper, Scissors, and Stone, respectively for dt ≅ 25 . Figure 7e-h shows correct identification for dt ≅ 12 , and Figure 7i-l shows correct identification with dt ≅ 0 .
In the second part of this experiment, training was redone using a larger set containing 56 patterns. In addition, the exit-condition error was updated and reduced to 0.5 Â 10 À3 . The purpose was to evaluate the dependence of correct identification on training resolution and effort (i.e., larger sets imply larger efforts and a smaller error implies higher resolution). As shown in Figure 8a, the training took %350 s (triple the effort). To evaluate the performance, 30 random images were tested in the first part and 50 in the second. A mismatch figure was calculated by taking the ratio of misidentified patterns over the total figure count. The resulting figures are shown in Figure 8b-d as a function of the misplacements dx, dy, and dt.
These plots show that the mismatch in the aligned cases (dx ¼ dy ¼ dt ¼ 0) was below 5% for both experimental parts. It can serve to indicate that the size of the training set used in the first part is indeed sufficient to yield a low bias. The error Figure 8. Improved tolerance for spatial and rotational transformations after retraining using a larger dataset and higher accuracy (lower training error). Thirty input images were randomly presented during the first part of the experiment, and increased to 50 in the second part (after retraining). a) Time evolution of the cumulative error after each training epoch for a training set of 56 patterns (the exit condition was defined for an error lower than 0.5 Â 10 À3 ). b) Mismatch ratio as a function of misalignment along the x-axis (note: the error bars correspond to a possible variation of AE1 in correct identification over the entire count). c) Mismatch ratio as a function of misalignment along the y-axis. d) Mismatch ratio as a function of angular misalignment.
www.advancedsciencenews.com www.advintellsyst.com bars in the plots correspond to a variance of AE1 in misidentification during the experiment. It is evident that increasing the set cardinality helped to reduce noise and improve performance especially for the rotational misalignment ( Figure 8c) and y-shifting ( Figure 8d). As far as misplacement along the x-axis, the larger cardinality helped to produce better results for a small displacement (2.5 mm point in Figure 8b). However, as the mismatch grew to 5 mm, the misidentification figure bounced back to over 50%. This should be regarded as an artifact caused by the physical construction of the LDR array prototype instead of an indication of a performance limit. It can be seen from Figure 3a that LDR devices were spaced roughly 5 mm apart in the x-direction (the board spacing is 2.54 mm and LDRs are placed 2-pitches apart). Once the LED-LDR overlapping field of vision was shifted by that amount, the LEDs were obscured from an entire column of LDRs, and the cyclic divergence calculation (Equation (1)) fell out of the generalization range of the model. On the contrary, in the y-direction the LDRs were still sufficiently exposed even with a 5 mm shift due to the smaller vertical spacing between LDRs.

Online Learning
The process of real-time, supervised learning implies a need to convey mistakes back to the ANN. Such errors may then be corrected on-the-fly, to tune and improve the overall performance. Here, we aimed to achieve classification by correctly labeling an increasingly growing training dataset while accounting for noisy inputs. Visual interaction was chosen as the main means to transmit feedbacks to the ANN. It was done so to keep the experimental methodology consistent and avoid a need for establishing new communication channels between the ANN and test-bench (independent controllers). This approach served to eliminate any doubts regarding the methodology in which the ANN corrected itself. In other words, the ANN has no indication as to the image generated by the test-bench controller apart from the visual error feedback. A human-machine handshake protocol was therefore defined in the following manner. Initially, the ANN was preset to recognize only the All pattern and the training set (input vector bank) kept empty. This pattern was then used to feedback mistakes to the ANN. Specifically, the shape All was used to indicate wrong responses, while the ANN was trained in real time to identify the other three patterns (PSS). The ANN was thus trained to perform image recognition. This experiment progressed in stages, where at each stage a random pattern was displayed over the LEDs by a human player. This pattern triggered the generation of an input vector to the ANN as detailed by the previous sections. Each vector was either a new one or an already encountered one. For a new vector, a random response was produced and the vector was added to the training set along with the response. As for an existing vector, the ANN calculated the response based on its current state (i.e., weights). In either case, the human player then decided whether or not to indicate an error in his next move. If no error message was received, the state of the ANN was maintained. Once an error was indicated, the ANN acknowledged it, and retraining was initiated by labeling this vector with a different random response. Each training session was terminated as the training error fell below 10 À3 arbitrary units.
Keeping these constraints in mind, no matter which approach was chosen to indicate errors, it is essential that the learning process converges to a successful outcome. For this purpose, a success rate (0-1) was defined and monitored throughout the training process. The reader is advised to consult the supplementary information for more details. Once this rate stabilized near 1, the ANN was considered to be trained. The evolution of the learning process is shown in Figure 9 along with the success rate and timing characteristics. It is evident from the figure that the ANN successfully reached a trained state with high success after %800 s. www.advancedsciencenews.com www.advintellsyst.com As mentioned earlier, the training set was empty at the beginning of the experiment. Therefore, as shown in Figure 9a, the success rate was very low because virtually all the encountered vectors were new to the ANN. Once training progressed and the pattern count increased, the success rate increased and fluctuated. Bumps were caused by the appearance of new patterns and mistaken decisions made by the ANN. However, the system was able to eventually converge to a high rate, indicating successful training. Further support for this conclusion is given by the overall training time shown in Figure 9b and the training time for each pattern in Figure 9c. The bars in the plots indicate the measured time it took to complete each retraining session. Contrary to the notion that the training process would be lengthened as more patterns were successively added into the set, the overall time decreased along with the time per pattern. This helps to support the assumption that a converging learning process took place.

Conclusions
In summary, this work presented an artificial intelligent, bioinspired robotic vision system with rotation-invariant properties. The concept was demonstrated by off-the-shelf components, used to fabricate an ANN-based system that was taught both offline and in real time to perform pattern recognition tasks. The architecture was based on optoelectronic AND gates, treated as building blocks to implement a sensory array. Analog voltage levels, produced by the said gates in response to an illuminated pattern, are directly correlated to the pixel information of this image. In this manner, image data were collected in a row-byrow basis to construct an input set-vector to an ANN, which, in turn, calculated a decision according to predetermined criteria.
First, the architecture and operation principals were discussed along with the experimental setup and test procedures. It was then followed by a functionality demonstration and performance analysis for a game of PSS. During this experiment, the ANN was trained to successfully identify patterns projected over the vision sensor by a white-light LED array construction. These projected images were based on moves made by a human player. The system then countered those plays with a move of its own, with a high success rate. Tolerance to spatial translation and rotation was studied as well after training the ANN with a dataset that corresponded to an aligned orientation. Misidentification error during pattern recognition was then characterized for various degrees of misalignments. Finally, the process of online supervised learning was shown through real-time pattern recognition. The concepts presented in this work could help pave the way toward implementation of light-responsive gate arrays for futuristic bioinspired, intelligent vision platforms.

Experimental Section
Image pattern generation was done over an 8 Â 8 white LED array, each with an irradiation power density of about 20 mW cm À2 . The array was operated by selecting designated BLs and cycling the rows to a high logic level. WL and BL were driven by the pads of an ESP32 control module. Player instructions were supplied by the user to the controller over Bluetooth communication using an Android gamepad application. Each button on the gamepad was therefore configured to produce a different image over the LED array.
The 4 Â 4 light-sensor array was fabricated with off-the-shelf light-dependent resistors, each connected to a VN0104 selection device. The array's bit lines were terminated with four 3296 potentiometers, each set to a resistance of %1 KΩ, to form a logic AND function. Analog voltage levels at the gates' outputs were buffered through an LM324 differential amplifiers integrated circuit and converted to digital levels by four ADC channels from a second ESP32 module. The deep layers of the ANN and system controller were implemented over this module as well. ANN decisions were displayed on a 0.91 inch, 128 Â 32 monochrome unit over I 2 C communication protocol.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.