FPGACam: A FPGA based efficient camera interfacing architecture for real time video processing

In most of the real time video processing applications, cameras are used to capture live video with embedded systems/Field Programmable Gate Arrays (FPGAs) to process and convert it into the suitable format supported by display devices. In such cases, the interface between the camera and display device plays a vital role with respect to the quality of the captured and displayed video, respectively. In this paper, we propose an efficient FPGA ‐ based low cost Complementary Metal Oxide Semiconductor (CMOS) camera interfacing architecture for live video streaming and processing applications. The novelty of our work is the design of optimised architectures for Controllers , Converters, and several interfacing blocks to extract and process the video frames in real time efficiently. The flexibility of parallelism has been exploited in the design for Image Capture and Video Graphics Array (VGA) Generator blocks. The Display Data Channel Conversion block required for VGA to High Definition Multimedia Interface Conversion has been modified to suit our objective by using optimised Finite State Machine and Transition Minimiszed Differential Signalling Encoder through the use of simple logic architectures, respectively. The hardware utilization of the entire architecture is compared with the existing one which shows that the proposed


| INTRODUCTION
Vision is one of the most prominent senses present in the humans [1] due to which real time vision based system or a part of the system are commonly used in various real-time applications. In general, any real time video/image processing technique can be split into four main processing blocks, namely Sensor, Memory, Processing Unit, and Display [1,2]. (i). Sensor: It is used to capture video sequences from the external environment and transform it into corresponding electrical signals suitable for further processing. (ii). Memory: It is internal RAM where the captured video sequences are stored temporarily for further processing. This block also helps to synchronize data between the sensor and processing unit where both the blocks are operated at different frequencies. (iii). Processing Unit: In this section, the required processing algorithm/architecture is implemented. This block accepts data from Memory. (iv). Display: This block accepts the processed data and converts it into the required format supported by the display device.
In this paper, we propose a new Very Large Scale Integrated Circuit architecture to interface low cost Complementary Metal Oxide Semiconductor (CMOS) camera and display device with processing elements to FPGA board efficiently and also to display the video directly using a display device, such as monitor or TV. The entire architecture is optimised to get lower hardware utilizations without affecting the architectural accuracy which is implemented using Vivado 2018.3 tool where the coding is performed by using the standard Very High-Speed Integrated Circuit Hardware Description Language (VHDL) language [3]. This architecture is synthesized and tested in real time using Digilent NexysVideo (xc7a200t-1sbg484c) FPGA board [4] and Zybo Z7-10 (xc7z010-1clg400c) FPGA Board [5] separately

| Contributions
The novel concepts of this paper are listed as follows:

| RELATED WORKS
Normally, embedded systems with hardware/software cosimulation techniques are widely used to implement video processing systems due to the ease of implement. Said et al. [6] proposed a video interface technique where the Xilinx EDK tool is used to interface a Micro-Blaze embedded processor and the architecture is implemented using the embedded Clanguage onto the processor. This system uses a Micron MT9V022 VGA camera to capture videos at the rate of 60 frames per seconds (fps) which is then displayed using standard DVI interfaces. This architecture requires a large amount of hardware resources and the overall operating speed of the architecture is low. A similar technique for capturing a video is presented by Abdaoui et al. [7] which is implemented on a Virtex-5 FPGA with a co-procesor to control the overall operation. This approach increases the area requirements as well as decreases overall frequencies. Biren and Berry [8] presented a new camera interfacing architecture which is implemented on an Altera Cyclone-III FPGA using different IP cores provided by an FPGA manufacturer. The use of IP cores increases the total hardware utilization of the architecture. Along with camera interfaces, many processing algorithms are implemented for real time video processing. Stereo vision based video rectification is presented by Maldeniya et al. [9], where a dual camera is used to capture stereo images and are interfaced with a Spartan-3E FPGA through the100Base-T protocol using the embedded processor present in Xilinx EDK tool. Similarly real time motion tracking is presented by Mosqueron et al. [10] where motion is detected from the real time video captured by a camera through FPGA using embedded processing. The disadvantage of this architecture is the less overall frame rate. It is necessary to understand sensor design to interface a camera properly with any of the processing elements. Zhao et al. [11] present a 64 x 64 array image sensor architecture which is designed using the UMC 0.18 μm technology which has different user defined operating modes depending upon applications. The circuit used to read the sensor captures the rows present in the array sequentially and generates an analog voltage which is then digitized by an onchip Analog to Digital Converter. This architecture is able to produce images of 64 � 64 resolution at 100 fps.

| PROPOSED ARCHITECTURE
The Digilent NexysVideo [4] and Zybo Z7-10 [5] FPGA Board is used separately to interface the OmniVision OV7670 [12] and OV9655 [13] cameras which further use the two wire SCCB interface [14] for initializing their internal resistors to generate specific user defined video formats. The proposed architecture used to interface camera with FPGA is shown in Figure 1 [14] protocol. The generated video is of the RGB565 [17] format, and for the proper synchronization between the camera and designed blocks, FPGA should start accepting pixel values from the camera serially through the PMOD Ports [18] after the completion of the camera configuration through register setting. The Optimised Image Capture block converts the camera output into the proper serial format using corresponding synchronization signals (VSYNC and HSYNC) of the camera which is then converted to the RGB444 [17] format by Optimised RGB565 to RGB444 Conversion block. This converted pixel values are then temporarily stored into Memory [19] block which helps in synchronizing pixel the values between different blocks operating at different frequencies [20]. After writing one row into the Memory, the Optimised VGA Signal Generator block reads the stored pixel data through Optimised RGB444 to RGB565 Conversion block and then converts this data into the corresponding VGA signal with proper synchronization signals (i.e. vsync and hsync), respectively. Now, depending upon the Mode switch value, color or gray pixel format is selected by both the Optimised Color to Gray Conversion and MUX blocks which are then used to generate the corresponding HDMI format by the Optimised VGA to HDMI Conversion block. The VGA and HDMI signals are connected to the input of the Port Selection block which selects one particular display formatted signal from the input depending upon the port_select value which is connected to the display device such as monitor/TV for the purpose of display.
In this architecture, both VGA and HDMI signals are used, as FPGA boards support the VGA port for displaying while others support the HDMI port. As a result with some small user defined modification, our proposed architecture can be implemented on any kind of FPGA boards.

| Optimised Controller
The Optimised Controller unit is used to program the internal registers of the camera depending upon the corresponding datasheets [12,13] to set the proper operating mode. For this implementation, an image size of 640 � 480 is considered at 30 fps whose pixel values are represented by the RGB565 format [17]. The architecture of the Optimised Controller is shown in Figure 2 which consists of the Register Sets and Modified SCCB Format Generation blocks.

| Register sets
The specifications of the generated video streams of both cameras (OV7670 and OV9655) such as video size, color specifications, and frame rates, can be set by the designer depending upon the application requirements with the help of the corresponding datasheets [12,13]. To generate a good quality video, it is essential to assign the proper values at corresponding address in proper sequences, and any mistakes in assigning this register address and values will certainly affect the quality of the generated video streams. So, correct register address and values are stored into the Register Sets block which is then used by the architecture to specify the correct video format.

| Modified SCCB format generation
The OmniVision cameras uses the SCCB interfacing [14] protocol to access its internal register to define video specifications generated by the camera. The abbreviation of the SCCB is Serial Camera Control Bus is the simplified version of Philips Inter-Integrated Circuit (I 2 C) [21] protocol.
The Modified SCCB Format Generation block fetches the address and data from the Register Sets block and convert these into the corresponding SCCB format values, which is normally using a serialized data transfer technique. The pseudocode is used to implement this block is given in Algorithm 1.
From Algorithm 1, it can be seen that the whole block is designed using Counters, Comparators, and basic logical components. This algorithm is designed with the aim to optimize hardware parameters without affecting functionality. Extra signals (send and taken) are considered for proper synchronization between two corresponding blocks. Similarly, the signal id is used to define the operating mode of the camera registers (i.e., read or write) by considering a specific value depending upon corresponding camera datasheet [12,13]. Most of the existing two wire I 2 C or SCCB interface architecture use the FSM model [22] or interdependent counters [23] to perform serialization which increases the design complexity and hardware requirements.

| Optimised image capture
For proper synchronization between different pixels present in video frames, the pixel data must be captured depending on various synchronous signals that is, PCLK, HREF, and VSYNC, respectively, from the camera. The main task of this block is to store pixel data into Memory using these synchronous signals through proper address (wr_addr) along with pixel values (data_out) and write enable (wr) signal. The block diagram of the Optimised Image Capture block is shown in Figure 3.
The proposed architecture captures the pixel values form the camera and stores it into the Memory block depending on VSYNC and HREF signals which are checked by Compar-ator_1 and Comparator_2, respectively. Both blocks generate the intermediate reset signal, rst_temp and enable signal, en, respectively which are then used to reset the entire architecture and to stay at previous values. The rst_temp and en signals are made high by Comparator_1 and Comparator_2 only when HREF = 1 and VSYNC = 0. At this situation, one pixel value is sent by the camera periodically at each nonoverlapping second PCLK pulse. To store the pixel values, the feedback architecture is made by interconnecting the AND Gate, Merger_1, DFF_1, and Concatenation blocks which are able to track second PCLK at valid conditions, and by making wr = 1, they force the Memory block into write mode.
Simultaneously DFF_2, Merger_2 and DFF_3 are used to merge incoming data from the camera at each rising edge PCLK to generate video pixel values in RGB565 format [17].
Also the blocks namely Counter and DFF_4 are used to generate the accurate write address values wr_addr to store the pixel values in correct predefined locations. This architecture uses basic gates to reduce hardware complexity and logic utilizations [3].

| Colour plane conversions
The OV7670 [12] and OV9655 [13] cameras support RGB565, RGB555, RGB444, Raw Bayer and Processed Bayer colour formats, but the RGB565 format [17] is considered to generate video frames from these cameras [12,13]. This is because it is nearer to the format of RGB888 color-space which generally used by most of the modern display devices. As a result, at the time of the conversion (i.e. RGB565 to RGB888 format) by the display devices, generated errors will be very less. To store the frame pixels temporarily in the Memory unit using this format requires a high amount of memory which is not present in most of the low and medium range FPGAs. So, the RGB565 format is converted into RGB444 format to reduce the memory requirements by almost 25% of the original requirements. But the RGB444 format is not suitable for producing good quality color pixels and not suitable for many applications. So, it is necessary to convert the stored RGB444 format into the corresponding RGB565 format.

| Optimised RGB565 to RGB444 conversion
The conversion equation [1] is given in Equation (1) as follows: From Equation (1), it can be seen that multipliers and dividers are required for implementation which increases hardware requirements [24]. To overcome this problem, multipliers and dividers are replaced by the corresponding shifter and adder blocks [20] which are given in Equation (2) as follows: where, LS n → Left Shift by Position n.
The architecture used to implement this conversion is shown in Figure 4 where Q-format [25] is used to preserve the data accuracy by considering a fractional part for the intermediate stage, but the input and output signals are in normal binary format, respectively. The Concatenation blocks present in the input side separate the different color components (red, green and blue) from the pixel value and then the required number of 0s are padded to the MSB side to make all three color planes 16-bit signals. Now using shifters and adders, the intermediate values are generated which are then concatenated to generate the individual color planes in the RGB444 format. These planes are then merged by the Merger block to generate the corresponding RGB444 formatted pixel value.
The equation used to convert the stored RGB444 format into corresponding RGB565 format [1] is given in Equation (3).
Hardware utilization which is mainly due to multipliers, is reduced by replacing it with shifters and adders [20]. As a result, the Equation (3) is modified into Equation (4): where, RS n → Rightt Shift by Position n.
The hardware architecture of the optimised RGB444 to RGB565 Conversion block is designed from Equation (4) similar to the way the optimised RGB565 to RGB444 Conversion block is designed from the Equation (2).

| Optimised VGA Signal Generator
The converted Pixel values form the Optimised RGB444 to RGB565 Conversion block is used by the Optimised VGA Signal Generator block to generate the corresponding VGA signal. The process of reading the pixel values from the Memory block starts after storing one frame of the video sequence. This avoids the synchronization problem between Memory and the remaining processing blocks.
The CMOS camera [12,13] is programed to generate videos of 640 � 480 resolution. So, the locally generated VGA signals must have the same resolution for regenerating the same video. Standard parameters [26] are used to generate VGA signals of 640 � 480 resolution. The VGA signal with 25.175 MHz frequency is used to generate a local VGA signal to maintain data comparability with camera modules. The proposed architecture is used to generate the local VGA signal given in Figure 5 where the constants values [27] are calculated using Equation (5) to Equation (12). This architecture requires less hardware resources than the existing [28] due to the use of basic logic elements in an optimised way.

| Optimised Color to Gray Conversion
In many video processing applications, color images are converted into grey scale to reduce the complexity of processing. The equation for this conversion is derived using the averaging method [1] and is given in Equation (13) as follows:  (14) to generate efficient hardware architectures [20].
The hardware architecture of the Optimised Color to Gray Conversion block is designed from Equation (14)

| Optimised VGA to HDMI conversion
Most commonly VGA, HDMI, DVI etc., interfacing standards are used by display devices, such as TV, monitor, etc. Among these standards, HDMI standard is the most commonly used in newly manufactured display devices due to its support of uncompressed high quality audio and video interfaces through a single cable [29].
The generalized architecture of the VGA to HDMI Conversion block [30]

| EDID ROM
In any HDMI protocol, the operational characteristics of the video, such as resolution, frame rate and version etc., must be exchanged between source and sink at the beginning for proper synchronization. These values are normally constant for a specific resolution [32]. The EDID values corresponding to 640 � 480 resolution are stored into the EDID ROM block in the proper order for further use. The standard file structure for EDID is considered for this implementation. The EDID versions of 1.3 and above uses a total 256 bytes to define the corresponding EDID structure. In such cases the Extension Flag is defined by a total of 128 bytes. For our implementation, CEA-861 Standards [33] are used to define the Extension Flag field.

| Modified DDC Format Conversion
The HDMI standard exchanges EDID ROM data between its Source and Sink using the DDC Protocol [33] which is normally F I G U R E 4 Proposed architecture of optimised RGB565 to RGB444 conversion SARKAR ET AL. a standard serial signaling protocol. This is almost similar to the Philips I 2 C [21] protocol which consists of three wires, such as Serial Data (SDA), Serial Clock (SCL), and high logic value (+5 V). The DDC format [33] uses Inter-Integrated (I 2 C) [21] specifications to transfer the TMDS encoded data. To design the proposed DDC Protocol, NXP UM10204 I 2 C bus specifications [35] for single master buses are considered. Those specifications mainly give the details of Start, Stop, and Acknowledgement signals for proper communication. To implement this, the FSM model is used, which is given in Figure 6. The proposed state machine uses 8-bit data and addressing bits. Upon start-up, the state machine immediately enters into the Ready state and stays in the same state until Send = 1 and Restart = 0 which makes the machine go to the Start state which generates proper start conditions for data transfer. Now the machine goes to the Address state which fetches 8-bit address from the EDID ROM and serialize it using the PISO architecture [36] which is then tracked by the bit_counter variable. The machine stays in this state until bit_counter = 0. When bit_counter = 0, the machine reaches the Slave ACK1 state where it waits for the acknowledgment from the slave device. If acknowledgement is received, then it performs similar operation using data from EDID ROM or else starts sending address values once again. Once Slave ACK2 is received by the sender, the address value is incremented by '1' and goes back to the Start state again. This way when Slave ACK2 is received for the address_max value then the machine enters the Stop state and stays in that state until the process is restarted when Restart = 1 which forces the machine to go to the Ready state and perform the entire operation once again.

| Modified TMDS encoder
Transition Minimized Differential Signaling is abbreviated as TMDS and is used to encode data at a very high speed for various video interfaces. It uses a form of 8b/10b encoding [37] technique which reduces electromagnetic interferences to F I G U R E 5 Proposed architecture of optimised VGA signal generator. VGA, Video Graphics Array provide faster signal transmission with reduced noise [38]. The TMDS encoder encode the input data and send it serially at a high speed which minimizes transitions by retaining high frequency transitions for clock recovery. This process keeps the minimum number of 1s and 0s in the line nearly equal to improve the noise margin. The algorithm is used to implement Modified TMDS Encoding is given in Algorithm-2. The Modified TMDS encoder unit is designed using basic gates and flip-flops with simpler interconnections between them to produce optimised hardware architecture than the existing [39].

| FPGA IMPLEMENTATION
The proposed architecture is coded using the standard VHDL language [3], synthesized and implemented on Digilent Nex-ysVideo (xc7a200t-1sbg484c) [4] and Zybo Z7-10 (xc7z010-1clg400c) [5] FPGA board separately through the bit-file generated by the Xilinx Vivado 2018.3 tool by assigning the ports in a .xdc file. The generated schematic diagram of the proposed architecture after post-implementation step is shown in Figure 7 for the NexysVideo FPGA board.
The hardware utilizations of the proposed interfacing architecture after post-implementation stage including most of the internal blocks are given in Table 1. The hardware utilizations and power requirements of the entire camera interfacing architecture is lesser than the sum of all the individual components present internally in the architecture for both the cases, respectively, which is mainly due to the use of the Balanced Synthesis and Optimizations [16] operation present in the Xilinx Vivado tool.

| REAL TIME IMPLEMENTATION
The image of the experimental product setup using the proposed camera interfacing architecture is shown in the Figure 8 where the OV7670 camera is mounted on a stand for better focusing and is connected to a Digilent NexysVideo FPGA board through PMOD ports using general purpose jumper wires. The architecture programs the camera to the correct mode and accepts video sequences which are processed and sent to the available HDMI/VGA port to display through the connected display device.
It is also possible to convert this architecture into a real time product which requires replacing the general purpose jumper wires used in Figure 8 with a custom designed Printed Circuit Board (PCB). The architecture of the custom designed PCB is shown in Figure 9 which is a simple two layered PCB used to provide proper connection between the camera and FPGA module.

| PERFORMANCE EVALUATION
The performance of the proposed architecture is compared with various existing architectures or techniques with respect to data accuracy, board comparability, cost, and hardware utilizations to check the superiority of the proposed architecture. SARKAR ET AL.

| Data accuracy
To check the accuracy of the Optimised RGB565 to RGB444 Conversion block, the full range of RGB565 values is considered for each color plane separately and then the corresponding RGB444 values are calculated using Equation (1). Also, the same inputs are fed to the proposed conversion block and the corresponding jErrorj is calculated [40] separately for each color plane using Equation (15) as follows: where, jErrorj→ is the error occurred in proposed architecture.
A → Actual values calculated form conversion equation. B → Output values form proposed architecture.
The |Error| is calculated for both of the conversion processes (i.e., RGB565 to RGB444 and RGB444 to RGB565 respectively) with the corresponding full range of RGB values of different color planes where only integer values are considered. The |Error| occurred in both of the conversion processes is always lesser than '1' due to the use of Qformat for intermediate calculations. As a result, when both conversion blocks are cascaded to get double conversion, due to design issues the |Error| introduced at the final result must also be equal to or less than '1'. These small errors are introduced due to truncation errors [3,24] that occurred in binary arithmetic calculations.

| Board compatibility and approximated costs
The FPGA manufactured by Xilinx and Altera are most commonly used by the PCB manufacturing companies to make FPGA boards [41]. As a result, these board manufacturers supply many extra addon components as accessories to some specific FPGA boards to perform selected operations, such as video and audio processing. Along with these accessories, interfacing program is also supplied by the manufacturer in terms of IP Cores. Due to various issues such as compatibility, availability and easier implementation etc., in most of the cases those IP Cores are developed by embedded programing with the help of supporting embedded platforms which are supported by high level FPGA boards only. On the other hand, the proposed architecture is designed through optimised architectures coded using the standard VHDL language [3] with minimal numbers of IP Cores [15,19,34] and connected to the FPGA board through two PMOD ports [18] which helps the architecture to be implemented on most of the available FPGA boards.
The board support comparisons of the proposed and existing camera interface architectures are given in Table 2 along with corresponding costs, where the cost of the required peripherals (i.e., stand and connecting PCB respectively) are also considered along with the camera cost. From the Table 2, it can be seen that the camera interfaces [42][43][44][45] developed by the corresponding manufactures supportsonly a few number of their boards, whereas the proposed architecture supports any FPGA board with minimum peripherals for interfacing. To prove this, the proposed architecture is implemented on two different level FPGA boards. Moreover, the cost of the entire architecture is very less and mainly depends on the CMOS camera cost. This is due to the absence of extra processing interfaces for the camera module [42][43][44][45].

| Hardware resources
The utilised hardware resources of most of the subblocks and total module is compared with the corresponding existing technique's hardware resources utilizations to prove that the proposed architecture is better in terms of hardware resource F I G U R E 8 Experimental setup for real time implementation F I G U R E 9 Schematic of the proposed PCB architecture 12utilizations as well. To get valid hardware resource comparison, it is essential to use the utilization values of a particular block for the same or similar kind of FPGA board.

| Camera controller
The comparison of hardware utilizations of the proposed camera controller with the existing controller presented by Xiaokun et al. [46] is compared and given in Table 3. From the table, it can be seen that the proposed camera controller architecture requires less hardware resources than the existing for the same FPGA board (Artix 7). This is because, the proposed camera controller architecture is optimised with the specifications of the OmniVision camera series.

| VGA signal generator
The comparison of hardware utilizations of the proposed VGA Signal Generator with the existing VGA Signal Generator is compared and given in Table 4. The VGA Generator architecture presented by Xiaokun et al. [46] is implemented on Artex-7 FPGA. The unoptimised way of using logical components to generate the VGA signal leads to large hardware utilizations than that of the proposed. Similarly Arun Babu [47] presented a Graphic Controller which was implemented on Virtex 5 FPGA. The use of generalized IP Cores inside the architecture in the unoptimised way requires large hardware resources than that of the proposed. On the other hand, the proposed VGA Generator architecture is optimised with respect to the camera architectural design using basic gates in a very optimised way without using any IP Cores.

| TMDS encoder
The hardware utilizations of the presented TMDS Encoder is compared with the existing TMDS encoder presented by Roshan and Patil [48] which is shown in Table 5. The TMDS encoder proposed by Roshan and Patil [48] was implemented on Spartan 6 FPGA. The proposed architecture requires very less hardware for the same (Spartan 6) FPGA. This is mainly due to the use of TMDS algorithm which is optimised by using basic gates, comparators, adders, and subtractors.

| Total module
The hardware resources comparison of the proposed camera interfacing architecture with similar existing architecture is given in Table 6. An effective CMOS camera interface architecture is presented by Xiaokun et al. [46] which is implemented on the Artix 7 FPGA board and verilog HDL is used for coding. The main reason behind the large hardware utilization than that of the proposed is the use of subblocks in the overall architecture without any optimization. A CPU and FPGA base camera interfacing architecture is presented by Bhowmik et al. [49] which is implemented on the Xilinx Zynq 7000 SoC FPGA board with Vivado tool. The main reason for large hardware requirements by this architecture than existing those of the proposed is the use of generalized IP Cores to implement total architecture without proper optimizations. A smart camera architecture is presented by Zhou et al. [50] which is implemented on the Zynq 7020 FPGA board using the Sum of Absolute Differences (SAD) based Mosaic Algorithm. The use of SAD based Mosaic Algorithm in an unoptimised way increases the hardware requirements. The Xilinx provides IP Cores [51] to interface a specific camera with the Zynq 7000 SoC FPGA board using the ARM CortexA9 present in that board through embedded programing techniques. This increases hardware requirements drastically. A new video processing system is presented by Honegger et al. [52] which uses the FPGA for image acquisition and a mobile CPU for processing. This architecture uses an MT9V034 CMOS camera and an Artix 7 FPGA to acquire video frames. The interfacing architecture is implemented using the embedded technique which increases hardware utilizations. On the other hand, each subblock is designed in an optimised way to get optimised hardware utilizations for the entire architecture. This is achieved by replacing complex logical architectures with the corresponding logical architectures with basic gates.

| CONCLUSION
In this paper, an efficient hardware architecture to interface low cost digital camera with FPGA is proposed which can be used for real time video capturing and processing. However, at the time of optimization of these architectures, data accuracy was also taken care of by considering sufficient intermediate bit sizes. As a result, the proposed method has less hardware complexity, low cost, and is more accurate compared to existing techniques. In future, an interface of higher resolution camera (HD, 4K etc.) with more frame rates and more sophisticated processing algorithm to generate better quality images will be implemented.

| ABBREVIATIONS AND APPENDICES
The abbreviations used in this paper are as follows