A highly accurate and robust deep checkerboard corner detector

Checkerboard corners are routinely used during tasks such as camera calibration, and accurate detection of them is essential. Traditional methods such as those based on the Harris corner detector are usually affected by image artefacts such as noise and blur. The more recently proposed deep network approach also suffers from the lack of accurate training data. In this article, we make the following contributions: First, we propose a synthetic training data generation method that simulates the real imaging process, with the most notable being that the exact sub-pixel corner positions in the image become readily available. Second, we design a relatively simple deep network and train it using the synthetic data generated by the proposed method. Finally, because the exact sub-pixel corner positions can be obtained in the proposed method, this article paves a way for objectively comparing different checker-board corner detectors in terms of metrics such as the mean Euclidean location error. Experimental results using both synthetic and real data show that the proposed detector signiﬁcantly outperforms typical meth-ods, including the commonly used Matlab camera calibration toolbox, the OpenCV checkerboard corner detectors, and the more recently proposed deep learning-based methods.

Introduction: Camera calibration is a classical and fundamental task in computer vision which aims at estimating the intrinsic parameters and the distortion coefficients of the camera.With the advantage of robustness to distortion bias and perspective bias [1], checkerboard is often the default pattern for high precision camera calibration.However, checkerboard images taken in practice may suffer from extreme poses, lens distortion and sensor noise.These problems can lead to inaccurate and even incorrect checkerboard corner detection.
In [2], a gradient co-variance matrix is computed for every pixel and its eigenvalues are used to locate the corners.Geiger et al. [3] used two corner prototypes to convolve with the image for corner candidates, and then scored the candidates by a correlation function of the gradient's direction and amplitude.Placht et al. [4] presented a full checkerboard detector which sought for a graphic structure of the checkerboard in a skeletal binary edge image.Ha et al. [5] presented a corner detection method based on fitting a monkey saddle point surface directly to the image intensity samples.Fourier transform-based methods such as [6] and [7] have also been proposed.The more recently proposed deep learning-based approach [8][9][10] typically trains deep networks on real checkerboard images with manually annotated ground-truth corner locations, which will invariably introduce additional labelling error.
We propose a synthetic data generation method to simulate the real imaging process, which can produce exact subpixel corner point coordinates.We also design a simple and fast checkerboard detecting network which is trained on the synthetic dataset.With no labelling error in the training data, the proposed network, after thus trained, is very robust to extreme poses, lens distortion and noise.Experiment results on both synthetic and real data show its advantages over the state-of-the-art.

Methods:
In the following, we present our method of generating synthetic data, the detailed architecture of the detection network, and a nonmaximum suppression post-processing step.

Generating synthetic training data:
We generate synthetic checkerboard images with exact corner point coordinates under different lighting, noise, blur, and camera poses by simulating the real imaging process.We first place an ideal analog checkerboard image plane of unit-size black/white blocks parallel to the x-y plane at z = z 0 , so that all the corner points take the form of [m, n, z 0 ] where m and n are integers.We use Rt to denote the operator of rotation and translation.For any ( Next we describe the key steps of the process of generating the training images.i.To simulate different lighting conditions, the intensity of the checkerboard's surface is uniformly random in the ranges of [0, 0.3] (black squares) and [0.7, 1] (white squares).ii.Fixing z 0 = 6, we sample the analog checkerboard at intervals of 0.001 in both x and y directions to get a dense image plane I 0 .iii.Apply random operator Rt to I 0 to obtain image I 1 .Note that translations along the z-axis results in different image resolutions.iv.Apply the pinhole imaging operator Ph to I 1 to get the image plane I 2 at z = 1.v. Form a less dense and finite image I 3 from I 2 as follows.
(a) The finite viewing window is at z = 1 and of unit size (x, y ∈ [−0.5, 0.5]).It is divided into N × N squares, each representing a square pixel of size w p × w p in image I 3 , where w p = 1/N.Label a general pixel in I 2 as p with analog coordinates (x p , y p ) and intensity I 2 (p).(b) As illustrated in Figure 1, for any pixel (i, j) in image I 3 , its centre position is given by and its value is designed as the Gaussian weighted sum of all those points of I 2 that fall into its area A(i, j) of size w p × w p , as where the Gaussian kernel is We denote this operator as Gi.vi.To simulate various blurring effect and noise, we apply zero mean, Gaussian blur (5 × 5 window size) with random standard deviation ranging from 0 to 1.6 and Gaussian noise with random standard deviation ranging from 0 to 8 to I 3 and obtain a final training image I.
The above procedure is summarised as We note that although the above procedure is only an approximation to the real imaging process, it turns out to be very effective, as shown in the experiments later in the article.Two samples of the thus generated training images are shown in Figure 2.
Checkerboard corner detection network: As depicted in Figure 3, we propose a network consisting of a 64-channel 3 × 3 convolutional layer, a 64-channel 3 × 3 residual block, two linear layers which have 255 and 1 nodes, respectively, and a sigmoid layer.The leaky ReLU unit uses the slope of 0.1.For low computational cost, we feed the network with 15 × 15 image patches.The network finally produces a float number which represents the confidence that the centre position of the input is a checkerboard corner.With a stride of 1, we slide the 15 × 15 window over the whole H × W checkerboard image and then get a (H − 7) × (W − 7) confidence map.

Non-maximum suppression post-processing:
To convert the confidence map to the final binary detection image, we apply a standard 7 × 7 nonmaximum suppression procedure [11].For every pixel with value greater than 0.2, we set it to 1.0.Figure 4 shows the procedure.The synthetic dataset consists of 2000 checkerboard images of resolution N = 360.Among them, 1800 images are used for training, 100 images for validation and 100 images for testing.These images contain an average of about 160 corners each.Before being fed to the network, the 360 × 360 gray-scale checkerboard image I has its intensity range linearly normalised to [0,1].
Because the proposed network takes a 15 × 15 greyscale image as input, the final training dataset consists of 15 × 15 image patches rather than 360 × 360 checkerboard images.For training data balance, we choose four mutually exclusive types of images as shown in Figure 5, with each type accounting for 25% of the training dataset.
During training, we use the batch size of 16, adapt a cosine-annealing decay of learning rate ranging from 1 × 10 −3 to 1 × 10 −6 .Utilising a warm restart with the period of 1 × 10 6 batches, we train the network with an Adam optimiser on L2 loss for 4 × 10 6 batches.
Experiments: OpenCV [12], Matlab [13], Ha et al. [5] and CCDN [9] are used for comparison with the proposed method.The proposed detector aims at locating the corners at integer pixel positions.If sub-pixel refinement is employed in any comparing method, we round the result to the nearest integer pixel position.Both synthetic and real data are used in the experiments.Besides, border regions of the images of seven pixels wide do not participate in the experiments.Matlab Camera Toolbox is used to calibrate the camera [14] and calculate the mean Euclidean reprojection error (MERE) [13].In the OpenCV method, we use functions goodFeaturesToTrack and cornerSubPix [12].

Synthetic data results:
The dataset for testing has 100 360 × 360 images and contains a total of 16,511 corner points.We directly compute the mean distance between the detected corner positions and their exact ground truths.
If a checkerboard image has N 0 ground truth corners and N 1 of them have detected corners located in their 3 × 3 neighbourhoods, then we call N 1 /N 0 corner detection rate (CDR) [5].Mean Euclidean location error (MELE) [9] is the mean distance between N 1 detected corners and their ground truths.False positives [9] are detected corners with whose 3 × 3 neighbourhoods contain no real corner point.Then we calculate mean values of the criteria mentioned above on all testing images.When the corner detection rate of an image is zero, we consider it to be a failed detection.
In Tables 1 through 5, we list the results of various methods at different noise levels (standard deviation σ n ).The proposed method can successfully detect all the corners at different noise levels with the lowest MELE and MERE.This shows that our method is robust and accurate under different poses, resolutions and noise.

Real data results:
We evaluate our method on Mono (provided by Matlab Calibration Toolbox [13]) and GoPro (provided by [4]).Mono is a 1072 × 712 checkerboard dataset with no lens distortion.GoPro is a 4000 × 3000 dataset with slight lens distortion.Using the Matlab function imresize, we resize GoPro to 1200 × 900, 800 × 600 and 400 × 300 to evaluate various methods against different resolutions.The best value(s) in each column is in bold.
With the known grid size of the checkerboard and the unknown exact ground truth for real data, we only compute the MERE index, as is done in [5,10], and the CDR index.
Figure 6 shows two examples of detected results by the proposed detector.As shown in Tables 6 and 7, our method can detect all the checkerboard corners with the lowest MERE and highest CDR.The results show that our method is quite robust to resolution change and lens distortion.

Conclusion:
We propose a new synthetic data-generating method and a highly robust and accurate deep network for checkerboard corner point detection.The synthetic data-generating method simulates the real imaging process and can produce training data with exact subpixel ground truth corner point locations.The proposed five-layer deep network is relatively simple and fast.After being trained by the synthetic dataset, it shows excellent detection accuracy and robustness against resolution change and different levels of noise.Experiments also demonstrate its improved performance over currently typical methods on both synthetic and real test datasets.

Fig. 1
Fig. 1 Illustration of the geometrical relationship in step 5.The black dots on the background are the points on I 2 .A pixel on I 3 is drawn as red with area A(i, j) and centre (c i , c j ).With our choice of parameters, each A(i, j) contains about hundreds of black dots when generating training images

Fig. 3
Fig. 3 Architecture of the proposed corner detection network.The network takes 15 × 15 greyscale image as input and output a value which represents the confidence that the image centre is a checkerboard corner point

Fig. 5
Fig. 5 Four types of 15 × 15 images patches in the training dataset.The centre pixel is mark with red square.(a) The centre is a ground truth corner.(b) The centre is adjacent to the ground truth corner.(c) The center locates on a border line between black and white blocks.(d) The centre is in a general non-edge smooth black or white area

Fig. 6
Fig. 6 Example results of the proposed corner detection method on real data.(a) Mono data.(b) GoPro400 data

Table 1 .
MELE (px) of different methods on synthetic data at various noise levelsThe best value(s) in each column is in bold.

Table 2 .
Results on synthetic data when σ n = 0

Table 3 .
Results on synthetic data when σ n = 4

Table 4 .
Results on synthetic data when σ n = 8

Table 5
The best value(s) in each column is in bold.

Table 6 .
MERE (px) of different methods on real datasets

Table 7 .
CDR of different methods on real datasets