Machine learning assisted analysis of equivalent circuit usage in electrochemical impedance spectroscopy applications

Electrical equivalent circuits are a widely applied tool with which electrical processes can be rationalized. There is a wide‐ranging selection of fields from bioelectrochemistry to batteries to fuel cells making use of this tool. Enabling meta‐analysis on the similarities and differences in the used circuits will help to identify commonly used circuits and aid in evaluating the underlying physics. We present a method and an implementation that enables the conversion of circuits included in scientific publications into a machine‐readable form for generating machine learning datasets or circuit simulations.

Instead, circuits are, if even given at all, provided in various differing distribution methods and formats.As an example, the inclusion of electrochemical impedance spectroscopy (EIS) equivalent models in scientific literature is usually performed by providing the circuits as schematic images, with the use of circuit description strings such as Boukamp's circuit description code (CDC) 7 or the use of common equivalent circuit names playing a minor role.It is, of course, difficult to parse images into machine intelligible format that can be used in circuit simulation.This challenge is exasperated by the inconsistency in the representation of schematics as drawn by authors outside the electrical engineering field.While basic circuit elements such as capacitors, resistances, and inductors are often, but not always, represented in a style approximating IEC60617, 8 elements unique to fields such as electrochemistry, for instance, Warburg elements of different types, defy a discernible wide spread standard or even unambiguous representation, see Figure 1.A method that meets this challenge and is capable of successfully discerning a wide range of different drawing styles is thus required.
To meet the challenges described, methods and algorithms had to be devised to solve the three core problems impeding equivalent circuit dataset creation from EIS scientific publications: • An algorithm that performs object detection on the pages of scientific publications to identify candidate equivalent circuits • An algorithm capable of parsing images of equivalent circuit schematics into model strings.
• A system capable of acquiring documents from many scientific publications as well as associated meta-data.
To aid understanding of the following subsections a flowchart of the system is provided in Appendix A.
Additional details on the used algorithms as well as the source code for them can be found in the following git repository: https:// git-ce.rwth-aachen.de/carl_philipp.klemm/CircutExtractorYOLOor in a GitHub mirror: https://github.com/IMbackK/CircuitExtractorYolo.

| Model detection
Detecting candidate models from rendered pages of portable document format (PDF) documents corresponds to a computer vision problem called object detection.Fortunately, object detection is a problem that has garnered much attention from researchers and implementers alike, and thus, a great wealth of theoretical approaches and practical implementations exist.Therefore, it is sufficient to simply choose a model architecture from the available implementations and train the system on the present problem of circuit detection.Three implementations were evaluated: The cascade object detector shipped with OpenCV 9,10 and the convolutional deep neural network (CDNN) based classifiers YoloV5 11 and PyTorch-YoloV4 12,13 All were trained on a dataset of 68 full pages from scientific publications, see Figure 2 for a representative example of such a page, as well as 10 synthesized pages containing a total of 326 equivalent circuits and tested on a dataset of 60 pages containing 80 equivalent circuits.† Ultimately, all implementations examined were able to recognize equivalent circuits with sufficient accuracy.
The cascade object detector was tested only on a subset of the validation dataset and managed 32 out of 36 with no false positives.
The results for the convolutional neural-networks are tabulated in Table 1.
The occurrence of a reasonable number of false positives at this stage is much less concerning than false negatives as the subsequent parsing systems will be unlikely to succeed in parsing a false positive circuit.Thus, it will still be discarded at a later stage, therefore, hyperparameters where selected for maximum recall at the expense of some loss of precision.As YoloV5 provided the best performance, if only marginally and statistically nonrelevantly so, it was chosen and implemented into the system.‡

| Circuit parsing
The parsing of a detected equivalent circuit image is performed in four steps: F I G U R E 1 From top to bottom: constant phase elements (CPEs), warburg elements, resistors, and inductors as represented in a nonstandard style.
F I G U R E 2 A representative example of a pages used in the training and validation datasets. 14.Object detection of circuit elements, 2. Detection of lines with clipping and post-processing thereof, 3. Sorting of elements and lines into nets, § 4. Generation of the output string from nets.

| Circuit element detection
Object detection of circuit elements is performed analog to model detection described in Section 2.1 with another YoloV5 stage.In a dataset of 516 images of equivalent circuits detected from scientific publications by the first stage, 2398 elements were labeled by hand.¶ The dataset was randomly split into a training dataset of 484 images and a validation dataset of 32 images.The 484 images of the training dataset contained 2234 elements, with two images containing no elements as they were misdetections by the first stage.The validation dataset of 32 images contained 164 elements.The system achieved an overall F1 score of 0.95 on the validation dataset see Table 2.

| Circuit direction detection
For subsequent analysis, it is necessary to determine the direction of the depicted circuit.To this end, the smallest possible rectangle containing the center points of all elements is computed, and its aspect ratio is examined.Aspect ratios of greater than, or equal to one are ascribed to a horizontal circuit layout and a vertical layout is defined as one with aspect ratios of less than one.

| Line detection
In order to determine how the elements are connected to each other, the lines between them are detected by first thresholding the image and then executing a thinning stage using the Zhang-Suen algorithm, 15 see Figure 3. # Subsequently, candidate lines are identified using a Hough transform 16 and are filtered using several custom stages.k The lines are then clipped against the circuit elements detected in the previous stage to remove line segments that are not part of the connections between elements.

| Net-list creation
Recursively, lines are sorted into connected groups of lines.These groups are then taken to be netsin the electrical engineering sense.
The elements are then added to these nets by checking for termination of the line in a net into the left and right edges of the elements, in the case of a horizontal circuit, or top and bottom edges, in the case of a vertical circuit, forming a net-list.
In contrast to the general electrical engineering case, each element in an equivalent circuit has exactly two terminals.Elements that are found in the circuit to be parsed with seemingly more nets are thus further examined to determine which nets are really connected and which connections are spurious.Assuming a horizontal circuit, nets with lines that terminate closer to the center of the left or right edges of the bounding box of the element are preferred to be kept.Excess connections to nets, which are lines that terminate further away from the center of the edges, are discarded.In a vertical circuit, the same procedure is performed, but here, the center of the top and bottom edges are used instead.If an element is found to be connected to less than two terminals, a check for overlap with adjacent elements is made.Suppose an overlapping adjacent element which also has less than two connections to nets is found, and this second element is also not already directly connected to the first element by a net.In this case, it is assumed that a connecting net was clipped away in the previous stage.A new net is thus inserted, connecting the free terminals of the adjacent overlapping elements.Nets with no elements are discarded.A schematic of the above discussed steps can be found in Figure A1 in the Appendix.

| String generation
To create a circuit description string, the input net and the output net must be found.Candidate input and output nets are determined by taking the left and right most net, in the case of a horizontal circuit, or the top and bottom-most net, in the case of a vertical circuit.
T A B L E 1 Scores for the convolutional neural-networks methods in detecting the presence and location of the circuits on pages from scientific publications.Commutative operations are then rearranged based on an arbitrary element type priority to increase the likelihood that mathematically identical equivalent circuits are assigned the same string.

| Evaluation
To test this system, 100 images of circuits were randomly sampled from the dataset created in Section 3.1, and converted into strings by the system (Figure 4).The same images were then also evaluated by hand.Of the 100 images, 74 were converted correctly, 24 were converted with errors, and conversion failed on two.Out of the 24 images converted incorrectly, it was impossible to determine the correct CDC by hand for four of them, see Figure 5 for an example.One of the circuits where conversion failed contained a transmission line.The system thus achieved an accuracy of 74% or, if the circuits which were impossible to convert by hand were removed, 77:1% in this test.A notable common cause of misparsings was circuit density and resolution, with very dense and low-resolution circuits constituting a large amount of the misparsings, see Figure 6.

| Limitations in circuit parsing
Aside from the aforementioned accuracy, the system also has some systemic limitations.No attempt was made to separate different kinds of Warburg elements as it is almost never evident in the representations of equivalent circuits what type of Warburg is in use.The system does not include the capability to detect transition lines.Some articles use symbols in equivalent circuits that are not determinable using just the schematics themselves.This includes the use of the capacitor symbol for CPEs, the use of the resistor symbol for inductors as well as the use of generic boxes with a Z label for any of the elements, see Figure 7 for an example.While the circuit finding algorithm has some ability to determine if a circuit can be understood as an EIS equivalent circuit, no specific attempt at this was made.Furthermore, no equivalent circuits that are embedded into a schematic of the wider measurement system are recognized.For performance reasons, only maximally the first ten pages are examined for each document.

| CREATION OF AN EQUIVALENT CIRCUIT DATASET USING THE SYSTEM
With the aforementioned challenges met and the limitations understood, the system was used to generate a large dataset of equivalent Top row left to right: Circuit as found by YoloV5 in, 17 after thresholding, after Zhang-Suen 15 thinning Bottom row left to right: Lines as detected using Hough transform 16 after custom filtering and adjustment stages, clipped against elements and overlaid over original image.
circuits.This dataset was created in order to thoroughly test the system in a production environment and to allow the generated dataset to be used in subsequent meta-analysis.

| Input dataset
To acquire a sufficient input dataset of scholarly articles, a dataset acquisition shared library, dubbed libscipaper, was established that enables the usage of multiple backend sources in a modular manner.** Using this new library, a search for "Impedance Spectroscopy" in the metadata as well as full text, where available, of the articles was performed.This resulted in 43,380 hits, of which 19,224 were successfully grabbed in PDF form by libscipaper and passed a simple filter for language and length.Unfortunately, as the licenses of the papers acquired with this method vary and no information about the license per document is reliably obtained, the dataset cannot be made freely available.An equivalent dataset for reproduction can, however, easily be recreated using libscipaper.To this, 1255 articles from papers acquired by rhd instruments were also added by hand.These then constituted the input dataset for subsequent analysis.F I G U R E 5 Circuit with unclear conversion. 18I G U R E 6 Missparsed circuit with high density and low resolution. 19I G U R E 7 Circuits with Z elements as captured from Reference 20.
F I G U R E 8 Papers included in analysis per field of study.Here, other refers to the sum of fields with a low amount of papers.

| Dataset analysis
As the distribution of the fields of study represented in the dataset imparts a bias on the elements encountered as well as the way they are drawn and thus is liable to introduce biases in the convolutional neural network (CNN)'s classification of elements, the dataset was analyzed with the help of a naive-bays classifier and a keyword search.
As can be seen in Figure 8, corrosion, batteries and the biology field, which also contains articles of medical nature, are the most Equivalent circuits across whole dataset, circuits given in Boukamp CDCs. 7In this case, other includes circuits with less 10 examples and circuits which could not be parsed into a string.
prevalent users of EIS found in the dataset, and thus, it can be assumed that these fields are major users of EIS.It must be avoided, however, to immediately draw the conclusion that the relative prevalence of EIS usage in these fields is contained in the ratios of EIS scholarly articles found above.As indicated in a meta-analysis by papers as open-access articles. 21As the dataset contains mostly open-access articles, this discrepancy is expected to significantly skew the prevalences found in Figure 8 and is potentially liable to introducing biases in the classification of circuit elements.
10:4% of documents contained a parseable circuit, with the values per field of study ranging from 8% to 11%, with the corrosion and fuel cell fields denoting outliers at 16% and 7% respectively.

| Equivalent circuits
As can be seen in Figure 9, the system found 3722 equivalent circuits, 429 of which failed to parse into a string in the final stage of system; these were noted as other.All results are presented as Boukamp CDCs. 7Circuits with less than 10 examples within one field were also counted as other to reduce the number of included parsing errors.
362 circuits met this criterion.As the dataset used in Section 2.2.6 for evaluation was a randomly selected subset of the dataset used for Figure 9, confidence of the data presented to the limits of the results in Section 2.2.6 is present.Overall, the expected common equivalent circuits 22 can be identified.However, several artifacts of note can also be seen: • The often found RC and RCL circuits are sometimes not used by the scientific article in materials investigation but are instead used for incidental circuit analysis of, for instance, measurement apparatus.
• CCC and PP are common misparsings of a battery series circuit as the battery symbol is close to the C and P symbols and were not encountered in the training data.These were thus removed from the presented results.
• Based on the analysis in Section 2.2.6 we additionally expect the figure above to contain a similar ratio of total misparsings.

| SUMMARY AND CONCLUSION
The current system's quality and accuracy suffice to give a good idea about the usage of equivalent circuit models and help in the task of creating a dataset for machine learning applications.Nevertheless, there are still possibilities to optimize the system.Chief of these would be a method of determining if a circuit in question is intended as an equivalent circuit.This could be performed by executing a text analysis stage on the text close to a candidate circuit, especially the figure description.Results could further be improved by also examining other sources of models, such as looking for models included as Boukamp coding.Exceptions could also be made in the system to reject well-known nonequivalent circuits, such as the RR …sequences or electrical probe models.Improving the field of study classification system by including keywords as well as training data of papers in languages other than English.
While the current use of the system targets simulation as the means to acquire impedance data from the circuits parsed, parsing information from the associated plots would also be of great interest.While extraction of data from charts, in general, has seen some promising study that could be adapted to the parsing of impedance plots. 23,24Currently there is a lack of systems able to reliably determine which chart, if any, in a publication contains data that the authors of the publication used the parsed equivalent circuit on.As this information is often only available in prose, this problem would require the training of a language-image model.Potentially, a system adapted from the Blip, 25 architecture could be applied to this problem, but further investigation is required in this area.
It should also be noted that the barriers to the creation of datasets from scholarly articles for the creation of powerful machine-learning systems are still needlessly high.Simple but significant problems make the creation of datasets difficult: • Paywalls inhibiting the extraction of published data are the first hurdle for such a system.
• The representation of the data in PDF-format alone and the varying quality of figures.
• The inclusion of meta-data on measurements only in the form of prose text.
• Image copyright restrictions hamper the ability to share machinelearning datasets • The nonunified ways to depict or describe models are a particular problem that would best be solved by the standardization of the depiction of electrochemical elements or the increased use of machine readable model descriptions such as Boukamp coding.
These are, nevertheless, well-known problems that call for a reform of publication strategies.
Additional applications outside of equivalent circuit conversion are also a possibility.It may be of value to remove the final stage of the system that parses the net-list into a Boukamp CDC and use the and output net out of the pool of candidates is chosen based on the location in the circuit relative to the circuits orientation.The remaining nets not assigned as the input or output net are then examined for loose ends.A loose end is any point where a line ends without a connection to an element, except in the input or output net.If a loose end in the net under examination is found, nearby nets are examined to find a loose end within a small region around the first loose end.If a matching loose end in a nearby net is found, the two nets are joined into one, healing the loose ends.The nets are then further subdivided into connecting nets, those with just two elements, and branching nets, those with three or more elements.As can be seen in FigureB1, to create the final string, a list of nets is traversed via the branching nets to the point of the deepest branch, and all connecting nets at this level are then collapsed forming n series circuits.The strings corresponding to these series circuits are saved.The branching nodes now connected only by collapsed connecting nodes are then themselves collapsed into parallel circuit constructions, with the previously saved strings of series circuits used as the parallelized elements.This operation is recursively repeated until all nets have been collapsed, forming the final string.

F I G U R E 4
Left to right: Input, element detection and network detection, parsing.

H.
Piwowar et al., the prevalence of open-access publication varies significantly inside these fields.The biomedical, biology and clinical medicine fields publish 58:49%, 32:70% and 47:81% of articles as open-access, respectively.Chemistry and engineering and technology researchers, which corresponds closest to the batteries and corrosion fields presented here, publish only 17:40% and 15:49% of their algorithms and neural networks up to this point to parse arbitrary circuits for general electrical engineering purposes.APP E NDIX A: NET-LIST PARSING FLOWCHART F I G U R E A 1 Abridged flowchart showing how circuit image to net-list parsing is performed.APP E NDIX B : CIRCUIT STRING PARSING F I G U R B 1 Steps used to parse a net-list into the string format of the eisgenerator simulation tool, see https://uvos.xyz/kiss/eisgeneratordoc/ modelpage.htmlfor information on the string format used.