The double‐tree method: An O(n) unsteady aerodynamic lifting surface method

Two new methods for reducing the computational cost of the unsteady vortex lattice method are developed. These methods use agglomeration to construct time‐saving tree structures by approximating the effect of either a group of vortex rings or query points. A case study shows that combining the two new O(n·log n) tree methods together results in an O(n) method, called the double‐tree method. Other case studies show that the trade‐off between accuracy and speed can be easily and reliably controlled by the agglomeration cutoff distance. For a flat plate with 5 × 200 panels analyzed over 20 time steps, the double‐tree method is 7 times faster than the unsteady vortex lattice method with a <5% difference in the force distribution and total lift coefficient. The case studies suggest that the computational benefit will increase for the same level of accuracy if the size of the problem is increased, making the method beneficial for full‐aircraft analysis within optimization or dynamic load analysis, where the computational cost of the unsteady vortex lattice method can be large.


INTRODUCTION
The unsteady vortex lattice method (UVLM) is a well-established and understood mid-fidelity aerodynamic analysis tool. One of the most useful descriptions of the method is given by Katz and Plotkin, 1 whom provide a detailed description of each necessary component of the UVLM.
An excellent review of the applications of the UVLM is provided by Murua et al. 2 They state that the UVLM is important in applications with a significant degree of geometric complexity. This can include, for example, rotating blades, such as in wind turbines, 3,4 or large-deflection wings, such as flapping wings, 5,6 morphing wings, [7][8][9] and highly flexible wings. [10][11][12] These applications are mainly focused on preliminary design and optimization studies that are made possible when the analysis of geometrically complex systems becomes feasible.
The computational cost of the UVLM is quite large despite its moderate level of fidelity. From the basic UVLM theory, the influence of every panel to every other panel is calculated; furthermore, the total number of panels increases each time step due to the requirement to shed a row of wake panels on the trailing edge of the wing. 1 This means that the order of computations is O(n 2 ), where n is the number of panels, and that the total number of panels over the course of an

UVLM METHOD
The UVLM is an incompressible time-domain 3D lifting surface method used to simulate the flow of a fluid around a solid object. 1 The solid object is discretized into a structured grid of aerodynamic panels, aligned with the leading edge of the object, placed at the boundary of the object and the fluid. Each panel is represented by a combination of a vortex ring and a control point. The leading edge of the vortex ring is placed at the quarter-chord location of the panel, while its sides are coincident with the panel sides. 1 The trailing edge of the ring is determined by the quarter-chord location of the panel immediately downstream of the current one. The control point is placed in the center, at the three-quarter-chord of the panel. A diagram of this layout is shown in Figure 1.
The vortex ring is comprised of four singular vortex filaments that each share the same strength value. Each vortex filament induces a tangential velocity on the entire flow field proportional to its strength and length, and inversely proportional to the distance from the filament, shown by the equation given by Bertin and Smith for the velocity at a point due to a vortex filament 19 : where Г represents the vortex strength, and each vector is defined in Figure 2. Each filament on the ring is then placed anti-clockwise around the center of the ring such that each filament induces velocity in the same direction inside of the ring (see Figure 1). Since each filament for the same ring has the same strength, this also means that a rectangular ring will induce velocity in the opposite direction outside of the ring.
As shown in Figure 3, the wing is discretized into a structured array of panels; in this case only the vortex rings are shown for clarity. At each time step, a row of wake panels are shed at the trailing edge of the wing according to the equation: where X r is the trailing edge vortex ring corner point; X w is the corresponding wake vortex ring corner point; V is the air velocity at the point X r relative to the wing (including the kinematics of the wing itself); and t is the time step. This means that the point X w is essentially at the point where an air particle starting at X r would end up after one time step relative to the wing. X y in Figure 3 is found using the same time integration scheme from X w . This process is repeated for every time step in the analysis, meaning that both the number of calculations required to determine the velocity at a point increases; and the number of points in the domain where the velocity must be known also increases. This gives rise to both the order of computations O(n 2 ) and a large n in the analysis, which is what leads to the method's large overall computational cost.
Given that the required number of time steps in an analysis will generally be much larger than the number of chordwise panels, most of the computational cost of the analysis is due to the wake calculations, which is the part of the analysis which the double-tree method and the Willis et al 14 method mainly address.
A more detailed description of the UVLM, including discretization of the wing and wake generation, can be found in Reference 1. The next section will discuss the construction of the filament tree, which uses agglomerated rings based on UVLM theory to solve the problem in O(n ⋅ logn) time.
F I G U R E 3 Diagram of wake shedding in the UVLM. Black lines represent the wing, whereas blue lines represent the wake. Г j t is the strength of ring j at time t [Colour figure can be viewed at wileyonlinelibrary.com] F I G U R E 4 Agglomeration of vortex rings. X 1 through X 4 are the mean coordinates of the points on their respective edges. The dimensions of the agglomerated ring are then found by dividing the vectors C and S by the number of rings in the respective directions. The center of the agglomerated ring X 5 is the mean of points X 1 through X 4

FILAMENT TREE
In order to reduce the order of computations, far-field vortex rings are agglomerated into one ring, constructed such that the difference in the velocity field reduces as the distance to the agglomerated ring increases. This can be done by constructing a ring whose size is the average of the agglomerated rings and whose strength is the sum of the agglomerated rings.
The method of agglomeration in the double-tree method is based upon the assumption that the rings are arranged into a regular two-dimensional structure. Points on the outside edges of an array of vortex rings are used to find the average dimensions of the rings, as illustrated in Figure 4.
The span of the agglomerated ring is defined as the mean span of the rings it agglomerates, and can be calculated by taking the mean edge positions of the rings; finding the difference between them; and dividing the result by the number of spanwise rings. The mean top edge position is calculated from where X top is the set of points along top edge and N X top are the number of points along the top edge. The same process can be followed to find points X 2 through X 4 ( Figure 4). The span vector of the agglomerated ring is then calculated using where N span rings is the number of vortex rings in the spanwise direction. The same process can be performed to calculate the chord vector, C. The center point of the agglomerated ring is The points X 6 through X 9 are used to define the agglomerated ring. They are found from the center point X 5 as follows: Finally, the strength of the agglomerated ring is where i is the index of a vortex ring, n the number of rings, and Г is the vortex strength defined in Equation (1). To verify the agglomeration method, the difference in the velocity field compared with the UVLM is plotted for four vortex rings agglomerated into one. Given that the difference is shown to drop off rapidly with distance ( Figure 5), this agglomeration method can be considered a valid representation of far-field vortex rings. When the number of agglomerated rings is squared, the width of the difference field profile increases by a factor slightly above 2, otherwise retaining the same shape. This suggests a strongly linked relationship between the difference field and the number of agglomerated rings.
In order for a tree structure to be built, there must also be a method of agglomerating a group of up to four agglomerated rings. First, the total strength of the agglomerated rings is summed according to (7). Then, both the center point and the dimensions of the new agglomerated ring are found by the weighted average of the corresponding child ring data, where i is the index of an agglomerated child vortex ring, n the number of child rings, and Г agg is the agglomerated vortex strength defined in Equation (7). X c is the center point of an agglomerated ring (X 5 in Figure 4) and S agg its span. A similar method is used for the agglomerated chord length. Using this agglomeration method, a tree structure can be generated that allows for different levels of agglomeration around any given point such that more vortex rings are agglomerated as the distance from the query point (a point in the domain whose flow velocity must be calculated) increases.
A recursive algorithm is used to construct the tree ( Figure 6), which passes sub-sectional information down to the leaves, and agglomeration information up to the root. To construct the tree, the root cell (number 1 in Figure 6) is given two lists containing the spanwise and chordwise identities of the panels, as well as the locations of all points that define the rings. The first step is to find the minimum and maximum values of the vortex ring locations contained within the cell in each orthogonal direction, which is later used to calculate whether the cell is near or far-field.
Next, the total number of rings N rings (Equation (10)) is checked to see if it exceeds the bucket size (Willis et al. recommend between 10 and 15). 14 The bucket size determines the maximum size of a leaf cell, in this case the maximum number of vortex rings in the cell. If the size is too small, the tree will be large and searching it will be inefficient; whereas if the bucket size is too large then the tree will not have enough levels of refinement for the algorithm to exploit. F I G U R E 5 A, Plot of the difference in the velocity field around an agglomerated vortex ring (red). The high yellow area has a difference of 10% or greater, which extends out to seven ring lengths from the center of the original four rings (blue). The difference can be seen to drop off rapidly as the distance to the rings increases. B, The same plot of the difference field, but with a 16-ring agglomeration instead of 4. The area with a 10% difference or greater extends to about 15 ring lengths from the center. C, Both A and B are plotted together, with a slice through the middle and viewed from the side F I G U R E 6 Example tree structure.
The cell (1) contains all vortex rings, and splits them into four child cells by proximity. Leaf cells (circled) apply the direct agglomeration method to the rings contained therein, whereas all other cells use the agglomerated-ring method Generally speaking, though, the performance is not sensitive to small differences in bucket size, so any value in the aforementioned range should be suitable.
where N chordwise and N spanwise are the number of chordwise and spanwise vortex rings, respectively. If (10) is false, the cell is a leaf cell. The leaf cell stores the identities of the rings it contains, as well as the agglomeration of the rings according F I G U R E 7 Diagram showing how d is calculated in order to determine the cutoff distance criterion. In this example, q is in between b 2 lower and b 2 upper , but greater than b 1 upper and b 3 upper to Equations (3) to (7). The identity of the rings within the leaf cell are calculated using the chordwise and spanwise lists, along with information regarding the original size of the list. If Equation (10) is true, the algorithm must recurse. It was found empirically that the algorithm performs more efficiently if the aspect ratio of each cell is close to 1; therefore, the recursion of the algorithm changes according to the following criterion: If case 1 is true, two child cells are created, each containing half of the list of chordwise panels. Similarly, if case 2 is true, the child cells will contain half of the list of spanwise panels each. If neither is true, four child cells are created by dividing the panels both chordwise and spanwise. After recursing, the cell agglomerates its children according to Equations (7) to (9).
To calculate the velocity induced by the filament tree on a point, another recursive algorithm is used. Starting with the root cell, the cutoff distance criterion is calculated with the distance from the query point to the boundary of the cell.
where q i is the ith coordinate (i = 1, 2, 3) of the query point; x i is a vector containing the ith coordinate of all vortex ring corner points in the cell; and u is the cutoff distance. Essentially, d is the shortest distance from the query point to the surface of a cuboid drawn around all of the vortex rings in the cell, except inside of the cuboid itself where d is zero. This is illustrated in Figure 7. If Equation (13) is true and the cell is not a leaf, each of the cell's children are searched in the same way. If the cell is a leaf, then the rings contained within the cell are calculated according to the basic UVLM theory without agglomeration. If Equation (13) is false, the cell's agglomerated ring is used to calculate the velocity.

QUERY POINT TREE
The query point tree agglomerates together multiple points in the domain whose velocity must be calculated, referred to in this paper as query points. If a vortex filament is sufficiently far away from a group of query points, its influence will be calculated once for the average position of the query points, and the result added to the total velocity of each point in the group.
The query point tree is constructed in exactly the same way as the filament tree, except that the only agglomeration information required for a leaf cell is the average position of the query points contained therein; all other cells take the average position of their child cells, where ∑ x q is the sum of the query point positions in the leaf cell; N x q is their number; and x C is their average position. x C leaf is then the position of the leaf cell. Similarly, ∑ x C child is the sum of the positions and N x C child the number of the children of a cell, which can also be leaves, and x C branch is the position of a branch cell.
This average position x C is used as the query point in the UVLM velocity calculations. Similar to the filament tree, a distance cutoff criterion is used to determine whether a cell should be searched when calculating the velocities at the query points, the only difference being that the distance is measured at both ends of the filament in question and the minimum value used.
During velocity calculation, two separate variables are required to be used to store the velocity at the query points in order for the operation to be efficient. The first variable stores velocity values for all non-agglomerated operations, and the second stores the velocity induced for each cell in the tree.
, n = number of query points, , c = number of cells in query tree, where [] i×j denotes a matrix of size i × j.
After the analysis, the tree cell velocity can be utilized by accessing the chordwise and spanwise lists for each cell in order to identify which query points the velocity should be summed over. This is measured to be a negligible part of the total processing time.

Implementation
The double-tree method utilizes both the filament and query point trees in tandem in order to calculate the velocity at the query points. Both trees are constructed in exactly the same way; the only difference in the double-tree method is how the velocity is calculated. Starting at the root of both trees, the cutoff distance criterion is calculated using where b i is defined in the same way as in each tree individually (Equation (15)).
The following logic is used to determine how the two trees should be searched. As can be seen in Figure 8, at the start of the algorithm, the Double Tree subprogram is called, with parameters [1,1]. These parameters indicate that cell 1, which is the root, of each tree should be compared. d from Equation (20) is calculated to determine if the tree cells are far enough apart to perform a double agglomeration, where the influence of an agglomerated vortex ring (Equations (6)- (9)) is calculated on an agglomerated query point (Equations (16)-(19)). Thus, the computational advantage of both methods is exploited simultaneously. If the cells are closer than the cutoff distance, several checks are performed to determine how the tree should be searched. If neither cell is a leaf cell, the tree with the least depth, that is, the closest to the root is searched. This is done by calling another subprogram, either Query Search or Filament Search, which in turn call Double Tree for each child cell of the tree cell to be searched. Otherwise, if one of the cells is a leaf and the other is not, the non-leaf cell is searched, and finally, if both cells are leaves, the basic UVLM is performed. Using this procedure, all query-filament interactions in the domain are calculated with either double agglomeration or non-agglomeration.

Order of computations
The double-tree algorithm is shown to be O(n) for a simple case in the following way. Suppose there is a regular grid of n = 4 x vortex rings, each containing one query point in its center. For a cutoff distance of 0, the root of each tree requires further refinement, since they overlap. The children of each cell create a 2 × 2 grid of overlapping filament tree and query tree cells requiring three computations per cell, resulting in 24 total computations, as shown in Figure 9. Following the query-filament cell pair (A) in Figure 9, the interactions with cell pairs (B) to (D) have been calculated with double agglomeration, meaning the only remaining interaction is (A)'s filament tree cell with its query tree cell. Since the two cells in (A) overlap, further refinement is required, but because the grid is regular and the trees are constructed indexically, (A) will have exactly four child cells, whose arrangement is equivalent to that given in Figure 9. When a leaf is reached in this recursive manner, it will contain four vortex rings and control points. Thus, for every cell in the trees, 24 computations are required to find the solution. Since the total number of cells in this case is ∑ x−1 i=0 4 i , where x = log 4 n, the total computations in this case is 24 ∑ (log 4 n)−1 i=0 4 i , which can be simplified to 8n − 8. Thus, the order of computations is O(n) for a cutoff distance of zero. Intuitively, as the cutoff distance is increased to infinity, the order of computations increases to the O(n 2 ) of the basic UVLM. However, based upon the example in this section and empirical evidence shown in Section 6.1, the authors conjecture the double-tree method is O(n) for all interactions beyond the cutoff distance.

RESULTS
All test cases are implemented in MATLAB and run on the same machine, which has a 2.6GHz i7 processor. All test-cases are run non-parallel in order to fairly measure the order of computations and the computation time of each method, due to the differing parallelization schemes between the double-tree method and the other methods. The parallelization test case is the exception, which is run in MATLAB on a high-performance computing cluster with a specified parallel number of cores.

Order of magnitude
The first test is performed to measure the scaling of each tree method for problems with increasing order of magnitude. An m × m regular grid of vortex rings is generated, and the velocity at the corners of every ring is then calculated. The wall clock time for the filament tree, query point tree, and double-tree methods to create the trees and calculate the velocity field are compared for a cutoff distance of 0 and a number of vortex rings N ≈ 10 x , x = [1, 2, 3, 4, 5]. The iteration time is averaged across three runs, each of which exhibited the same trend. These results are not compared with the base UVLM due to its excessive computation time for large N. As shown by Figure 10, the double-tree method's iteration time per N decreases as N increases. This is likely due to constant overhead (such as tree generation) contributing a lower fraction of the total time as N increases, while the velocity calculation time per N remains constant. For N ≈ 10 vortex rings, the overhead dominates the computational cost, leading to an exceptionally high computational cost per vortex ring; however, the total iteration time for an analysis of this size is negligible anyway. The other two individual tree methods can be seen to have an increasing iteration time per N despite the same diminishing overhead effect, due to the logn part of the velocity calculation scaling. Based upon this evidence, it is reasonable to conclude that the double-tree method is O(n) and that the individual-tree methods are O(n ⋅ logn). It can also be concluded that the double-tree method only provides a superior computational cost compared with the other tree methods for analyses with approximately 1000 or more vortex rings.

Accuracy vs speed
The next test measures the effectiveness of the tree methods by measuring computation time and accuracy. The original UVLM is compared with the filament tree method; the query point tree method; and the double-tree method. The UVLM is verified against the Katz and Plotkin 1 data for a flat plate undergoing sudden acceleration. The accuracy is measured as the difference in the distribution of the magnitude of the forces between the UVLM and the tree methods, using the formula: where F is a 3 × N matrix of force values calculated on the wing panels. 1 The setup is of a flat plate undergoing sudden acceleration. In order for the wake to grow sufficiently in size with each time step, the plate has 5 chordwise and 200 spanwise panels, and has an aspect ratio of 40. The tree bucket size is 15 in every case; the time step duration is 0.003 second; the air density is 1 kg/m 3 ; the UVLM shedding ratio1 is 0.25; and the ground speed is 20 m/s. Results are compared for the same cutoff distances; however, each tree method can have differing levels of agglomeration for the same cutoff distance. The computation time is measured as the "wall clock" time that passes while each time step completes, and is averaged across three separate runs of each method and test case.
As shown in Figure 11, for the same cutoff distance for this test case, the filament tree method has the lowest computational cost, followed by the double-tree method, and then the query tree method. For a cutoff distance of one ring length, a speedup of over one order of magnitude is achieved, with a force distribution difference of approximately 20% for the query tree and double-tree methods. This gives an idea of the maximum achievable speedup for a problem of this size. The speedup drops to one order of magnitude for a cutoff distance of two rings, but with the difference dropping to 10% for the query and double tree methods.
It can be seen that a cutoff of one ring for the query and double-tree methods provides both less difference and less computational cost than the filament tree with a two ring cutoff. This same trend continues as the cutoff distances are doubled. For a cutoff distance of eight rings, the filament tree overtakes the query point tree in terms of accuracy, but the double-tree method remains consistently high in accuracy across all runs. The difference appears to drop off faster than the computational efficiency as the cutoff distance is increased. Based upon these tests, it may be concluded that all tree methods perform similarly in terms of computational efficiency; however, the double-tree method has a consistently lower difference for the same cutoff distances compared with the other methods.

Lift coefficient difference
The difference in the total lift coefficient on the wing for each tree method is evaluated against the basic UVLM using the formula: where C L is the total lift coefficient on the wing. The test case used is the same as the accuracy vs speed test. As shown in Figure 12, the query point tree tends to overestimate the lift coefficient, whereas the other two trees tend to underestimate it. The filament tree has the most consistent difference profile, achieving 20%, 10%, 5%, and 1% difference at the respective increasing cutoff distances. At lower cutoff distances, the double-tree method experiences a large variation in the difference; however, it generally outperforms the filament tree at larger cutoff distances. At small cutoff distances, the query point tree has the lowest difference, however its difference appears to vary more and the method performs worse at larger cutoff distances than the other tree methods. Large difference variation could be caused by a combination of vortex stretching effects, singularity effects, and different rounding of elements into tree cells between iterations. For a cutoff distance of zero (where agglomeration is always used between non-overlapping cells), the absolute difference can reach well over 100% due to these effects.

F I G U R E 13
Plot of the geometry used for the swept tapered wing analysis. The planform is derived from the NASA common research model wing 20

Swept tapered wing
In order to verify the efficacy of the tree methods for more complex geometries, an analysis is performed on a swept tapered flat plate whose geometry is shown in Figure 13. The plate contains 20 chordwise and 200 spanwise panels. The non-dimensional number c U ∞ Δt is kept consistent with the other test cases (where c is the root chord) by increasing the time step to 0.03357 second. The aspect ratio can be calculated with AR = s 2 A , where s is the wingspan and A is the area of the wing. The span is measured as the distance from root to tip in the direction perpendicular to the root, in this case 26.14 m, which is doubled to 52.28 m for the total wingspan. The area of the wing is measured as the sum of the area of the panels, at 190.37 m 2 , making the aspect ratio of the wing 14.36. All other parameters are the same as in the accuracy vs speed test case (Section 6.2).
Comparing Figure 12 with Figure 14, all of the general trends are repeated, except for less variance in the difference and an increased magnitude of difference in this case. These trends are also observed in the force distribution difference (not shown). The computation time saved by the four and eight cutoff distance analyses are approximately 79% and 43%, respectively. The cutoff distance is measured in ring lengths; however, the rings at the root were chosen for this measurement, which are approximately four times longer than the rings at the tip.
Overall, the general trends in the difference are similar, although the magnitude of the difference and the computation time are worse in this case than in the other test cases. However, the time-saving for this example is still significant when comparing the tree methods to the basic UVLM.

Analysis size effects
The effect of different wing sizes is investigated. A flat plate with an aspect ratio (AR) of 40 (5 chordwise and 200 spanwise square panels) undergoing sudden acceleration is investigated for 20 time steps. It is compared against plates of AR 20, 10, and 5. The other AR plates are created by truncating the AR 40 plate, such that the number of spanwise panels is reduced. The differences are calculated in the same way as stated previously (Equations (21) and (22)), taking the maximum absolute value across the 20 time steps. The raw data values can be found in Appendix A.
As can be seen in Figure 15, the double-tree method has the smallest maximum force distribution difference compared with the other tree methods in every case. All tree methods show that the force distribution difference is reduced as the cutoff distance increases; however, the filament tree method has more sensitivity to the size of the problem than the other tree methods.  Figure 16 shows that the double tree has larger lift coefficient difference for small cutoff distances compared to the other tree methods; however for large cutoff distances it moves towards having the smallest difference. As in Figure 15, the maximum difference decreases as cutoff distance increases and the number of panels decreases. This is because a larger portion of the domain is non-agglomerated for larger cutoff distances in smaller problems, bringing the overall methods closer to the UVLM.
As can be seen in Figure 17, the computational speedup is much more strongly correlated with the size of the problem than the differences in force distribution or lift coefficient. This suggests that for larger problems the computational benefit will increase for the same level of difference between the tree method and the UVLM. In this case, a cutoff of 8 ring lengths and 200 spanwise panels leads to a speedup of 4.5 and 3.9 for the filament and double-tree methods, respectively, each of which also exhibit <1% difference in the force distribution and lift coefficient compared with the UVLM.
Taking into account the results from Section 6.4, a cutoff of eight ring lengths should reliably lead to an analysis with <5% difference with the basic UVLM for at least 20 time steps. This cutoff distance is shown to provide good speedup for large problems while keeping the solution reasonably accurate (compared with the UVLM).

Parallelization scaling
Parallelization of the code requires special consideration because of the way the double-tree method divides up the domain. If there are P cores, ideally there should be P jobs, with each job being assigned 1 P of the domain. Since the double-tree method divides the domain into cells, each job must be assigned to a cell in order to parallelize the algorithm. To do this, each job is assigned a different query tree cell to start the algorithm (instead of starting from the root cell). A problem arises, however, when the number of cores does not match any combination of cells that cover the entire domain. The proposed solution is to leave the "remainder" cores unused, which provides a balance between parallel scaling and eliminating unnecessary overhead. For example, if every cell has four children and there are eight cores, first the children of the root cell are considered as the parallel jobs. Since at this point there are more cores than jobs, one of the cells is replaced with its children, bringing the total number of jobs to 7. Replacing another of the cells with its children increases the number of jobs to 10. With seven jobs and eight cores, the computation time will be reduced to 1 7 . With 10 jobs and 8 cores, the first 8 jobs will complete in 1 10 time; however, the remaining 2 jobs are run afterwards, also in 1 10 time, bringing the total time to 2 10 . Since 1 7 < 1 5 , it is better to leave 1 core unused than it is to have 2 extra jobs in this case. Furthermore, the creation of each job has an associated overhead cost, meaning that using a larger number of jobs than is necessary is detrimental to the overall computation time.
Since the number of children a cell can have does not change, the number of remainder cores does not exceed 2. This means that as the number of cores increases, the ratio of remainder cores to used cores will decrease, meaning that parallel scaling is preserved in this way.
Since the parallelized analysis distributes query tree cells among the cores, if the number of cores matches the number of leaves in the query tree, then each core is assigned a leaf cell with which to start its double-tree solution procedure search. This means that any computations that would be saved by rootward query tree cells are lost, moving the order F I G U R E 18 Base-2 log graphs of speedup and computation time for each tree method for different analysis sizes [Colour figure can be viewed at wileyonlinelibrary.com] of computations toward O(n ⋅ logn). Furthermore, any additional cores added beyond the number of leaves cannot have a job assigned to them, and so are wasted. For these reasons, an individual tree method may have better parallel scaling than the double tree if a large amount of parallelization is used relative to the size of the problem.
In order to measure how each method scales as more parallel cores are introduced, the same setup as the order of magnitude analysis (Section 6.1) is used with different numbers of panels (N = 10 4 and 10 5 ). Again, the results are averaged across three runs, each of which has the same trend, as shown in Figure 18. The analysis is run on a high-performance computing cluster on 2 x cores; x = [0, 1,2,3,4].
The speedup is calculated as the ratio of the 1-core computation time over the x-core computation time. It is immediately obvious that the double-tree parallelization scheme developed is sensitive to the number of cores available. For 2 1 and 2 3 cores, it is likely that the parallelization method was unable to reduce the size of the largest job compared with 2 0 and 2 2 cores respectively, thus the computation time is similar. These trends are the same for N ≈ 10 3 . At 10 2 panels and below, the analysis is too small to gain benefit from parallelization.
Given an improved parallelization scheme, the double-tree method has the potential to be faster than the other tree methods for any number of cores. Although the double-tree method has the worst speedup out of the three methods, it remains competitive for larger problems due to its smaller order of computations.

CONCLUSIONS
The double-tree method provides a substantial reduction in the computational cost of the UVLM. The individual tree methods (filament and query point) are shown to be O(n ⋅ logn) on their own, however when combined into the double-tree method, solve the problem in O(n) time. All tree methods provide a similar level of cost reduction for the test cases investigated; however, the double-tree method performs better in terms of computational efficiency for larger problems.
The trade-off between accuracy and computational efficiency can be controlled by the cutoff distance. For the AR 40 test case with a cutoff distance of four ring lengths, the double-tree method achieves <5% difference in both force distribution and lift coefficient, while decreasing the total computation time by a factor of 7. A < 1% difference is achieved at a cutoff of eight ring lengths with a computational cost reduction factor of 4. Wake truncation is a simpler method to reduce the computational cost, however it requires about a 10-chord length cutoff for a < 1% difference in the solution. 13 Furthermore, no computational time is saved for the non-truncated panels, whereas the double-tree method reduces computations for all interactions beyond the cutoff distance, including wing-wing interactions. Finally, the double-tree method can easily be combined with wake truncation to further reduce computational cost, although further studies are required to understand the effect of the combined approach on the trade-off between accuracy and efficiency.
Results show that more complex geometry can cause a larger solution difference between the tree methods and the UVLM for the same cutoff distance; however, this difference still follows a predictable pattern within the same problem.
With an increase in problem size (number of panels), the speedup is shown to be more sensitive than the solution difference, meaning that larger problems will benefit from more speedup for the same level of accuracy. This means that applications of the UVLM where the computational cost is high, such as large-scale full-aircraft analysis in optimization and dynamic load case analysis, will benefit the most from the double-tree method.
The current double-tree parallelization algorithm is sensitive to the number of cores and has the worst speedup; however, it remains competitive in terms of computational cost for large parallelized problems.

RECOMMENDATIONS FOR FURTHER WORK
Different agglomeration methods may be directly compared with the double-tree method, such as the vortex particle multipole moment method developed by Willis et al. 14 Other methods may provide a better overall performance than the UVLM-based approach, with the downside of being potentially more difficult to implement. An improved parallelization scheme could be developed that divides the domain using both trees simultaneously, thus reducing the initial tree depth used by the parallel jobs. This could save computations by allowing for shallower portions of each tree to double-agglomerate. It could also raise the upper limit on the number of usable cores for parallelization.
In the method presented in this paper, the cutoff distance is a constant value throughout the simulation. However, it may be beneficial to develop an adaptive cutoff distance method, in order to improve the algorithm's performance. Investigations into how the cutoff distance should be chosen for a range of flight configurations would also be beneficial.