Detect clones in a cross-language setting. Companion to the paper "Cross-language Clone Detection for Mobile Apps"
Managing loops is an interesting case for universal nodes. Loop abstractions are defined using different keywords in the programming languages (e.g., for
, while
). In Out of Step we process loop abstractions as semantically equivalent, and define them as Type 3 clones. To evaluate this, we manually inject Type 1 and Type 3 clones in our tests.
Snippet 1. Dart for loop - version A
int sum = 0;
for (int i = 1; i <= 100; i++)
sum = sum + i;
Snippet 2. Dart while loop - version B
int sum = 0;
int i = 1;
while ( i <= 100) {
sum = sum + i;
i += 1;
}
Snippet 3. Kotlin for loop - version A.
var sum = 0
for (i in 1..100)
sum = sum + i
Snippet 4. Kotlin while loop - version B
var sum = 0
var i = 1
while (i <= 100) {
sum = sum + i
i += 1
}
The analysis for Dart (Snippets 1 and 2), and Kotlin (Snippets 3 and 4) in the same language results in two Type 3 clones. These match the for
and while
statements and the i
variable declaration. The clone for the postfix statement i++
and the variable assignment i += 1
is not correctly detected as a Type 3 clone due to their node representations in the eCST, an assignment_operator
unary node, and a binary node respectively. Nonetheless, we identify the complete for
and while
blocks as clones of each other, as the algorithm broadcasts clones inside the body of blocks. Additionally, Out of Step identifies the declarations and assignment of variables as Type 1 clones in Line 1 of both snippets, and Line 3 of version A with Line 4 of version B.
Algorithm | Total | Type 1 | Type 2 | Type 3 | FP | FN | Precision | Recall |
---|---|---|---|---|---|---|---|---|
Dart A | 26 | 26 | 0 | 0 | 0 | 0 | 1 | 1 |
Dart B | 27 | 27 | 0 | 0 | 0 | 0 | 1 | 1 |
Dart A v. B | 12 | 3 | 7 | 2 | 2 | 0 | 0.84 | 1 |
Kotlin A | 17 | 17 | 0 | 0 | 0 | 0 | 1 | 1 |
Kotlin B | 24 | 24 | 0 | 0 | 0 | 0 | 1 | 1 |
Kotlin A v. B | 8 | 3 | 5 | 0 | 0 | 2 | 1 | 0.8 |
Note that while the Dart analysis is effective in detecting the type 3 clones between the increment of the i
variable in the for loop
and while loop
, and the looping structures themselves, the Kotlin analysis fails to detect such Type 3 clones, and therefore has a reduced Recall. With respect to the Precision, the roles reverses. Dart detect a couple of FPs with respect to the Type 2 clones between LITERAL
and TYPE
nodes, while the Kotlin analysis does not detect them, reaching full precision.
The cross-language comparison between code versions A and B is also successful in detecting the corresponding clones. This is because the node information and the eCST structure are the same for the two code snippets. Every node is matched to its corresponding node on the other eCST, even though the programming languages differ. When comparing any A vs any B versions, we find that there are Type 3 clones for the for
and while
statements, and the definition of variable i
. Additionally, we can find the Type 1 clones for the declaration and assignment of the sum
variable, as before.
In the loop analysis, Out of Step finds a couple of Type 2 false positives, due to the two assignments that are present in versions B of the code. In this case, the detection algorithm points to the body of the for
and while
statements to be clones. This happens because the intermediate type node for both of them is the same. The exact same behavior takes place with the assignment before the loop
statement in both cases.
Algorithm | Total | Type 1 | Type 2 | Type 3 | FP | FN | Precision | Recall |
---|---|---|---|---|---|---|---|---|
Dart v. Kotlin A | 7 | 4 | 3 | 0 | 1 | 0 | 0.86 | 1 |
Dart v. Kotlin B | 17 | 6 | 9 | 2 | 1 | 3 | 0.93 | 0.82 |
Dart A v. Kotlin B | 9 | 2 | 6 | 1 | 1 | 1 | 0.89 | 0.89 |
Dart B v. Kotlin A | 10 | 2 | 7 | 1 | 1 | 0 | 0.9 | 1 |