CloneDetection

Detect clones in a cross-language setting. Companion to the paper "Cross-language Clone Detection for Mobile Apps"

View the Project on GitHub FLAGlab/CloneDetection

Function Definition Results

back to main results

In the case of function definition we evaluate multiple function interacting with each other as part of a program to check declaration and the call to a print function. The Snippet exemoplars for Dart and Kotlin are as follows

Snippet 1. Dart multiple function definitions and calls

List<int> calculateStatistics(List<int> scores) {
    var min = scores[0];
    var max = scores[0];
    var sum = 0;

    for(int score in scores) {
        if(score > max) {
            max = score;
        } else if (score < min) {
            min = score;
        }
        sum += score;
    }

    return [min, max, sum];
}

bool hasAnyMatches(List<int> list,  Function(int) condition) {
    for (int item in list) {
        if (condition(item)) {
            return true;
        }
    }
    return false;
}
bool lessThanTen(int number) {
    return number < 10;
}

void main() {
  var numbers = [20, 19, 7, 12];
  hasAnyMatches(numbers, lessThanTen);
  var statistics = calculateStatistics([5, 3, 100, 3, 9]);
  print(statistics[2]);
  print(statistics[1]);
}

Snippet 2. Kotlin multiple function definitions and calls

fun calculateStatistics(scores: Array<Int>) : Triple<Int, Int, Int> {
    var min = scores[0]
    var max = scores[0]
    var sum = 0

    for(score in scores) {
        if(score > max) {
            max = score
        } else if (score < min) {
            min = score
        }
        sum += score
    }

    return Triple(min, max, sum)
}

fun hasAnyMatches(list: List<Int>, condition: (Int) -> Boolean) : Boolean {
    for (item in list) {
        if (condition(item)) {
            return true
        }
    }
    return false
}
fun lessThanTen(number : Int) : Boolean {
   return number < 10
}

fun main() {
    var numbers = listOf(20, 19, 7, 12)
    hasAnyMatches(numbers, ::lessThanTen)
    val statistics = calculateStatistics(arrayOf(5, 3, 100, 3, 9))
    println(statistics.first)
    println(statistics.second)
}

Same-language evaluation

In this case we have a single version of the application for each language, effectively generating a Type 1 clone between the applications. The table beelow hows the summay of theobtain results fort each language, comparing the program ith itself.

Language Total Type 1 Type 2 Type 3 FP FN Precision Recall
Kotlin 195 135 50 10 33 0 0.83 1
Dart 156 114 36 6 5 0 0.96 1

Two things come to mind when evaluating the single language cases. First, the amount of detected clones is quite large; this is due to two main reasons. On the one hand, the comparison is two-by-two, detecting the clone type for each pair of code elements twice, yielding the double of clones than those present. For example, the function identifier calculateStatistics is detected as a clone of Type 1 with the identifier of function lessThanTen. However, when evaluating the identifier of function lessThanTen, this will be detected as a clone of identifier calculateStatistics, even though they are technically the same clone. On the other hand, the clone evaluation is fine-grained, taking place for every node on teh generated eCST, which for example, processes a clone for each element of the declaration of the func lessThanTen(number: Int) : Bool ... statement, but also for each of its parts, the parameters (variable types, and identifiers) for the function itself, yielding a larger nnumber of identified clones.

Second, the presence of False Positives (FP). These occur by the miss classification of Type 3 clones, for example detecting the sets of PARAMETER_LIST as Type 3 clones of each other, while theyr should be Type 2 clones, given that the node types are of the same type.

Note that in our manual inspection of the code (and detected nodes) there is no clone-pair that is not detected by the algorithm. The problems detected correspond to clone type misclassifications.

Cross-language evaluation

The analysis for Dart and Kotlin results in 4 correct Type 1 clones identifying the variables main function name, the number literal, and the main functions’ parameter lists. A Type 2 clone is detected between the type declaration of the Dart code and the val declaration in the Kotlin case. FInally, a Type 3 clone is detected between . However, this clone is not of Type 3 as, contituing a False Positive.

The table below shows the summary of the results.

Language Total Type 1 Type 2 Type 3 FP FN Precision Recall
Dart v. Kotlin 79 28 38 13 20 0 0.86 1

From the cross language analysis we observe a similar behavior to that in the single case analysis, where there is a larger class of False Positivies. We classified these as FPs as the clones identified are indeed clones, but are misclassified to the incorrect type. One source of imprecision is the inclusion of a node’s type in its similarity set. This causes OOS to sometimes classify the same node types as Type 3 clones (they should be Type 2). Another problem is the detection of LITERAL nodes (e.g., 42 or true) as IDENTIFIER nodes (i.e., variable names). These nodes are detected as the same type and therefore detected as Type 1 or Type 2 clones, while indeed they should be Type 3 clones.