CloneDetection

Detect clones in a cross-language setting. Companion to the paper "Cross-language Clone Detection for Mobile Apps"

View the Project on GitHub FLAGlab/CloneDetection

Out of Step

Cross-language Clone Detection for Mobile Apps

This online appendix presents the complete evaluation results for our work on cross-language clone detection for mobile device programming languages: Dart, Kortlin, and Swift.

The experiment evaluation is structured in three parts.

  1. Clone detection for the basic language features.
  2. Clone detection using algorithms from a known (i.e., sorting algorithms) domain.
  3. Clone detection on a corpus of mobile app repositories.

Clone detection: Language Features

The first part of our evaluation consists on assuring Out of Step is able to detect clones effectively across the different language features and abstractions. To demonstrate this, we provide implementations demonstrating each of the features and manually translate them to the other languages, generating a high rate of Type-1 and Type-2 clones.

The first part of the evaluation in all cases conisders the comparison of programs using the same languagee base (comparing two versions of a program, or a version with itself) to evaluate the correctness of the algorithm in detecting clones for the specific language feature at hand.

Data set description

The language features datasets consists of 7 categories as shown in the following table. For each of the categories we show the different files and clones to test.

Note that, while we define the programs for different programming languages, we focus the result and analysis to the mobile app programming languages Dart and Kotlin.

Feature Cases Evaluation Languages
Variable declaration 1 integer variable declaration C++, Dart, Java, kotlin, Swift
Function declaration 1 Stats calculation C++, Dart, Java, Kotlin, Swift
Conditionals 1 Test variable and return C++, Dart, Java, Kotlin, Swift
Loops 2 for loop

while loop
C++, Dart, Java, Kotlin, Swift
Structures 2 Structures
Enumerations
C++ Swift
C++, Dart, Java, Kotlin, Swift
Classes 2 1-to-1 definition
Crossed definition
C++, Dart, Java, Kotlin, Swift
C++, Dart, Java, Kotlin, Swift

Clone detection: Sorting Algorithms

Our second dataset is composed by different algorithms from the known domain of sorting. The objective of this evaluation is to assess the effectiveness of Out of Step to detect clones in small programs found in the wild.

Using a known domain enables us to validate the algorithms correctness, and to easily detect if the clones detected are true clones or false positives. This assessment is important to move forward to larger application domains with confidence about the validity of our results.

Data set description

The implementation of the algorithms is taken from the rosettacode.org website to eliminate any implementation bias. The implementations on RosettaCode are standard implementations submitted by different users and curated by the community.

Algorithm Languages (with LoC)
Bubble C++ Dart Java Kotlin Swift
Heap C++ Dart Java Kotlin Swift
Insertion C++ Dart Java Kotlin Swift
Quick C++ Dart Java
Java
Kotlin
Kotlin
Swift
Selection C++ Dart Java Kotlin Swift
Shell C++ Dart Java Kotlin Swift

Evaluation results

Clone detection: Mobile Apps

The final set in our evaluation uses full fletch mobile apps of mid and large size

Data set description

Our evalution data set consists of 116 mobile apps mined from GitHub and collected from student projects from a mobile app development (capstone) course. The following table shows the app distribution across language pairs.

type quantity source of the repositories
kotlin-dart 50 GitHub (4 Dart, 4 Kotlin), Students (21 Dart, 21 Kotlin)
kotlin-swift 52 GitHub (12 Kotlin, 12 Swift), Students (14 Kotlin, 14 Swift)
dart-swift 14 GitHub (3 Dart, 3 Swift), Students (4 Dart, 4 Swift)

We focus the evaluation on the 50 apps for Dart and Kotlin, as shown in the results section.

Evaluation results for Dart and Kotlin repositories.

Performance Evaluation

We evaluate the performance in detecting clones for the complete applications in our CloneCorp