Fig. 1

Fig. 2

Fig. 3

Comparison of models’ performance on various datasets_
| Model | Score | SARD | SeVC | Devign | D2A |
|---|---|---|---|---|---|
| VulBERTa | 88.7 | 84.2 | 80.5 | 81.8 | 79.9 |
| SySeVR | 81.5 | 82.6 | 78.3 | 80.2 | 72.7 |
| DistilVulBERT | 94.0 | 91.4 | 82.2 | 87.5 | 85.9 |
Fine-tuning time comparison_
| Model | Dataset | Fine-tuning time (hours) |
|---|---|---|
| VulBERTa | SARD | 1.2 |
| SySeVR | SeVC | 1.1 |
| DistilVulBERT | SARD | 0.8 |
| DistilVulBERT | SeVC | 0.9 |
Model overhead analysis_
| Model | Parameters (millions) | Training time (hours) |
|---|---|---|
| VulBERTa | 110 | 8.2 |
| SySeVR | 90 | 6.5 |
| DistilVulBERT | 66 | 5.0 |
Hyperparameters of the models_
| Hyperparameter | GPT-2 | CodeBERT | LSTM |
|---|---|---|---|
| Learning rate | 0.001 | 0.0005 | 0.01 |
| Batch size | 32 | 64 | 128 |
| Epochs | 5 | 10 | 3 |
| Optimizer | Adam | AdamW | RMSprop |
| Dropout rate | 0.1 | 0.05 | 0.2 |
| Hidden units | 768 | 312 | 256 |
| Attention heads | 12 | 8 | – |
| Layers | 12 | 12 | 1 |
j_ijmce-2024-0003_tab_005
| Require: Set of labeled training data D = {(xi,yi)} |
| Require: Set of K teacher models T = Tk |
| Require: Student model S |
| Ensure: Trained student model |
| 1: Initialize student model parameters θS randomly. |
| 2: for each teacher model Tk ∈ T do |
| 3: Compute predictions pk (x) for each xi ∈ D. |
| 4: Initialize student model weights to match Tk. |
| 5: Train student model on D using: KDLoss |
|
|
| where DKL denotes Kullback-Leibler divergence and
|
| 6: end for |
| 7: return Trained student model S |