Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

Figure 11.

Figure 12.

Dataset statistics
| Domain knowledge graph | Dataset | ||
|---|---|---|---|
| Number of objects | 5262 | Number of dataset objects | 233 |
| Relationship types | 48 | Number of algorithm objects | 1448 |
| Number of triples | 14774 | Number of interactions | 1485 |
| Average number of descriptive words | 50.5 | Sparsity | 0.00440 |
Cloud Platform Experimental Environment Information
| Name | Configuration information |
|---|---|
| operating system | Ubuntu 20.04.5 LTS |
| memory | 64G |
| graphics card | NVIDIA A100 40GB |
| development language | Python 3.8 |
| Deep learning platform | Pytorch 2.0.0 |
Statistical data on Q&A dataset
| Dataset | Attribute |
|---|---|
| source language | English |
| target language | Python |
| quantity | 121 |
| Average number of words in the source language | 52 |
| Maximum number of words in the source language | 69 |
| Average number of words in the target language | 1365 |
| Maximum number of words in the target language | 1593 |
Comparative Experiment (%)
| label | model | Parameter quantity | CodeBLEU | ROUGE-1 | ROUGE-2 | ROUGE-L |
|---|---|---|---|---|---|---|
| 1 | CodeT5 | 770M | 12.62 | 7.62 | 3.02 | 5.29 |
| 2 | CodeT5-EKG | 770M | 23.93 | 13.52 | 4.62 | 10.02 |
| 3 | CodeT5 | 2B | 32.83 | 20.04 | 6.43 | 14.32 |
| 4 | CodeT5-EKG | 2B | 47.94 | 24.30 | 9.22 | 17.60 |
| 5 | CodeT5 | 6B | 46.27 | 32.96 | 14.21 | 25.68 |
| 6 | CodeT5-EKG | 6B | 51.12 | 35.58 | 16.11 | 27.54 |
Pre-training dataset
| Language | Sample quantity |
|---|---|
| Ruby | 2,119,741 |
| JavaScript | 5,856,984 |
| Go | 1,501,673 |
| Python | 3,418,376 |
| Java | 10,851,759 |
| PHP | 4,386,876 |
| C | 4,187,467 |
| C++ | 2,951,945 |
| C# | 4,119,796 |
CTR prediction comparison experiment (%)
| Model | AUC | Precision | Recall | F1-score |
|---|---|---|---|---|
| KGNN-LS | 80.01 | 71.63 | 76.10 | 73.80 |
| KGCN | 71.62 | 62.78 | 64.38 | 63.57 |
| RippleNet | 82.55 | 69.43 | 86.91 | 77.19 |
| TCF | 82.16 | 78.24 | 82.81 | 80.46 |
| AD-EKG | 88.20 | 83.80 | 86.82 | 85.28 |
Comparison with other models (%)
| label | model | Parameter quantity | CodeBLEU | ROUGE-1 | ROUGE-2 | ROUGE-L |
|---|---|---|---|---|---|---|
| 1 | CodeT5-EKG | 770M | 23.93 | 13.52 | 4.62 | 10.02 |
| 2 | CodeT5-EKG | 2B | 47.94 | 24.30 | 9.22 | 17.60 |
| 3 | CodeT5-EKG | 6B | 51.12 | 35.58 | 16.11 | 27.54 |
| 4 | CodeGen-Mono | 2B | 34.08 | 20.23 | 6.52 | 14.94 |
| 5 | GPT-Neo | 2.7B | 19.82 | 12.57 | 2.79 | 11.28 |
| 6 | InstructCodeT5 | 16B | 43.71 | 25.00 | 9.63 | 21.06 |
Experimental environment information
| Name | Configuration information |
|---|---|
| operating system | Windows 11 |
| RAM | 16G |
| Graphics card | NVIDIA GeForce RTX 3070 8G |
| development language | Python 3.7.8 |
| Deep learning platform | TensorFlow 2.2.0 |