
Figure 1
The QA process between ROCP and domain experts.

Figure 2
The overview of our approach.
Table 1
The algorithm of the Domain document word segmentation.
| Algorithm 1.1 Domain document word segmentation | |
|---|---|
| Input: Domain documents List D | |
| Output: The segmented words W; | |
| 1. List W; | |
| 2. for each i in D | |
| 3. List Wi = D.WordSegmentationByLucene(); | |
| 4. for each j in Wi | |
| 5. Wij.stemming(); | |
| 6. if Wij in stopwordlist | |
| 7. Wi.remove(Wij); | |
| 8. end if | |
| 9. end for | |
| 10. end for | |
| 11. return W; | |
Table 2
The algorithm of the construction of VSM.
| Algorithm 1.2 The construction of VSM | |
|---|---|
| Input: The segmented words W, words number N. | |
| Output: The vector space model of each document VSM; | |
| 1. List HFW; | |
| 2. List VSM; | |
| 3. for each i in W | |
| 4. HFWi=Wi.findHighFrequencyWords(WN); | |
| 5. end for | |
| 6. List WA=HFW.allHighFrequencyWords(); | |
| 7. for each j in W | |
| 8. for each k in WA | |
| 9. VSMk=WAk.appearedTimesIn(Wj); | |
| 10. end for | |
| 11. end for | |

Figure 3
Documents Validation.

Figure 4
The cosine similarity algorithm to locate invalid documents.
Table 3
The algorithm to remove invalid documents.
| Algorithm 1.3 Remove invalid documents | |
|---|---|
| Input: The Vector Space Model VSM, the cosSimilarity threshold CT, The domain documents D; | |
| Output: The valid domain documents D; | |
| 1. sumcos1=0; sumcos2=0; | |
| 2. for each i in VSM | |
| 3. for each j in VSM | |
| 4. cosSimij=VSMi.computeCosSimilarityWith(VSMj); | |
| 5. sumcos1+=cosSimij; | |
| 6. end for | |
| 7. avgCosSimi=sumcos1/j | |
| 8. sumcos2+= avgCosSimi | |
| 9. end for | |
| 10. totalAvgCosSim=sumcos2/i | |
| 11. for each i in avgCosSim | |
| 12. if Math.abs(totalAvgCosSim-avgCosSimi)>CT | |
| 13. D.removeDocumentByItsVSMIndex(i) | |
| 14. end if; | |
| 15. end for; | |
| 16. return D; | |

Figure 5
The 3-layers taxonomy.

Figure 6
A part of the selection for domain experts to achieve 3-layers taxonomy.

Figure 7
Object property creation.

Figure 8
Ontology assembly.
Table 4
The algorithm for ontology classes hyponymy construction.
| Algorithm 2 The construction of ontology classes hyponymy | |
|---|---|
| Input: list NodesPool; | |
| Output: list OntTree; | |
| 1. List rootNodes=SelectRootNodesByExperts(NodesPool); | |
| 2. OntTree0=rootNodes; | |
| 3. NodesPool.remove(rootNodes); | |
| 4. int n=1; | |
| 5. while(NodesPool.hasElement()) | |
| 6. tempnodes=SelectNodesByExperts (NodesPool); | |
| 7. OntTreen.addsubnodes(tempnodes); | |
| 8. OntTreen+1.add(tempnodes); | |
| 9. NodesPool.remove(tempnodes); | |
| 10. n++; | |
| 11. end while | |

Figure 9
The tag cloud in space debris mitigation domain.

Figure 10
The main part of the ontology in space debris mitigation domain.
Table 5
The detailed information of the corpus and experimental data.
| Documents | The Corpus | Domain documents set 1 | Domain documents set 2 |
|---|---|---|---|
| Source | China Daily | Space debris mitigation | Astronautics fundamentals |
| Number of documents | 1000 | 20 | 50 |
| Total number of words | 1777763 | 54619 | 145628 |
| Average number of words | 1778 | 2731 | 2513 |
Table 6
The statistics of the extracted terminologies.
| Total Valid words(TW) | Total Terminologies(TT) | Number of Extraction(NE) | Number of Correct words(NC) | |
|---|---|---|---|---|
| DS1-MPVW | 2617 | 129 | 155 | 123 |
| DS1-TF-IDF | 2617 | 129 | 155 | 81 |
| DS2-MPVW | 4126 | 288 | 346 | 254 |
| DS2-TF-IDF | 4126 | 288 | 346 | 209 |
Table 7
The result of the recall, precision and F1-Measure.
| Recall | Precision | F1 Measure | |
|---|---|---|---|
| DS1-MPVW | 95.3% | 79.4% | 86.6% |
| DS1-TF-IDF | 62.8% | 52.3% | 57.1% |
| DS2-MPVW | 88.1% | 73.4% | 80.1% |
| DS2-TF-IDF | 72.6% | 60.4% | 65.9% |

Figure 11
The accuracy comparison of algorithm MPVW and TF-IDF.
Table 8
The time cost of each period of the manual operation.
| Data sets | DS3 | DS4 | DS5 | DS6 |
|---|---|---|---|---|
| Number of Terminologies | 85 | 123 | 171 | 254 |
| 3-layers taxonomy | 382s | 579s | 856s | 1366s |
| Hyponymy construction | 415s | 695s | 1056s | 1690s |
| Properties and instances link | 236s | 346s | 491s | 747s |
| ROCP Total time | 1033s 12.15 s/word | 1620s 13.17 s/word | 2403s 14.05 s/word | 3803s 14.97 s/word |
| Protégé Total Time | 1787s 21.02 s/word | 2867s 23.31 s/word | 4602s 26.91 s/word | 8708s 30.28 s/word |

Figure 12
The time test of ontology construction by ROCP and manual work by Protégé.
