Have a personal or library account? Click to login
An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm Cover

An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm

Open Access
|Jul 2021

References

  1. 1. Internet Live Stats. 2020. https://www.internetlivestats.com/total-number-of-websites/
  2. 2. Hliaoutakis, A., G. Varelas, E. Voutsakis, E. G. M. Petrakis, E. Milios. Information Retrieval by Semantic Similarity. – Int. J. Semant. Web Inf. Syst., Vol. 2, 2011, No 3, pp. 55-73.<a href="https://doi.org/10.4018/jswis.2006070104" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.4018/jswis.2006070104</a>
  3. 3. Geng, Z., D. Shang, Q. Zhu, Q. Wu, Y. Han. Research on Improved Focused Crawler and Its Application in Food Safety Public Opinion Analysis. – In: Proc. of 2017 Chinese Autom. Congr., 2017, pp. 2847-2852.<a href="https://doi.org/10.1109/CAC.2017.8243261" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1109/CAC.2017.8243261</a>
  4. 4. Liu, Z., Y. Du, Y. Zhao. Focused Crawler Based on Domain Ontology and FCA. – J. Inf. Comput. Sci., Vol. 8, 2011, No 10, pp. 1909-1917.
  5. 5. Chakrabarti, S., M. van den Berg, B. Dom. Focused Crawling: A New Approach to Top-Specific Web Source Discovery. – Comput. Networks, Vol. 31, 1999, No 11-16, pp. 1623-1640.<a href="https://doi.org/10.1016/S1389-1286(99)00052-3" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1016/S1389-1286(99)00052-3</a>
  6. 6. Menczer, F. Complementing Search Engines with Online Web Mining Agents. – Decis. Support Syst., Vol. 35, 2003, No 2, pp. 195-212.<a href="https://doi.org/10.1016/S0167-9236(02)00106-9" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1016/S0167-9236(02)00106-9</a>
  7. 7. Park, J. R., C. Yang, Y. Tosaka, Q. Ping, H. el Mimouni. Developing an Automatic Crawling System for Populating a Digital Repository of Professional Development Resources: A Pilot Study. – J. Electron. Resour. Librariansh., Vol. 28, 2016, No 2, pp. 63-72.<a href="https://doi.org/10.1080/1941126X.2016.1164549" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1080/1941126X.2016.1164549</a>
  8. 8. Agre, G. H., N. V. Mahajan. Keyword Focused Web Crawler. – In: Proc. of 2nd Int. Conf. Electron. Commun. Syst. ICECS’15, 2015, pp. 1089-1092.<a href="https://doi.org/10.1109/ECS.2015.7124749" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1109/ECS.2015.7124749</a>
  9. 9. Liu, W. J., Y. J. Du. A Novel Focused Crawler Based on Cell-Like Membrane Computing Optimization Algorithm. – Neurocomputing, Vol. 123, 2014, pp. 266-280.<a href="https://doi.org/10.1016/j.neucom.2013.06.039" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1016/j.neucom.2013.06.039</a>
  10. 10. Farag, M. M. G., S. Lee, E. A. Fox. Focused Crawler for Events. – Int. J. Digit. Libr., Vol. 19, 2018, No 1, pp. 3-19.<a href="https://doi.org/10.1007/s00799-016-0207-1" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1007/s00799-016-0207-1</a>
  11. 11. Chen, Z., J. Ma, J. Lei, B. Yuan, L. Lian, L. Song. A Cross-Language Focused Crawling Algorithm Based on Multiple Relevance Prediction Strategies. – Comput. Math. with Appl., Vol. 57, 2009, No 6, pp. 1057-1072.<a href="https://doi.org/10.1016/j.camwa.2008.09.021" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1016/j.camwa.2008.09.021</a>
  12. 12. Du, Y., W. Liu, X. Lv, G. Peng. An Improved Focused Crawler Based on Semantic Similarity Vector Space Model. – Appl. Soft Comput. J., Vol. 36, 2015, pp. 392-407.<a href="https://doi.org/10.1016/j.asoc.2015.07.026" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1016/j.asoc.2015.07.026</a>
  13. 13. Dong, H., F. K. Hussain. Self-Adaptive Semantic Focused Crawler for Mining Services Information Discovery. – IEEE Trans. Ind. Informatics, Vol. 10, 2014, No 2, pp. 1616-1626.<a href="https://doi.org/10.1109/TII.2012.2234472" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1109/TII.2012.2234472</a>
  14. 14. Zheng, H. T., B. Y. Kang, H. G. Kim. An Ontology-Based Approach to Learnable Focused Crawling. – Inf. Sci. (Ny)., Vol. 178, 2008, No 23, pp. 4512-4522.<a href="https://doi.org/10.1016/j.ins.2008.07.030" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1016/j.ins.2008.07.030</a>
  15. 15. Najork, M., J. L. Wiener. Breadth-First Search Crawling Yields High-Quality Pages. – In: Proc. of 10th Int. Conf. World Wide Web, WWW’01, 2001, pp. 114-118.<a href="https://doi.org/10.1145/371920.371965" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1145/371920.371965</a>
  16. 16. Salton, G., A. Wong, C. Yang. Information Retrieval and Language Processing: A Vector Space Model for Automatic Indexing. – Commun. ACM, Vol. 18, 1975, No 11, pp. 613-620.<a href="https://doi.org/10.1145/361219.361220" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1145/361219.361220</a>
  17. 17. Princeton University. About WordNet. WordNet, Princeton University, 2010.
  18. 18. Bird, E. L., E. K. Bird, Steven. Natural Language Processing with Python. O’Reilly Media Inc, 2009.
  19. 19. Li, Y., Z. A. Bandar, D. McLean. An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. – IEEE Trans. Knowl. Data Eng., Vol. 15, 2003, No 4, pp. 871-882.<a href="https://doi.org/10.1109/TKDE.2003.1209005" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1109/TKDE.2003.1209005</a>
  20. 20. Lin, D. Definition of Similarity in Informaiton Theory.Pdf, 1989.
  21. 21. Robertson, S. The Probabilistic Relevance Framework: BM25 and Beyond. – Foundation and Trend K in Retrievel, Vol. 3, 2010, No 4.<a href="https://doi.org/10.1561/1500000019" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1561/1500000019</a>
  22. 22. Dhanith, P. R. J., B. Surendiran. An Ontology Learning Based Approach for Focused Web Crawling Using Combined Normalized Pointwise Mutual Information and Resnik Algorithm. – Int. J. Comput. Appl., Vol. 0, 2019, No 0, pp. 1-7.<a href="https://doi.org/10.1080/1206212X.2019.1684023" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.1080/1206212X.2019.1684023</a>
  23. 23. Dhanith, P. R. J., B. Surendiran, S. P. Raja. A Word Embedding Based Approach for Focused Web Crawling Using the Recurrent Neural Network. – International Journal of Interactive Multimedia and Artificial Intelligence, 2020, pp. 1-11.<a href="https://doi.org/10.9781/ijimai.2020.09.003" target="_blank" rel="noopener noreferrer" class="text-signal-blue hover:underline">10.9781/ijimai.2020.09.003</a>
DOI: https://doi.org/10.2478/cait-2021-0022 | Journal eISSN: 1314-4081 | Journal ISSN: 1311-9702
Language: English
Page range: 105 - 120
Submitted on: Dec 30, 2020
Accepted on: Apr 14, 2021
Published on: Jul 1, 2021
Published by: Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
In partnership with: Paradigm Publishing Services
Publication frequency: 4 times per year

© 2021 K. S. Sakunthala Prabha, C. Mahesh, S. P. Raja, published by Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.