Have a personal or library account? Click to login
An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm Cover

An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm

Open Access
|Jul 2021

References

  1. 1. Internet Live Stats. 2020. https://www.internetlivestats.com/total-number-of-websites/
  2. 2. Hliaoutakis, A., G. Varelas, E. Voutsakis, E. G. M. Petrakis, E. Milios. Information Retrieval by Semantic Similarity. – Int. J. Semant. Web Inf. Syst., Vol. 2, 2011, No 3, pp. 55-73.10.4018/jswis.2006070104
  3. 3. Geng, Z., D. Shang, Q. Zhu, Q. Wu, Y. Han. Research on Improved Focused Crawler and Its Application in Food Safety Public Opinion Analysis. – In: Proc. of 2017 Chinese Autom. Congr., 2017, pp. 2847-2852.10.1109/CAC.2017.8243261
  4. 4. Liu, Z., Y. Du, Y. Zhao. Focused Crawler Based on Domain Ontology and FCA. – J. Inf. Comput. Sci., Vol. 8, 2011, No 10, pp. 1909-1917.
  5. 5. Chakrabarti, S., M. van den Berg, B. Dom. Focused Crawling: A New Approach to Top-Specific Web Source Discovery. – Comput. Networks, Vol. 31, 1999, No 11-16, pp. 1623-1640.10.1016/S1389-1286(99)00052-3
  6. 6. Menczer, F. Complementing Search Engines with Online Web Mining Agents. – Decis. Support Syst., Vol. 35, 2003, No 2, pp. 195-212.10.1016/S0167-9236(02)00106-9
  7. 7. Park, J. R., C. Yang, Y. Tosaka, Q. Ping, H. el Mimouni. Developing an Automatic Crawling System for Populating a Digital Repository of Professional Development Resources: A Pilot Study. – J. Electron. Resour. Librariansh., Vol. 28, 2016, No 2, pp. 63-72.10.1080/1941126X.2016.1164549
  8. 8. Agre, G. H., N. V. Mahajan. Keyword Focused Web Crawler. – In: Proc. of 2nd Int. Conf. Electron. Commun. Syst. ICECS’15, 2015, pp. 1089-1092.10.1109/ECS.2015.7124749
  9. 9. Liu, W. J., Y. J. Du. A Novel Focused Crawler Based on Cell-Like Membrane Computing Optimization Algorithm. – Neurocomputing, Vol. 123, 2014, pp. 266-280.10.1016/j.neucom.2013.06.039
  10. 10. Farag, M. M. G., S. Lee, E. A. Fox. Focused Crawler for Events. – Int. J. Digit. Libr., Vol. 19, 2018, No 1, pp. 3-19.10.1007/s00799-016-0207-1
  11. 11. Chen, Z., J. Ma, J. Lei, B. Yuan, L. Lian, L. Song. A Cross-Language Focused Crawling Algorithm Based on Multiple Relevance Prediction Strategies. – Comput. Math. with Appl., Vol. 57, 2009, No 6, pp. 1057-1072.10.1016/j.camwa.2008.09.021
  12. 12. Du, Y., W. Liu, X. Lv, G. Peng. An Improved Focused Crawler Based on Semantic Similarity Vector Space Model. – Appl. Soft Comput. J., Vol. 36, 2015, pp. 392-407.10.1016/j.asoc.2015.07.026
  13. 13. Dong, H., F. K. Hussain. Self-Adaptive Semantic Focused Crawler for Mining Services Information Discovery. – IEEE Trans. Ind. Informatics, Vol. 10, 2014, No 2, pp. 1616-1626.10.1109/TII.2012.2234472
  14. 14. Zheng, H. T., B. Y. Kang, H. G. Kim. An Ontology-Based Approach to Learnable Focused Crawling. – Inf. Sci. (Ny)., Vol. 178, 2008, No 23, pp. 4512-4522.10.1016/j.ins.2008.07.030
  15. 15. Najork, M., J. L. Wiener. Breadth-First Search Crawling Yields High-Quality Pages. – In: Proc. of 10th Int. Conf. World Wide Web, WWW’01, 2001, pp. 114-118.10.1145/371920.371965
  16. 16. Salton, G., A. Wong, C. Yang. Information Retrieval and Language Processing: A Vector Space Model for Automatic Indexing. – Commun. ACM, Vol. 18, 1975, No 11, pp. 613-620.10.1145/361219.361220
  17. 17. Princeton University. About WordNet. WordNet, Princeton University, 2010.
  18. 18. Bird, E. L., E. K. Bird, Steven. Natural Language Processing with Python. O’Reilly Media Inc, 2009.
  19. 19. Li, Y., Z. A. Bandar, D. McLean. An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. – IEEE Trans. Knowl. Data Eng., Vol. 15, 2003, No 4, pp. 871-882.10.1109/TKDE.2003.1209005
  20. 20. Lin, D. Definition of Similarity in Informaiton Theory.Pdf, 1989.
  21. 21. Robertson, S. The Probabilistic Relevance Framework: BM25 and Beyond. – Foundation and Trend K in Retrievel, Vol. 3, 2010, No 4.10.1561/1500000019
  22. 22. Dhanith, P. R. J., B. Surendiran. An Ontology Learning Based Approach for Focused Web Crawling Using Combined Normalized Pointwise Mutual Information and Resnik Algorithm. – Int. J. Comput. Appl., Vol. 0, 2019, No 0, pp. 1-7.10.1080/1206212X.2019.1684023
  23. 23. Dhanith, P. R. J., B. Surendiran, S. P. Raja. A Word Embedding Based Approach for Focused Web Crawling Using the Recurrent Neural Network. – International Journal of Interactive Multimedia and Artificial Intelligence, 2020, pp. 1-11.10.9781/ijimai.2020.09.003
DOI: https://doi.org/10.2478/cait-2021-0022 | Journal eISSN: 1314-4081 | Journal ISSN: 1311-9702
Language: English
Page range: 105 - 120
Submitted on: Dec 30, 2020
Accepted on: Apr 14, 2021
Published on: Jul 1, 2021
Published by: Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2021 K. S. Sakunthala Prabha, C. Mahesh, S. P. Raja, published by Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.