Have a personal or library account? Click to login
Semantic Schema Extraction in NoSQL Databases using BERT Embeddings Cover

Semantic Schema Extraction in NoSQL Databases using BERT Embeddings

Open Access
|Dec 2024

References

  1. Abdelhedi, F., Rajhi, H. and Zurfluh, G. (2022) ‘Extraction process of the logical schema of a document-oriented NoSQL database’, in Proceedings of the 10th International Conference on Model-Driven Engineering and Software Development. 10th International Conference on Model-Driven Engineering and Software Development, SCITEPRESS – Science and Technology Publications, pp. 6171. Available at: 10.5220/0010899000003119
  2. Ajarroud, O., Zellou, A. and Idri, A. (2018) ‘A new filtering-based query processing: improving semantic caching efficiency in mediation systems’, in Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications. New York, NY, USA: Association for Computing Machinery (SITA’18), pp. 16. Available at: 10.1145/3289402.3289512
  3. Baazizi, M.-A. et al. (2019) ‘Parametric schema inference for massive JSON datasets’, The VLDB Journal, 28(4), pp. 497521. Available at: 10.1007/s00778-018-0532-7
  4. Bansal, N., Sachdeva, S. and Awasthi, L.K. (2023) ‘A workload-driven approach for automatic schema generation for document stores’, in Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD). New York, NY, USA: Association for Computing Machinery (CODS-COMAD ‘23), p. 133. Available at: 10.1145/3570991.3570996
  5. Belefqih, S. (2023) ‘saadbelefqih/extractionSchemaNoSQLDb’. Available at: https://github.com/saadbelefqih/extractionSchemaNoSQLDb (Accessed: 17 November 2024).
  6. Belefqih, S., Zellou, A. and Berquedich, M. (2023) ‘Schema extraction in NoSQL databases: a systematic literature review’, Recent Advances in Computer Science and Communications, 17(8), pp. 92104. Available at: 10.2174/0126662558273437231204061106
  7. Bouhamoum, R. et al. (2018) ‘Scaling up schema discovery for RDF datasets’, in 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW). 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), pp. 8489. Available at: 10.1109/ICDEW.2018.00021
  8. Chowdhary, K.R. (2020) ‘Natural language processing’, in K.R. Chowdhary (ed.) Fundamentals of Artificial Intelligence. New Delhi: Springer India, pp. 603649. Available at: 10.1007/978-81-322-3972-7_19
  9. Church, K.W. (2017) ‘Word2Vec’, Natural Language Engineering, 23(1), pp. 155162. Available at: 10.1017/S1351324916000334
  10. Fabregat, A. et al. (2018) ‘Reactome graph database: efficient access to complex pathway data’, PLoS Computational Biology, 14(1). Available at: 10.1371/journal.pcbi.1005968
  11. Frozza, A.A., Defreyn, E.D. and Mello, R. dos S. (2020) ‘A process for inference of columnar NoSQL database schemas’, in Anais do Simpósio Brasileiro de Banco de Dados (SBBD). Anais do XXXV Simpósio Brasileiro de Bancos de Dados, SBC, pp. 175180. Available at: 10.5753/sbbd.2020.13637
  12. Jose, B. and Abraham, S. (2019) ‘Performance analysis of NoSQL and relational databases with MongoDB and MySQL’, in. Materials Today: Proceedings, pp. 20362043. Available at: 10.1016/j.matpr.2020.03.634
  13. Klessinger, S. et al. (2023) ‘Extracting JSON schemas with tagged unions’. arXiv. Available at: 10.48550/arXiv.2306.07085
  14. Klettke, M. et al. (2017) ‘Uncovering the evolution history of data lakes’, in 2017 IEEE International Conference on Big Data (Big Data). 2017 IEEE International Conference on Big Data (Big Data), pp. 24622471. Available at: 10.1109/BigData.2017.8258204
  15. Koupil, P., Hricko, S. and Holubová, I. (2022) ‘A universal approach for multi-model schema inference’, Journal of Big Data, 9(1), p. 97. Available at: 10.1186/s40537-022-00645-9
  16. Liu, S. et al. (2015) ‘Quantitative Analysis of Consistency in NoSQL Key-Value Stores’, in J. Campos and B.R. Haverkort (eds.) Quantitative Evaluation of Systems. Cham: Springer International Publishing (Lecture Notes in Computer Science), pp. 228243. Available at: 10.1007/978-3-319-22264-6_15
  17. Machado, F. et al. (2021) ‘A text similarity-based process for extracting JSON conceptual schemas’, in Proceedings of the 23rd International Conference on Enterprise Information Systems. 23rd International Conference on Enterprise Information Systems, SCITEPRESS – Science and Technology Publications, pp. 264271. Available at: 10.5220/0010475102640271
  18. Möller, M.L., Klettke, M. and Störl, U. (2019) ‘Keeping NoSQL databases up to date – semantics of evolution operations and their impact on data quality’.
  19. Pennington, J., Socher, R. and Manning, C. (2014) ‘Glove: global vectors for word representation’, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar: Association for Computational Linguistics, pp. 15321543. Available at: 10.3115/v1/D14-1162
  20. Reda, O. et al. (2020) ‘Towards a data quality assessment in big data’, in Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications. New York, NY, USA: Association for Computing Machinery (SITA’20), pp. 16. Available at: 10.1145/3419604.3419803
  21. Reimers, N. and Gurevych, I. (2019) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, arXiv.org. Available at: https://arxiv.org/abs/1908.10084v1 (Accessed: 30 December 2023).
  22. Semantic Textual Similarity Methods, Tools, and Applications: A Survey (2016). Available at: https://www.scielo.org.mx/scielo.php?pid=S1405-55462016000400647&script=sci_arttext&tlng=en (Accessed: 31 December 2023).
  23. Sevilla Ruiz, D., Morales, S.F. and García Molina, J. (2015) ‘Inferring versioned schemas from NoSQL databases and its applications’, in P. Johannesson et al. (eds.) Conceptual Modeling. Cham: Springer International Publishing (Lecture Notes in Computer Science), pp. 467480. Available at: 10.1007/978-3-319-25264-3_35
  24. Souibgui, M. et al. (2022) ‘An embedding driven approach to automatically detect identifiers and references in document stores’, Data & Knowledge Engineering, 139, p. 102003. Available at: 10.1016/j.datak.2022.102003
  25. Störl, U. and Klettke, M. (2022) ‘Darwin: a data platform for NoSQL schema evolution management and data migration’.
  26. Vora, M.N. (2011) ‘Hadoop-HBase for large-scale data’, in Proceedings of 2011 International Conference on Computer Science and Network Technology. Proceedings of 2011 International Conference on Computer Science and Network Technology, pp. 601605. Available at: 10.1109/ICCSNT.2011.6182030
  27. Yazidi, M.H.E., Zellou, A. and Idri, A. (2012) ‘Towards a fuzzy mapping for mediation systems’, in 2012 IEEE International Conference on Complex Systems (ICCS). 2012 IEEE International Conference on Complex Systems (ICCS), pp. 14. Available at: 10.1109/ICoCS.2012.6458573
  28. Yousfi, A., Elyazidi, M.H. and Zellou, A. (2018) ‘Assessing the performance of a new semantic similarity measure designed for schema matching for mediation systems’, in N.T. Nguyen et al. (eds.) Computational Collective Intelligence. Cham: Springer International Publishing (Lecture Notes in Computer Science), pp. 6474. Available at: 10.1007/978-3-319-98443-8_7
Language: English
Submitted on: Jan 4, 2024
|
Accepted on: Nov 19, 2024
|
Published on: Dec 6, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Saad Belefqih, Ahmed Zellou, Mouna Berquedich, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.