Have a personal or library account? Click to login
Semantic Schema Extraction in NoSQL Databases using BERT Embeddings Cover

Semantic Schema Extraction in NoSQL Databases using BERT Embeddings

Open Access
|Dec 2024

Abstract

NoSQL databases, valued for flexibility and scalability, pose analytics challenges due to their schema-less nature. Automatic schema extraction is crucial, with existing techniques limited in handling nested structures. Leveraging Natural Language Processing (NLP) advancements, this paper introduces a novel BERT Embeddings-Based approach for extracting schemas from NoSQL databases. The method analyzes semantic relationships within triplets from JSON documents through four stages: triplet extraction, preprocessing, BERT Embedding generation, and similarity analysis. Evaluation on real datasets demonstrates over 83% accuracy in extracting valid nested schema components. The study reveals interdisciplinary intersections, using NLP to unveil structures in scenarios lacking explicit schemas, showcasing significant potential for autonomous schema extraction from raw, unstructured data formats.

Language: English
Submitted on: Jan 4, 2024
|
Accepted on: Nov 19, 2024
|
Published on: Dec 6, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Saad Belefqih, Ahmed Zellou, Mouna Berquedich, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.