Have a personal or library account? Click to login
Repairing ETL Processes using Extended Relational Algebra Cover

Repairing ETL Processes using Extended Relational Algebra

Open Access
|Jun 2025

Abstract

In a data warehouse architecture, heterogeneous and distributed data sources (DSs) are integrated by means of an extract-transform-load (ETL) layer, which runs integration processes (a.k.a. ETL processes). This layer is not static, since DSs being integrated change their schemas in time. A DS schema change impacts ETL processes, which typically stop working and need to be re-designed (i.e., repaired). Our overall goal is to repair automatically these ETL processes that were affected by DS schema changes. In this paper we focus on ETL processes specified by extended relational algebra, since relational data warehouses are among the most popular for business applications. For such a processes, we contribute a repair method. The method uses a rule engine that maps a possible DS schema change with: (1) an ETL operation on the changed schema element and with (2) a repair rule applicable if a DS schema element is changed. Based on this mapping, when a DS schema change occurs, our solution allows to apply adequate ETL rules to repair the affected ETL processes.

DOI: https://doi.org/10.2478/fcds-2025-0006 | Journal eISSN: 2300-3405 | Journal ISSN: 0867-6356
Language: English
Page range: 157 - 190
Submitted on: Oct 21, 2024
|
Accepted on: Mar 20, 2025
|
Published on: Jun 10, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Judith Awiti, Robert Wrembel, Esteban Zimányi, published by Poznan University of Technology
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.