Data Warehouse Hybrid Modeling Methodology

Viktor László Takács; Katalin Bubnó; Gergely Gábor Ráthonyi; Éva Bácsné Bába; Róbert Szilágyi

doi:10.5334/dsj-2020-038

Figures & Tables

Table 1

Steps of the six methodologies.

	GrHyMM	UMLDW	MDBE	PDM	GRAnD	GQM	VMQD*
Requirement Analysis	goals, tasks	goals, tasks	queries in SQL	queries in SQL	goals, decisions	goals, questions, metrics	visualized questions, metrics, dimensionality
Minimal Granularity							minimally detailed metrics
Ideal Schema						ideal facts, ideal dimensions	ideal facts, ideal dimensions
Source Analysis	independent, source system schema	independent, CWM	independent	independent	independent	independent, potential schema	potential transactions, attributes, partly dependent,
Integration						potential schema vs.ideal schema	potential schema vs.ideal schema
Reconciliation	DB integrity	consistent UML multidimensional schema	DB integrity
Multidimensional Modeling	facts, attribute tree for facts, remodeling	cubes, dimensions, hierarchies, measures	dimensions and facts from tables	Date dimension and Attribute dimensions for factsMeER	Derived from requirement analysis schemas		MeER
Schema Selection				MeER related to questions
Manual Refinement			modified automatically generated schema

Table 2

Management question analysis.

Indicator	$I_{{af [, af]}}^{{u [, u]}}$	the indicator I to be produced with u unit(s) in the upper right index and af aggregate function(s) in the bottom right index,
unit(s)	$I_{{af [, af]}}^{{u [, u]}}$
aggregate function(s)
visualization	${(\begin{matrix} vt \\ [{\begin{matrix} s \\ s \end{matrix}}] \end{matrix})}^{v}$	the v visualization with the type vt (table, line diagram, bar graph, etc. …) and optional s slicers (values can be D_{a} dimensional attribute, D_{v} subset of concrete values, or a D_{a} dimensional attribute in the d detail of another I indicator on the same dashboard)
slicer(s)
detail(s)	$[{(\begin{matrix} D_{\sum^{} {a}} \\ [D_{\sum^{} {a}}] \end{matrix})}^{{d}}]$	d details with D_{a} dimensional attribue(s), with optional $Σ^{} {a}$ aggregation. d values e.g.: row, column, category, y indicator

Table 3

Optimizations’ notations.

$(\begin{matrix} I_{1} \\ I_{2} \end{matrix}) (D_{{dk}}) \equiv I_{1} (D_{{dk}}) \times I_{2} (D_{{dk}})$	Combining indicators I₁ and I₂ with the same dimensionality. We create the Descartes multiplier of the two indicators.
	The value of the indicator I can be obtained by summing through dimension D (roll up) with the aggregate function in the lower left index of I. Calculating the aggregation from D_{dk} at the bottom of the Summa symbol to the level at the top of the Summa sign (all or D_{dhk} hierarchy level, leaving the original key. This is referred to as .
$\begin{array}{l} I_{1} (A_{{dk}}) \subset I_{2} (A_{{dk}}, B_{{dk}}) \\ (\begin{matrix} I_{1} (A_{{dk}}) \\ I_{2} (A_{{dk}}, B_{{dk}}) \end{matrix}) \equiv (\begin{matrix} I_{1} \\ I_{2} \end{matrix}) (A_{{dk}}, B_{{dk}}) \end{array}$	A and B are dimensions of indicators I₁ and I₂ and I₁ is proper subset of I₂.

Table 4

Data loadings’ transformation notations.

	The value of the indicator I can be obtained by summing through D dimension (roll up) with the aggregate function in the lower left index of I. This is an aggregation is from D_{dk} at the bottom of the Summa symbol to the level at the top of the Summa Sign (all or D_{dhk} hierarchy level, leaving the original key. This is referred to as .
$D (D_{{dk}} [, D_{{a}}] [, D_{{i}}]) = D (\sum_{D_{{dk}}}^{D_{{dk}}} [, D_{{a}}] [, {}_{af}D_{{i}}])$	Deduplicate the values of D dimensions’ D_{dk}. key. Summarize the indicator with the af aggregate function in the lower left index, while leaving the first element of attribute values.
$I (D_{{dk}}) = I () \times D_{{dk}}$	Expand the dimensionality of indicator I. The Descartes multiplier of the original indicator with the dimension to be expanded.
	Pivoting I indicator values through D_{a} dimensional attribute. We create several new indicators corresponding to the occurrence values of the attribute.
$(\begin{matrix} I_{1} \\ I_{2} \end{matrix}) (D_{{dk}}) = I_{1} (D_{{dk}}) \times I_{2} (D_{{dk}})$	Combining *I₁ I₂* indicators with the same dimensionality. We create the Descartes multiplier of the two indicators.
	Unpivoting *I₁ I₂* indicators with the same dimensionality into V indicator values and A attribute set with the indicators’ name
$I = \sum_{D_{{a}}}^{all} I_{D_{{a}}}$	The sum of pivoted $I_{D_{{a}}}$ indicator values along the occurrence values of D_{a} attribute.

Table 5

Question1 analysis.

Indicator	how many days completed (activity)	$I_{{af [, af]}}^{{u [, u]}}$	*Activity*^{day}
unit(s)	day	$I_{{af [, af]}}^{{u [, u]}}$	*Activity*^{day}
aggregate function(s)	how many (sum)
visualization	table	${(\begin{matrix} vt \\ [{\begin{matrix} s \\ s \end{matrix}}] \end{matrix})}^{v}$	${(\begin{matrix} table \\ D_{{March}} \end{matrix})}^{v}$
slicer(s)	March
detail(s)	student	$[{(\begin{matrix} D_{{a}} \\ [D_{{a}}] \end{matrix})}^{{d}}]$	${(P_{{stud}})}^{row}$
detail(s)	daily step category	$[{(\begin{matrix} D_{{a}} \\ [D_{{a}}] \end{matrix})}^{{d}}]$	${(I_{{dsc}})}^{col}$

[i] $Activit y_{}^{{day}} {(\begin{matrix} table \\ D_{{March}} \end{matrix})}^{v} {(P_{{nid}})}^{row} {(I_{{dsc}})}^{col}$

Table 6

Question2 analysis.

Indicator	averagely completed days	$I_{{af [, af]}}^{{u [, u]}}$	$Activit y_{average}^{{day}}$
unit(s)	day	$I_{{af [, af]}}^{{u [, u]}}$	$Activit y_{average}^{{day}}$
aggregate function(s)	average
visualization	table	${(\begin{matrix} vt \\ [{\begin{matrix} s \\ s \end{matrix}}] \end{matrix})}^{v}$	${(\begin{matrix} table \\ D_{{March}} \end{matrix})}^{v}$
slicer(s)	March
detail(s)	gender	$[{(\begin{matrix} D_{{a}} \\ [D_{{a}}] \end{matrix})}^{{f}}]$	${(P_{{gender}})}^{row}$
detail(s)	daily step category	$[{(\begin{matrix} D_{{a}} \\ [D_{{a}}] \end{matrix})}^{{f}}]$	${(I_{{dsc}})}^{col}$

[i] $Activit y_{average}^{{day}} {(\begin{matrix} table \\ D_{{March}} \end{matrix})}^{v} {(P_{{gender}})}^{row} {(I_{{dsc}})}^{col}$

Table 7

Question3 analysis.

Indicator	Daily steps	$I_{{af [, af]}}^{{u [, u]}}$	$DailyStep s_{average}^{{steps}}$
unit(s)	steps	$I_{{af [, af]}}^{{u [, u]}}$	$DailyStep s_{average}^{{steps}}$
aggregate function(s)	average
visualization	radar chart	${(\begin{matrix} vt \\ [{\begin{matrix} s \\ s \end{matrix}}] \end{matrix})}^{v}$	${(\begin{matrix} radar chart \\ D_{{March}} \end{matrix})}^{v}$
slicer(s)	March
detail(s)	day of the week	$[{(\begin{matrix} D_{{a}} \\ [D_{{a}}] \end{matrix})}^{{d}}]$	${(D_{{DoW}})}^{cat}$
detail(s)	men, women, all	$[{(\begin{matrix} D_{{a}} \\ [D_{{a}}] \end{matrix})}^{{d}}]$	${(P_{\sum^{} {gender}})}^{y}$

[i] $DailyStep s_{average}^{{steps}} {(\begin{matrix} radar chart \\ D_{{March}} \end{matrix})}^{v} {(D_{{weekday}})}^{cat} {(P_{\sum^{} {gender}})}^{y}$

Table 8

10-minute normalized steps’ property mapping.

OLTP system (extract)	transform	OLAP system (load)
S_{10mNS}	=>	10minNS^{step}
S_{DK}	=>	D_{DK}
S_{TK}	=>	T_{TK}
S_{PK}	=>	P_{TK}

[i] $S (S_{{10 mNS}}, S_{{date}}, S_{{TK}}, S_{{NID}}) \overset{etl}{\Rightarrow} 10 mN S^{{step}} (P_{{pk}}, D_{{dk}}, T_{{TK}})$

Table 9

Person dimension’s property mapping.

OLTP system (extract)	transform	OLAP system (load)
P_{PK}	=>	P_{PK}
P_{GenderEn}	=>	P_{gender}

[i] $dimPerson (P_{{pk}}, P_{{GenderEn}}) \overset{etl}{\Rightarrow} dimPerson (P_{{pk}}, P_{{gender}})$

Table 10

Date dimension’s property mapping.

OLTP system (extract)	transform	OLAP system (load)
D_{DK}	=>	D_{DK}
left(D_{DK}, 6)	D_{MK}
D_{DOW}	D_{DoW}&“–”&D_{weekdayEn}	D_{weekday}
D_{weekdayEn}

[i] $dimDate (D_{{dk}}, D_{{DoW}}, D_{{weekdayEn}}) \overset{etl}{\Rightarrow} dimDate (D_{{dk}}, D_{{weekday}}, D_{{MK}})$

Table 11

Month dimension-hierarchy’s property mapping.

OLTP system (extract)	transform	OLAP system (load)
D_{DK}	left(D_{DK}, 6)	DM_{MK}
D_{monthStrEn}	=>	DM_{month}

[i] $dimDate (D_{{dk}}, D_{{monthStrEn}}) \overset{etl}{\Rightarrow} dimDateMonth (D M_{{MK}}, D M_{{month}})$

Table 12

Walk intensity dimension’s property mapping.

OLTP system (extract)	transform	OLAP system (load)
I_{IK}	=>	I_{IK}
I_{IK}	I_{IK}&“–”& D_{sscEn}	I_{dsc}
D_{sscEn}

[i] $dimIntensity (I_{{IK}}, I_{{sscEn}}) \overset{etl}{\Rightarrow} dimIntensity (I_{{IK}}, I_{{dsc}})$

Data Warehouse Hybrid Modeling Methodology

Figures & Tables

Table 1

Figure 1

Table 2

Table 3

Table 4

Table 5

Table 6

Table 7

Table 8

Table 9

Table 10

Table 11

Table 12

Figure 2

Figure 3

Figure 4

Figure 5

Paradigm

My account