multiviewstacking: A Python Package for Training Multi-View Stacking Classifiers

Enrique Garcia-Ceja

doi:10.5334/jors.712

(1) Overview

Introduction

In machine learning, multi-view learning models are a special case of data fusion models where the aim is to combine different sources of information to improve robustness and generalization performance [1]. In multi-view learning, the same object is represented with different sets of features. Each set provides additional information that complements the other sets. For example, a movie can be represented using three different views: 1) the sequence of images, 2) the audio, and 3) the subtitles. Multi-view learning algorithms aim to effectively combine the different views to build better performing models as opposed to merging all views into a single feature vector. Merging all views’ features into a single feature vector may not be optimal since each view has its own statistical properties [2]. Furthermore, specific classification models may perform better on a given view.

In this paper, we present an implementation of the Multi-View Stacking (MVS) algorithm [3] for classification tasks through the multiviewstacking Python package [4]. This algorithm is based on a set of first-level learners that are trained on each view and a meta-learner that learns from the predictions of each first-level learner. Unlike existing multi-view learning packages, multiviewstacking implements the MVS algorithm that provides flexibility on the selection of first-level learners and the meta-learner, which is crucial when specific models perform better on specific views, thus allowing a heterogeneous selection of models.

The package has been designed to facilitate flexibility in terms of the number of supported views, the selection of first-level and meta-learners, and support for scikit-learn [5] and custom models. The package also includes a pre-loaded dataset with two views for testing. The MVS method has been used in human-activity recognition [6] and image recognition [7], and can easily extend to other sensor-fusion problems.

The source code of the multiviewstacking package is available as a GitHub repository https://github.com/enriquegit/multiviewstacking. The package can also be installed from the PyPI repository https://pypi.org/project/multiviewstacking/.

There exist some packages for multi-view learning implementing different methods, including classification, regression, clustering, autoencoders, dimensionality reduction, and so on. However, the documentation of some of them is limited, and none of them implements the MVS algorithm, one of its advantages being the flexibility of defining any type of model as underlying classifiers. That is, as opposed to other packages and algorithms, this multiviewstacking implementation allows the selection of any model as first-level learner and meta-learner. Furthermore, not all those packages implement classification models, which is the most common task in machine learning. Table 1 shows a summary of the packages’ characteristics.

Table 1

Characteristics of similar packages. The symbol ‘?’ means that the feature is not officially mentioned in the documentation but it may work. However, the results may not be accurate.

		PACKAGES
		multiviewstacking [4] (Python)	mvlearn [8] (Python)	multiview [9] (Python)	multiview [10] (R)	mvlearn R [11] (R)	multi-view- AE [12] (Python)	scikit-multimodallearn [13] (Python)
		Characteristics	Allows choosing any underlying model	✓	✓
Allows passing underlying custom models	✓		?					?
Allows heterogeneous underlying models	✓		✓
Supports supervised learning	✓				✓	✓		✓
Supports semi-supervised learning			✓
Supports unsupervised learning			✓	✓		✓	✓
Implements multi-view stacking	✓

Theoretical background

In multi-view learning, an object is represented by multiple independent views (sets of features). For example, a website can be characterized by its contents (text, images, etc.) but also by the links’ text pointing to it. In this case, the same object is represented by two views. Each view provides complementary information that can be used in conjunction to enhance the performance of a model (classification or regression). One way of combining the different views is by aggregating their features into a single feature vector. However, this may not be optimal since each view may be better represented in different formats. Furthermore, each view has its own statistical properties [2]. Multi-view learning is a field in which methods for combining the different views representing an object are developed.

One of the first multi-view algorithms was Co-Training [14]; however, it only supports two views. In the last years, multi-view learning methods for supervised deep learning have also been developed. For example, in Piriyajitakonkij et al.’s [15] work, they proposed a multi-view method for sleep analysis. Lupión et al. [16] proposed a multi-view method for pose estimation based on neural networks. Furthermore, Zhang et al. [17] developed a multi-view algorithm that is suitable when data from the different views may be missing. Even though those methods have proven to be successful in solving specific tasks, they are specific to neural network architectures, thus limiting their flexibility. Other methods that do not rely on specific underlying models have also been proposed. For example, MVS does not depend on a specific underlying model [3], which is the focus of the multiviewstacking package.

The MVS algorithm is based on the stacked generalization method [18]. It is a type of ensemble learning algorithm that aims to combine the outputs of several models (learners) to produce the final predictions. The procedure consists of training n models (called the first-level learners). Then, a meta-learner is trained using the outputs (predictions) of the first-level learners. That is, instead of using the original features of the training data, the predictions of the first-level learners are used as features to train the meta-learner. Below are the main steps of the algorithm:

Define a set $L$ of first-level learners and a meta-learner. The learners can be of any type, for example, Decision Trees, Logistic Regression, Random Forests, etc.
Train the first-level learners with the original training set D. D contains n training examples.
Predict the labels of D with each of the learners in $L$ . Each learner i in $L$ produces a prediction vector $p^{i}$ of n elements.
Build a matrix $M_{n \times | L |}$ by column binding the predictions $p^{i}$ .
Build a new training set $D^{’}$ with the matrix M and the true labels y.
Train the meta-learner using $D^{’}$ .
Return the trained stacked generalization model $Model = (L, meta-learner)$ .

After Step 3, the prediction vectors $p^{i}$ can be one-hot encoded (this is how it is implemented in the multiviewstacking package). It has also been shown that adding the class confidence scores from the first-level learners can increase the performance [19]. Thus, the averaged (across first-level learners) confidence scores for each class are added in the Python package implementation.

During Steps 2 and 3, over-fitting can occur because the predictions are made on the same training set that was used to train the first-level learners. To avoid over-fitting, Steps 2 and 3 are implemented with k-fold cross validation. After $D^{’}$ has been built, the learners in $L$ are retrained using all the training data in D to avoid wasting data points. For additional details, the reader is referred to the original MVS paper [3].

The MVS algorithm follows the same principle as stacked generalization. The key difference is that instead of training the first-level learners with all the features, each first-level learner is trained with only one of the views. Figure 1 depicts the overall process at test time. Each first-level learner generates a prediction. The individual predictions are used as inputs for the meta-learner, which in turn produces the final answer.

Overall process of Multi-View Stacking at prediction time. The predicted classes of each view are fed to the meta-learner, which produces the final prediction.

The running time of the MVS procedure in terms of the number of views is $O (k | V |)$ where k is the number of internal cross-validation folds and $| V |$ is the number of views. That is, within each fold, the corresponding learner for each view is trained. At the end, all first-level learners are retrained with all the training data plus the meta-learner; thus, $k | V | + | V | + 1$ models need to be trained. One thing to note is that each first-level learner is trained with only the subset of the corresponding view features. This makes MVS slightly more efficient compared to the stacked generalization algorithm when the same learners are used. Note that the running time based on the size of the training data will depend on the running times of the selected first-level learners and the meta-learner. In terms of memory, MVS only requires one copy of the training data, which is passed sequentially to each fold and each view during the training procedure.

Implementation and architecture

The multiviewstacking package is organized in a modular and object-oriented architecture designed for flexibility, extensibility, and ease of integration with the scikit-learn ecosystem. The core design adheres to the scikit-learn estimator interface philosophy, ensuring that all models expose standard methods such as fit, predict, and predict_proba. This enables seamless interoperability with pipelines, grid search utilities, and cross-validation mechanisms available in scikit-learn. The package provides the following functionalities:

Allows to build multi-view stacking classifiers.
Supports an arbitrary number of views.
Allows to use any scikit-learn classifier as first-level learner and meta-learner.
Supports the use of user-defined models. The models only require to implement the fit, predict, and predict_proba functions.
Allows to combine different types of first-level learners.
Includes a pre-loaded dataset with two views for testing.

The central component of the package is the MultiViewStacking class. This class encapsulates the full multi-view stacking process. It orchestrates the training of the base learners and meta-learner that combines their predictions. Another key component is the load_example_data function that loads an example dataset pre-split into training and test sets, along with the view indices and label encoders. This facilitates quick verification of package functionality and reproducible demonstrations.

The overall procedure (from the user point of view) for building a multi-view stacking model with the package consists of four steps: 1) define the first-level learners and the meta-learner, 2) create a MultiViewStacking object by passing the previously defined models and their corresponding views, 3) fit the model, 4) make predictions on new data. Figure 2 depicts this process.

Overall steps to build a multi-view stacking model. 1) Define the first-level learners and meta-learner. 2) Instantiate a `MultiViewStacking` object. 3) Fit the model. 4) Make predictions on new data.

The main function is MultiViewStacking, which is used to initialize the model and its parameters. The following list describes each of its parameters.

views_indices: A list of tuples or a list of lists (default = None). The list of tuples specifies the column start/end indices for each view as a range. Indices start at 0. The list of tuples has the form:
$[(start_v_{1}, end_v_{1}), (start_v_{2}, end_v_{2}), \dots, (start_v_{n}, end_v_{n})]$
where $start_v_{1}$ is the column index where the features of the first view begin, $end_v_{1}$ is the index where the features of the first view end, and so on. If the features for a given view are not contiguous, it is also possible to explicitly specify the column indices using a list of lists of the form [[],[],…,[]]. The inner lists contain the column indices that belong to a given view. If there are three views then, the list should contain three sub-lists. If this parameter is not specified, all features will be mapped into a single view.
first_level_learners: A list of scikit-learn classifiers or custom models (default = None). The list must have n elements where n is the number of views. Custom models need to implement the fit, predict, and predict_proba functions. The classifier at position i is trained with the column indices specified at position i in the views_indices parameter. If this parameter is None, a Random ForestClassifier will be used for each of the views by default.
meta_learner: A scikit-learn classifier or a custom model. If no model is specified, the default will be a RandomForestClassifier.
k: An integer number (default = 10). The number of folds to be used during internal k-fold cross validation when building the training set $D^{’}$ for the meta-learner.
random_state: An integer value (default = 123). Random state used for the internal k-fold cross validation.

The multiviewstacking package can be installed with the following command:

pip install multiviewstacking

Since the package’s methods are documented using Docstring,¹ some code editors let the user browse the documentation while typing. Figure 3 shows this functionality within a Jupyter Notebook [20] editor.

Example documentation text. In a Jupyter Notebook (Windows), the documentation of a function can be displayed by pressing SHIFT+TAB.

Basic usage

The Home-Tasks Activities dataset (HTAD) [21] will be used to demonstrate how to train a multi-view stacking model. This dataset has two views: audio and accelerometer readings. It was collected by volunteers performing seven home-based activities: mopping the floor, sweeping the floor, typing on computer keyboard, eating chips, brushing teeth, washing hands, and watching television. The participants performed each activity for approximately three minutes. The participants wore a wrist-band from which accelerometer data was recorded. Audio data was collected with a phone that was placed nearby. The audio and accelerometer data correspond to each of the two views.

From the accelerometer view, 16 statistical features were extracted. From the audio view, 36 Mel Frequency Cepstral Coefficients (MFCCs) were computed as features. Figure 4 shows the first 10 rows and first 10 columns of the dataset.

First rows and columns of the HTAD dataset.

For convenience, this dataset is already pre-loaded in the multiviewstacking package and can be loaded as follows (Listing 1):

The function load_example_data returns a tuple of seven elements with the data already divided into training and test sets. The first one is a data frame containing the train set. The second element stores the corresponding labels (as integers). The third and fourth elements are the test set rows and labels, respectively. The fifth and sixth elements are the column indices that indicate which columns in the data frames correspond to the first and second views. The last element is the LabelEncoder used to convert the labels from strings to integers. This is useful to reconstruct the integer labels back to strings.

Now that the data is loaded, the next step is to define the first-level learners and meta-learner (see Listing 2). The multiviewstacking package supports most of scikit-learn’s classifiers. A MultiViewStacking model supports a different type of model for each view. Furthermore, you can even specify a different model for the meta-learner.

The classifiers are instantiated the same way as you would normally do in scikit-learn. In this case, a K-Nearest Neighbors (KNN) classifier for the audio view, a Naive Bayes classifier for the accelerometer view, and a Random Forest for the meta-learner.

Now, the MultiViewStacking model can be instantiated as shown in Listing 3.

Instantiate the multi-view stacking model.

The views_indices argument specifies the column indices that will be used as features for each view. It is a list of lists where the first list corresponds to the indices of the first view, the second list contains the indices of the second view, and so on. Indices start at 0. In this example, ind_v1 is a list of integers specifying the column index where the features related to the audio view are stored.

Likewise, ind_v2 has the indices for the second view. The first_level_learners parameter accepts a list of objects. Each object is a classifier—in this example, a KNN and a Naive Bayes classifier. The meta_learner parameter specifies the model to be used as the meta-learner. In this case, it is a Random Forest. The parameter k specifies the number of folds in the internal cross-validation of the multi-view stacking algorithm that is used to build the dataset $D^{’}$ .

Now that the MultiViewStacking model is initiated, the next step is to fit its parameters with the fit function (see Listing 4). It receives as arguments the numpy arrays corresponding to the training set and the corresponding labels.

Now, the model is ready to make predictions on new data points (see Listing 5).

In this case, the accuracy on the test set was 0.86.

It is also useful to compare the performance of the multi-view stacking model with the individual first-level learners in order to understand how each individual view performs. To this extent, the package allows the extraction of the trained first-level learners from the MultiViewStacking fitted model.

The attribute fitted_first_level_learners_ stores the trained first-level learners. This attribute is a list where each element corresponds the first-level learner that was passed to the first_level_learners argument when initializing the MultiViewStacking object. The following code snippet (Listing 6) extracts the trained first-level learner that corresponds to the first view (audio). Then, the extracted learner is used to make predictions on the test set. Note that only the columns that correspond to the audio view are selected from the test set using the corresponding indices (ind_v1).

We can do the same for the second view (accelerometer) as shown in Listing 7.

In both cases, the accuracy of each individual model was significantly lower compared to the MVS algorithm.

The multiviewstacking package also supports custom learners (that are not part of scikit-learn). Still, they need to comply with certain rules. A custom learner needs to implement three functions: fit, predict, and predict_proba.

The examples/ directory in the GitHub repository of the package includes Jupyter Notebooks with usage examples in more advanced tasks such as:

Using custom learners
Finding the best learner iteratively
Hyper-parameter optimization with grid search
A full example for digits classification with six views

Quality control

The multiviewstacking package has undergone extensive quality control to ensure correctness, stability, and ease of use. A comprehensive suite of unit tests has been implemented using the pytest framework [22]. These tests verify the behavior of all major components, including 1) data validation, 2) views’ consistency, 3) model fitting, and prediction, 4) that the passed models implement the fit, predict, and predict_proba methods, 5) binary and multi-class classification tasks, and 6) heterogeneous base learners (e.g., combining decision trees, Naive Bayes, and neural networks). Integration tests confirm that the complete pipeline (from model definition to prediction) produces consistent and reproducible results. The tests can be run by issuing the following command in the root folder.

python -m pytest

Detailed warning messages are printed when attempting to use the methods in an uncommon way and default behaviors are implemented as a fallback and communicated to the user. For example:

When the views’ indices are not provided by the user, all the columns are assigned to a single view.
When one or more column indices appear in more than one view, the user is presented with the list of duplicate indices.
When no learners are passed as arguments, a RandomForestClassifier is used by default.

Test coverage was evaluated (achieving over 90%). Continuous integration and automated testing are managed via GitHub Actions, which executes the full tests on every commit and pull request. The GitHub repository page of the package includes a quick start example to check that the library was successfully installed. Furthermore, the examples/ directory contains Jupyter Notebooks detailing more advanced use cases.

(2) Availability

Operating system

The package can be used on any operating system (Windows, Linux, macOS) that supports Python ≥ 3.11.0

Programming language

Python 3.11.0 or higher.

Additional system requirements

No extra requirements.

Dependencies

The package requires (i) numpy 1.26.4 or higher, (ii) pandas 2.0.0 or higher, (iii) scikit-learn 1.5.2 or higher, and (iv) pytest 9.0.2 or higher.

List of contributors

Enrique Garcia-Ceja

Software location

Code repository

Name: GitHub
Persistent identifier: https://github.com/enriquegit/multiviewstacking
Licence: MIT
Date published: 05/03/24

Language

English

(3) Reuse Potential

Multi-view learning is a very flexible framework since many types of examples of interest can be naturally characterized by several views. For instance, videos can be split into sequences of images, audio, and subtitles. Patients’ clinical data may include images (X-rays, tomography, etc.), text records, and prescriptions. In cybersecurity, network attacks can be characterized by flow, content, time, and general features, each representing a different view.

In recent years, there has been a growing interest from researchers in the application of multi-view learning algorithms to tackle problems in a wide range of fields, including cybersecurity [23], image recognition [7], activity recognition [24], chemistry [25], manufacturing [26], and medicine [27], to name a few.

The multiviewstacking package provides researchers with the ability to apply multi-view learning into a variety of research areas and practical use cases. It can also serve as a benchmark to compare against other multi-view algorithms.

Some suggestions on how the software can be extended to enhance its applicability to a broader range of problems include: 1) Adding support for passing pre-trained first-level learners. This would be useful in situations where training takes a lot of time, thus helping to reduce the internal cross-validation overhead. 2) Adding support for regression. Suggestions to improve the package or add new functionality are welcome through the issues website: https://github.com/enriquegit/multiviewstacking/issues.

Notes

[1] PEP 257 – Docstring conventions: https://peps.python.org/pep-0257/.

multiviewstacking: A Python Package for Training Multi-View Stacking Classifiers

Full Article

(1) Overview

Introduction

Table 1

Theoretical background

Figure 1

Implementation and architecture

Figure 2

Figure 3

Basic usage

Figure 4

Listing 1

Listing 2

Listing 3

Listing 4

Listing 5

Listing 6

Listing 7