Unveiling Rare Patterns: Anomaly Detection in CCTV Footage for Safeguarding Home Premises

Mintu Movi; Abdul Jabbar P

doi:10.2478/ias-2024-0002

1

Introduction

Rare pattern mining is a broader field dedicated to identifying and extracting infrequently occurring patterns within a data set. Unlike frequent pattern mining, which aims to unveil commonly recurring patterns, rare pattern mining delves into the realm of unique and seldom-occurring patterns – those hidden gems that may harbor valuable insights or anomalies [1]. The significance of rare pattern mining lies in its proficiency in revealing exceptional occurrences or outliers within data sets. These patterns carry pivotal information despite their infrequency, rendering them invaluable in anomaly detection [2].

Over the past two decades, there has been a remarkable surge in the volume of audio-video information across various environments [3]. This surge, coupled with increased population, security threats, and criminal activities, has prompted the widespread deployment of city surveillance cameras [4]. The imperative for home safety is further intensified by the rising adoption of smart homes, where intelligent security systems are integral components [5].

In the face of escalating home invasion crimes and a surge in burglary activities, the necessity for a safe and secure residential space has become more pronounced than ever [6]. While many home premises are equipped with CCTV monitoring systems to tackle these challenges, the real-time monitoring of home environments poses inherent difficulties [7].

The challenge is exacerbated by the sheer volume of spatial-temporal and non-spatial data associated with multiple surveillance cameras. Developing an efficient and effective method for intruder detection becomes a formidable task. Compounding the complexity is that anomalies may manifest at various levels of abstraction and be associated with different granularity of time and location.

This research within the home security domain explores the pervasive use of Closed-Circuit Television (CCTV) systems—a norm in contemporary residential settings. The focus is specifically on anomaly detection within CCTV footage, emphasizing identifying and analyzing rare patterns indicative of potential threats to home premises. The study goes beyond a singular approach by comparing diverse machine-learning methodologies tailored for rare pattern mining.

The main contributions of this work are threefold. Firstly, it presents a pioneering effort in rare pattern mining within home premises, offering insights into an under-explored surveillance domain. Secondly, it investigates the efficiency of existing machine-learning approaches in capturing intruders within residential CCTV footage. Thirdly, it evaluates the potential of different machine-learning algorithms to mine rare patterns effectively, providing a comprehensive understanding of their strengths and limitations in the context of home security.

2

Background

Anomaly detection in surveillance systems has gained significant attention due to the increasing availability of CCTV footage and the need for effective monitoring and security measures [8]. Machine learning algorithms have been increasingly employed for anomaly detection in surveillance systems [9]. There are some leading examples of algorithms that can separate rare and frequent frames in an image data set. The choice of algorithm depends on the specific requirements of the problem at hand.

One-class Support Vector Machine (SVM) is an unsupervised learning algorithm that can be used for anomaly detection [10,11,12]. The algorithm estimates the support of a high-dimensional distribution [13,14,15]. In image data sets, One-Class SVMs can identify rare frames that do not belong to the normal class.

Isolation forests are unsupervised learning algorithms that can be used for anomaly detection [16,12]. The algorithm randomly selects a feature and splits the data set based on a random threshold until isolated points are identified as outliers [17,15]. In the context of image data sets, Isolation Forests can identify rare frames with significantly different pixel values or patterns than most frames. The Local Outlier Factor (LOF) algorithm, introduced by Breunig et al, offers a robust method for detecting outliers within high-dimensional datasets [18]. Unlike traditional anomaly detection algorithms, LOF does not assume a specific data distribution, making it particularly well-suited for identifying rare patterns within complex and heterogeneous CCTV footage [19]. LOF can effectively identify instances that deviate significantly from their surroundings by assessing the local density of data points relative to their neighbors, thus revealing potential anomalies or threats within the surveillance footage [20].

Autoencoders represent a class of artificial neural networks that are trained to reconstruct input data with minimal error [21]. In rare pattern mining within CCTV footage, autoencoders can be leveraged to learn meaningful data representations, effectively capturing the underlying structure and identifying subtle deviations indicative of rare events [22]. By compressing the input data into a lower-dimensional latent space and then reconstructing it, autoencoders can highlight anomalies that may not be apparent in the original high-dimensional data, thereby enhancing the efficacy of anomaly detection in surveillance applications [23].

Clustering is a form of unsupervised learning that aims to group similar data points, forming clusters or groups while distinguishing dissimilar points [24]. The primary objective is identifying inherent patterns or structures within the data without predefined labels [25].

Several clustering methods can be explored to separate rare patterns from frequent patterns in an image dataset. Each method has strengths and weaknesses; the choice depends on the data characteristics and the task’s requirements. Here are some commonly used clustering methods:

K-means Clustering: Partitions data into k clusters based on similarity [26].
Hierarchical Clustering: Builds a hierarchy of clusters, either bottom-up (agglomerative) [27] or top-down (divisive) [28].
DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Identifies dense regions of data points, separating sparse regions as noise [29].
Spectral Clustering: Applies clustering to a low-dimensional data representation [30].
OPTICS (Ordering Points to Identify the Clustering Structure): A density-based clustering algorithm similar to DBSCAN but produces a reachability plot [31].

Rare pattern mining within residential CCTV surveillance demands a meticulous examination of clustering performance metrics to optimize the efficacy of the chosen algorithm. The literature reveals a rich tapestry of studies focusing on diverse metrics, each offering unique insights into the effectiveness of clustering algorithms, particularly K-Means, in rare event detection.

The Silhouette Method is an indispensable technique for determining the optimal number of clusters in a dataset [32]. Its application in clustering algorithms aids researchers in refining the determination of the cluster count. This method ensures that the clustering solution balances cohesion and separation between clusters, which is vital for effective rare pattern mining [33].

The Average Silhouette metric is a pivotal tool for evaluating the cohesion and separation of clusters generated by clustering algorithms [34]. A high Average Silhouette score indicates well-defined and distinct clusters, providing a quantitative measure of the algorithm’s ability to group similar data points and separate dissimilar ones [35].

The Calinski-Harabasz Index provides a quantitative measure of cluster compactness and separation [36]. Its effectiveness in assessing the quality of clustering solutions makes it an essential metric for evaluating the performance of algorithms like K-Means [37]. A higher Calinski-Harabasz Index indicates well-defined clusters with minimal overlap, crucial for the success of rare pattern mining [38].

The Elbow Method, a widely adopted technique for determining the optimal number of clusters, has been extensively explored in the clustering literature. The Elbow Method is instrumental in identifying the point at which adding clusters no longer significantly improves variance reduction [39]. Its application in rare pattern mining ensures an optimal balance between granularity and simplicity in clustering solutions.

These metrics collectively contribute to a comprehensive evaluation framework, ensuring that the chosen clustering algorithm, particularly K-Means, is fine-tuned to meet the specific demands of detecting rare events in the intricate dynamics of home environments.

3

Methodology

The methodology unfolded in a multiphase investigation to comprehensively evaluate various techniques for rare pattern mining within CCTV footage of home premises.

3.1

Dataset Collection and Preprocessing

The study leverages three distinct data sets, each consisting of 3600 frames, totaling 10,800 frames, and representing diverse scenarios encountered in home environments. The data sets are carefully selected to encompass a range of normal and abnormal frames, capturing various activities and potential security threats within residential spaces. In the preprocessing phase, a one-hour .dav file undergoes frame extraction at 10-second intervals using Google Colab, resulting in a compilation of distinct frames.

3.2

Feature Extraction

In this study, the Mask R-CNN model [40] utilizes the ResNet50 backbone for segmentation and extracting human subjects within each frame. This comprehensive approach currently encompasses the recognition of family members and facilitates the potential identification of intruders within the captured surveillance footage.

3.3

Experiments and Results

The initial phase assessed five distinct techniques: One-Class SVM (SVM), Isolation Forest (IF), Autoencoders (AE), Local Outlier Factor (LOF), and Cluster-Based Method (CBM). The examination involved scrutinizing their accuracy in detecting rare patterns, laying the groundwork for understanding the comparative performance of these methods.

Support Vector Machines: Figures 1, 2, and 3 illustrate the implementation of Support Vector Machines (SVM), [12,15] on datasets 1, 2, and 3, respectively. The scatter plots within these figures depict the distribution of rare and frequent frames. In the plots, the yellow color denotes frequent frames, while the purple color represents rare frames.

Isolation Forest: Figures 4, 5, and 6 showcase the ongoing application of the Isolation Forest algorithm to datasets 1, 2, and 3, correspondingly. The scatter plots depicted in these figures visually illustrate the arrangement of rare and frequent frames. In these graphical representations, the prevalence of yellow hues signifies frames occurring frequently, whereas the purple tones distinctly represent frames classified as rare.

Autoencoder-based Analysis: Figures 7, 8, and 9 showcase the persistent application of Autoencoder-based Analysis on datasets 1, 2, and 3, respectively. The scatter plots presented in these figures represent the arrangement of rare and common frames. In these visualizations, the prevalence of yellow hues signifies the presence of frames occurring more frequently, whereas the discernible purple tones depict frames recognized as rare. Local Outlier Factor: Figures 10, 11, and 12 illustrate the ongoing application of Local Outlier Factor (LOF) Analysis on datasets 1, 2, and 3, respectively. The scatter plots depicted in these figures visually represent the arrangement of rare and common frames. In these graphical presentations, the prevalence of yellow hues signifies the occurrence of frames at higher frequencies, while the distinct purple tones characterize frames identified as rare by the LOF algorithm.

Cluster-based Analysis: Figures 13, 14, and 15 exemplify the continuous utilization of Cluster-based Analysis on datasets 1, 2, and 3, respectively. The scatter plots in these figures offer a visual depiction of the organization of rare and frequent frames. Within these visual representations, the dominance of yellow hues indicates the occurrence of frames at higher frequencies, while the distinct purple tones characterize frames identified as rare. The rationale behind the second phase of the investigation exploration is rooted in the understanding that different clustering techniques possess unique characteristics. Identifying the optimal method among Agglomerative, Divisive, Spectral Clustering, DBSCAN, OPTICS, and K-means clustering techniques is crucial for enhancing the efficacy of rare pattern mining in home premises CCTV footage. This phase aims to comprehensively assess each clustering technique in handling the complexities of CCTV data, including effectively detecting rare patterns. By evaluating these techniques, the research aims to determine which approach or combination of approaches offers the highest accuracy, sensitivity, and specificity in identifying anomalies or rare events within the CCTV footage.

Agglomerative Clustering: Figures 16, 17, and 18 exemplify the continuous utilization of Agglomerative Clustering on datasets 1, 2, and 3, respectively. The scatter plots in these figures offer a visual depiction of the organization of rare and frequent frames. Within these visual representations, the dominance of blue hues indicates the occurrence of frames at higher frequencies, while the distinct red tones characterize frames identified as rare.

Divisive Clustering: Figures 19, 20, and 21 show the persistence of Divisive Clustering on datasets 1, 2, and 3, correspondingly. The scatter diagrams within these visuals provide a graphical representation of the arrangement of rare and frequent frames. In these depictions, the prevalence of shades in the blue signifies the presence of frames occurring more frequently, whereas the discernible red hues distinctly mark frames recognized as uncommon.

DBSCAN: Figures 22, 23, and 24 exemplify DBSCAN to datasets 1, 2, and 3, respectively. The scatter plots in these figures visually depict the organization of frames with varying frequencies. Within these graphical representations, the prevalence of blue tones signifies higher-frequency frames, while the distinct red hues characterize frames identified as rare occurrences.

OPTICS: Figures 25, 26, and 27 aptly demonstrate the continuous deployment of OPTICS on datasets 1, 2, and 3. The scatter plots embedded in these figures offer a visual portrayal of the arrangement of frames with differing frequencies. In these visual depictions, the prevalence of blue tones conveys frames occurring at higher frequencies, while the pronounced red tones distinctly delineate frames identified as rare instances.

Spectral Clustering: Figures 28, 29, and 30 are tangible examples showcasing the persistent use of Spectral Clustering on datasets 1, 2, and 3, respectively. The scatter plots within these visuals clearly illustrate how frames are organized based on their occurrence frequencies. In these graphical representations, the prevalence of blue hues signifies frames appearing more frequently, while the noticeable red tones distinctly characterize frames recognized as rare.

K-means Clustering: Figures 31, 32, and 33 vividly portray the K-means Clustering of datasets 1, 2, and 3. The scatter plots embedded in these instances visually elucidate the arrangement of frames based on their varying frequencies. Within these graphical depictions, the prevalence of blue tones denotes frames occurring more frequently, while the discernible red hues distinctly delineate frames identified as rare occurrences.

Expanding upon these findings, the investigation advanced into its third phase, emphasizing enhancing the K-Means clustering approach. This phase entailed the utilization of a spectrum of metrics, including but not limited to the Average Silhouette Method, the Silhouette Method, the Calinski-Harabasz Index, the Elbow Method, and the utilization of K-means clustering with a predetermined number of clusters. The evaluation of these metrics facilitated a comprehensive overview, offering insights into the cohesion, separation, compactness, and the determination of an optimal number of clusters within the framework of the K-Means clustering approach.

Average Silhouette: Figures 34, 35, and 36 showcase the persistent utilization of K-Means Clustering on datasets 1, 2, and 3; the Average Silhouette method can provide a numerical measure of the effectiveness in organizing rare and frequent frames. The blue hues indicating higher-frequency occurrences and red tones characterizing rare frames within the scatter plots visually represent the data’s clustering structure.

Silhouette Method: Figures 37, 38, and 39 illustrate the persistence of K-Means Clustering on datasets 1, 2, and 3, prompting the application of the Silhouette Method to assess the appropriateness of cluster assignments. The silhouette method can provide a quantitative validation of the graphical representation of rare and frequent frames. The blue shades denoting higher-frequency frames and discernible red hues marking uncommon frames in the scatter diagrams align with the Silhouette Method’s objective to ensure well-defined and distinct clusters.

Calinski-Harabasz Index: Figures 40, 41, and 42, exemplifying the application of K-Means on Data sets 1, 2, and 3, respectively, bring the Calinski-Harabasz Index into focus. As the scatter plots visually depict the organization of frames with varying frequencies, the prevalence of blue tones and distinct red hues provides a qualitative representation that can be objectively assessed using the Calinski-Harabasz Index.

K means clustering with a predetermined number of clusters: as showcased in Figures 43, 44, and 45, it demonstrates a deliberate choice to define the cluster count before analysis. This method bypasses conventional techniques like the elbow method or silhouette analysis to ascertain the optimal number of clusters. Instead, it relies on pre-existing knowledge or domain expertise to specify the appropriate clusters for the dataset under examination. By leveraging this approach, analysts can leverage their understanding of the dataset’s underlying structure to guide the clustering process [11]. This can be particularly advantageous when there’s confidence in the relevant features and their relationships within the data, enabling the determination of an optimal cluster count with precision [14]. The scatter plots presented visually depict the distribution of frames with varying frequencies within the dataset. The prevalence of blue tones and pronounced red hues in these plots reflects a strategic effort to achieve a balanced clustering outcome. By carefully considering the specified cluster count and the inherent characteristics of the data, the clustering process aims to strike a harmonious balance between the identified clusters, effectively capturing the nuances and patterns present within the dataset.

Elbow Method: Figures 46, 47, and 48 showcase the continuous deployment of K-Means on datasets 1, 2, and 3, prompting consideration of the Elbow Method. As the scatter plots visually portray the arrangement of frames with differing frequencies, employing the Elbow Method on these figures helps identify where the decrease in distortion slows down, indicating an optimal cluster count. The prevalence of blue tones and pronounced red hues within these visuals aligns with the Elbow Method’s goal of striking a balance between cluster granularity and meaningful pattern capture.

3.4

Discussion

The initial phase assessed five distinct techniques: One-Class SVM (SVM), Isolation Forest (IF), Autoencoders (AE), Local Outlier Factor (LOF), and Cluster-Based Method (CBM). The examination involved scrutinizing their accuracy in detecting rare patterns, laying the groundwork for understanding the comparative performance of these methods. Following a meticulous evaluation, it was discerned that the cluster-based approach exhibited superior performance compared to the other methods, as depicted in Figure 49. Notably, the performance hierarchy observed in the ongoing analysis reveals Isolation Forest’s superior performance compared to One-Class SVM.

During the second phase, an in-depth assessment of clustering algorithms was conducted utilizing the F1 Score metric on specified datasets, as illustrated in Figure 50. K-Means emerged as a leading candidate, showcasing superior performance and demonstrating its remarkable capability to cluster the data accurately. Following closely, Agglomerative Clustering exhibited a commendable performance with a strong indication of its proficiency in identifying patterns within the dataset. While achieving a comparatively lower performance, Spectral Clustering still displayed reasonable clustering capabilities. OPTICS, DBSCAN, and Divisive Clustering showcased decreasing performance levels without specifying F1 Scores, indicating varying degrees of effectiveness at accurately clustering the data.

The third phase shows that the Elbow Method yields the highest values across all metrics, indicating a better overall performance than the other methods in this context, as shown in Figure 51. Higher accuracy, precision, recall, specificity, and F1 score values suggest that the Elbow Method provides a clustering solution that better captures the rare patterns and structures within the data.

4

Conclusion and Future Scope

This comprehensive research delves into the critical realm of anomaly detection within residential security, mainly through rare pattern mining in CCTV footage. Through meticulous evaluation and comparative analysis, various machine-learning approaches, including one-class SVM, Isolation Forest, and clustering algorithms were scrutinized to understand their efficacy in detecting anomalies.

The findings underscored the superiority of the Cluster-based Method (CBM) over Isolation Forest and Support Vector Machines (SVM) in terms of performance metrics such as accuracy, recall, and F1-score. Notably, Isolation Forest performed better than SVM, highlighting its potential for anomaly detection tasks.

The evaluation of various clustering algorithms, particularly focusing on K-Means clustering, provided valuable insights into their performances in residential CCTV footage analysis. K-means and Agglomerative Clustering emerged as top performers, showcasing the importance of selecting clustering methods aligned with the dataset characteristics.

Through a meticulous evaluation using diverse clustering performance metrics such as the Silhouette Method, the Calinski-Harabasz Index, and the Elbow Method, the study fine-tunes the K-Means approach for enhanced efficacy. The Elbow Method emerges as the top-performing technique, indicating superior overall performance.

This research contributes to the field of residential security by providing a nuanced solution that addresses the specific challenges posed by home environments, offering insights into rare event detection within the context of residential CCTV surveillance.

Integrating advanced feature extraction techniques or incorporating deep learning models for enhanced pattern recognition [41,42,43,44,45,46,47,48,49,50] can be explored in the future. Collaboration with industry experts and stakeholders can further validate the applicability and effectiveness of the proposed approach in real-world residential security applications. These future directions aim to advance the state of the art in rare pattern mining, making it more adaptable and robust for addressing the evolving security needs of residential environments.

Unveiling Rare Patterns: Anomaly Detection in CCTV Footage for Safeguarding Home Premises

Full Article

Paradigm

My account