Have a personal or library account? Click to login
Deep Learning for Sign Language Recognition: A Comparative Review Cover

Deep Learning for Sign Language Recognition: A Comparative Review

Open Access
|Jun 2024

Figures & Tables

Figure 1:

Paper Organization
Paper Organization

Figure 2:

Samples of sign language datasets.
Samples of sign language datasets.

Figure 3:

Samples of gesture datasets
Samples of gesture datasets

Figure 4:

The procedural stages of sign language recognition
The procedural stages of sign language recognition

Figure 5:

Sample images (class 9) from NUS hand posture dataset-II (data subset A), showing the variations in hand posture sizes and appearances.
Sample images (class 9) from NUS hand posture dataset-II (data subset A), showing the variations in hand posture sizes and appearances.

Related works’ Classifiers employed in SLR using DL_

AuthoryearInput modalityClassifierresult
[129]2018StaticDCNN92.4%
[131]2018StaticDCNN99.85%
[133]2018StaticDCNN85.3 %
[134]2018Staticrestricted Boltzmann machine98.13 %
[135]2018IsolatedLRCNs and 3D CNNs99 %
[136]2018StaticDAN73.4%
[137]2018Static(CNNs) of variant depth sizes and stacked denoising autoencoders92.83%
[139]2018StaticDCNN82.5%
[142]2018StaticDCNN90.3 %
[145]2018IsolatedDCNN88.59%
[146]2018ContinuesCNN-HMM hybrid7.4 error
[147]2018StaticDCNN98.05 %
[151]2018Isolated3DCNN, and enhanced fully connected (FCRNN)69.2 %
[155]2019ContinuesDeep Capsule networks and game theory92.50%
[156]2019ContinuesHierarchical Attention Network (HAN) and Latent Space82.7 %
[157]2019StaticDCNN93.667%
[160]2019StaticDCNN97 %
[161]2019ContinuesDCNN2.80 WER
[162]2019Continues IsolatedModified LSTM72.3%89%
[167]2019IsolatedDCNN based Dense NET90.3 %
[168]2019StaticDCNN97.71%
[176]2020StaticDCNN90%
[181]2020StaticDCNN97.6%
[184]2020StaticEight CNN layers+ stochastic pooling, batch normalization and dropout89.32 %
[185]2020IsolatedCascaded model (SSD, CNN, LSTM)98.42 %
[187]2020StaticDeep Elman recurrent neural network98.89 %
[188]2020StaticDCNN93%
[190]2020StaticEnhanced Alex Net89.48%
[198]2020StaticMultimodality fine-tuned VGG16 CNN+ Leap Motion network82.55%
[199]2020ContinuesMulti-channel CNN10.8 WER
[200]2020StaticHybrid model based on the Inception v3+ SVM99.90%
[201]2020Static11 Layer CNN95%
[205]2021StaticThree-layered CNN model90.8%
[206]2021IsolatedHybrid deep learning with convolutional (LSTM)+ and BiLSTM.76.21%
[209]2021IsolatedDCNN+ Sentiment analysis99.63%
[211]2021ContinuesGRU+LSTM19.56error
[214]2021IsolatedGeneric temporal convolutional network77.42%
[215]2021StaticDCNN96.65%
[216]2021StaticDCNN99.7%
[220]2021StaticPretrained InceptionV3+ Mini-batch gradient descent optimizer85%
[221]2021StaticApply the PSO algorithm to find the optimal parameters of the convolutional neural networks99.58%
[223]2021ContinuesVisual hierarchy to lexical sequence alignment network H2SNet91.72%
[227]2021StaticNovel lightweight deep learning model based on bottleneck motivated from deep residual learning99.52%
[228]2021ContinuesNovel hyperparameter based optimized Generative Adversarial Networks (H-GANs)97%
[229]2021Isolated3DCNN88.24%
[232]2021ContinuesBidirectional encoder representations from transformers (BERT) + ResNet23.30 WER
[234]2021ContinuesGenerative Adversarial Network (SLRGAN)23.4 WER
[238]2021StaticDCNN97%
[239]2022StaticOptimized DCNN hybridization of Electric Fish Optimization (EFO), and Whale Optimization Algorithm (WOA) called Electric Fish based Whale Optimization Algorithm (E-WOA).98.7%
[241]2022IsolatedCNN+ RNN98.8%
[242]2022StaticModified CapsNet architecture, (SLR-CapsNet)99.60%
[245]2022StaticDCNN99.52%
[247]2022StaticDCNN+ diffGrad optimizer88.01%
[250]2022StaticDCNN92%
[251]2022StaticDCNN99.38%
[252]2022StaticLightweight CNN94.30%
[254]2022IsolatedHybrid model based on VGG16-BiLSTM83.36%

Related works on SLR using DL that address overfitting problem_

Author(s)YeardatasetModeltechniqueresult
[129]2018NTUDCNNAugmentation92.4%
[130]2018CollectedModified VGG netDropout84.68%
[132]2018Ishara-LipiDCNNDropout94.88%
[133]2018CollectedDCNNsmall convolutional filter sizes, Dropout, and learning strategy85.3%
[136]2018HUSTDeep Attention Network (DAN)data augmentation73.4%
[142]2018ASL Finger Spelling ADNNDense Net90.3%
[143]2018Collected3DCNNSGD88.7%
[146]2018SIGNUMCNN-HMM hybridAugmentation7.4 error
[157]2019CollectedDCNNAugmentation93.667%
[79]2019CollectedResNet-152batch size, Augmentation55.28%
[163]2019CollectedVGG-16Dropout95%
[166]2019CollectedDCNNAugmentation95.83%
[167]2019CollectedDCNNDense Net90.3%
[171]2019CollectedLSTMIncrease hidden state number94.7%
[172]2019NVIDIASqueeze-netAugmentation83.29%
[173]2019G3DFour stream CNNSharing of multi modal features with RGB spatial features during training and drop out86.87%
[175]2019CollectedDCNNAugmentation98.9%.
[176]2020CollectedDCNNPooling Layer90%
[181]2020CollectedDCNNReduce epochs to 30, and dropout added after each maxpooling97.6%
[184]2020CollectedCNN with 8 layersAugmentation89.32 %
[188]2020MNISTCNNDropout93%
[190]2020CollectedEnhanced Alex NetAugmentation89.48%
[191]2020CollectedSVMAugmentation, and k-fold cross validation99.9%
[193]2020KETICNN+LSTMNew data augmentation96.2%
[194]2020CollectedVGG16, and ResNet152 with enhanced softmax layerAugmentation99%
[196]2020CollectedRNN-LSTMdropout layer (DR)99.81%
[201]2020CollectedCNNdropout layer, and augmentation95%
[203]2020NTU2 stream CNNrandomness in the features interlocking fusion with dropout93.01%
[207]2021Jochen-Triesch’sDCNNtwo dropouts99.96%
[214]2021CollectedGeneric temporal convolutional network (TCN)Dropout77.42%
[215]2021CollectedDCNNDropout96.65%
[216]2021CollectedDCNNCyclical learning rate method99.7%
[217]2021MUModified AlexNet and VGG16Augmentation99.82%
[222]2021CollectedCNNDropout97.62%
[229]2021Collected3DCNNDropout & Regularization88.24%
[236]2021CollectedResNet-18Zero-patience stopping criteria93.4%
[238]2021CollectedDCNNSynthetic Minority Oversampling Technique (SMOTE)97%
[240]2022CollectedDCNNAugmentation99.67%
[253]2022CollectedResNet50-BiLSTMAugmentation99%
[256]2022CollectedLSTM, and GRUDropout97%
[263]2022BdSLCNNAugmentation99.91%

Public sign language datasets

DatasetLanguageEquipmentModalitiesSignersSamples
ASL alphabets [45]AmericanWebcamRGB images-87,000
MNIST [46]AmericanWebcamGrey images-27,455
ASL Fingerspelling A [47]AmericanMicrosoft KinectRGB and depth images548,000
NYU [48]AmericanKinectRGB and depth images3681,009
ASL by Surrey [49AmericanKinectRGB and depth images23130,000
Jochen-Triesch [50]AmericanCamGrey images with different background24720
MKLM [51]AmericanLeap Motion device and a Kinect sensorRGB and depth images141400
NTU-HD [52]AmericanKinect sensorRGB and depth images101000
HUST [53]AmericanMicrosoft KinectRGB and depth images1010880
RVL-SLLL [54]AmericanCamRGB video14
ChicagoFSWild [55]AmericanCollected online from YouTubeRGB video1607,304
ASLG-PC12 [56]AmericanCamRGB video-880
American Sign Language Lexicon Video (ASLLVD) [57]AmericanCamRGB videos of different angles63,300
MU [58]AmericanCamRGB images with illumination variations in five different angles52515
ASLID [59]AmericanWeb camRGB images6809
KSU-SSL [60]ArabicCam and KinectRGB Videos with uncontrolled environment4016000
KArSL [61]ArabicKinect V2RGB video375,300
ArSL by University of Sharjah [62]ArabicAnalog camcorderRGB images33450
JTD [63]IndianWebcamRGB images with 3 different backgrounds24720
IISL2020 [64]IndianWebcamRGB video with uncontrolled environment1612100
RWTH-PHOENIX-Weather 2014 [65]GermanWebcamRGB Video98,257
SIGNUM [66]GermanCamRGB Video2533210
DEVISIGN-D [67]ChineseCamRGB videos86000
DEVISIGN-L [67]ChineseCamRGB videos824000
CSL-500 [68]ChineseCamRGB, depth and skeleton videos5025,000
Chinese Sign Language [69]ChineseKinectRGB, depth and skeleton videos50125000
38 BdSL [70]BengaliCamRGB images32012,160
Ishara-Lipi [71]BengaliCamGreyscale images-1800
ChaLearn14 [72]ItalianKinectRGB and depth video940940
Montalbano II [73]ItalianKinectRGB and depth video20940
UFOP–LIBRAS [74]BrazilianKinectRGB, depth and skeleton videos52800
AUTSL [75]TurkishKinect v2RGB, depth and skeleton videos4338,336
RKS-PERSIANSIGN [76] inPersianCamRGB video1010,000
LSA64 [77]ArgentineCamRGB video103200
Polytropon (PGSL) [78]GreekCamRGB video6840
kETI [79]KoreanCamRGB video4014,672

Gesture public datasets

NameModalitydevicesignerssamples
LMDHG [82]RGB, and depth videosKinect and21608
SHREC Shape Retrieval Contest (SHREC) [83]RGB, and depth videosIntel RealSense short range depth camera282800
UTD–MHAD [84]RGB, depth and skeleton videosKinect and wearable inertial sensor8861
The Multicamera Human Action Video Data (MuHAVi) [85]RGB video8 camera views141904
NUMA [86]RGB, depth and skeleton videos10 Kinect with three different views101493
WEIZMANN [87]Low resolution RGB videoCamera with 10 different viewpoints990
NTU RGB [88]RGB, depth and skeleton videosKinect4056 880
Cambridge hand gesture [89]RGB video captured under five different illuminationsCam9900
VIVA [90]RGB, and depth videosKinect8885
MSR [91]RGB, and depth videosKinect10320
CAD-60 [92]RGB and depth video in different environments, such as a kitchen, a living room, and officeKinect448
HDM05MoCap (motion capture) [93]RGB videoCam52337
CMU [94]RGB imagesCAM25204
isoGD [95]RGB and depth videosKinect2147,933
NVIDIA [96]RGB and depth videoKinect8885
G3D [97]RGB and depth videoKinect161280
UT Kinect [98]RGB and depth videoKinect10200
First-Person [99]RGB and depth videoRealSense SR300 cam61,175
Jester [100]RGBCam25148,092
Ego Guster [101]RGB and depth videoKinect502,081
NUS II [102]RGB images with complex backgrounds, and various hand shapes and sizesCam402000

Related works on SLR using DL that address movement orientation, trajectory, occlusion problems_

Author(s)YearType of variationlanguageSigning modeModelAccuracyError Rate
[129]2018similarities, and occlusionAmericanStaticDCNN92.4%
[135]2018MovementBrazilianIsolatedLong-term Recurrent Convolutional Networks99%-
[138]2018size, shape, and position of the fingers or handsAmericanStaticCNN82%-
[140]2018Hand movementAmericanIsolatedVGG 1699%-
[144]2018MovementAmericanIsolatedLeap Motion Controller88.79%-
[145]20183D motionIndianIsolatedJoint Angular Displacement Maps (JADMs)92.14%
[150]2018head and hand movementsIndianContinuesCNN92.88 %-
[155]2019Hand movementIndianContinuesWearable systems to measure muscle intensity, hand orientation, motion, and position92.50%-
[156]2019Variant hand orientationsChinesContinuesHierarchical Attention Network (HAN) and Latent Space82.7%-
[165]2019Similarity and trajectoryChinesIsolatedDeep 3-d Residual ConvNet + BiLSTM89.8%-
[166]2019orientation of camera, hand position and movement, inter hand relationVietnamIsolatedDCNN95.83%
[173]2019Movement, self-occlusions, orientation, and anglesIndianContinuesFour stream CNN86.87%
[174]2019Movement in different distance from the cameraAmericanStaticNovel DNN97.29%-
[176]2020Angles, distance, object size, and rotationsArabicStaticImage Augmentation90%0.53
[180]2020fingers' configuration, hand's orientation, and its position to the bodyArabicIsolatedMultilayer perceptron+ Autoencoder87.69%
[185]2020Hand MovementPersianIsolatedSingle Shot Detector (SSD) +CNN+LSTM98.42%
[186]2020shape, orientation, and trajectoryGreekIsolatedFully convolutional attention-based encoder-decoder95.31%-
[192]2020TrajectoryGreekIsolatedincorporate the depth dimension in the coordinates of the hand joints93.56%-
[195]2020finger angles and Multi finger movementsTaiwanContinuesWristband with ten modified barometric sensors+ dual DCNN97.5%
[196]2020movement of fingers and handsChineseIsolatedMotion data from IMU sensors99.81%-
[197]2020finger movementChineseIsolatedTrigno Wireless sEMG acquisition system used to collect multichannel sEMG signals of forearm muscles93.33%
[199]2020finger and arm motions, two-handed signs, and hand rotationChineesContinuesTwo armbands embedded with an IMU sensor and multi-channel sEMG sensors are attached on the forearms to capture both arm, and finger movements-10.8%
[76]2020Hand occlusionPersianIsolatedSkeleton detection99.8%
[204]2020TrajectoryBrazilianIsolatedConvert the trajectory information into spherical coordinates64.33%
[210]2021TrajectoryArabicIsolatedMulti-Sign Language Ontology (MSLO)94.5%
[213]2021MovementKoreanIsolated3DCNN91%
[214]2021finger movementChinesIsolatedDesign a low-cost data glove with simple hardware structure to capture finger movement and bending simultaneously77.42%
[218]2021Skewing, and angle rotationBengaliStaticDCNN99.570.56
[219]2021Hand motionAmericanContinuesSensing Gloves86.67%
[223]2021spatial appearance and temporal motionChinesContinuesLexical prediction network91.72%6.10
[226]2021finger self-occlusions, view invarianceIndianContinuesMotion modelled deep attention network (M2DA-Net)84.95%
[228]2021Occlusions of hand/hand, hands/face, or hands/upper body postures.AmericanContinuesNovel hyperparameter based optimized Generative Adversarial Networks (H-GANs) Deep Long Short-Term Memory (LSTM) as generator and LSTM with 3D Convolutional Neural Network (3D-CNN) as a discriminator97%1.4
[230]2021Variant viewAmericanIsolated3-D CNN’s cascaded96%
[233]2021Hand occlusion,ItalianIsolatedLSTM+CNN99.08%
[237]2021Finger occlusion, motion blurring, variant signing styles.ChinesContinuesDual Network up on a Graph Convolutional Network (GCN).98.08%
[239]2022self-structural characteristics, and occlusionIndianContinuesDynamic Time Warping (DTW)98.7%
[240]2022High similarity and complexityAmericanStaticDCNN99.67%0.0016
[241]2022MovementArabicIsolatedThe difference function98.8%
[259]2022Hand OcclusionAmericanStaticRe-formation layer in the CNN91.40%
[260]2022Trajectory, hand shapes, and orientationAmericanIsolatedMedia Pipe’s Landmarks with GRU99%
[261]2022ambiguous and 3D double-hand motion trajectoriesAmericanIsolated3D extended Kalman filter (EKF) tracking, and approximation of a probability density function over a time frame.97.98%
[262]2022MovementTurkishContinuesMotion History Images (MHI) generated from RGB video frames94.83%
[264]2022MovementArgentinaContinuesPropose an accumulative video motion (AVM) technique91.8%
[269]2022orientation angle, prosodic, and similarityAmericancontinuesDevelop robust fast fisher vector (FFV) in in Deep Bi-LSTM98.33%
[270]2022variant length, sequential patterns,EnglishIsolatedNovel Residual-Multi Head model95.03%

Related works on SLR using DL that aim to achieve generalization_

Author(s)YearDatasetsTechniqueResult
[129]2018ASL finger spelling ANTUDCNN92.4%99.7%
[134]2018NYUMUASL Fingerspelling AASL SurreyRestricted Boltzmann Machine (RBM)90.01%99.31%98.13%97.56%
[136]2018NTUHUSTDAN98.5%73.4%
[143]2018Collected CSLChaLearn143D-CNN88.7%95.3%
[145]2018Collected MD05CMUJADM+CNN88.59%87.92%87.27%
[146]2018RWTH 2012RWTH 2014SIGNUMCNN-HMM hybrid30.0 WER32.57.4
[156]2019CollectedRWTH-2014Hierarchical Attention Network (HAN) + Latent Space LS-HAN82.7%61.6%
[161]2019RWTH-2014SIGNUMDCNN22.86 WER2.80
[164]2019CSLIsoGDProposed multimodal two-stream CNN96.7%63.78%
[165]2019DEVISIGN-DCollectedDeep 3-d Residual ConvNet + BiLSTM89.8%86.9%
[170]2019KSU-SSLArSLRVL-SLLL3D-CNN77.32%34.90%70%
[173]2019Collected RGB-DMSRUT KinectG3DFour stream CNN86.87%86.98%85.23%88.68%
[174]2019Jochen-TrieschMKLMNovel SI-PSLNovel DNN97.29%96.8%51.88%
[182]2020KSU-SSLArSL by University of SharjahRVL-SLLL3DCNN84.38%34.9%70%
[186]2020PGSLChicagoFSWildRWTH 2014TDCNN95.31%92.63%76.30%
[187]2020ASLMUDeep Elman recurrent neural network98.89%97.5%
[192]2020GSLChicagoFSWildCNN93.56%91.38%
[76]2020NYUFirst-Person, RKS-PERSIANSIGNCNN4.64 error91.12%99.8%
[202]2020NUSAmerican fingerspelling ADCNN94.7%99.96%
[203]2020HDM05CMUNTUCollected2 stream CNN93.42%92.67%94.42%93.01%
[204]2020UTD–MHADIsoGDCollectedlinear SVM classifier94.81%67.36%64.33%
[207]2021Collected RGB images.Jochen-Triesch’sDCNN99.96%100%
[210]2021LSA64LSACollected3DCNN98.5%99.2 %94.5%
[211]2021ASLG-PC12RWTH-2014GRU and LSTM Bahdanau and Luong’s attention mechanisms66.59%19.56% BLEU
[221]2021ASL alphabet, ASL MNIST MSLOptimized CNN based on PSO99.58%99.58%99.10%
[225]2021KSU-ArSLJesterNVIDIAInception-BiLSTM84.2%95.8%86.6%
[226]2021CollectedNTUMuHAVi,WEIZMANNNUMAMotion modelled deep attention network (M2DA-Net)84.95%89.98%85.12%82.25%88.25%
[228]2021RWTH-2014ASLLVDNovel hyperparameter based optimized Generative.Adversarial Networks (H-GANs)73.9%97%
[232]2021RWTH-2014CollectedBidirectional encoder representations from transformers (BERT) + ResNet20.123.30 WER
[233]2021Montalbano IIisoGDMSRCAD-60LSTM+CNN99.08%86.10%98.40%95.50%
[234]2021RWTH2014(CSL)(GSL)GAN23.42.12.26
[237]2021CSL-500DEVISIGN-LDual Network up on a Graph Convolutional Network (GCN).98.08%64.57%
[242]2022SLDDMNISTModified Caps Net architecture (SLR-Caps Net)99.52%99.60%
[243]2022RKS-PERSIANSIGNFirst-PersonASVIDisoGDSingle shot detector, 2D convolutional neural network, singular value decomposition (SVD), and LSTM99.5%91%93%86.1%
[247]2022CollectedCollectedASL finger spellingDCNN+ diffGrad optimizer92.43%88.01%99.52%
[248]202238 BdSLCollectedIshara-LipiBenSignNet94.00%99.60%99.60%
[251]2022CollectedCollectedCollectedDCNN99.41%99.48%99.38%
[254]2022CollectedCambridge hand gestureHybrid model based on VGG16-BiLSTM83.36%97%
[255]2022CollectedMNIST,JTDNUSHybrid Fist CNN97.89%,95.68%94.90%95.87%
[256]2022ASLGSLAUTSLIISL2020LSTM+GRU95.3%94%95.1%97.1%
[261]2022CollectedSHRECLMDHGDLSTM97.98%96.99%97.99%
[262]2022AUTSLCollected3D-CNN93.53%94.83%
[265]2022CSL-500JesterEgo Gesturedeep R (2+1) D97.45%97.05%94%
[266]2022MUHUST-ASLend-to-end fine-tuning method of a pre-trained CNN model with score-level fusion technique98.14%64.55%
[269]2022SHRECCollectedLMDHGFFV-Bi-LSTM92.99%98.33%93.08%

Related works on SLR using DL that address the various environmental conditions problem_

Author (s)YearLanguageModalityType of conditionDeal with techniqueresults
[130]2018BengaliRGB imagesVariant background and skin colorsModified VGG net84.68%
[134]2018AmericanRGB imagesnoise and missing dataAugmentation98.13%
[150]2018IndianRGB videoDifferent viewing angles, background lighting, and distanceNovel CNN92.88%
[158]2019AmericanBinary imagesNoiseErosion, closing, contour generation, and polygonal approximation,96.83%
[159]2019AmericanDepth imageVariant illumination, and backgroundAttain depth images88.7%
[164]2019chinesRGB, and depth videoVariant illumination, and backgroundTwo-stream spatiotemporal network96.7%
[173]2019IndianRGB, and depth videoVariant illumination, background, and camera distanceFour stream CNN86.87%
[178]2020ArabicRGB imagesVariant illumination, and skin colorDCNN94.31%
[179]2020ArabicRGB videosVariant illumination, background, pose, scale, shape, position, and clothesBi-directional Long Short-Term Memory (BiLSTM)89.59%
[180]2020ArabicRGB VideosVariant illumination, clothes, position, scale, and speed3DCNN and SoftMax function87.69%
[182]2020ArabicRGB VideosVariations in heights and distances from cameraNormalization84.3%
[194]2020ArabicRGB imagesvariant illumination, and backgroundVGG16 and the ResNet152 with enhanced softmax layer99%
[201]2020AmericanGrayscale imagesillumination, and skin colorSet the hand histogram95%
[202]2020AmericanRGB imagesVariant illumination, backgroundDCNN99.96%
[206]2021IndianRGB videoVariant illuminations, camera positions, and orientationsGoogle net+ BiLSTM76.21%
[207]2021IndianRGB imagesLight and dark backgroundsDCNN with few numbers of parameters99.96%
[209]2021AmericanRGB videoNoiseGaussian Blur99.63%
[213]2021KoreanDepth VideosLow resolutionAugmentation91%
[224]2021BengaliRGB imagesVariant backgrounds, camera angle, light contrast, and skin toneConventional deep learning + Zero-shot learning ZSL93.68%
[225]2021ArabicRGB videoVariant illumination, background, and clothesInception-BiLSTM84.2%
[227]2021AmericanThermal imagesVarying illuminationAdopt live images taken by a low-resolution thermal camera99.52%
[229]2021IndianRGB videoVarying illumination3DCNN88.24%
[230]2021AmericanRGB videoNoise, varying illuminationMedian filtering + histogram equalization96%
[236]2021ArabicRGB imagesVariant illumination, and backgroundRegion-based Convolutional Neural Network (R-CNN)93.4%
[239]2022IndianRGB videoVariant illumination, and viewsGrey scale conversion and histogram equalization98.7%
[241]2022ArabicRGB videoVariant illumination, and backgroundCNN+ RNN98.8%
[249]2022ArabicGreyscale imagesVariant illumination, and backgroundSobel filter97%
[253]2022ArabicRGB, and depth videoVariant BackgroundResNet50-BiLSTM99%
[259]2022AmericanRGB, and depth imagesNoise and illumination variationMedian filtering and histogram equalization91.4%
[261]2022AmericanSkeleton videoNoise in video framesAn innovative weighted least square (WLS) algorithm97.98%
[270]2022EnglishWi-Fi signalNoise and uncleaned Wi-Fi signals.Principal Component Analysis (PCA)95.03%

Related works on SLR using DL that address feature extraction problem_

Author(s)YearDatasetTechniqueSigning modeFeature(s)Result
[130]2018CollectedDCNNstaticHand shape84.6%
[135]2018Collected3D CNNIsolatedspatiotemporal99%
[138]2018ASL Finger SpellingCNNStaticdepth and intensity82%
[141]2018RWTH-20143D Residual Convolutional Network (3D-ResNet)ContinuesSpatial information, and temporal connections across frames37.3WER
[143]2018Collected3D-CNNsIsolatedspatiotemporal88.7%
[144]2018CollectedDCNNIsolatedhand palm sphere radius, and position of hand palm and fingertip88.79%
[149]2018ASL Finger SpellingHistograms of oriented gradients, and Zernike momentsStaticHand shape94.37%
[150]2018CollectedCNNContinuesHand shape92.88 %
[151]2018Collected3DRCNNContinues/Isolatedmotion, depth, and temporal69.2%
[152]2018SHRECLeap Motion Controller (LMC) sensorIsolated, staticfinger bones of hands.96.4%
[153]2018CollectedHybrid Discrete Wavelet Transform, Gabor filter, and histogram of distances from Centre of MassStaticHand shape76.25%
[154]2018CollectedDCNNStaticFacial expressions89%
[156]2019CollectedTwo-stream 3-D CNNContinuesSpatiotemporal82.7%
[158]2019CollectedCNNStaticHand shape96.83%
[79]2019CollectedOpen Pose libraryContinueshuman key points (hand, face, body)55.2%
[159]2019ASL fingerspellingPCA NetStatichand shape (corners, edges, blobs, or ridges)88.7%
[161]2019SIGNUMStacked temporal fusion layers in DCNNContinuesspatiotemporal2.80WER
[162]2019CollectedLeap motion deviceContinues Isolated3D positions of the fingertips72.3%89%
[163]2019CollectedCNNStaticHand shape95%
[164]2019CSLD-shift NetContinuesspatial features time features, and temporal.96.7%
[165]2019DEVISIGN_DB3D Res-NetIsolatedspatiotemporal89.8%
[166]2019CollectedLocal and GIST DescriptorIsolatedSpatial and scene-based features95.83%
[169]2019CollectedRestricted Boltzmann Machine (RBM)IsolatedHandshape, and network generated features88.2%
[170]2019KSU-SSL3D-CNNIsolatedhand shape, position, orientation, and temporal dependence in consecutive frames77.32%
[171]2019CollectedC3D, and Kinect deviceContinuesTemporal, and Skeleton94.7%
[175]2019CollectedOpen Pose library with Kinect V2Static3D skeleton98.9%.
[177]2020Ishara-LipiMobile Net V1IsolatedTwo hands shape95.71%
[178]2020CollectedDCNNStaticHand shape94.31%.
[179]2020CollectedSingle layer Convolutional Self-Organizing Map (CSOM)IsolatedHand shape89.59%
[180]2020KSU-SSLEnhanced C3D architectureIsolatedSpatiotemporal of hand and body87.69 %
[182]2020KSU-SSL3DCNNIsolatedSpatiotemporal84.3%
[185]2020CollectedResNet50 modelIsolatedHand shape, Extra Spatial hand Relation (ESHR) features, and Hand Pose (HP), temporal.98.42%
[186]2020Polytropon (PGSL)ResNet-18IsolatedOptical flow of skeletal, handshapes, and mouthing95.31%
[187]2020CollectedDiscrete cosines transform, Zernike moment, scale-invariant feature transform, and social ski driver optimization algorithmStaticHand shape98.89%
[189]2020RWTH-2014Temporal convolution unit and dynamic hierarchical bidirectional GRU unitContinuesspatiotemporal10.73% BLEU
[191]2020CollectedStandard score’ normalization on the raw Channel State Information (CSI) acquired from the Wi-Fi device, and MIFS algorithmStatic, and continuesThe cross-cumulant features (unbiased estimates of covariance, normalized skewness, normalized kurtosis)99.9%
[192]2020GSLOpen Pose human joint detectorIsolated3D hand skeletal, and region of hand, and mouth93.56%
[197]2020CollectedFour channel surface electromyography (sEMG) signalsIsolatedtime-frequency joint features93.33%
[199]2020CollectedEuler angle, Quaternion from IMU signalContinuesHand Rotation10.8% WER
[76]2020RKS-PERSIANSIGN3DCNNsIsolatedSpatiotemporal99.8%
[202]2020ASL fingerspelling ADCNNStaticHand Shape99.96%
[203]2020CollectedConstruct a color-coded topographical descriptor from joint distances and angles, to be used in 2 streams (CNN)Isolateddistance and angular93.01%
[204]2020CollectedTwo CNN models and a descriptor based on Histogram of cumulative magnitudesIsolatedTwo hands, skeleton, and body64.33%
[208]2021RWTH-2014TSemantic Focus of Interest Network with Face Highlight Module (SFoI-Net-FHM)IsolatedBody and facial expression10.89Bleu
[210]2021Collected(ConvLSTM)IsolatedSpatiotemporal94.5%
[212]2021CollectedResNet50Statichand area, the length of axis of first eigenvector, and hand position changes.96.42%.
[214]2021Collectedf-CNN (fusion of 1-D CNN and 2-D CNNIsolatedTime and spatial-domain features of finger resistance movement77.42%
[217]2021MUModified Alex Net and VGG16StaticHand edges and shape99.82%
[222]2021CollectedVGG net of six convolutional layersStaticHand shape97.62%
[224]202138 BdSLDenseNet201, and Linear Discriminant AnalysisStaticHand shape93.68%
[225]2021KSU-ArSLBi-LSTMIsolatedspatiotemporal84.2%
[226]2021CollectedPaired pooling network in view pair pooling net (VPPN)Isolatedspatiotemporal84.95%
[228]2021ASLLVDBayesian Parallel Hidden Markov Model (BPaHMM) + stacked denoising variational autoencoders (SD-VAE) + PCAContinuesShape of hand, palm, and face, along with their position, speed, and distance between them97%
[230]2021ASLLVD3-D CNN’s cascadedIsolatedspatiotemporal96.0%
[231]2021Collectedleap motion controllerStatic, and Isolatedsphere radius, angles between fingers their distance91.82%
[232]2021RWTH-2014(3 C 2 C 1) D ResNetContinuesheight, motion of hand, and frame blurriness levels23.30WER
[233]2021Montalbano IIAlexNet + Optical Flow (OF) + Scene Flow (SF) methodsIsolatedPixel level, and hand pose99.08%
[234]2021RWTH-2014GANContinuesspatiotemporal23.4WER
[235]2021MNISTDCNNStaticHand shape98.58%
[236]2021CollectedR-CNNStaticHand shape93%
[237]2021CSL-500Multi-scale spatiotemporal attention network (MSSTA)IsolatedSpatiotemporal98.08%
[242]2022MNISTmodified CapsNetStaticSpatial, and orientations99.60%
[243]2022RKS-PERSIANSIGNSingular value decomposition SVDIsolated3D hand key points between the segments of each finger, and their angles.99.5%
[244]2022Collected2DCRNN + 3DCRNNContinuesSpatiotemporal out of small patches99%
[246]2022CollectedAtrous convolution mechanism, and semantic spatial multi-cue modelStatic Isolatedpose, face, and hand, and Spatial, full frame,99.85%
[253]2022Collected4 DNN models using 2D and 3D CNNIsolatedSpatiotemporal99%
[255]2022CollectedScale-Invariant Feature Transformation (SIFT)StaticCorner, edges, rotation, blurring, and illumination.97.89%
[256]2022CollectedInceptionResNetV2IsolatedHand shape97%
[257]2022CollectedAlex netStaticHand shape94.81%
[258]2022CollectedSensor + mathematical equations+ CNNContinuesMean, Magnitude of Mean, Variance, correlation, Covariance, and frequency domain features+ spatiotemporal0.088WER
[260]2022CollectedMedia Pipe frameworkIsolatedhands, body, and face99%
[261]2022CollectedBi-RNN network, maximal information correlation, and leap motion controllerIsolatedhand shape, orientation, position, and motion of 3D skeletal videos.97.98%
[264]2022LSA64dynamic motion network (DMN)+ Accumulative motion network (AMN)Isolatedspatiotemporal91.8%
[265]2022CSL-500Spatial–temporal–channel attention (STCA) is proposedisolatedspatiotemporal97.45%
[268]2022CollectedSURF (Speeded Up Robust Features)Isolateddistribution of the intensity material within the neighborhood of the interest point99%
[269]2022CollectedThresholding and Fast Fisher Vector Encoding (FFV)IsolatedHand, palm, finger shape, and position and 3D skeletal hand characteristics98.33%

Related works on SLR using DL that address segmentation problem_

Author(s)YearInput ModalitySegmentation methodResults
[131]2018RGB imageHSV color model99.85%
[148]2018RGB imageSkin segmentation algorithm based on color information94.7%
[149]2018RGB imagesk-means-based algorithm94.37%
[158]2019RGB imagesColor segmentation by MLP network96.83%
[159]2019Depth imageWrist line localization by algorithm-based thresholding88.7%
[164]2019RGB, and depth videoAligned Random Sampling in Segments (ARSS)96.7%
[168]2019RGB, and depth imagesDepth based segmentation using data of Kinect RGB-D camera97.71%
[171]2019RGB videoDesign an adaptive temporal encoder to capture crucial RGB visemes and skeleton signees94.7%
[179]2020RGB videosHand semantic Segmentation named as DeepLabv3+89.59 %
[180]2020RGB VideosNovel method based on open pose87.69 %
[182]2020RGB VideosViola and Jones, and human body part ratios84.3%
[183]2020RGB imagesRobert edge detection method99.3 %
[185]2020RGB videoSSD is a feed-forward convolutional network A Non-Maximum Suppression (NMS) step is used in the final step to estimate the final detection98.42%
[187]2020RGB imagesSobel edge detector, and skin color by thresholding98.89%
[188]2020RGB imagesOpen-CV with a Region of Interest (ROI) box in the driver program93%
[189]2020RGB VideosFrame stream density compression (FSDC) algorithm10.73 error
[199]2020RGB VideosDesign an attention-based encoder-decoder model to realize end-to-end continuous SLR without segmentation10.8% WER
[200]2020RGB imagesSingle Shot Multi Box Detection (SSD)99.90%
[209]2021RGB VideoCanny99.63%
[216]2021RGB imagesErosion, Dilation, and Watershed Segmentation99.7 %
[219]2021RGB VideoData sliding window86.67%
[236]2021RGB imagesR-CNN93%
[239]2022RGB videosNovel Adaptive Hough Transform (AHT)98.7%
[246]2022RGB images, and videoGrad Cam and Cam shift algorithm99.85%
[248]2022Grey imagesYCbCr, HSV and watershed algorithm99.60%,
[249]2022RGB imagesSobel operator method97 %
[263]2022RGB imagesSemantic99.91%
[267]2022RGB imagesR-CNN99.7%
[268]2022RGB videoMask is created by extracting the maximum connected region in the foreground assuming it to be the hand+ Canny method99%
Language: English
Page range: 77 - 116
Submitted on: May 27, 2024
Accepted on: Jun 5, 2024
Published on: Jun 15, 2024
Published by: Future Sciences For Digital Publishing
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2024 Shahad Thamear Abd Al-Latief, Salman Yussof, Azhana Ahmad, Saif Khadim, published by Future Sciences For Digital Publishing
This work is licensed under the Creative Commons Attribution 4.0 License.