Have a personal or library account? Click to login
Novel framework for dyslexia diagnosis in children in Al Kharj region with super-resolution generative adversarial network and transfer learning technique Cover

Novel framework for dyslexia diagnosis in children in Al Kharj region with super-resolution generative adversarial network and transfer learning technique

Open Access
|May 2025

Full Article

I.
Introduction

Dyslexia is a common learning disability observed in children in the early years of their school education. Children diagnosed with dyslexia show symptoms of poor reading ability, confusion in understanding the shapes of letters, and writing the letters as mirror images in their notes. They also struggle to follow the sequence of instructions and hence face difficulty in planning and organizing any task given to them [1]. Academic excellence is challenging for dyslexic children due to weak reading and cognitive skills. They have difficulty copying lessons written on the board. Some children with dyslexia also have issues such as attention deficit and speech delays [2]. Research shows that dyslexia occurs in children because of genetic factors such as premature birth or low birth weight [3]. Children with dyslexia have a high risk of suffering from attention-deficit/hyperactivity disorder (ADHD) [4]. In the early days, dyslexia was visualized as a reading blindness disorder. Around 10% of the world’s population has dyslexia and many are unaware of it. The world population is 7.8 billion, and around 780 million of them are dyslexic [5]. More than 40 million adults in the United States irrespective of men and women are affected by dyslexia. Statistics show that in the United States dyslexia is prevalent in one out of every five students. In the United Kingdom, 1 out of every 10 individuals is seen to have varying degrees of dyslexia [6]. Around 6.3 million in the United Kingdom live with dyslexia and reading disability [7]. The statistics indicate that 25%–40% of children with dyslexia also have ADHD [8]. In Saudi Arabia over 700 million individuals are at risk of dyslexia [9]. In 2015, a cross-sectional observational study carried out in public and private girls’ schools in the city of Riyadh revealed some interesting facts. The study indicates that 23.89% have learning difficulties, where dyslexia amounts to 31.4% and dysgraphia to 27.3%. In the target group of this study, around 68% of students with learning difficulties scored below average marks in the class compared to 14.7% of non-dyslexic children. These statistical results clearly state that there is a direct correlation between academic performance and learning difficulty. The study proves that dyslexia had a radical effect on the academic achievements of the children [10]. Specialist tutors, qualified to teach students with SpLD, render professional help to dyslexic children enabling them to cope academically and perform at par with non-dyslexic children. Detecting dyslexia at an early age of schooling allows parents to provide special help to their children and improves their academic caliber in a significant way. Dyslexia can be detected by reading tests, cognitive tests, questionnaires for parents, and students’ handwritten images [11]. Dyslexic children write certain letters such as f, d, b, and e as the mirror images of the original letters [12]. This is one of the significant biomarkers for dyslexia detection from handwritten images. The handwritten notes of the children indicate that the child is suffering from dyslexia and needs urgent attention and professional help. These children are often labeled as lazy and incapable by teachers as they cannot compete with their peers in the class. This results in a drastic effect on a child’s confidence and mental health. There are no benchmark methods of diagnosing dyslexia at an early age as the written and cognitive tests conducted by schools are very subjective rather than objective. There are no standard tests as the test patterns, time duration, and questions in the test paper vary from school to school [13]. Tests for dyslexia detection are reading test and cognitive test on a time-bound basis. Children with a low intelligence quotient may be wrongly diagnosed as dyslexic if they are unable to meet the school academic standards. Children with dyslexia need special teaching methods and motivation to compete with their peers and to excel in their studies [14]. Though children with dyslexia suffer in the early years of schooling, they are very successful at later stages developing strong skills in science and technology. They also excel in the domains of art, sports, and entertainment. Many scientists and celebrities who are dyslexic are known for their creativity and innovative ideas. This research study focuses on developing an automated system for diagnosing dyslexia in children from the handwritten images in the English language. An extensive exploratory data analysis for the common biomarkers for dyslexia is carried out as part of the research study. The results of data analysis supported with visualization graphs and charts indicate the significant biomarker for dyslexia and its direct correlation with dyslexia detection.

II.
Related Study

This section summarizes the related study of the research papers published on automated dyslexia detection tools and frameworks. Automated dyslexia detection models are designed and implemented with datasets of neuroimages, handwritten images, cognitive tests, and specially designed questionnaires for parents. The subjective method is to examine the child for cognitive development and reading skills, and more objective methodology is to perform the neuroimaging medical tests. Several research studies are discussed in this section to detect dyslexia using neuroimaging tests. EEG data from 32 native Hebrew-speaking children from grades 6 to 7 were continuously recorded and sampled using the Biosemi ActiView software (Cortech Solutions, LLC). After preprocessing for noise removal, the ERP signals were analyzed for waveform shape in the time and frequency domains. The frequency and periodicity of the high-pass components of the signal from discrete wavelet transform are considered critical factors for the classification of dyslexic and non-dyslexic children [15]. In another study [16], EEG resting state data from 15 normal readers and 29 dyslexic children were analyzed. Connectivity matrices for multiple frequency bands were used to calculate the weights in the weighted connectivity graphs. A set of 37 input features to the SVM and KNN classifiers achieved an accuracy of 95%. Neuroimaging tests are complex methods that require sophisticated equipment and signal analysis techniques for feature extraction in the classification model. The Ober-2TM (Formerly Permobil Meditech, Inc., Woburn, MA, USA) is a goggle-based infrared corneal reflection system that tracks the eye position over time. This device records the horizontal and vertical movements of the left eye and the horizontal movement of the right eye. Experiment was standardized on the subjects by providing them with the same reading content. The framework was designed with the recursive feature elimination (RFE) method for feature selection and SVM as a classifier [17]. Medical imaging techniques for dyslexia detection are uncommon due to the lack of awareness about dyslexia among parents. The parents view the issue as a social stigma and resist accepting the fact that the child suffers from a learning disability. In the early school days, parents often ignore this learning disability and try to find help in later years of schooling when the child’s academic performance deteriorates. A simple and easy-to-use tool to detect dyslexia will encourage parents to detect the disorder at initial stages and give timely help to their children. Recently, researchers have been working toward a user-friendly methodology for dyslexic detection rather than complex neuroimaging techniques. In this landscape, the researchers decided to use handwriting, which is a simple yet effective biomarker to detect dyslexia. The handwritten images with letters written as mirror images clearly show the high chances of the child being dyslexic. In 2019, the optical character recognition (OCR) technique for feature extraction from handwritten images and custom-built ANN model exhibited a classification accuracy of 73.33% [18]. In all the earlier research studies for dyslexia detection, writing letters in a sequence was not taken into consideration for detecting dyslexia. The research study with transfer learning of CNN and positional encoding of LSTM together considering the sequential letters achieved 83% accuracy on Chinese dataset of 100,000 characters from 1,064 children [19]. The proposed research study aims at developing an automated system to detect dyslexia from handwritten images without the children undergoing neuroimaging tests. The proposed automated tool is a novel framework that includes a Super Resolution Generative Adversarial Network (SRGAN) for generating super-resolution handwritten images and transfer learning-based Artificial Intelligence (AI) model to classify the image as written by dyslexic or non-dyslexic.

III.
Proposed Methodology

Training deep learning models is time-consuming and tedious. The high computational cost incurred while training and testing a deep learning model is the main drawback in adopting this technique for new applications. Collection and compilation of training and testing data are also time-consuming tasks for any deep learning model. Every machine learning technique demands for a voluminous and standard dataset for training, testing, and validation of the model [20]. Dataset and model architecture have a direct impact in classifying or predicting the results of the model. There are domains where data collection is a tedious task and can consume a lot of time and effort. Non-availability of open-source datasets is the main hurdle in developing application-specific AI models. Datasets are not always standard, balanced, and voluminous. In this scenario, transfer learning allows the custom-built model to be developed by transferring the configuration and parameters learned from the initial model built with the standard dataset [21]. In transfer learning, the knowledge gained by training the prior model is reused for developing a new model. The pre-trained models form the basis for developing the custom model for new applications. Transfer learning alleviates the hurdles of requiring large datasets for health-care applications. Health-care custom-built AI models based on transfer learning do not require a voluminous dataset that requires a lot of time and effort [22]. In a nutshell, transfer learning finds wide applications in domains where the training data are limited. Transfer learning enhances the model’s performance when the source and target tasks are compatible. In transfer learning, the availability of preexisting training data reduces the model training time and the computation cost of developing the model from scratch. In transfer learning, models can be trained on simulations and then tested with the real-time data. In this proposed study, the collection of handwritten images is a challenging issue. The handwritten images are collected in low-illuminated conditions with shadows in a noisy background. The dataset for this study is a collection of non-similar content of handwritten images from students. This dataset is a challenge for any custom-built deep learning model to classify the data accurately as one of the labeled classes. The task may be simpler if the dataset has a large collection of well-balanced datasets with high-quality images having similar data content. Collecting such a large dataset is not a feasible task for this study. Hence, the researchers have leveraged the concept of transfer learning that builds powerful models for self-collected datasets. With transfer learning, the new model retrained with a simpler dataset is at par with the original model in terms of performance. The proposed methodology includes the generation of super-resolution handwritten images by SRGAN and classification by CNN with transfer learning. Figure 1 shows the block diagram of the proposed model.

Figure 1:

Block diagram for the classification of dyslexia from images.

a.
Dataset

The dataset includes normal handwriting images from the National Institute of Standards and Technology (NIST) 19 Database (NIST Special Database 19, 2010) [23] and self-collected dyslexia dataset. The self-collected dataset includes data collected from Cognito Academy, Coimbatore, India. The dataset includes handwritten images of 36 students from the same school. This dataset has a total of 78,275 images for the normal class: 52,196 for the reversal 8,029 for the corrected. The dataset of 150 images of handwriting from students in schools in Al Kharj, Riyadh, and Coimbatore is used for prediction of the images.

b.
Data augmentation

The performance of machine learning algorithms relies on the quality of the dataset that trains the model. Dataset has a direct impact on the performance of the model. Improving the quality of the dataset ensures the building of effective models. Data cleaning, preprocessing, and augmentation contribute to the enhancement of data quality. A low-quality dataset leads to unreliable predictions and misleading conclusions [24]. Training datasets must be versatile, diverse, and voluminous for any domain. Dyslexia detection from handwritten images is not an exception. Data validity is the degree to which the training dataset is purposeful for the model to extract feature information and use the knowledge to predict new test samples. Data augmentation is the first step to ensure data validity. The images are converted to a particular standardized size before the data augmentation phase. The heterogeneity feature is added to the dataset by augmentation techniques such as rotation, shear, and translation. In the data augmentation phase, rotation introduces generalization, shear adds deformations, and translation helps replicate spatial changes [25]. The data augmentation phase increases not only the sample size in the dataset but also the versatility of the images in the dataset.

c.
Super-resolution GAN

With the advent of deep learning techniques and image resolution enhancement tasks, traditional noise removal methodologies have become obsolete. In this research study, super-resolution GAN network concomitantly enhances the image resolution for further processing in the subsequent phases of the proposed model. SRGAN leverages the GAN model to generate high-resolution images from noisy low-resolution images. The handwritten images are photographed in a moderate illumination condition in the classroom. It may include shadows and slight blurring due to manual acquisition procedures. In medical diagnosis, SRGAN generates high-resolution medical images to accurately diagnose diseases. The interpolation methods initially adopted for single-image super resolution (SISR) have drawbacks of losing fine details in high frequency during upsampling, low convergence speed, and high computation cost [26]. The machine learning methods adopted for image processing demand robust optimization techniques that optimize the parameters. The drawbacks of manual parameter setting and the selection of optimal parameters drew on deep learning technologies. GAN is an AI framework that includes the generator and discriminator. The generator generates the samples from the given input and the discriminator validates the truth of the samples; the model terminates at the point where the generator and discriminator strike a balance. In this research study, the SRGAN model is implemented to improve the resolution of the images. SRGAN concurrently trains the generator and discriminator to produce synthetic data and distinguish data [27]. Figure 2 shows the architecture of SRGAN.

Figure 2:

Architecture of SRGAN [28]. SRGAN, Super-Resolution Generative Adversarial Network.

d.
Architecture of SRGAN

Real-time images are affected by factors such as background lighting, the position of the object, and finally noise induced by the camera [29]. The applications require the images to be clean and denoised from lighting changes. GAN generates images that are of high resolution from low-resolution input images. High-level information can be retrieved from the high-resolution images generated by GAN. In conventional GANs and conditional GANs, the noise and conditions are fed as inputs, whereas in the SRGAN the low-resolution images are fed as input to the generator neural network. GAN works on the concept of a generator generating novel datasets. The data training performed by SRGAN produces a new set of images with features the same as the input dataset but with high resolution. In the SRGAN, the generator network and discriminator network work against each other to learn the dataset. The architecture of the generator network includes the first layer of convolution and PReLU followed by a middle layer of residual blocks and finally by the last layer.

d.i
Convolution layer

The SRGAN has an initial block that contains the convolution layer and PReLU activation function. The input images are passed through the convolution layer to extract the feature map from the images. The feature extraction is performed by the convolution layer followed by the activation layer. The convolution layer uses a 3 × 3 kernel for generating the feature map. PReLU is a variation of ReLU activation function. ReLU eliminates vanishing gradient problem but cannot deal with negative inputs. The neurons receiving negative inputs lose its discriminating ability by giving zero as the output. In Leaky ReLU, the neurons receiving negative inputs are preserved without entering an inactive state with small gradient value. PReLU parameterizes the slope in negative range, and thus alpha becomes a learnable parameter [30].

d.ii
Residual blocks

The proposed framework of SRGAN uses ResNet block to extract deep features from the handwritten images. The residual block in the network retains the information in the previous layers and enables the network to use more features adaptively. The residual block contains a repeated block with a convolution layer followed by batch normalization and PReLU activation layer. The features of each block are concatenated with the features of previous blocks. The last residual block has no activation function and uses the kernel size of three to overcome the blurring effect. Adversarial strategy is used to train a model to improve the resolution of the image. The deep features extracted are then upscaled for further analysis.

d.iii
Upscaling of image with pixel shuffle

The images are upscaled by a scale factor of 2 to calculate the features. The first residual block has a convolution layer followed by PReLU. The second residual block scales the image by factor 2 and applies PReLU. In the last unit, the output is scaled from [−1, 1] to [0, 255]. The model is generated after upscaling the images. In the last step, there are four upscaled feature maps. The convolution layer maps back the 4× upscaled features to the output image.

d.iv
Discriminator network

In the proposed model, the discriminator network consists of four convolutional layers, with kernels increasing from 64 to 512, and a single dense layer at the tail end. Each convolutional layer is followed by the batch normalization layer combined with the leaky ReLU layer. Using skip connection strongly helps in avoiding the vanishing gradient problem [31]. The feature maps are processed with global average pooling function followed by the leaky ReLU activation function. The fully connected layer is the dense layer with sigmoid as the activation function.

d.v
Algorithm for predicting classes for dyslexia handwritten images
  • 1.

    Apply data augmentation on the low-resolution images in the dataset D.

  • 2.

    Define generator parameters—scaling factor, feature map, and residual blocks.

    • 2.1.

      Pass Input through convolution layer + PReLU activation layer

    • 2.2.

      Construct residual layer-1

      • 2.2.1.

        Convolution layer

      • 2.2.2.

        Batch normalization

      • 2.2.3.

        PReLU layer

      • 2.2.4.

        Conv2D

      • 2.2.5.

        Batch normalization

    • 2.3.

      Add input X to the residual block

    • 2.4.

      For i = 1 to n <where n is number of residual blocks>

      • 2.4.1.

        Pass input X through the residual layer-2 with

        • 2.4.1.1.

          Conv2D

        • 2.4.1.2.

          Batch normalization

        • 2.4.1.3.

          PReLU layer

        • 2.4.1.4.

          Conv2D

        • 2.4.1.5.

          Batch normalization

    • 2.5.

      Pass through the last residual layer

      • 2.5.1.

        Add convolution layer

      • 2.5.2.

        Add batch normalization

      • 2.5.3.

        Add the input

    • 2.6.

      Upscale the image

      • 2.6.1.

        Conv2D layer with scaling factor specified as two

      • 2.6.2.

        Rearrange the pixels of the image with tensor flow function

      • 2.6.3.

        PReLU layer

    • 2.7.

      Scale the output to [0, 255]

      • 2.7.1.

        Conv2D with activation function tanh

      • 2.7.2.

        Perform rescaling

      • 2.8.

        Create the discriminator model

        • 2.8.1.

          Define discriminator initialize input layer

        • 2.8.2.

          Conv2D

        • 2.8.3.

          Batch normalization

        • 2.8.4.

          Leaky ReLU as activation function

        • 2.8.5.

          for i = 1 to m <where m is the number of discriminators>

          • 2.8.5.1.

            First discriminator layer

          • 2.8.5.2.

            Second discriminator layer

      • 2.8.6.

        GlobalAvgPool2D (x)

      • 2.8.7.

        Leaky ReLU as activation function

      • 2.8.8.

        Fully connected dense layer with sigmoid activation function

  • 3.

    Generate discriminator

  • 4.

    Train the model

  • 5.

    Implement the model

  • 6.

    Create dataset D’ with the enhanced SRGAN output images

  • 7.

    The dataset Ds has all the alphabets in English classified as Normal, Corrected, and Reversed

  • 8.

    Train the model M on the dataset Ds with the custom-built model architecture given as Figure 3

  • 9.

    Test the model M with the test data of 150 images from the dataset D

  • 10.

    Record the model prediction scores as the results of experimental study.

Figure 3:

Flowchart for the proposed model to predict classes for the dyslexia handwriting images. SRGAN, Super-Resolution Generative Adversarial Network.

The algorithm is graphically represented as Figures 3 and 4. Figure 3 shows the flowchart for predicting classes for dyslexia test data images, and Figure 4 exhibits the GAN model constructed for generating high-resolution images. There is a need to address the issue of data convergence in any processing environment. Especially in an IoT environment, the data convergence can be achieved by extracting versatile data from various data sources and making it available to the high-level systems [32].

d.vi
Classification model

Character recognition is a classification task based on the text features. The feature extraction from images, the vital phase in handwriting recognition, is performed by deep learning techniques. This study aims to design a learning model for classifying images. The dataset built on handwritten images collected from the schools is very limited in number. Deep learning models perform well when there is a voluminous dataset for training the model. The volume of the dataset directly impacts the model performance [33]. The unavailability of a public dataset of handwritten images of dyslexic students is the main challenge that is faced in this research study. Collecting large databases of images is time-consuming and tedious. This hurdle was surpassed with the concept of transfer learning. Transfer learning mitigates the computation cost and the requirement for a standard large dataset. Transfer learning paves the way to solve interesting AI problems as learning is a continuous process. Deep transfer learning allows the knowledge gained from the source model or dataset to another target model or dataset that may be strongly correlated. DTLs are categorized as transductive, inductive, and unsupervised. When the source data alone are labeled and the target data are unlabeled, then it is called transductive DTL; if both the source and target data are labeled, then it is called inductive DTL; and the non-labeled source and target data are unsupervised DTL. In deep transfer learning the source and target datasets can be different, whereas in supervised learning both the target and source data must be extracted from the same dataset. This research study adopts DTL for the classification of dyslexia handwriting data. DTL deals with heterogeneous and homogeneous datasets based on the source and target dataset nature. In this study, the source and target datasets are handwritten images; hence, this research study falls under the homogeneous dataset category. Transfer learning is a machine-learning approach that allows preexisting training data to solve the current problem. Transfer learning utilizes existing pretrained models rather than building models from scratch. Transfer learning also transfers the dataset between models. Data transfer between models is applicable only when the source and target domains are related [34]. The feature space χ in the domain D is {x1, x2, x3,... xn ...∞}. Supervised task in Y label space is L = Y, fs. The objective function fs can be learned from the training data {xsi, ysi}, where xsiX and ysiY. The predictive function trains the data x value for feature set values { y1, y2, y3,... yn }. If the source domain is Ds with learning task Ls and the target domain is DT with learning task LT then transfer learning is the process of improving the predictive function fs using the knowledge from both domains [35]. This proposed methodology uses model-based transfer learning. The model has knowledge about the source as well as target domain. The model parameters are shared by both the domains.

IV.
Experimental results

The proposed methodology was implemented in Python, and the model performance was experimentally evaluated and analyzed. The model was trained on the dataset available online [36]. The model summary is given as a screenshot in Figure 3. The dataset has three classes: Normal, Corrected, and Reversed. The model summary is showcased in Figure 3. The input to the model is images of size 29 × 29. The first layer is a Conv2D layer that has 16 filters, each of kernel size 3 × 3 with ReLU as activation function. The consecutive convolution layers are padded with the MaxPooling2D layer. The second, third, and fourth convolution layers have 16, 32, and 64 filters with ReLU as activation function. The convolutional layers are followed by the flattened layer. The last block has a dense layer with 128 units with ReLU as activation function. In the last dense layer, the number of units is set as three corresponding to the number of classes in the dataset. The optimizer used is Adam, and the loss function is Sparse Categorical Crossentropy that computes the cross-entropy loss between the labels and predictions. This loss function is used only when there are multiple classes, and target data are integers in the dataset. In this dataset, the target data is one of the classes Normal, Corrected, or Reversed. The test data is a set of 120 images with 75 handwritten normal images and 45 dyslexia images. The mean accuracy of the images is 92.53%. The accuracy of sample set of 20 images is recorded in Table 1. This proposed methodology is a novel one where the model is trained with a public dataset of alphabets but tested with both the public dataset of alphabets as well as the handwritten images of students from normal and special schools. Figure 5 shows the proposed classifier model summary.

Figure 4:

Flowchart for SRGAN module generating high-resolution images. SRGAN, Super-Resolution Generative Adversarial Network.

Figure 5:

Proposed classifier model summary.

Table 1:

Prediction scores for randomly selected sample images from the test dataset

ImageClassPrediction score, %
ImgID1001Normal99.81
Img ID1002Normal99.86
Img ID1003Normal99.82
Img ID1004Normal93.46
Img ID1005Normal98.72
Img ID1006Normal72.76
Img ID1007Normal95.85
Img ID1008Normal88.95
Img ID1009Reversal87.85
Img ID10010Normal93.72
Img ID10011Reversal89.20
Img ID10012Normal88.88
Img ID10013Normal80.93
Img ID10014Normal99.33
Img ID10015Reversal92.73
Img ID10016Normal94.19
Img ID10017Reversal89.24
Img ID10018Normal97.89
Img ID10019Normal95.63
Img ID10020Corrected98.03

The alphabet dataset shows high accuracy as the model is trained with similar images, whereas the handwritten image collected dataset shows reduced accuracy compared to the alphabet dataset. This reduction in accuracy is due to real-time images that are acquired from varying illumination conditions and varied handwriting styles. Figure 6.1 shows the training versus validation accuracy graph for alphabet dataset. The smooth learning curve shows that the learning of training dataset is consistent. The validation curve shows a slight zig-zag pattern, but it is natural for ML algorithms to have slight zig-zag patterns for learning validation data. If the data is noisy, then the zig-zag pattern is found to have sharp changes. Table 1 records the prediction scores of randomly selected sample set of images from the dataset of handwritten images. Figure 6.2 shows a very smooth learning curve for the alphabet dataset, indicating that the model performs extremely well for the test dataset.

Figure 6.1

Training versus validation accuracy for alphabet dataset.

Figure 6.2

Testing accuracy for alphabet dataset.

The dataset includes normal handwriting images from the NIST 19 database [23] and a self-collected dyslexia dataset. A questionnaire was prepared and shared with parents of children studying in special schools and other schools in Al Kharj, Riyadh, and Coimbatore. The questionnaire was carefully designed after diligent analysis of the biomarkers of dyslexia. Several medical websites and learning aid tools were studied to shortlist the questions in the questionnaire. The questionnaire has 20 questions divided into three categories. The first category is related to the child’s details like age, gender, and nationality. The second category includes questions on the child’s cognitive skills and academic performance. The third category of questions is related to the fine and gross motor skills of the children. The last set of questions is about the mother’s medical history. The questions from q1 to q3 are about the child’s age, gender, and nationality. The questions from q4 to q8 are about the child’s reading ability; q9 mainly focuses on logical skills; q10 and q11 are about child’s fine motor skills; q12 is about assessing if the child is attention deficit which is closely related to dyslexia; q13 and q14 are to discover the child’s talent; and q15 and q16 collect the mother’s medical history. The question q17 is the response variable that indicates that the child is suffering from dyslexia. All these carefully chosen questions form a labeled dataset with variables from q1 to q16 marked as biomarkers and q17 as the target variable indicating the presence or absence of dyslexia in children. The dataset has “yes” or “no” values for the feature variables in this study. The questions q1 to q16 are given in Table 2.

Table 2:

Questions in the questionnaire form collected for exploratory data analysis

S. No.Questions
Q1Age
Q2Gender
Q3Nationality
Q4Does your child frequently reverse letters or miss out some letters when writing especially letters like C, D, E, and F?
Q5Does your child have difficulty sounding out words phonetically, particularly new words?
Q6Does your child suffer copying notes from the board? Example: missing information, misspelling, or leaving out lines while copying
Q7Do you feel your child is academically doing well?
Q8Do you feel your child lacks reading ability?
Q9Does your child struggle with solving mathematical word problems?
Q10Does your child have difficulty understanding multiple instructions given by anyone like tying shoelace?
Q11Does your child have difficulty with fine or gross motor skills (handwriting, holding a pencil, catching and throwing a ball, team sports)?
Q12Does your child suffer from focusing for a long time on one activity? Do you feel he is attention deficit or lacks attention?
Q13Is the child creative compared to other children?
Q14Does your child have extraordinary memory?
Q15Was the child born as premature baby?
Q16Was the mother under medication for any neurological or other disease/disorders?
Q17Does your child suffer from dyslexia?
V.
Exploratory data analysis

The dataset QBDysData collected using the questionnaire is analyzed to unearth significant correlation between the features and the target variable in the dataset. This study shows an attempt to visualize the data distributions using a graphical method. The best visualization tool for the data under study is the scatterplot [37]. The study analyzes the data distribution for one or more features with the response variable. In each graph, the grouping variables represent a subset of the features with unique distributions to the response variable. In each graph, the grouping variables are carefully chosen to show the data distribution for the response variable. All the graphs show the response variable as the x-axis and the grouping variable along the other two axes. To maintain uniformity, the graphs are plotted as a scatterplot. In Figure 4, the grouping variables are q1 and q4. The graph in Figure 7.1 shows that in all age groups above the age of 3 years, the children with dyslexia are found to have issues with reversing letters like C, D, E, and F. Non-dyslexic children in the given sample set do not show symptoms of writing letters in reverse format. Hence, it proves that children suffering from dyslexia tend to write letters as a mirror image while copying or writing notes. In Figure 7.2, the grouping variables are q4 and q7, while the response variable is q17. Some children in the sample irrespective of dyslexia condition show poor academic performance, but the unique pattern of reversal of letters is found only in dyslexic children. One of the main symptoms of dyslexia is difficulty in copying notes from the board correctly without spelling mistakes. Figure 7.3 indicates that the children of all age groups above 5 years with dyslexia struggle to copy notes from the board, whereas it is a simple task for the non-dyslexic children above the age of 5 years to copy notes from the board. In Figure 7.4, the grouping variables are q7 and q9 plotted against the variable q17. The dyslexic children who perform academically well also suffer in solving mathematical problems. They may excel in other streams like science, art, and history, but mathematics is a challenge to dyslexic children. Most of the non-dyslexic children perform well in mathematics and excel academically in other subjects too. Figure 7.4 indicates that dyslexic children need special help to perform well in school. The graph shows that there is an inevitable need to diagnose dyslexia at an early stage and help the children improve their problem-solving skills and in turn improve their academic performance. In Figure 7.5, the ability to perform well academically and to follow multiple instructions is listed as grouping variables. The dyslexic children in the age group of 3–15 years find it hard to follow the sequence of instructions, whereas non-dyslexic children show no such difficulty. This is an important biomarker for children suffering from dyslexia. Figure 7.5 shows that almost all the dyslexic children in the sample set irrespective of age group find it a big challenge to tie a shoelace which is a day-to-day task for a non-dyslexic child. In Figure 7.6, the age and fine motor skills are grouping variables. In this sample set, it is difficult to conclude that fine motor skills are poor in dyslexic children. However, the non-dyslexic children have no issues with fine or gross motor skills. In Figure 7.7, the grouping variables are reading ability and focusing issues. The figure indicates that dyslexic children not only have a short attention span but also lack reading ability. Focusing on a particular activity for a period is challenging for non-dyslexic children as well as for dyslexic children irrespective of their reading ability. In Figure 7.8, the group variables are creativity and age group. This sample set does not indicate a strong correlation between creativity and dyslexia. Very few children diagnosed with dyslexia are found to be creative. In Figure 7.9, the grouping variables are age factor and possessing an extraordinary memory. The figure indicates a weak correlation between extraordinary memory and the response variable. Figure 7.10 shows the gender-wise distribution of dyslexic and non-dyslexic children in all age groups. In a nutshell, the findings of the data exploration phase match the research findings about the behavior of dyslexia children. This phase shows a strong correlation between dyslexia condition and writing letters as mirror images, difficulty tying shoelace, poor fine motor skills in younger age group, having a short attention span, and a lack of reading ability. This data exploration phase also unearths the fact that there is a poor correlation between creativity, extraordinary memory, mother’s medical history, and premature birth of the child, with the child being affected by dyslexia.

Figure 7.1

Visualization of correlation of letter reversal to dyslexia among all age groups.

Figure 7.2

Visualization of correlation of letter reversal and academic performance to dyslexia.

Figure 7.3

Visualization of correlation of difficulty in copying from board accurately to dyslexia in all age groups.

Figure 7.4

Visualization of correlation of mathematical skills and academic performance to dyslexia.

Figure 7.5

Visualization of correlation of following multiple instructions to dyslexia in all age groups.

Figure 7.6

Visualization of correlation of fine motor skills to dyslexia in all age groups.

Figure 7.7

Visualization of correlation of attention deficit and reading ability to dyslexia.

Figure 7.8

Visualization of correlation of creativity to dyslexia in all age groups.

Figure 7.9

Visualization of correlation of extraordinary memory to dyslexia in all age groups.

Figure 7.10

Distribution of gender and age groups for dyslexia and non-dyslexia children.

VI.
Conclusion

Children with dyslexia often experience behavioral issues stemming from frustration when they struggle with tasks that their peers can perform with ease. These challenges include difficulty identifying letters and numbers, confusion between left and right, a short attention span, and a tendency to reverse letters. All of which can lead to poor academic performance and behavioral problems. Early detection of dyslexia is crucial, as it allows professionals to intervene with targeted games and activities to alleviate these symptoms. The importance of early detection in improving academic performance cannot be overstated. Data show that most students with dyslexia in the sample set are above 10 years of age. This implies that parents will often overlook the disorder in its early stages and only seek specialized education at later years of schooling. This computer-assisted diagnosis tool offers an opportunity to identify dyslexia early, ensuring timely support. The computer-assisted tool developed detects dyslexia in children from the images of handwritten notes with customized deep learning model and transfer learning technique. The implemented model exhibits a mean accuracy of 92.53% in predicting the class of the images. This tool can be of great help to parents to detect dyslexia at an early age and provide professional help to their children. Children are the future of any nation; it is important to empower them to excel in their domain of interest despite their disabilities.

Language: English
Submitted on: Sep 10, 2024
Published on: May 16, 2025
Published by: Professor Subhas Chandra Mukhopadhyay
In partnership with: Paradigm Publishing Services
Publication frequency: 1 times per year

© 2025 Shabana Ziyad, May Altulyan, Munira Abdulaziz Al-Helal, Pradeep Kumar Singh, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.