Have a personal or library account? Click to login
Predicting AI job market dynamics: a data mining approach to machine learning career trends on glassdoor Cover

Predicting AI job market dynamics: a data mining approach to machine learning career trends on glassdoor

Open Access
|Jul 2025

Figures & Tables

Figure 1:

Methodology of proposed work. EDA, exploratory data analysis.
Methodology of proposed work. EDA, exploratory data analysis.

Figure 2:

Representation pre and post handling of outliers in current dataset.
Representation pre and post handling of outliers in current dataset.

Figure 3:

Count plot of attribute “Job Title.”
Count plot of attribute “Job Title.”

Figure 4:

Count plot of attribute “Company revenue” (UN-unknown).
Count plot of attribute “Company revenue” (UN-unknown).

Figure 5:

Number of jobs posted on different domains.
Number of jobs posted on different domains.

Figure 6:

Type of ownership—before and after trimming.
Type of ownership—before and after trimming.

Figure 7:

Top 15 job locations preferred by employees.
Top 15 job locations preferred by employees.

Figure 8:

Number of Jobs in various sectors.
Number of Jobs in various sectors.

Figure 9:

Plot based on Company_size.
Plot based on Company_size.

Figure 10:

Sample result obtained for predicting salary.
Sample result obtained for predicting salary.

Figure 11:

Sample result obtained for predicting job title.
Sample result obtained for predicting job title.

Figure 12:

Representation of performance metrics. MAE, mean absolute error; NRMSE, normalized root mean square error; RMSE, root mean squared error; SD, standard deviation.
Representation of performance metrics. MAE, mean absolute error; NRMSE, normalized root mean square error; RMSE, root mean squared error; SD, standard deviation.

Description of attributes present in dataset

S. No.Name of attributeDescription
1.Job title
  • The designation of the job being listed.

  • E.g., data scientist, data engineer, other, manager, Director, Machine Learning Engineer

2.Salary estimateThe estimated salary range for the job provided by Glassdoor/Employer
3.Job descriptionThe full description of the job, including roles, responsibilities, and qualifications
4.RatingRating of the company, from employee reviews on Glassdoor. Initial reviews range from −1 to 5
5.Company name
  • The name of the company offering the job.

  • E.g., IBM, New York (United States of America), Adobe, Microsoft etc.

6.Location
  • The location of the job

  • E.g., Remote (San Jose, CA, USA), (Atlanta, GA, USA)

7.Size
  • The number of employees at the company

  • E.g., 1,001–5,000 employees, 10,000+ employees

8.FoundedThe year the company was founded
9.Type of ownership
  • The ownership structure of the company

  • E.g., private, public, government

10.Industry
  • The specific industry the company operates in

  • E.g., Telecommunications Services, Chemical Manufacturing, Computer Hardware Development

11.Sector
  • The broader sector associated with the company’s operations

  • E.g., Education, Information Technology, Manufacturing

12.RevenueThe estimated annual revenue of the company in US$

Comparison of model performance

Model performanceAccuracyRMSENRMSER2MAESD
Random Forest0.98530.06460.19660.81330.01660.0646
Lasso0.87500.21030.64020.00610.08880.2103
LightGBM0.95590.44411.35200.53730.11600.4430
XGBoost0.99630.18191.31080.92240.11130.1816
Voting0.99630.06460.55010.92340.01170.1803

Work done by different researchers in similar domain

Ref. No.Methodology usedDomainDataset usedPerformance/outcome
[1]Linear regression, Lasso, random forestSalary prediction for Data Science JobKaggle—GlassdoorMAE: For random forest—11.22, for linear regression—18.86, for ridge regression—19.67
[2]SVMSkill based job recommendation systemJob portals, company websites, scraping data from other online sourcesAccuracy, precision, recall, and F1 score was calculated
[3]Bidirectional, decoder-encoder, stacked, Conv LSTMTrend analysis system to predict future job markets using historical dataWeb scraping, manually collecting data, government sourcesAccuracy: for bidirectional LSTM—95.71%, for decoder– encoder LSTM—91.56%, for stacked LSTM—87.24%, for Conv LSTM—83.7%
[4]NB, KNN, NBSTPredictive analysisStudent employment in the employment market of Chongqing S colleges and universities in the past 3 yearsMean value [test time (ms)]: NB—18.607, KNN—22.224, NBST—49.026
[5]MNB, SVM, DT, KNN, RFJob posting classificationKaggle, titled by “[real or fake] fake job posting prediction”For MNB 95.6%, for SVM 97.7%, for DT 97.4%, for KNN 97.8%, for 98.2%, for RF 98.2%
[6]LR, SVM, KNN, DT, RF, AdaBoost(DT), GB, voting classifier soft & hard, XGBoostCampus placement analyzer: Using supervised machine learning algorithmsTraining and placement department of MIT which consists of all the students of Bachelor of Engineering (B.E) from three different colleges of their campusAccuracy: Logistic Regression 58%, support vector machine 69%, KNN 63.22%, decision tree 69%, random forest 75.25%, AdaBoost(DT) 77%, gradient boosting 77%, voting classifier soft 69.11%, voting classifier hard 68.43%, XGBoost 78%
[7]Voting classifierEnsemble approach for classifying job positionsGlassdoor websiteFor voting classifier soft—100%
[8]NB, SGD, LR, KNN, RF classifierDetecting and preventing fake job offersKaggle—real/fake job posting predictionFor random forest classifier—97.48%
[9]NLP, KNNResume-based job recommendation system using NLP and deep learningCombined from multiple sourcesImproving the efficiency and success rate of the hiring process

Normalization of column—salary estimate

Original valueValue after normalization
–1116.0 (median)
$100 K–$151 K (Glassdoor est.)125.5
Employer provided salary: $100 K–$120 K110
Employer provided salary:$107 K107
Employer provided salary: $60.00 per hr140.4
Employer provided salary: $53.62–$64.58 per hr138.3
Language: English
Submitted on: Mar 17, 2025
Published on: Jul 11, 2025
Published by: Professor Subhas Chandra Mukhopadhyay
In partnership with: Paradigm Publishing Services
Publication frequency: 1 times per year

© 2025 Renuka Agrawal, Aditi Nayak, Preeti Hemnani, Barish Chetia, Ishaan Bhadrike, Jil Kapadia, Usha A. Jogalekar, Safa Hamdare, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.