Have a personal or library account? Click to login
Hyper-parameter optimization in neural-based translation systems: A case study Cover

Hyper-parameter optimization in neural-based translation systems: A case study

Open Access
|Sep 2023

Figures & Tables

Figure 1:

Schematic illustration of computation of ∂j for hidden unit j with the help of back propagation.
Schematic illustration of computation of ∂j for hidden unit j with the help of back propagation.

Figure 2:

The accuracy of deep learning models is higher, but their interpretability is low compared to other ML models.
The accuracy of deep learning models is higher, but their interpretability is low compared to other ML models.

Figure 3:

Schematic representation of bidirectional RNN (Source: Bahdanau et al. [26]).
Schematic representation of bidirectional RNN (Source: Bahdanau et al. [26]).

Figure 4:

RNN with global attention.
RNN with global attention.

Figure 5:

RNN with local attention.
RNN with local attention.

Figure 6:

Transformer model.
Transformer model.

Figure 7:

The generative adversarial network (GAN) model in the NMT case.
The generative adversarial network (GAN) model in the NMT case.

Figure 8:

Typical machine learning model building steps.
Typical machine learning model building steps.

Figure 9:

Schematic representation of the NMT model.
Schematic representation of the NMT model.

Figure 10:

Encoder–decoder-based NMT model.
Encoder–decoder-based NMT model.

Figure 11:

Training and validation accuracy with five epochs.
Training and validation accuracy with five epochs.

Figure 12:

There is a slight improvement in training and validation accuracy with increasing the number of epochs up to 10.
There is a slight improvement in training and validation accuracy with increasing the number of epochs up to 10.

Figure 13:

Graphical representation of training and validation accuracy with reduced units in different layers and up to five epochs.
Graphical representation of training and validation accuracy with reduced units in different layers and up to five epochs.

Figure 14:

Graphical representation of training and validation accuracy with reduced units per layer and increased number of epochs, i.e., up to 10.
Graphical representation of training and validation accuracy with reduced units per layer and increased number of epochs, i.e., up to 10.

Figure 15:

Snapshots of the English–Bangla parallel corpus collected from TDIL.
Snapshots of the English–Bangla parallel corpus collected from TDIL.

Figure 16:

BLEU scores produced by different NMT models for the first test data.
BLEU scores produced by different NMT models for the first test data.

Figure 17:

BLEU scores generated by different NMT models for the second test data.
BLEU scores generated by different NMT models for the second test data.

Figure 18:

BLEU score produced by various NMT models on third test data.
BLEU score produced by various NMT models on third test data.

WMT-14 English–German test results show that ADMIN outperforms the default base model 6L-6L in different automatic metrics (Liu et al_ [35])_

ModelParamTERMETEORBLEU
6L-6L Default61M54.446.627.6
6L-6L ADMIN61M54.146.727.7
60L-12LDefault256MDivergeDivergeDiverge
60L-12LADMIN256M51.848.330.1

Statistics of English to Bangla tourism corpus (text) collected from TDIL_

Corpus (English to Bangla)Size in terms of sentence pairs
Tourism11,976

NMT models with some other range of learning rate (hyper-parameter) (Lim et al_ [11])_

CellLearning ratero→en P100ro→en tro→en V100ro→en tde→en P100de→en tde→en V100de→en t
GRU0.034.476:2934.474:4332.299:4831.616:15
0.235.538:4835.436:2133.0318:4732.5519:40
0.335.3612:2135.157:2831.3610:1431.509:33
0.534.5012:2034.6717:1829.6411:0930.2111.09
LSTM0.034.846:2934.654:4632.8412:1732.887:37
0.234.278:1035.616:3433.1016:3333.8913:39
0.335.679:5635.3711:2933.4520.0233.5115:51
0.534.5015:1334.3312:4532.6720.0232.2013.03

Training and validation accuracy of our model with five epochs_

EpochsTraining accuracyValidation accuracy
10.94260.9698
20.97300.9708
30.97920.9776
40.98290.9726
50.98590.9762

Translations generated by Google and Bing_

TranslatorsLanguage pairBLEU
GoogleEnglish ⇒ Bangla (1st sentence)36.84
English ⇒ Bangla (2nd sentence)6.42
English ⇒ Bangla (3rd sentence)4.52
BingEnglish ⇒ Bangla (1st sentence)36.11
English ⇒ Bangla (2nd sentence)6.01
English ⇒ Bangla (3rd sentence)4.05

Training and validation accuracy with 100 units in different layers with 10 epochs_

EpochsTraining accuracyValidation accuracy
10.92930.9631
20.96740.9730
30.97630.9751
40.98070.9729
50.98290.9724
60.98520.9780
70.98820.9773
80.98900.9756
90.99080.9784
100.99130.9793

Generic hyper-parameters in NMT-based model_

ModelType of MTHyper-parameters
Deep learning modelsNMTHidden layers, learning rate, activation function, epochs, batch size, dropout, regularization

Models per data set and their best BLEU scores and respective hyper-parameter configurations (Zhang and Duh [36])_

Data setNo. of modelsBest BLEUBPENo. of layersNo. of embeddingNo. of hidden layersNo. of attention headsInit-lr
Chinese–English11814.6630k45121024163e-4
Russian–English17620.2310k4256204883e-4
Japanese–English15016.4130k4512204883e-4
English–Japanese16820.7410k41024204883e-4
Swahili–English76726.091k2256102486e-4
Somali–English60411.238k2512102483e-4

Training and validation accuracy of our model with a higher number of epochs_

EpochsTraining accuracyValidation accuracy
10.94310.9606
20.97420.9729
30.97960.9777
40.98350.9748
50.98650.9794
60.98720.9802
70.98960.9830
80.98980.9782
90.99160.9764
100.99240.9799

WMT-14 English–French test results showed that 60L-12L ADMIN outperforms the default base model 6L-6L in different automatic metrics (Liu et al_ [35])_

ModelParamTERMETEORBLEU
6L-6L Default67M42.260.541.3
6L-6L ADMIN67M41.860.741.5
60L-12LDefault262MDivergeDivergeDiverge
60L-12LADMIN262M40.362.443.8

MT models for different language pairs in a GPU-based single-node and multiple-node environment with a wider range of hyper-parameters and their BLEU scores (Lim et al_ [11])_

CellLearning ratero→en P100ro→en V100en→ro P100en→ro V100de→en P100de→en V100en→de P100en→de V100
GRUle-335.5335.4319.1919.2828.0027.8420.4320.61
5e-334.3734.0519.0719.1626.0522.16N/A19.01
le-435.4735.4619.4519.4927.3727.81Dnf21.41
LSTMle-334.2735.6119.2919.6428.6228.8321.7021.69
5e-335.0534.9919.4819.43N/A24.3618.5318.01
le-435.4135.2819.4319.48N/A28.50DnfDnf
GRUle-334.2234.1719.4219.4333.0332.5526.5526.85
5e-333.1332.7419.3118.9731.0426.76N/A26.02
le-433.6734.4418.9819.6933.1533.12Dnf28.43
LSTMle-333.1033.9519.5619.0833.1033.8928.7928.84
5e-333.1033.5219.1319.51N/A29.1624.1224.12
le-433.2932.9219.1419.23N/A33.44DnfDnf

Training and validation accuracy with 100 units in different layers with five epochs_

EpochsTraining accuracyValidation accuracy
10.92890.9584
20.96740.9671
30.97580.9734
40.98000.9739
50.98360.9772

Performance of BiLSTM, Google Translate, and Bing in terms of the automatic metric BLEU_

ModelHyper-parameterBLEU score
BiLSTM (for English to Bangla; 1st sentence)Optimizer = Adam;4.1
BiLSTM (for English to Bangla; 2nd sentence)Learning rate = 0.001;3.2
BiLSTM (for English to Bangla; 3rd sentence)No. of encoder and decoder layers = 63.01
Language: English
Submitted on: Feb 22, 2023
Published on: Sep 25, 2023
Published by: Professor Subhas Chandra Mukhopadhyay
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2023 Goutam Datta, Nisheeth Joshi, Kusum Gupta, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.