Skip to main content
Have a personal or library account? Click to login
Hyper-parameter optimization in neural-based translation systems: A case study Cover

Hyper-parameter optimization in neural-based translation systems: A case study

Open Access
|Sep 2023

Figures & Tables

Figure 1:

Schematic illustration of computation of ∂j for hidden unit j with the help of back propagation.

Figure 2:

The accuracy of deep learning models is higher, but their interpretability is low compared to other ML models.

Figure 3:

Schematic representation of bidirectional RNN (Source: Bahdanau et al. [26]).

Figure 4:

RNN with global attention.

Figure 5:

RNN with local attention.

Figure 6:

Transformer model.

Figure 7:

The generative adversarial network (GAN) model in the NMT case.

Figure 8:

Typical machine learning model building steps.

Figure 9:

Schematic representation of the NMT model.

Figure 10:

Encoder–decoder-based NMT model.

Figure 11:

Training and validation accuracy with five epochs.

Figure 12:

There is a slight improvement in training and validation accuracy with increasing the number of epochs up to 10.

Figure 13:

Graphical representation of training and validation accuracy with reduced units in different layers and up to five epochs.

Figure 14:

Graphical representation of training and validation accuracy with reduced units per layer and increased number of epochs, i.e., up to 10.

Figure 15:

Snapshots of the English–Bangla parallel corpus collected from TDIL.

Figure 16:

BLEU scores produced by different NMT models for the first test data.

Figure 17:

BLEU scores generated by different NMT models for the second test data.

Figure 18:

BLEU score produced by various NMT models on third test data.

WMT-14 English–German test results show that ADMIN outperforms the default base model 6L-6L in different automatic metrics (Liu et al_ [35])_

ModelParamTERMETEORBLEU
6L-6L Default61M54.446.627.6
6L-6L ADMIN61M54.146.727.7
60L-12LDefault256MDivergeDivergeDiverge
60L-12LADMIN256M51.848.330.1

Statistics of English to Bangla tourism corpus (text) collected from TDIL_

Corpus (English to Bangla)Size in terms of sentence pairs
Tourism11,976

NMT models with some other range of learning rate (hyper-parameter) (Lim et al_ [11])_

CellLearning ratero→en P100ro→en tro→en V100ro→en tde→en P100de→en tde→en V100de→en t
GRU0.034.476:2934.474:4332.299:4831.616:15
0.235.538:4835.436:2133.0318:4732.5519:40
0.335.3612:2135.157:2831.3610:1431.509:33
0.534.5012:2034.6717:1829.6411:0930.2111.09
LSTM0.034.846:2934.654:4632.8412:1732.887:37
0.234.278:1035.616:3433.1016:3333.8913:39
0.335.679:5635.3711:2933.4520.0233.5115:51
0.534.5015:1334.3312:4532.6720.0232.2013.03

Training and validation accuracy of our model with five epochs_

EpochsTraining accuracyValidation accuracy
10.94260.9698
20.97300.9708
30.97920.9776
40.98290.9726
50.98590.9762

Translations generated by Google and Bing_

TranslatorsLanguage pairBLEU
GoogleEnglish ⇒ Bangla (1st sentence)36.84
English ⇒ Bangla (2nd sentence)6.42
English ⇒ Bangla (3rd sentence)4.52
BingEnglish ⇒ Bangla (1st sentence)36.11
English ⇒ Bangla (2nd sentence)6.01
English ⇒ Bangla (3rd sentence)4.05

Training and validation accuracy with 100 units in different layers with 10 epochs_

EpochsTraining accuracyValidation accuracy
10.92930.9631
20.96740.9730
30.97630.9751
40.98070.9729
50.98290.9724
60.98520.9780
70.98820.9773
80.98900.9756
90.99080.9784
100.99130.9793

Generic hyper-parameters in NMT-based model_

ModelType of MTHyper-parameters
Deep learning modelsNMTHidden layers, learning rate, activation function, epochs, batch size, dropout, regularization

Models per data set and their best BLEU scores and respective hyper-parameter configurations (Zhang and Duh [36])_

Data setNo. of modelsBest BLEUBPENo. of layersNo. of embeddingNo. of hidden layersNo. of attention headsInit-lr
Chinese–English11814.6630k45121024163e-4
Russian–English17620.2310k4256204883e-4
Japanese–English15016.4130k4512204883e-4
English–Japanese16820.7410k41024204883e-4
Swahili–English76726.091k2256102486e-4
Somali–English60411.238k2512102483e-4

Training and validation accuracy of our model with a higher number of epochs_

EpochsTraining accuracyValidation accuracy
10.94310.9606
20.97420.9729
30.97960.9777
40.98350.9748
50.98650.9794
60.98720.9802
70.98960.9830
80.98980.9782
90.99160.9764
100.99240.9799

WMT-14 English–French test results showed that 60L-12L ADMIN outperforms the default base model 6L-6L in different automatic metrics (Liu et al_ [35])_

ModelParamTERMETEORBLEU
6L-6L Default67M42.260.541.3
6L-6L ADMIN67M41.860.741.5
60L-12LDefault262MDivergeDivergeDiverge
60L-12LADMIN262M40.362.443.8

MT models for different language pairs in a GPU-based single-node and multiple-node environment with a wider range of hyper-parameters and their BLEU scores (Lim et al_ [11])_

CellLearning ratero→en P100ro→en V100en→ro P100en→ro V100de→en P100de→en V100en→de P100en→de V100
GRUle-335.5335.4319.1919.2828.0027.8420.4320.61
5e-334.3734.0519.0719.1626.0522.16N/A19.01
le-435.4735.4619.4519.4927.3727.81Dnf21.41
LSTMle-334.2735.6119.2919.6428.6228.8321.7021.69
5e-335.0534.9919.4819.43N/A24.3618.5318.01
le-435.4135.2819.4319.48N/A28.50DnfDnf
GRUle-334.2234.1719.4219.4333.0332.5526.5526.85
5e-333.1332.7419.3118.9731.0426.76N/A26.02
le-433.6734.4418.9819.6933.1533.12Dnf28.43
LSTMle-333.1033.9519.5619.0833.1033.8928.7928.84
5e-333.1033.5219.1319.51N/A29.1624.1224.12
le-433.2932.9219.1419.23N/A33.44DnfDnf

Training and validation accuracy with 100 units in different layers with five epochs_

EpochsTraining accuracyValidation accuracy
10.92890.9584
20.96740.9671
30.97580.9734
40.98000.9739
50.98360.9772

Performance of BiLSTM, Google Translate, and Bing in terms of the automatic metric BLEU_

ModelHyper-parameterBLEU score
BiLSTM (for English to Bangla; 1st sentence)Optimizer = Adam;4.1
BiLSTM (for English to Bangla; 2nd sentence)Learning rate = 0.001;3.2
BiLSTM (for English to Bangla; 3rd sentence)No. of encoder and decoder layers = 63.01
Language: English
Submitted on: Feb 22, 2023
Published on: Sep 25, 2023
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2023 Goutam Datta, Nisheeth Joshi, Kusum Gupta, published by Macquarie University, Australia
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.