Early Warning System for Debt Group Migration: The Case of One Commercial Bank in Vietnam

Figures & Tables

-	The event occurred	The event did not occur
There is a warning signal	A	B
There is no warning signal	C	D

Model	Parameter	Description
LG	None	The baseline model is a linear regression model combined with the sigmoid (logit) activation function, so no tuning is required
SVM	Kernel function	The activation function used to transform data into a different feature space for linear separation includes Linear, Polynomial, Sigmoid, and RBF
	C	The coefficient for balancing the weight between distance and noise
	d	The degree parameter when using the Polynomial kernel, which takes a natural number value
	γ	The gamma parameter for Polynomial, Sigmoid, and RBF kernels, which takes a non-negative value
	r	The intercept for the Polynomial and Sigmoid kernels
DT	Depth	It is necessary to limit the depth of the DT to avoid overfitting and reduce computational cost
DT	Number of leaf nodes	It is necessary to limit the number of leaf nodes of the DT to avoid overfitting and reduce computational cost
RF	Depth	It is necessary to limit the depth of each DT to avoid overfitting and reduce computational cost
	Number of leaf nodes	It is necessary to limit the number of leaf nodes of each DT to avoid overfitting and reduce computational cost
	Number of DTs	The number of DTs in Random Forest Classifier (RF) needs to be considered for computational cost when the number is too high

Target variable	Predicted: 1	Predicted: 0	Total
Actual: 1	TP: True positives	FN: False negatives	P
Actual: 0	FP: False positives	TN: True negatives	N
Total	P^ ${\rm{\hat P}}$	N^ ${\rm{\hat N}}$	P + N

Customer	Model	Selection	Parameters
B Score	Best	RF tuned by MCC	n_estimators = 100;max_depth = 20;max_leaf_node = None
	Best	SVM tuned by F-Recall	kernel = ‘sigmoid’;C = 0.1;gamma = 0.01
	Second best	RF tuned by F-Recall	n_estimators = 100;max_depth = 16;max_leaf_node = None
C Score	Best	SVM tuned by MCC	kernel = ‘poly’;degree = 4;C = 0.01gamma = 0.1
C Score	Best	SVM tuned by F-Precision	kernel = ‘poly’;degree = 4;C = 0.1gamma = 0.01

Model	Parameter	Parameters in Scikit-learn	Range of values for tuning
LG	None	None	None
SVM	Kernel function	kernel: accepts a value from ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’. The default value is ‘rbf’	‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’
	C	C: data type is float; the default value is 1	0.01, 0.1, 1, 10
	d	degree: data type is integer; the default value is 3	2, 3, 4, 5 (this is applicable only when kernel is set to ‘poly’)
	γ	Gamma: accepts a value from ‘scale’, ‘auto’. The default value is ‘scale’. It can also be specified as a non-negative float	0.01, 0.1, 1, 10 (not applicable when kernel is set to ‘linear’)
DT	Depth	max_depth: data type is integer or none. The default value is none, which means the tree is expanded until the maximum depth is reached	The range from 2 to 21 (with a step size of 2) and none
DT	Number of leaf nodes	max_leaf_nodes: data type is integer or None. The default value is none, which means an unlimited number of leaf nodes will be developed, regardless of max_depth	The range from 2 to 21 (with a step size of 2) and none
RF	Depth	Similar to DT	Similar to DT
	Number of leaf nodes	Similar to DT	Similar to DT
	Number of DTs	n_estimators, data type is integer. The default value is 100	10, 50, 100