5.2.2 Notebook
Last updated
Was this helpful?
Last updated
Was this helpful?
The notebook below is an example by using the models and methods from the suggestion lists. The score tables and the radar charts are the results of using the evaluation system. The source code of this notebook is as below:
H2O cluster uptime:
02 secs
H2O cluster timezone:
America/New_York
H2O data parsing timezone:
UTC
H2O cluster version:
3.24.0.2
H2O cluster version age:
1 year, 3 months and 23 days !!!
H2O cluster name:
H2O_from_python_guruisi_r1jxcs
H2O cluster total nodes:
1
H2O cluster free memory:
7.111 Gb
H2O cluster total cores:
4
H2O cluster allowed cores:
4
H2O cluster status:
accepting new members, healthy
H2O connection url:
http://127.0.0.1:54321
H2O connection proxy:
None
H2O internal security:
False
H2O API Extensions:
Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4
Python version:
3.7.4 final
CustomerID
Gender
Senior Citizen
Partner
Dependents
Tenure
Phone Service
Multiple Lines
Internet Service
Online Security
Online Backup
Device Protection
Tech Support
Streaming TV
Streaming Movies
Contract
Paperless Billing
Payment Method
Monthly Charges
Total Charges
Churn
Id
7590-VHVEG
Female
0
Yes
No
1
No
No phone service
DSL
No
Yes
No
No
No
No
Month-to-month
Yes
Electronic check
29.85
29.85
No
0
5575-GNVDE
Male
0
No
No
34
Yes
No
DSL
Yes
No
Yes
No
No
No
One year
No
Mailed check
56.95
1889.5
No
1
3668-QPYBK
Male
0
No
No
2
Yes
No
DSL
Yes
Yes
No
No
No
No
Month-to-month
Yes
Mailed check
53.85
108.15
Yes
2
7795-CFOCW
Male
0
No
No
45
No
No phone service
DSL
Yes
No
Yes
Yes
No
No
One year
No
Bank transfer (automatic)
42.3
1840.75
No
3
9237-HQITU
Female
0
No
No
2
Yes
No
Fiber optic
No
No
No
No
No
No
Month-to-month
Yes
Electronic check
70.7
151.65
Yes
4
Finding the missing data. The missing data only takes 0.14% of the whole dataset, therefore, dropping the missing data is our solution.
Separating the target column and numerical columns. Target : 'Churn' Non-feature: 'CustomerID' & 'Id'
What Is Churn Rate In Business Area? The churn rate, also known as the rate of attrition or customer churn, is the rate at which customers stop doing business with an entity. It is most commonly expressed as the percentage of service subscribers who discontinue their subscriptions within a given time period. It is also the rate at which employees leave their jobs within a certain period. For a company to expand its client, its growth rate (measured by the number of new customers) must exceed its churn rate.
Running the H2O AutoML to get the models leaderboard. Since the target 'Churn' is a binomial target, the problem we are facing here is a classification problem.
model_id
auc
logloss
mean_per_class_error
rmse
mse
0
StackedEnsemble_BestOfFamily_AutoML_20200809_1...
0.850180
0.417073
0.226777
0.366226
0.134121
1
GBM_grid_1_AutoML_20200809_184844_model_18
0.850156
0.409682
0.230198
0.364476
0.132843
2
GBM_grid_1_AutoML_20200809_184844_model_20
0.849776
0.410381
0.234533
0.364628
0.132953
3
StackedEnsemble_AllModels_AutoML_20200809_184844
0.849452
0.417161
0.228774
0.366490
0.134315
4
GBM_grid_1_AutoML_20200809_184844_model_16
0.849246
0.411673
0.238235
0.365142
0.133329
5
GBM_grid_1_AutoML_20200809_184844_model_12
0.848980
0.452791
0.234555
0.381081
0.145222
6
GBM_grid_1_AutoML_20200809_184844_model_14
0.848974
0.412404
0.225886
0.365270
0.133422
7
GBM_grid_1_AutoML_20200809_224838_model_8
0.848855
0.431964
0.240782
0.372267
0.138582
8
GBM_grid_1_AutoML_20200809_184844_model_3
0.848788
0.432952
0.237229
0.372787
0.138970
9
GBM_grid_1_AutoML_20200809_184844_model_1
0.848785
0.434713
0.228669
0.373369
0.139405
10
XGBoost_grid_1_AutoML_20200809_224838_model_9
0.848772
0.411119
0.232333
0.365420
0.133531
11
GBM_grid_1_AutoML_20200809_224838_model_12
0.848654
0.411532
0.234209
0.365369
0.133494
12
GBM_grid_1_AutoML_20200809_224838_model_7
0.848639
0.414071
0.236514
0.366018
0.133970
13
GBM_grid_1_AutoML_20200809_224838_model_5
0.848601
0.431544
0.240111
0.371907
0.138315
14
GBM_grid_1_AutoML_20200809_184844_model_13
0.848597
0.412046
0.233466
0.365329
0.133465
15
XGBoost_grid_1_AutoML_20200809_224838_model_5
0.848483
0.415327
0.232322
0.365961
0.133928
16
GBM_5_AutoML_20200809_184844
0.848407
0.411846
0.235760
0.365449
0.133553
17
XGBoost_grid_1_AutoML_20200809_224838_model_2
0.848375
0.411610
0.229764
0.365684
0.133724
18
XGBoost_3_AutoML_20200809_224838
0.848356
0.412230
0.229621
0.365567
0.133639
19
GBM_grid_1_AutoML_20200809_224838_model_11
0.848199
0.413048
0.234907
0.365655
0.133704
20
GBM_grid_1_AutoML_20200809_224838_model_15
0.848197
0.435540
0.229049
0.373651
0.139615
21
GBM_grid_1_AutoML_20200809_184844_model_8
0.848169
0.441859
0.234264
0.376513
0.141762
22
XGBoost_3_AutoML_20200809_184844
0.848065
0.412465
0.237157
0.365700
0.133736
23
XGBoost_2_AutoML_20200809_184844
0.847559
0.413327
0.234434
0.366205
0.134106
24
XGBoost_grid_1_AutoML_20200809_184844_model_10
0.847499
0.415598
0.237966
0.366557
0.134364
25
GBM_grid_1_AutoML_20200809_184844_model_19
0.847337
0.436593
0.235980
0.374232
0.140050
26
GBM_5_AutoML_20200809_224838
0.847264
0.412953
0.233433
0.366141
0.134059
27
XGBoost_2_AutoML_20200809_224838
0.847197
0.413653
0.231425
0.366346
0.134210
28
GBM_grid_1_AutoML_20200809_224838_model_4
0.846974
0.439954
0.240557
0.376069
0.141428
29
XGBoost_grid_1_AutoML_20200809_224838_model_1
0.846642
0.412812
0.232162
0.366473
0.134303
...
...
...
...
...
...
...
70
GBM_grid_1_AutoML_20200809_224838_model_14
0.831422
0.432501
0.243522
0.375203
0.140777
71
GBM_grid_1_AutoML_20200809_224838_model_13
0.830757
0.435942
0.248588
0.376414
0.141687
72
GBM_4_AutoML_20200809_184844
0.830160
0.438904
0.251052
0.377504
0.142509
73
DeepLearning_1_AutoML_20200809_224838
0.829825
0.436108
0.254523
0.378159
0.143004
74
XGBoost_grid_1_AutoML_20200809_184844_model_3
0.829003
0.447995
0.250662
0.381721
0.145711
75
DeepLearning_grid_1_AutoML_20200809_184844_mod...
0.828073
0.505879
0.251965
0.388789
0.151157
76
DeepLearning_grid_1_AutoML_20200809_224838_mod...
0.827414
0.485080
0.249715
0.381528
0.145563
77
DeepLearning_grid_1_AutoML_20200809_224838_mod...
0.826863
0.463806
0.250040
0.382114
0.146011
78
XGBoost_grid_1_AutoML_20200809_184844_model_2
0.826246
0.447182
0.254606
0.382063
0.145972
79
XGBoost_grid_1_AutoML_20200809_224838_model_6
0.825827
0.453619
0.252576
0.382699
0.146458
80
DeepLearning_grid_1_AutoML_20200809_224838_mod...
0.824306
0.460778
0.256933
0.383795
0.147299
81
XRT_1_AutoML_20200809_184844
0.822605
0.512764
0.259865
0.379563
0.144068
82
XRT_1_AutoML_20200809_224838
0.821975
0.498749
0.259430
0.379496
0.144018
83
GBM_grid_1_AutoML_20200809_224838_model_17
0.821147
0.569075
0.251833
0.437177
0.191124
84
DRF_1_AutoML_20200809_224838
0.820474
0.522540
0.256855
0.380883
0.145072
85
GBM_grid_1_AutoML_20200809_184844_model_5
0.819377
0.465534
0.260789
0.387906
0.150471
86
DRF_1_AutoML_20200809_184844
0.819231
0.538513
0.255728
0.380744
0.144966
87
GBM_grid_1_AutoML_20200809_184844_model_7
0.816504
0.570223
0.255827
0.437693
0.191575
88
DeepLearning_grid_1_AutoML_20200809_184844_mod...
0.814457
0.464657
0.256316
0.387104
0.149850
89
XGBoost_grid_1_AutoML_20200809_184844_model_9
0.812922
0.497491
0.262802
0.398574
0.158861
90
GBM_grid_1_AutoML_20200809_184844_model_11
0.810966
0.568592
0.257857
0.436954
0.190928
91
GBM_grid_1_AutoML_20200809_184844_model_10
0.810071
0.483470
0.262444
0.391754
0.153472
92
DeepLearning_grid_1_AutoML_20200809_184844_mod...
0.809415
0.681625
0.257185
0.417600
0.174389
93
GBM_grid_1_AutoML_20200809_184844_model_17
0.807155
0.671564
0.261080
0.419363
0.175865
94
DeepLearning_grid_1_AutoML_20200809_184844_mod...
0.806609
0.665037
0.270382
0.397929
0.158348
95
GBM_grid_1_AutoML_20200809_224838_model_6
0.805700
0.500964
0.264710
0.399209
0.159368
96
GBM_grid_1_AutoML_20200809_184844_model_9
0.805206
0.568771
0.268710
0.437034
0.190999
97
DeepLearning_grid_1_AutoML_20200809_184844_mod...
0.803046
0.550576
0.271191
0.405248
0.164226
98
XGBoost_grid_1_AutoML_20200809_184844_model_7
0.792067
0.559789
0.277797
0.417189
0.174046
99
GBM_grid_1_AutoML_20200809_224838_model_2
0.789137
0.606154
0.282159
0.420276
0.176632
100 rows × 6 columns
As the suggestion list generated from the evaluation system of interpretability, there are three models to build. Logistic Regression, Gradient Boosting Machine, and XGBoost Model. By taking advantage of the H2O Auto ML leaderboard, we are able to get the best models with the most suitable hyperparameter.
No
Yes
Error
Rate
No
2088.0
745.0
0.263
(745.0/2833.0)
Yes
181.0
850.0
0.1756
(181.0/1031.0)
Total
2269.0
1595.0
0.2396
(926.0/3864.0)
metric
threshold
value
idx
max f1
0.2585920
0.6473724
241.0
max f2
0.1563713
0.7615521
288.0
max f0point5
0.5398616
0.6524909
122.0
max accuracy
0.5398616
0.8115942
122.0
max precision
0.8750044
1.0
0.0
max recall
0.0031143
1.0
397.0
max specificity
0.8750044
1.0
0.0
max absolute_mcc
0.2585920
0.5044077
241.0
max min_per_class_accuracy
0.2989662
0.7720660
223.0
max mean_per_class_accuracy
0.2585920
0.7807351
241.0
group
cumulative_data_fraction
lower_threshold
lift
cumulative_lift
response_rate
score
cumulative_response_rate
cumulative_score
capture_rate
cumulative_capture_rate
gain
cumulative_gain
1
0.0100932
0.8072080
3.2673282
3.2673282
0.8717949
0.8318661
0.8717949
0.8318661
0.0329777
0.0329777
226.7328210
226.7328210
2
0.0201863
0.7867289
3.0751324
3.1712303
0.8205128
0.7973382
0.8461538
0.8146022
0.0310378
0.0640155
207.5132433
217.1230322
3
0.0300207
0.7708700
3.2546838
3.1985685
0.8684211
0.7784711
0.8534483
0.8027661
0.0320078
0.0960233
225.4683751
219.8568514
4
0.0401139
0.7558323
3.2673282
3.2158693
0.8717949
0.7653157
0.8580645
0.7933431
0.0329777
0.1290010
226.7328210
221.5869341
5
0.0502070
0.7380001
2.7868388
3.1296209
0.7435897
0.7467696
0.8350515
0.7839804
0.0281280
0.1571290
178.6838767
212.9620927
6
0.1001553
0.6729028
2.8351367
2.9827593
0.7564767
0.7046722
0.7958656
0.7444288
0.1416101
0.2987391
183.5136670
198.2759269
7
0.1501035
0.6071369
2.3885056
2.7850162
0.6373057
0.6387067
0.7431034
0.7092489
0.1193016
0.4180407
138.8505551
178.5016221
8
0.2000518
0.5381941
2.0778056
2.6084423
0.5544041
0.5722402
0.6959897
0.6750410
0.1037827
0.5218235
107.7805642
160.8442299
9
0.2999482
0.3962338
1.5437902
2.2538645
0.4119171
0.4683588
0.6013805
0.6062064
0.1542192
0.6760427
54.3790173
125.3864456
10
0.4001035
0.2677629
1.2976940
2.0145126
0.3462532
0.3286738
0.5375162
0.5367335
0.1299709
0.8060136
29.7693968
101.4512593
11
0.5
0.1837581
0.7961685
1.7710960
0.2124352
0.2207226
0.4725673
0.4735967
0.0795344
0.8855480
-20.3831483
77.1096023
12
0.5998965
0.1104158
0.5922717
1.5747948
0.1580311
0.1484992
0.4201898
0.4194606
0.0591659
0.9447139
-40.7728298
57.4794820
13
0.7000518
0.0617657
0.2808442
1.3896714
0.0749354
0.0847926
0.3707948
0.3715802
0.0281280
0.9728419
-71.9155783
38.9671388
14
0.7999482
0.0286901
0.1262218
1.2318935
0.0336788
0.0437199
0.3286962
0.3306374
0.0126091
0.9854510
-87.3778162
23.1893476
15
0.8998447
0.0107940
0.0970937
1.1059134
0.0259067
0.0186504
0.2950820
0.2960021
0.0096993
0.9951503
-90.2906278
10.5913406
16
1.0
0.0013001
0.0484214
1.0
0.0129199
0.0050078
0.2668219
0.2668575
0.0048497
1.0
-95.1578583
0.0
No
Yes
Error
Rate
No
1189.0
263.0
0.1811
(263.0/1452.0)
Yes
163.0
349.0
0.3184
(163.0/512.0)
Total
1352.0
612.0
0.2169
(426.0/1964.0)
metric
threshold
value
idx
max f1
0.3933891
0.6209964
177.0
max f2
0.1467309
0.7441787
296.0
max f0point5
0.5139588
0.6147913
124.0
max accuracy
0.5208544
0.7998982
121.0
max precision
0.8712309
1.0
0.0
max recall
0.0066205
1.0
393.0
max specificity
0.8712309
1.0
0.0
max absolute_mcc
0.3933891
0.4744255
177.0
max min_per_class_accuracy
0.3241675
0.7519531
210.0
max mean_per_class_accuracy
0.2878493
0.7594885
226.0
group
cumulative_data_fraction
lower_threshold
lift
cumulative_lift
response_rate
score
cumulative_response_rate
cumulative_score
capture_rate
cumulative_capture_rate
gain
cumulative_gain
1
0.0101833
0.8083745
3.0687500
3.0687500
0.8
0.8366588
0.8
0.8366588
0.03125
0.03125
206.8750000
206.8750000
2
0.0203666
0.7814955
2.4933594
2.7810547
0.65
0.7940115
0.725
0.8153352
0.0253906
0.0566406
149.3359375
178.1054688
3
0.0300407
0.7625985
2.8264803
2.7956833
0.7368421
0.7728231
0.7288136
0.8016448
0.0273438
0.0839844
182.6480263
179.5683263
4
0.0402240
0.7433646
2.8769531
2.8162579
0.75
0.7519586
0.7341772
0.7890660
0.0292969
0.1132812
187.6953125
181.6257911
5
0.0504073
0.7278637
2.8769531
2.8285196
0.75
0.7359333
0.7373737
0.7783321
0.0292969
0.1425781
187.6953125
182.8519571
6
0.1003055
0.6504495
2.5051020
2.6676317
0.6530612
0.6902004
0.6954315
0.7344900
0.125
0.2675781
150.5102041
166.7631662
7
0.1502037
0.5958767
2.3485332
2.5616261
0.6122449
0.6207043
0.6677966
0.6966900
0.1171875
0.3847656
134.8533163
156.1626059
8
0.2001018
0.5365627
2.0745376
2.4401638
0.5408163
0.5687102
0.6361323
0.6647764
0.1035156
0.4882812
107.4537628
144.0163804
9
0.2998982
0.4021287
1.7222577
2.2012680
0.4489796
0.4687213
0.5738540
0.5995357
0.171875
0.6601562
72.2257653
120.1268039
10
0.4002037
0.2980178
1.2072494
1.9521310
0.3147208
0.3501028
0.5089059
0.5370188
0.1210938
0.78125
20.7249365
95.2131043
11
0.5
0.2041451
0.7632733
1.7148438
0.1989796
0.2493834
0.4470468
0.4796089
0.0761719
0.8574219
-23.6726722
71.4843750
12
0.5997963
0.1297163
0.7241311
1.5500053
0.1887755
0.1641780
0.4040747
0.4271263
0.0722656
0.9296875
-27.5868941
55.0005306
13
0.7001018
0.0717965
0.3504918
1.3781477
0.0913706
0.1003014
0.3592727
0.3803012
0.0351562
0.9648438
-64.9508249
37.8147727
14
0.7998982
0.0354904
0.1957111
1.2306254
0.0510204
0.0536183
0.3208148
0.3395438
0.0195312
0.984375
-80.4288903
23.0625398
15
0.8996945
0.0127866
0.1369978
1.1093175
0.0357143
0.0222299
0.2891907
0.3043466
0.0136719
0.9980469
-86.3002232
10.9317523
16
1.0
0.0013697
0.0194718
1.0
0.0050761
0.0067909
0.2606925
0.2745001
0.0019531
1.0
-98.0528236
0.0
timestamp
duration
iterations
negative_log_likelihood
objective
2020-08-10 08:10:16
0.000 sec
0
2241.3989447
0.5800722
2020-08-10 08:10:16
0.017 sec
1
1654.2086472
0.4289863
2020-08-10 08:10:16
0.028 sec
2
1585.7817876
0.4114282
2020-08-10 08:10:16
0.031 sec
3
1571.8949395
0.4081494
2020-08-10 08:10:16
0.035 sec
4
1570.1412044
0.4078462
2020-08-10 08:10:16
0.039 sec
5
1570.0635782
0.4078434
No
Yes
Error
Rate
No
369.0
121.0
0.2469
(121.0/490.0)
Yes
37.0
135.0
0.2151
(37.0/172.0)
Total
406.0
256.0
0.2387
(158.0/662.0)
metric
threshold
value
idx
max f1
0.3024392
0.6308411
193.0
max f2
0.1422109
0.7490637
271.0
max f0point5
0.5192027
0.6445313
110.0
max accuracy
0.5192027
0.8141994
110.0
max precision
0.8750186
1.0
0.0
max recall
0.0137343
1.0
379.0
max specificity
0.8750186
1.0
0.0
max absolute_mcc
0.5110350
0.4988469
111.0
max min_per_class_accuracy
0.3130038
0.7591837
187.0
max mean_per_class_accuracy
0.3024392
0.7689725
193.0
group
cumulative_data_fraction
lower_threshold
lift
cumulative_lift
response_rate
score
cumulative_response_rate
cumulative_score
capture_rate
cumulative_capture_rate
gain
cumulative_gain
1
0.0105740
0.8028264
3.8488372
3.8488372
1.0
0.8344546
1.0
0.8344546
0.0406977
0.0406977
284.8837209
284.8837209
2
0.0211480
0.7855657
3.2990033
3.5739203
0.8571429
0.7949672
0.9285714
0.8147109
0.0348837
0.0755814
229.9003322
257.3920266
3
0.0302115
0.7630627
2.5658915
3.2715116
0.6666667
0.7721759
0.85
0.8019504
0.0232558
0.0988372
156.5891473
227.1511628
4
0.0407855
0.7516464
3.2990033
3.2786391
0.8571429
0.7580396
0.8518519
0.7905661
0.0348837
0.1337209
229.9003322
227.8639104
5
0.0513595
0.7319520
3.2990033
3.2828317
0.8571429
0.7393147
0.8529412
0.7800144
0.0348837
0.1686047
229.9003322
228.2831737
6
0.1012085
0.6708461
2.2159972
2.7573759
0.5757576
0.6986010
0.7164179
0.7399152
0.1104651
0.2790698
121.5997181
175.7375911
7
0.1510574
0.6110156
2.0993658
2.5402326
0.5454545
0.6368209
0.66
0.7058941
0.1046512
0.3837209
109.9365751
154.0232558
8
0.2009063
0.5535949
2.6825229
2.5755377
0.6969697
0.5807483
0.6691729
0.6748429
0.1337209
0.5174419
168.2522903
157.5537681
9
0.3006042
0.3961013
1.3412615
2.1661797
0.3484848
0.4753281
0.5628141
0.6086721
0.1337209
0.6511628
34.1261452
116.6179736
10
0.4003021
0.2801609
1.3995772
1.9752523
0.3636364
0.3394054
0.5132075
0.5416095
0.1395349
0.7906977
39.9577167
97.5252304
11
0.5
0.1838036
0.8164200
1.7441860
0.2121212
0.2310811
0.4531722
0.4796914
0.0813953
0.8720930
-18.3579986
74.4186047
12
0.5996979
0.1218909
0.5831572
1.5511686
0.1515152
0.1538460
0.4030227
0.4255207
0.0581395
0.9302326
-41.6842847
55.1168649
13
0.6993958
0.0686841
0.4082100
1.3882415
0.1060606
0.0948750
0.3606911
0.3783876
0.0406977
0.9709302
-59.1789993
38.8241499
14
0.7990937
0.0313631
0.1166314
1.2295907
0.0303030
0.0485601
0.3194707
0.3372371
0.0116279
0.9825581
-88.3368569
22.9590715
15
0.8987915
0.0107587
0.1749471
1.1126050
0.0454545
0.0193192
0.2890756
0.3019723
0.0174419
1.0
-82.5052854
11.2605042
16
1.0
0.0014169
0.0
1.0
0.0
0.0056128
0.2598187
0.2719782
0.0
1.0
-100.0
0.0
'Tenure', the number of months the customer has stayed with the company, is the most important feature in this dataset. The longer the customer has stayed with the company, the possibility of the customer stop doing business with the company is smaller.
The above PDP is for the column "Tenure" which turned out to be one of the most important columns under variable importance. The explanation from the graph is as observed, that the target "Churn"(which means if a customer will stay or leave) decreases as the tenure increases. We can infer that, if a Customer is staying for a longer duration, the chances of the Churn reduces, so the curve levels down as it moves with the increase of Tenure.
truenetotal_charges
mean_response
stddev_response
std_error_mean_response
18.8
0.2137070
0.2401651
0.0038636
474.9052632
0.2284907
0.2505882
0.0040313
931.0105263
0.2436702
0.2606659
0.0041934
1387.1157895
0.2592092
0.2703500
0.0043492
1843.2210526
0.2750709
0.2795973
0.0044979
2299.3263158
0.2912182
0.2883697
0.0046391
2755.4315789
0.3076140
0.2966338
0.0047720
3211.5368421
0.3242221
0.3043611
0.0048963
3667.6421053
0.3410069
0.3115273
0.0050116
4123.7473684
0.3579340
0.3181119
0.0051175
4579.8526316
0.3749705
0.3240982
0.0052138
5035.9578947
0.3920849
0.3294726
0.0053003
5492.0631579
0.4092474
0.3342245
0.0053767
5948.1684211
0.4264302
0.3383460
0.0054431
6404.2736842
0.4436072
0.3418315
0.0054991
6860.3789474
0.4607542
0.3446780
0.0055449
7316.4842105
0.4778489
0.3468842
0.0055804
7772.5894737
0.4948709
0.3484511
0.0056056
8228.6947368
0.5118011
0.3493812
0.0056206
8684.8
0.5286223
0.3496788
0.0056254
Similar to "Tenure", one of the other important features is "total_charges" and the PDP of "total_charges" helps us understand that as the charges of the customer increases the chances of they getting Churned is high, and it makes a complete sense that the reason for the Churn could be because of high charges that make them drop the subscription.
Tenure
partial_dependence
Percentile_0
Percentile_10
Percentile_20
Percentile_30
Percentile_40
Percentile_50
Percentile_60
Percentile_70
Percentile_80
Percentile_90
Percentile_99
0
1.00
0.655008
1.0
1.0
0.0
1.0
0.0
0.0
1.0
0.0
1.0
1.0
1.0
1
4.55
0.608783
1.0
1.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
2
8.10
0.559630
1.0
1.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
3
11.65
0.513713
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
4
15.20
0.469954
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
5
18.75
0.426502
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
6
22.30
0.378582
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
7
25.85
0.320493
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
8
29.40
0.269337
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
9
32.95
0.212481
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
10
36.50
0.158243
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
11
40.05
0.115100
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
12
43.60
0.082589
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
13
47.15
0.054083
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
14
50.70
0.033436
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
15
54.25
0.021109
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
16
57.80
0.011864
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
17
61.35
0.005393
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
18
64.90
0.002311
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
19
68.45
0.000616
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
The ICE plot for "Tenure" shows every instance of the data that changes according to the target. And the average of which is the Partial Plot which is shown in the graph.
No
Yes
Error
Rate
No
2495.0
354.0
0.1243
(354.0/2849.0)
Yes
214.0
822.0
0.2066
(214.0/1036.0)
Total
2709.0
1176.0
0.1462
(568.0/3885.0)
metric
threshold
value
idx
max f1
0.3760812
0.7432188
201.0
max f2
0.2599200
0.8228181
249.0
max f0point5
0.5358469
0.7384654
137.0
max accuracy
0.4226923
0.8550837
183.0
max precision
0.9288880
1.0
0.0
max recall
0.0296397
1.0
382.0
max specificity
0.9288880
1.0
0.0
max absolute_mcc
0.3760812
0.6441133
201.0
max min_per_class_accuracy
0.3300302
0.8378378
220.0
max mean_per_class_accuracy
0.2818741
0.8463057
240.0
group
cumulative_data_fraction
lower_threshold
lift
cumulative_lift
response_rate
score
cumulative_response_rate
cumulative_score
capture_rate
cumulative_capture_rate
gain
cumulative_gain
1
0.0100386
0.8899302
3.75
3.75
1.0
0.9065725
1.0
0.9065725
0.0376448
0.0376448
275.0
275.0
2
0.0200772
0.8579185
3.5576923
3.6538462
0.9487179
0.8778309
0.9743590
0.8922017
0.0357143
0.0733591
255.7692308
265.3846154
3
0.0301158
0.8366076
3.6538462
3.6538462
0.9743590
0.8478174
0.9743590
0.8774069
0.0366795
0.1100386
265.3846154
265.3846154
4
0.0404118
0.8169945
3.28125
3.5589172
0.875
0.8268917
0.9490446
0.8645368
0.0337838
0.1438224
228.125
255.8917197
5
0.0501931
0.7901430
3.3552632
3.5192308
0.8947368
0.8036881
0.9384615
0.8526791
0.0328185
0.1766409
235.5263158
251.9230769
6
0.1001287
0.6838086
3.3827320
3.4511568
0.9020619
0.7367699
0.9203085
0.7948735
0.1689189
0.3455598
238.2731959
245.1156812
7
0.1500644
0.5917246
2.5322165
3.1453688
0.6752577
0.6386479
0.8387650
0.7428876
0.1264479
0.4720077
153.2216495
214.5368782
8
0.2
0.5229638
2.3002577
2.9343629
0.6134021
0.5565991
0.7824968
0.6963754
0.1148649
0.5868726
130.0257732
193.4362934
9
0.3001287
0.3784853
2.0051414
2.6243568
0.5347044
0.4510177
0.6998285
0.6145194
0.2007722
0.7876448
100.5141388
162.4356775
10
0.4
0.2616369
1.1501289
2.2562741
0.3067010
0.3181251
0.6016731
0.5405162
0.1148649
0.9025097
15.0128866
125.6274131
11
0.5003861
0.1649588
0.5
1.9039352
0.1333333
0.2125728
0.5077160
0.4747251
0.0501931
0.9527027
-50.0
90.3935185
12
0.6
0.0988879
0.2422481
1.6280566
0.0645995
0.1315786
0.4341484
0.4177548
0.0241313
0.9768340
-75.7751938
62.8056628
13
0.6998713
0.0552501
0.1449742
1.4164215
0.0386598
0.0741760
0.3777124
0.3687263
0.0144788
0.9913127
-85.5025773
41.6421478
14
0.8
0.0339601
0.0674807
1.2475869
0.0179949
0.0430926
0.3326898
0.3279697
0.0067568
0.9980695
-93.2519280
24.7586873
15
0.8998713
0.0212780
0.0193299
1.1112700
0.0051546
0.0276263
0.2963387
0.2946364
0.0019305
1.0
-98.0670103
11.1270023
16
1.0
0.0109891
0.0
1.0
0.0
0.0176606
0.2666667
0.2669031
0.0
1.0
-100.0
0.0
Different from the Variable Importance Plot we generated from the Logistic Regression Model, the most important variable in Gradient Boosting Model is 'Contract'. That could be a reason for changing the model to a tree-based model, and the Contract column can be seen as a very important Leaf Node.
tenure
mean_response
stddev_response
std_error_mean_response
1.0
0.4034221
0.2908955
0.0046670
4.7368421
0.3325176
0.2527814
0.0040555
8.4736842
0.3245015
0.2471217
0.0039647
12.2105263
0.2921240
0.2306489
0.0037005
15.9473684
0.2706254
0.2223178
0.0035668
19.6842105
0.2605981
0.2229685
0.0035772
23.4210526
0.2130585
0.1886440
0.0030265
27.1578947
0.2157993
0.1934324
0.0031034
30.8947368
0.2145211
0.1928156
0.0030935
34.6315789
0.2145211
0.1928156
0.0030935
38.3684211
0.2105373
0.1890392
0.0030329
42.1052632
0.2054247
0.1833675
0.0029419
45.8421053
0.2019875
0.1820938
0.0029215
49.5789474
0.2037796
0.1868446
0.0029977
53.3157895
0.2021542
0.1739150
0.0027902
57.0526316
0.1782865
0.1582936
0.0025396
60.7894737
0.1482262
0.1329597
0.0021332
64.5263158
0.1517906
0.1362895
0.0021866
68.2631579
0.1517977
0.1362876
0.0021866
72.0
0.1191810
0.1090938
0.0017503
The PDP of Tenure shown above is from the GBM model, which shows a similar trend as in the Logistic Regression but has an uneven trend due to the non-linearity that GBM possesses. Though the overall interpretation from the graph is the same for the Logistic Regression
total_charges
mean_response
stddev_response
std_error_mean_response
18.8
0.2999703
0.2774126
0.0044507
467.6605263
0.2593645
0.2500084
0.0040111
916.5210526
0.2756536
0.2298681
0.0036879
1365.3815789
0.2670324
0.2321262
0.0037242
1814.2421053
0.2683494
0.2377486
0.0038144
2263.1026316
0.2771945
0.2405347
0.0038591
2711.9631579
0.2783816
0.2412736
0.0038709
3160.8236842
0.2531762
0.2227409
0.0035736
3609.6842105
0.2422233
0.2135560
0.0034262
4058.5447368
0.2508845
0.2205948
0.0035392
4507.4052632
0.2671597
0.2359473
0.0037855
4956.2657895
0.2468248
0.2171484
0.0034839
5405.1263158
0.2712422
0.2350661
0.0037713
5853.9868421
0.2627801
0.2348798
0.0037683
6302.8473684
0.2701906
0.2288810
0.0036721
6751.7078947
0.2591206
0.2288842
0.0036721
7200.5684211
0.2729574
0.2385147
0.0038267
7649.4289474
0.3157270
0.2462378
0.0039506
8098.2894737
0.3142197
0.2465738
0.0039560
8547.15
0.3142197
0.2465738
0.0039560
As an inference, the PDP's derived from logistic and GBM can clearly show us the difference in the derived graphs, which shows differences in the algorithms working within.
Tenure
partial_dependence
Percentile_0
Percentile_10
Percentile_20
Percentile_30
Percentile_40
Percentile_50
Percentile_60
Percentile_70
Percentile_80
Percentile_90
Percentile_99
0
1.00
0.576888
1.0
0.0
0.0
1.0
1.0
0.0
1.0
0.0
1.0
1.0
1.0
1
4.55
0.507550
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
2
8.10
0.500000
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
3
11.65
0.456086
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
0.0
1.0
4
15.20
0.443143
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
0.0
1.0
5
18.75
0.420339
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
0.0
1.0
6
22.30
0.359322
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
7
25.85
0.330354
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
8
29.40
0.337442
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
9
32.95
0.336826
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
10
36.50
0.336826
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
11
40.05
0.322804
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
12
43.60
0.312481
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
13
47.15
0.318952
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
14
50.70
0.310015
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
15
54.25
0.306163
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
0.0
1.0
16
57.80
0.273960
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
17
61.35
0.192758
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
18
64.90
0.205239
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
19
68.45
0.205239
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
No
Yes
Error
Rate
No
2682.0
167.0
0.0586
(167.0/2849.0)
Yes
91.0
945.0
0.0878
(91.0/1036.0)
Total
2773.0
1112.0
0.0664
(258.0/3885.0)
metric
threshold
value
idx
max f1
0.4160832
0.8798883
197.0
max f2
0.3003411
0.9168793
234.0
max f0point5
0.5387729
0.8942644
151.0
max accuracy
0.4264682
0.9341055
193.0
max precision
0.9776292
1.0
0.0
max recall
0.0642277
1.0
342.0
max specificity
0.9776292
1.0
0.0
max absolute_mcc
0.4160832
0.8350746
197.0
max min_per_class_accuracy
0.3879922
0.9285714
206.0
max mean_per_class_accuracy
0.3585542
0.9310284
215.0
group
cumulative_data_fraction
lower_threshold
lift
cumulative_lift
response_rate
score
cumulative_response_rate
cumulative_score
capture_rate
cumulative_capture_rate
gain
cumulative_gain
1
0.0100386
0.9424194
3.75
3.75
1.0
0.9552457
1.0
0.9552457
0.0376448
0.0376448
275.0
275.0
2
0.0200772
0.9212297
3.6538462
3.7019231
0.9743590
0.9316404
0.9871795
0.9434430
0.0366795
0.0743243
265.3846154
270.1923077
3
0.0301158
0.9050536
3.75
3.7179487
1.0
0.9124670
0.9914530
0.9331177
0.0376448
0.1119691
275.0
271.7948718
4
0.0401544
0.8860434
3.75
3.7259615
1.0
0.8956389
0.9935897
0.9237480
0.0376448
0.1496139
275.0
272.5961538
5
0.0501931
0.8648538
3.75
3.7307692
1.0
0.8763654
0.9948718
0.9142715
0.0376448
0.1872587
275.0
273.0769231
6
0.1001287
0.7814316
3.5953608
3.6632391
0.9587629
0.8228929
0.9768638
0.8686997
0.1795367
0.3667954
259.5360825
266.3239075
7
0.1500644
0.6925323
3.5180412
3.6149228
0.9381443
0.7361416
0.9639794
0.8245894
0.1756757
0.5424710
251.8041237
261.4922813
8
0.2
0.5959323
3.3247423
3.5424710
0.8865979
0.6410087
0.9446589
0.7787533
0.1660232
0.7084942
232.4742268
254.2471042
9
0.3001287
0.3844050
2.2172237
3.1003431
0.5912596
0.4889227
0.8267581
0.6820602
0.2220077
0.9305019
121.7223650
210.0343053
10
0.4
0.2166304
0.5605670
2.4662162
0.1494845
0.2954908
0.6576577
0.5855423
0.0559846
0.9864865
-43.9432990
146.6216216
11
0.5001287
0.1206617
0.0867609
1.9898353
0.0231362
0.1632120
0.5306227
0.5009893
0.0086873
0.9951737
-91.3239075
98.9835306
12
0.6
0.0609180
0.0483247
1.6666667
0.0128866
0.0882648
0.4444444
0.4322904
0.0048263
1.0
-95.1675258
66.6666667
13
0.6998713
0.0307850
0.0
1.4288341
0.0
0.0440276
0.3810224
0.3768855
0.0
1.0
-100.0
42.8834130
14
0.8
0.0131013
0.0
1.25
0.0
0.0209528
0.3333333
0.3323366
0.0
1.0
-100.0
25.0
15
0.8998713
0.0049039
0.0
1.1112700
0.0
0.0084077
0.2963387
0.2963857
0.0
1.0
-100.0
11.1270023
16
1.0
0.0003411
0.0
1.0
0.0
0.0024668
0.2666667
0.2669560
0.0
1.0
-100.0
0.0
tenure
mean_response
stddev_response
std_error_mean_response
1.0
0.3390722
0.3175808
0.0039421
4.7368421
0.3265676
0.3092209
0.0038384
8.4736842
0.3189497
0.3025351
0.0037554
12.2105263
0.2732323
0.2627196
0.0032611
15.9473684
0.2541345
0.2545033
0.0031592
19.6842105
0.2689445
0.2609253
0.0032389
23.4210526
0.2041085
0.2114047
0.0026242
27.1578947
0.2120031
0.2326531
0.0028879
30.8947368
0.2089505
0.2294045
0.0028476
34.6315789
0.2079895
0.2272737
0.0028212
38.3684211
0.2144646
0.2273624
0.0028223
42.1052632
0.2050112
0.2244768
0.0027864
45.8421053
0.1991063
0.2210418
0.0027438
49.5789474
0.2045538
0.2324444
0.0028853
53.3157895
0.2221776
0.2186239
0.0027138
57.0526316
0.1401536
0.1683525
0.0020898
60.7894737
0.1114619
0.1456679
0.0018082
64.5263158
0.1023254
0.1369044
0.0016994
68.2631579
0.1255760
0.1615960
0.0020059
72.0
0.0795766
0.1187945
0.0014746
If observed clearly, the results of PDP for GBM and XGboost are similar and are very closely related. That is because both the algorithm are tree-based models. and that shows their working within the models. This is a clear capture of how the models of similar origin help us interpret the explainable methods.
total_charges
mean_response
stddev_response
std_error_mean_response
18.8
0.4238615
0.3261315
0.0040483
474.9052632
0.2464518
0.2738153
0.0033989
931.0105263
0.2916773
0.2639648
0.0032766
1387.1157895
0.2316458
0.2615037
0.0032461
1843.2210526
0.2017253
0.2400597
0.0029799
2299.3263158
0.2147928
0.2491192
0.0030923
2755.4315789
0.2225582
0.2517619
0.0031251
3211.5368421
0.2479061
0.2703625
0.0033560
3667.6421053
0.1678940
0.1976482
0.0024534
4123.7473684
0.2350038
0.2451407
0.0030429
4579.8526316
0.1998015
0.2325371
0.0028865
5035.9578947
0.2053192
0.2169033
0.0026924
5492.0631579
0.2459624
0.2454963
0.0030474
5948.1684211
0.2257868
0.2376747
0.0029503
6404.2736842
0.2348899
0.2398839
0.0029777
6860.3789474
0.2202493
0.2335130
0.0028986
7316.4842105
0.2375569
0.2388472
0.0029648
7772.5894737
0.2341314
0.2315600
0.0028744
8228.6947368
0.2336091
0.2314393
0.0028729
8684.8
0.2336091
0.2314393
0.0028729
Total Charges
partial_dependence
Percentile_0
Percentile_10
Percentile_20
Percentile_30
Percentile_40
Percentile_50
Percentile_60
Percentile_70
Percentile_80
Percentile_90
Percentile_99
0
18.8
0.574422
1.0
0.0
0.0
1.0
1.0
1.0
1.0
0.0
1.0
1.0
1.0
1
452.1
0.333128
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
2
885.4
0.338829
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
3
1318.7
0.310015
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
4
1752.0
0.310786
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
5
2185.3
0.326965
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
6
2618.6
0.306626
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
7
3051.9
0.359168
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
8
3485.2
0.290755
1.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
9
3918.5
0.200462
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
10
4351.8
0.452080
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
11
4785.1
0.441757
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
12
5218.4
0.252388
1.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
13
5651.7
0.271495
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
14
6085.0
0.417720
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
15
6518.3
0.323267
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
16
6951.6
0.354854
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
17
7384.9
0.274268
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
18
7818.2
0.329584
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
19
8251.5
0.328351
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0