kaggle_Bosch_Production_Line_Performance

Kaggle Bosch Production Line Performance, NO.74/top 6% (post competition analysis) 生產線分析、前 6 % ( 賽後分析 )

Stars
10

Bosch Production Line Performance, NO.74/top 6%

Bosch Production Line Performance e-mail : [email protected]


: 4000 Fitted model 50 top 6% 50

: R code parallel package mclapply linux mclapply sapply

ps :


1.

---/90%NA

NA AIC/BIC/lasso missing value XGB importance kernel Daniel FG code --- Feature Engineering

2.

Bosch Production Line Performance ( )

4000

Response 0 1
1176868 6879

rate of Response 1 = 0.0058

2.1

Kaggle :

data size n () p () R ram
train_numeric 2.1GB 100 970 8.5 gb
train_date 2.9GB 100 1157 10.2 gb
train_categorical 2.7GB 100 2141
test_numeric 2.1GB 100 969
test_date 2.9GB 100 1157
test_categorical 2.7GB 100 2141

:

Response , 0 : , 1 :
Id
Lx_Sx_Fx L : lineS : stationF : feature number

L3_S36_F3939 3 36 3939 numeric date categorical

train_numeric :

Id L0_S0_F0 L0_S0_F2 L0_S0_F4 L0_S0_F6 Response
11 -0.055 -0.086 0.294 0.330 0
13 0.003 0.019 0.294 0.312 0
14 NA NA NA NA 0
16 NA NA NA NA 0
18 -0.016 -0.041 -0.179 -0.179 0

train_date

Id L0_S0_F0 L0_S0_F2 L0_S0_F4 L0_S0_F6
11 602.64 602.64 602.64 602.64
13 1331.66 1331.66 1331.66 1331.66
14 NA NA NA NA
16 NA NA NA NA
18 517.64 517.64 517.64 517.64

evaluation MCC

3.

3.1 feature engineering 1 ( 1 )

date data

feature
first
min
last
max
class.amount
na.amount na

L0L1L2L3 ex : all_first, L0_first, L1_first, L2_first, L3_first data 100 feature feature engineering 1 kaggle rank 50% feature engineering 2

3.2 feature engineering 2 ( 2 )

feature code
next
prev
total total next+prev
same.time total>0
order.same.time (cumsum(prev)+1) * same.time
group
group.length table(group)
cost.time max-min
prev.cost.time cost.time-c(NA,cost.time[1:length(cost.time)-1])
next.cost.time cost.time-c(cost.time[2:length(cost.time)],NA)
prev.na.amount na na.amount--c(NA,na.amount[1:length(na.amount)-1])
next.na.amount na na.amount--c(na.amount[2:length(na.amount)],NA)
prev.target c(NA,target[1:(nrow(target)-1)])
next.target c(target[2:nrow(target)],NA)

L0~L3

3.3

train_numeric

100 100

100 100 0.058 0.045 ID L3_S32_F3850

train_numeric :

- res1.per var.name
1 0.0451 L3_S32_F3850
2 0.0093 L1_S24_F1768
3 0.0093 L1_S24_F1763
. ... ...
. ... ...
968 0.0003 L1_S25_F2512

3.4 feature selection

feature engineering 1feature engineering 2 450 XGBoost xgb.cv bset nrounds bset nrounds

xgb.importance 50 50 fitted model feature

amount of var = 450

pred
0 1
real 0 1176353 515
1 4216 2663

MCC = 0.568

amount of var = 50

pred
0 1
real 0 1176304 564
1 4360 2519

MCC = 0.545

( 0 )( 1 )( )

xgb.cv bset nrounds nrounds xgb.cv nrounds

nrounds train-rmse test-rmse
11 0.168337 0.168881
21 0.083219 0.085046
31 0.064824 0.067989
41 0.061830 0.065582
51 0.061191 0.065279****
61 0.060756 0.065227
71 0.060327 0.065229

Best iteration : 67, train-rmse:0.060464 test-rmse:0.065220

nround = 50 test-rmse train & test model

3.5 other

imbalance evaluation --- MCC XGBoost evaluation

  1. XGBoost rmse MCC
  2. imbalance target
    0.25 0.25 1 0.25 0
    0 0

4. Fitted model

feature engineering 1feature engineering 2 train_numeric xgb.imporance 50 feature feature XGBoost evaluation MCC rmse imbalance rate = 0.25 imblance

Fitted model top 6% rank MCC 2 ( 0.18 -> 0.46 ) feature

50 feature

train_numeric

L3_S32_F3850** L1_S24_F1723** L3_S33_F3859** L3_S33_F3855 L1_S24_F1846
L3_S33_F3865 L1_S24_F1632 L3_S33_F3857 L3_S38_F3956 L1_S24_F1498
L1_S24_F1604 L3_S41_F4014 L1_S24_F1695 L3_S38_F3952 L3_S33_F3873
L1_S24_F1844 L3_S38_F3960 L2_S26_F3036 L2_S26_F3040 L2_S26_F3047
L2_S26_F3073 L1_S24_F1672 L1_S24_F1609 L1_S24_F1685

feature of train_date

all_next** next.cost.time** next.traget** L0_first**
all_prev** prev.traget** group.amount** next.na.amount
all_na.amount L3_first total cost.time
L2_first L3_na.amount prev.cost.time group
L3_last prev.na.amount L3_min L0_min
all_first all_class.amount L1_first L3_max
order.same.time L0_last

feature plot :

ML 1. 2. ( ) 3. modeltype 1 error vs typr 2 error 4. datamodel 100 datadatavsML DATA/ 5. DL/ensemble 6.

Reference

Bosch Production Line Performance. ( 2016 )

Daniel FG. ( 2016 )