Bosch Production Line Performance, NO.74/top 6%

Bosch Production Line Performance e-mail : [email protected]

: 4000 Fitted model 50 top 6% 50

: R code parallel package mclapply linux mclapply sapply

ps :

1.

---/90%NA

NA AIC/BIC/lasso missing value XGB importance kernel Daniel FG code --- Feature Engineering

2.

Bosch Production Line Performance ( )

4000

Response	0	1
	1176868	6879

rate of Response 1 = 0.0058

2.1

Kaggle :

data	size	n ()	p ()	R ram
train_numeric	2.1GB	100	970	8.5 gb
train_date	2.9GB	100	1157	10.2 gb
train_categorical	2.7GB	100	2141
test_numeric	2.1GB	100	969
test_date	2.9GB	100	1157
test_categorical	2.7GB	100	2141


Response	, 0 : , 1 :
Id
Lx_Sx_Fx	L : lineS : stationF : feature number

L3_S36_F3939 3 36 3939 numeric date categorical

train_numeric :

Id	L0_S0_F0	L0_S0_F2	L0_S0_F4	L0_S0_F6
11	-0.055	-0.086	0.294	0.330
13	0.003	0.019	0.294	0.312
14	NA	NA	NA	NA
16	NA	NA	NA	NA
18	-0.016	-0.041	-0.179	-0.179

train_date

Id	L0_S0_F0	L0_S0_F2	L0_S0_F4	L0_S0_F6
11	602.64	602.64	602.64	602.64
13	1331.66	1331.66	1331.66	1331.66
14	NA	NA	NA	NA
16	NA	NA	NA	NA
18	517.64	517.64	517.64	517.64

evaluation MCC

3.

3.1 feature engineering 1 ( 1 )

date data

feature
first
min
last
max
class.amount
na.amount	na

L0L1L2L3 ex : all_first, L0_first, L1_first, L2_first, L3_first data 100 feature feature engineering 1 kaggle rank 50% feature engineering 2

3.2 feature engineering 2 ( 2 )

feature		code
next
prev
total	total	next+prev
same.time		total>0
order.same.time		(cumsum(prev)+1) * same.time
group
group.length		table(group)
cost.time		max-min
prev.cost.time		cost.time-c(NA,cost.time[1:length(cost.time)-1])
next.cost.time		cost.time-c(cost.time[2:length(cost.time)],NA)
prev.na.amount	na	na.amount--c(NA,na.amount[1:length(na.amount)-1])
next.na.amount	na	na.amount--c(na.amount[2:length(na.amount)],NA)
prev.target		c(NA,target[1:(nrow(target)-1)])
next.target		c(target[2:nrow(target)],NA)

L0~L3

3.3

train_numeric

100 100

100 100 0.058 0.045 ID L3_S32_F3850

train_numeric :

-	res1.per	var.name
1	0.0451	L3_S32_F3850
2	0.0093	L1_S24_F1768
3	0.0093	L1_S24_F1763
.	...	...
.	...	...
968	0.0003	L1_S25_F2512

3.4 feature selection

feature engineering 1feature engineering 2 450 XGBoost xgb.cv bset nrounds bset nrounds

xgb.importance 50 50 fitted model feature

amount of var = 450

		pred
		0	1
real	0	1176353	515
	1	4216	2663

MCC = 0.568

amount of var = 50

		pred
		0	1
real	0	1176304	564
	1	4360	2519

MCC = 0.545

( 0 )( 1 )( )

xgb.cv bset nrounds nrounds xgb.cv nrounds

nrounds	train-rmse	test-rmse
11	0.168337	0.168881
21	0.083219	0.085046
31	0.064824	0.067989
41	0.061830	0.065582
51	0.061191	0.065279****
61	0.060756	0.065227
71	0.060327	0.065229

Best iteration : 67, train-rmse:0.060464 test-rmse:0.065220

nround = 50 test-rmse train & test model

3.5 other

imbalance evaluation --- MCC XGBoost evaluation

XGBoost rmse MCC
imbalance target
0.25 0.25 1 0.25 0
0 0

4. Fitted model

feature engineering 1feature engineering 2 train_numeric xgb.imporance 50 feature feature XGBoost evaluation MCC rmse imbalance rate = 0.25 imblance

Fitted model top 6% rank MCC 2 ( 0.18 -> 0.46 ) feature

50 feature

train_numeric


L3_S32_F3850**	L1_S24_F1723**	L3_S33_F3859**	L3_S33_F3855	L1_S24_F1846
L3_S33_F3865	L1_S24_F1632	L3_S33_F3857	L3_S38_F3956	L1_S24_F1498
L1_S24_F1604	L3_S41_F4014	L1_S24_F1695	L3_S38_F3952	L3_S33_F3873
L1_S24_F1844	L3_S38_F3960	L2_S26_F3036	L2_S26_F3040	L2_S26_F3047
L2_S26_F3073	L1_S24_F1672	L1_S24_F1609	L1_S24_F1685

feature of train_date


all_next**	next.cost.time**	next.traget**	L0_first**
all_prev**	prev.traget**	group.amount**	next.na.amount
all_na.amount	L3_first	total	cost.time
L2_first	L3_na.amount	prev.cost.time	group
L3_last	prev.na.amount	L3_min	L0_min
all_first	all_class.amount	L1_first	L3_max
order.same.time	L0_last

feature plot :

ML 1. 2. ( ) 3. modeltype 1 error vs typr 2 error 4. datamodel 100 datadatavsML DATA/ 5. DL/ensemble 6.

Reference

Bosch Production Line Performance. ( 2016 )

Daniel FG. ( 2016 )

Related Projects

ESL-CN

The Elements of Statistical Learning (ESL)的中文翻译、代码实现及其习题解答。

30 Sep 2016 2,425

kaggle_Grupo_Bimbo_Inventory_Demand

Kaggle Grupo Bimbo Inventory Demand, NO.156/top 8% (post competition analysis) 庫存需求預測、前 8 % ( 賽後分析 )

16 Dec 2016 10

kaggle_Bosch_Production_Line_Performance

Bosch Production Line Performance, NO.74/top 6%

1.

2.

2.1

3.

3.1 feature engineering 1 ( 1 )

3.2 feature engineering 2 ( 2 )

3.3

3.4 feature selection

3.5 other

4. Fitted model

50 feature

train_numeric

feature of train_date

Reference

Related Projects

ESL-CN

kaggle_Grupo_Bimbo_Inventory_Demand