서울시 모기발생상황 지표 예측#

Attention

2016년~ 2019년까지의 일별 모기지수 데이터를 온도,강수량 데이터를 통해 예측해본다.
평가지표는 r2 score

DataLoad

데이터 로드

import pandas as pd
train_x =pd.read_csv('https://raw.githubusercontent.com/Datamanim/mosquito/main/train_x.csv',encoding='euc-kr')
train_y =pd.read_csv('https://raw.githubusercontent.com/Datamanim/mosquito/main/train_y.csv',encoding='euc-kr')
test_x =pd.read_csv('https://raw.githubusercontent.com/Datamanim/mosquito/main/test_x.csv',encoding='euc-kr')
sub    =pd.read_csv('https://raw.githubusercontent.com/Datamanim/mosquito/main/sub.csv')

DATA

데이터셋 확인

train_x.head()
date 강수량(mm) 평균기온(℃) 최저기온(℃) 최고기온(℃)
0 2019-12-31 0.0 -7.9 -10.9 -4.5
1 2019-12-30 0.4 2.7 -5.7 6.8
2 2019-12-29 1.4 3.8 1.1 6.2
3 2019-12-27 0.0 -1.7 -4.6 2.6
4 2019-12-25 0.0 2.0 -2.7 6.6
train_y.head()
date mosquito_ratio
0 2019-12-31 5.5
1 2019-12-30 5.5
2 2019-12-29 5.5
3 2019-12-27 5.5
4 2019-12-25 5.5

baseLine

베이스라인 코드입니다.

Hide code cell source
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRFRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

def preprocessing(df):
    df['date'] = pd.to_datetime(df['date'])
    df['year']= df['date'].dt.year
    df['month']= df['date'].dt.month
    df=df.drop(['date'],axis=1)
    return df

x = preprocessing(train_x)
y = train_y.drop(['date'],axis=1)

rf = RandomForestRegressor(random_state =12)
xg =XGBRFRegressor(random_state =12)
xtr ,xt, ytr,yt = train_test_split(x,y,test_size=0.3,random_state=24)

rf.fit(xtr,ytr.values.ravel())
xg.fit(xtr,ytr)
pred= rf.predict(xt)
predxg= xg.predict(xt)

Ans = 'randomforest r2 : '+str(r2_score(yt,pred))+' \nxgboost r2 : '+str(r2_score(yt,predxg))


subDF = preprocessing(test_x)

pred = (rf.predict(subDF) + xg.predict(subDF))/2
sub['mosquito_ratio'] = pred
sub.to_csv('submission.csv',index=False)
print(Ans)
randomforest r2 : 0.8477788464778293 
xgboost r2 : 0.8494664636000008

Tip

제출코드 결과확인

Hide code cell source
def FinalMseScore():
    import pandas as pd
    y_true = pd.read_csv("https://raw.githubusercontent.com/Datamanim/mosquito/main/result.csv")
    sub = pd.read_csv('./submission.csv')
    pred = sub.iloc[:,-1].values
    from sklearn.metrics import r2_score
    mse = r2_score(y_true['mosquito_ratio'],pred)    
    print('submission mse score : ',mse)
    return mse
final_mse = FinalMseScore()
submission mse score :  0.8800627717083699