모의고사 1회차

모의고사 1회차#

광고 한번 눌러주시겠습니까

캐글 링크

데이터셋 링크
문제풀이 유튜브 링크

작업 1유형#

Attention

데이터 출처 : https://archive.ics.uci.edu/ml/datasets/Bank+Marketing (후처리 작업)
데이터 설명 : 은행의 전화 마케팅에 대해 고객의 반응 여부
dataurl : https://raw.githubusercontent.com/Datamanim/datarepo/main/bank/train.csv

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/bank/train.csv')

df.head()

	ID	age	job	marital	education	default	balance	housing	loan	contact	day	month	campaign	pdays	poutcome	y
0	13829	29	technician	single	tertiary	no	18254	no	no	cellular	11	may	2	-1	unknown	no
1	22677	26	services	single	secondary	no	512	yes	yes	unknown	5	jun	3	-1	unknown	no
2	10541	30	management	single	secondary	no	135	no	no	cellular	14	aug	2	-1	unknown	no
3	13689	41	technician	married	unknown	no	30	yes	no	cellular	10	jul	1	-1	unknown	no
4	11304	27	admin.	single	secondary	no	321	no	yes	unknown	2	sep	1	-1	unknown	no

Question1

마케팅 응답 고객들의 나이를 10살 단위로 변환 했을 때, 가장 많은 인원을 가진 나이대는? (0~9 : 0 , 10~19 : 10)

Question2

마케팅 응답 고객들의 나이를 10살 단위로 변환 했을 때, 가장 많은 나이대 구간의 인원은 몇명인가?

Question3

나이가 25살 이상 29살 미만인 응답 고객들중 housing컬럼의 값이 yes인 고객의 수는?

Question4

numeric한 값을 가지지 않은 컬럼들중 unique한 값을 가장 많이 가지는 컬럼은?

pd.DataFrame(lst).sort_values(1,ascending=False)

	0	1
0	job	12
7	month	12
2	education	4
8	poutcome	4
1	marital	3
6	contact	3
3	default	2
4	housing	2
5	loan	2
9	y	2

['job' 'month']

Question5

balance 컬럼값들의 평균값 이상을 가지는 데이터를 ID값을 기준으로 내림차순 정렬했을때 상위 100개 데이터의 balance값의 평균은?

3473.73

Question6

가장 많은 광고를 집행했던 날짜는 언제인가? (데이터 그대로 일(숫자),달(영문)으로 표기)

(15, 'may')

Question7

데이터의 job이 unknown 상태인 고객들의 age 컬럼 값의 정규성을 검정하고자 한다. 샤피로 검정의 p-value값을 구하여라

0.1961131989955902

Question8

age와 balance의 상관계수를 구하여라

0.10198734763981472

Question9

y 변수와 education 변수는 독립인지 카이제곱검정을 통해 확인하려한다. p-value값을 출력하라

df

	ID	age	job	marital	education	default	balance	housing	loan	contact	day	month	campaign	pdays	previous	poutcome	y
0	13829	29	technician	single	tertiary	no	18254	no	no	cellular	11	may	2	-1	0	unknown	no
1	22677	26	services	single	secondary	no	512	yes	yes	unknown	5	jun	3	-1	0	unknown	no
2	10541	30	management	single	secondary	no	135	no	no	cellular	14	aug	2	-1	0	unknown	no
3	13689	41	technician	married	unknown	no	30	yes	no	cellular	10	jul	1	-1	0	unknown	no
4	11304	27	admin.	single	secondary	no	321	no	yes	unknown	2	sep	1	-1	0	unknown	no
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
12865	14023	47	technician	married	secondary	no	1167	yes	no	cellular	30	apr	1	87	5	failure	yes
12866	17259	31	unknown	married	secondary	no	111	no	no	cellular	21	nov	2	93	2	failure	yes
12867	15200	37	unemployed	single	tertiary	no	1316	yes	no	cellular	18	nov	1	172	2	failure	no
12868	13775	42	management	married	tertiary	no	479	yes	no	unknown	28	may	2	-1	0	unknown	no
12869	20137	24	services	single	secondary	no	0	no	no	unknown	16	may	1	-1	0	unknown	no

12870 rows × 17 columns

education	primary	secondary	tertiary	unknown
y
no	1424	4555	2559	365
yes	456	1813	1516	182

7.901201277473551e-29

Question10

각 job에 따라 divorced/married 인원의 비율을 확인 했을 때 그 값이 가장 높은 값은?

0.2831050228310502

작업 2유형#

Attention

데이터 출처 : https://archive.ics.uci.edu/ml/datasets/Bank+Marketing (후처리 작업)
데이터 설명 : 은행의 전화 마케팅에 대해 고객의 반응 여부
train : https://raw.githubusercontent.com/Datamanim/datarepo/main/bank/train.csv
test : https://raw.githubusercontent.com/Datamanim/datarepo/main/bank/test.csv
submission : https://raw.githubusercontent.com/Datamanim/datarepo/main/bank/submission.csv

import pandas as pd
train= pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/bank/train.csv')
test= pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/bank/test.csv')
submission= pd.read_csv('https://raw.githubusercontent.com/Datamanim/datarepo/main/bank/submission.csv')

display(train.head())
display(test.head())
display(submission.head())

	ID	age	job	marital	education	default	balance	housing	loan	contact	day	month	campaign	pdays	poutcome	y
0	13829	29	technician	single	tertiary	no	18254	no	no	cellular	11	may	2	-1	unknown	no
1	22677	26	services	single	secondary	no	512	yes	yes	unknown	5	jun	3	-1	unknown	no
2	10541	30	management	single	secondary	no	135	no	no	cellular	14	aug	2	-1	unknown	no
3	13689	41	technician	married	unknown	no	30	yes	no	cellular	10	jul	1	-1	unknown	no
4	11304	27	admin.	single	secondary	no	321	no	yes	unknown	2	sep	1	-1	unknown	no

	ID	age	job	marital	education	default	balance	housing	loan	contact	day	month	campaign	pdays	previous	poutcome
0	53608	32	management	single	tertiary	no	12569	no	no	cellular	1	jul	2	295	2	success
1	51055	25	services	single	secondary	no	801	no	no	cellular	5	jun	2	-1	0	unknown
2	52573	46	blue-collar	married	secondary	no	1728	yes	no	unknown	26	may	2	-1	0	unknown
3	50458	39	management	divorced	secondary	no	51	no	no	unknown	17	jun	2	-1	0	unknown
4	52272	31	services	single	tertiary	no	1626	no	no	unknown	31	jul	1	-1	0	unknown

	ID	predict
0	53608	0.0
1	51055	0.0
2	52573	0.0
3	50458	0.0
4	52272	0.0

간단한 모델링 작업

모델링 및 submission파일 생성까지

test roc score :  0.77612408703591
submission file

	ID	predict
0	53608	0.73
1	51055	0.80
2	52573	0.01
3	50458	0.23
4	52272	0.33

모의고사 1회차

Contents

모의고사 1회차#

작업 1유형#

작업 2유형#