This is an old revision of the document!

Week01 (March 5, 7)

ideas and concepts

Introduction to the class

Install R in your class computer
- Refer to this page

리서처는 젊은이와 (Younger Adults) 중년층의 (Older Adults) 개인간에는 생에 대한 만족도에 차이가 있을 것이라고 생각하고 이를 검증하는 테스트를 진행하려고 한다. 사전 리서치 결과 Life satisfaction을 측정하는 테스트가 있다는 것을 파악하고 이를 사용하기로 하였다.

대학생의 성적은 졸업 후 직장의 월급의 양(액수)와 관련이 있을 것이라고 생각하고 이를 증명하고자 한다.

Week 01 materials

Theories 참조
연구문제와 가설 참조
연구문제 (research question)
개념 (concept) vs 개념 (concept)
- concepts, ideas, etc.

Assignment

수업활동입니다. 자신의 학문적인 관심사를 미래의 캐리어와 관련지어 서술하세요. ("관심사 기술하기" 과제 페이지 가기).
1. 과제 제출은 한번만 허용됩니다. 충분한 에디팅을 거친 후에 작성완료하시기 바랍니다.
2. Safeassign이라는 프로그램이 이용되어 표절 검사가 있을 것입니다. 표절은 0점 처리됩니다.
Also, please read all the materials in the Week 2 before the class.
Read the article, 제3자 효과이론과 침묵의 나선이론 연계성 and summarize in your own ideas and terms. Submit as an assignment by the next Thursday class (과제 제출 페이지), “제3자 효과이론과 침묵의 나선이론 연계성 논문 정리” 항목
1. What are the research questions?
2. What are the hypotheses and what are the author's expectations?
3. What are the findings?

Week02: (March 12, 14)

1.1 맥락의 중요성 숙지할 것
차이와 관계성 (difference vs. association) 등등
아래 ideas and concepts section 참조.

ideas and concepts

학생 A는 성별에 따라 영화 취향이 다르다고 생각한다. 예를 들어, 남자는 대체적으로 액션 영화를 좋아하고 여자는 로맨스 영화를 좋아한다고 생각한다. 학생 A의 주장을 증명할 수 있는 방법을 찾아보자.

HOW?

요즘 남성 여성 구분 없이 자기관리를 위해 화장품 및 장신구 구매가 기하급수적으로 늘어나고 있다. 이에 들어가는 비용도 정말 만만치 않다. 그런데 A씨는 자신이 꾸미지 않아도 본인이 충분히 괜찮다고 하고 B씨는 자기관리를 하지 않으면 본인의 삶에 만족하지 못한다고 한다. 자기관리 비용에 따른 자기만족도 차이를 증명하기 위한 방법은?

ROSENBERG SELF-ESTEEM SCALE

연구문제와 가설 참조
위의 문서를 꼭 읽어야 합니다.
연구설계

연역 대. 귀납 (induction vs. deduction)
연구문제 (research question)
- Conceptualization
- Operationalization
가설 (hypothesis)
- 차이와 연관 가설 참조
- Difference
- Association
변인 (variable) 설명
- 속성 (attributes): 측정수준: 척도의 4가지 유형 참조 (p.78).
- 종류와 숫자 (교재 참조: 불연속 vs. 연속)
- NOIR (교재 참조: 명명, 서열, 간격, 비율척도)
Types of Variable 변인의 종류
- Dependent
- Independent
- Control
- Moderating (Intervening)
- eg.,
  - IV, DV: 부모의 교육수준 –> 자녀의 수능점수
  - moderating: 부모의 수입수준의 개입
  - control: 부모의 교육수준 (대학원 이상으로 콘트롤, 15년 이상)
  - intervening: 학생의 성별

How to use data to make a hit TV show

그렇다면 어떤 가설과 연구문제가 적절한가?
Sampling
- Sampling
- sampling frame
- ECOBS
- Probability sampling
- Non-probability sampling
- Sample frame
- Sample vs Population

Assignment 2

Week03 (March 19, 21)

ideas and concepts

For the lecture content

Assignment

Week04 (March 26, 28)

ideas and concepts

Ch. 5, 6, 7, 8

range 혹은 범위
interquartile range 혹은 사분위 범위
평균편차
변량 Variance 혹은 분산
- 모집단분산
- 표본분산 why n-1 degrees of freedom 혹은 df
- text 부분: 추정치로서의 평균과 변량(분산)
표준편차 Standard Deviation
계산공식 분산계산공식

optical_illusion <- c(1.73, 1.06, 2.03, 
1.40, .95, 1.13, 1.41, 1.73, 1.63, 1.56)

> mean(optical_illusion)
[1] 1.463

> var(optical_illusion)
[1] 0.1160678

> sqrt(var(optical_illusion))
[1] 0.3406872

> stem(optical_illusion)

  The decimal point is at the |

  0 | 9
  1 | 1144
  1 | 6677
  2 | 0

> median(optical_illusion)
[1] 1.485

> mode(optical_illusion)
[1] "numeric"

정상분포 (정규분포) Normal Distribution
표준점수 z score

정상 성인이 10초동안 두드리는 속도의 분포가 평균 59, 표준편차 7인 정상분포를 취한다고 한다. 이 때, 한 환자가 10초 동안 45번을 두드린다고 하는데, 이 환자는 정상인걸까 아니면 정상 성인이 아닌 것일까?

finger_tap<-rnorm(n=10000, m=59, sd=7) 
hist(finger_tap)

see normal distribution table
or use R (qnorm, pnorm)
.05 (5%)에 들때의 z score? = -1.645

qnorm(0.95)
[1] 1.644854

(혹은 .025(2.5%)일 경우 = -1.96 =about -2)

qnorm(0.975)
[1] 1.959964

그렇다면, 이 환자의 z score 는?
z-score = (45-59) / 7 = -2

z=-2 일때 분포확률 = 2.28% = 0.0228

> pnorm(-2)
[1] 0.02275013

How to look at “Normal” person? within .05? or .01? = 기각수준, 유의도수준

Please note that this is a hypothetical test for one individual (not a sample) against population.

가설검증 (Hypothesis testing)
- 영가설(null hypothesis)
- 연구가설, 대립가설 (research hypothesis, alternative hypothesis)
types of error
- type I error
- type II error

central limit theorem
- sampling distribution
- standard error

YSR (Youth Self-Report Inventory) 의 우울/불안 척도 ( $ \mu = 50, \sigma = 10 $ )
다섯 아이의 평균 = 56 일때, 이 아이들의 정상정도에 대한 가설 검증

절차를 옆사람에게 설명하시오.

> 10/sqrt(5)
[1] 4.47

Assignment

The more time people spend using social media, the less they read books.
Drinking energy drinks makes people more aggressive.
Taking a nap in the afternoon makes people more focused for the rest of the day.
Spending time with a family dog decreases the amount of stress someone is feeling.
Eating breakfast in the morning increases the ability to learn in school.

Week05 (April 2, 4)

ideas and concepts

우선 type I and type II error 다시 확인 types of error
z-test
t-test

Q. Alcohol이 임산부에게 미치는 영향
: Alcohol이 임산부에게 미치는 영향에 대해서 조사를 하는 연구자가, 임신 중의 alcohol 섭취가 태아의 몸무게에 미치는 영향에 대해서 관심을 가졌다. 이에 따라서 n = 16 의 랜덤 샘플 쥐가 구해졌다. 어미 쥐는 매일 일정량의 alcohol을 섭취하였다. 연구자는 이 쥐들의 새끼 중 하나씩을 선택해서 n = 16의 샘플을 취한 후 평균을 내 보았더니, $\overline{X}$ = 15 grams 이었다. 보통 쥐의 경우 평균 몸무게는 $\mu = 18$ 그램이고 $\sigma = 4$ 라는 것을 연구자는 알고 있다. 연구자는 alcohol의 영향력을 어떻게 테스트해야 할까?

T dist. table

> rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
> potato_sample <- rnorm2(25, 191,20)
> rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) }
> rat <- rnorm2(16, 15, 4)
> t.test(rat, mu=18, sd=4)

	One Sample t-test

data:  rat
t = -3, df = 15, p-value = 0.008973
alternative hypothesis: true mean is not equal to 18
95 percent confidence interval:
 12.86855 17.13145
sample estimates:
mean of x 
       15 

>

28명의 SAT score. reasonable guess의 효과
각 문항은 다섯개의 선택지가 존재한다고 할 때
학생들이 reasonable guess를 이용하여 답을
풀었을 때 과연 효과가 있다고 할 수 있을까?

58, 48, 48, 41, 34, 43, 38, 53, 41, 60, 55, 44, 43, 49, 47, 33, 47, 40, 46, 53, 40, 45, 39, 47, 50, 53, 46, 53

. . .

> sec12.9 <- c(58, 48, 48, 41, 34, 
43, 38, 53, 41, 60, 55, 44, 43, 49, 47, 
33, 47, 40, 46, 53, 40, 45, 39, 47, 50, 
53, 46, 53)

> mean(sec12.9)
[1] 46.21429

> sqrt(var(sec12.9))
[1] 6.729466

> length(sec12.9)
[1] 28

> t.test(sec12.9, mu=20)

	One Sample t-test

data:  sec12.9
t = 20.6128, df = 27, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
 43.60487 48.82370
sample estimates:
mean of x 
 46.21429 


> num <- mean(sec12.9)-20
> # num = difference
> denum <- sqrt(var(sec12.9))/sqrt(length(sec12.9))
> # denum <- std error 
> tvalue <- num/denum
> tvalue
[1] 20.61277

t test summary

차이(difference)와 연관(association)의 가설 중 차이의 가설에서
독립변인(independent variable)의 attributes가 2개의 종류일 때 t-test를 한다.
- remind: see hypothesis, types of variable, level of measurement
차이를 알아보는 상황을 정리해 보면 (두 개의 그룹 간) see t-test
- Population vs. sample의 차이
  - population with known $\mu$ and $\sigma$
  - population with known $\mu$, but unknown $\sigma$
- two samples 간의 차이
  - 두 그룹 간의 비교
    - 남/녀 간의 게임적응 능력 차이
- one sample 의 시간을 둔 차이
  - 약을 먹고 나타나는 효과

Chapter 1. should be familiarized.
Chapter 2.

?trees

will explain what the data set is.

Description

This data set provides measurements of the girth, 
height and volume of timber in 31 felled black 
cherry trees. Note that girth is the diameter of 
the tree (in inches) measured at 4 ft 6 in above 
the ground.

Usage

trees
Format

A data frame with 31 observations on 3 variables.

[,1]	Girth	 numeric	 Tree diameter in inches
[,2]	Height	 numeric	 Height in ft
[,3]	Volume	 numeric	 Volume of timber in cubic ft

평균

mean(trees$Volume)

분산

var(trees$Volume)

분산 s²은 자료의 제곱합을 n이 아닌 n-1로 나누어 구하는데, 그 이유는 수학적으로 n-1로 나눈 s²의 기대값이 모분산 $ \sigma^{2} $ 과 일치하기 때문이다¹⁾ 따라서 조사대상이 모집단일 경우 모분산을 구하려면 분산값에 (n-1)/n을 곱해준다.

attach(trees)
n <- length(Volume)
var_as_population <- var(Volume) * (n-1) / n 
var_as_population

Standard Deviation

sd(Volume)

or

sqrt(var(Volume))

Standard Error

. . . . 수학적으로 표준편차는 표준오차보다 $ \sqrt{n} $ 배만큼 크다.

attach(trees)
n <- length(Volume)
se_value <- sd(Volume)/sqrt(n)
se_value

중위수, 사분위수, boxplot

fivenum(Volume)

quantile(Volume)
  0%  25%  50%  75% 100% 
10.2 19.4 24.2 37.3 77.0

IQR = 75% value - 15% value

IQR(Volume)

Boxplot

boxplot(Volume, col="red")

colors()

histogram

hist(Volume, probability=T)  # histogram
lines(density(Volume), col="blue") # distribution curve

stem(Volume)

  The decimal point is 1 digit(s) to the right of the |

  1 | 00066899
  2 | 00111234567
  3 | 24568
  4 | 3
  5 | 12568
  6 | 
  7 | 7

qqnorm(Volume)
qqline(Volume, col="red")

QQplot에서 직선은 정확한 정규분포 수식에서 나오는 값인데, 관찰값인 점들이 이 직선에서 크게 벗어나지 않으면 Volume이 정규분포를 따른다고 할 수 있다.

이를 대강 살펴보는 것이 qqnorm 펑션의 역할이다.

> x <- rnorm(n=31, 0, 1)
> qqnorm(x)
> qqline(x)

Assignment

Quiz 1 on the next Wednesday

Week06 (April 9, 11)

ideas and concepts

중간고사 기간 중 퀴즈
이후 2 주 후 퀴즈 범위는 양분, 증가

Confidence Interval in t-test and confidence interval

ANOVA
Repeated Measure ANOVA
Factorial ANOVA

Assignment

Week07 (April 16, 18)

시험범위

Mid term 범위:

Ch. 1, 2, 3, 4, 5, 6, 7, 8
+ z-test

Mid term 이후 시험:

Ch. 5, 6, 7, 8 +
Ch. 12, 13, 14, 15 (효과크기 15.3 이후 제외), 16

ideas and concepts

Lecture content

Assignment

Week08 (April 23, 25)

Mid-term period

Week09 (April 30, May 2)

ideas and concepts

Factorial ANOVA
~~Repeated Measure ANOVA~~ – in a future week

Assignment

Week10 (May 7, 9)

ideas and concepts

Children's day
Budah Birthday

Assignment

Week11 (May 14, 16)

ideas and concepts

Correlation
Regression

Assignment

Week12 (May 21, 23)

ideas and concepts

correlation
Regression

Variance = SS_total / df
SS_tot = sum of error squared predicted by mean alone
SS_residual
- Regression line
- a and b in $ \hat{Y} = a + b X $
  - $b = {SP} / {SS_{X}}$
  - $a = \overline{Y} - b {\overline{X}}$
- error squared predicted by regression line
SS_regression = error squared overcome by regression line
SS_tot = SS_regression + SS_residual
If SS_regression is big enough, we can say
- X's contribution to explain y's variation is significant
- How to determine that? → F test
$\text{F test} = MS_{\text{regression}} / MS_{\text{total}} $
- with $\text{df}_{\text{regression}} = k - 1$ ; and
- $\text{df}_{\text{total}} = n - 1$
$\text{R}^{\text{2}} = \text{SS}_{\text{reg}} / \text{SS}_{\text{tot}}$
will be clear with multiple regression
- degrees of freedom을 고려한 R² = adjusted R²
  - addition of IVs will always increase R².
  - should be penalized (or adjusted)
  - so, when R² = 1 - (SS_res/SS_tot), use
    - SS_res → SS_res/df_res
    - df_res = n - p - 1
    - p = number of IVs
    - if p increases, the calculated value will be decreased, which will give you adjusted R² value.
    - SS_tot → SS_total/df_tot
    - df_tot = n - 1
- meaning of t test for slope b
  - Suppose that in $ \hat{Y} = a + b_{1} X_{1} + b_{2} X_{2} $, Xs are not correlated to each other, and X is not contributing anything to Y's variance,
  - we can say that b = 0.
  - This is a null hypothesis for testing b
  - Actual test for determining the contribution of bs is t-test
    - t = b1 - b / SE_b
    - $\displaystyle \text{SE}_{\text{b}} = \frac{s_{\text{est}}}{\sqrt{SSX}}$

Multiple Regression
Sequential Regression
Using Dummy Variables

Assignment

Week13 (May 28, 30)

ideas and concepts

Assignment

Week14 (June 4, 6)

6일 현충일
In continuation with ANOVA, Factorial ANOVA
Repeated Measures ANOVA
post hoc test
Effect size for ANOVA

Quiz:

t-test
F-test

기본적으로 위를 포함하지만, 위를 이해하기 위해서는

standard deviation
variance
central limit theorem
- sampling distribution
- standard error
hypothesis testing
z-test
types of error
variable
types of variable 등등을 이해해야 합니다.

또한 위를 포함하는 교재의 범위는

Ch 12: 신뢰한계에 대해서는 수업중에 다루지 않았으므로 제외합니다. 단, 마지막 퀴즈에서는 다루겠습니다.
Ch 13, Ch 14
~~Ch 15:~~
Ch 16:
- 단일하지 않은 표본크기 포함
- 다중비교(post hoc 혹은 multiple comparison techniques) 포함 (단 퀴즈에서 수학적인 것은 다루지 않습니다).
- 효과크기 중 에타제곱에 해당하는 부분만 포함
- 결과보고하기 포함 (다루지 않았으나 숙지하시기 바랍니다)
Ch 17 (factorial)
- 효과크기 중 에타제곱이 아닌 부분은 (r-가족, 오메가 제곱 등) 제외
- 17.7 제외
- 17.8, 17.9 포함
Ch 18 (repeated measures anova)

Week15 (June 11, 13)

13일, 지방선거일
week15

Quiz: 지난 번 범위 + regression 부분 일체

t-test
F-test
regression
- regression
- multiple regression
  - 무엇부터_라는_문제와 determining_ivs_role 부분 포함.
- using dummy variables: 기본적인 논리를 중심으로 이해하세요.

기본적으로 위를 포함하지만, 위를 이해하기 위해서는

standard deviation
variance
central limit theorem
- sampling distribution
- standard error
hypothesis testing
z-test
types of error
variable
types of variable 등등을 이해해야 합니다.

또한 위를 포함하는 교재의 범위는

Ch 12: 신뢰한계에 대해서는 수업중에 다루지 않았으므로 마지막 퀴즈에서도 제외합니다.
Ch 13, Ch 14
~~Ch 15:~~
Ch 16:
- 단일하지 않은 표본크기 포함
- 다중비교(post hoc 혹은 multiple comparison techniques) 포함 (단 퀴즈에서 수학적인 것은 다루지 않습니다).
- 효과크기 중 에타제곱에 해당하는 부분만 포함 (에타제곱, 파샬에타제곱, ~~오메가~~)
- 결과보고하기 포함 (다루지 않았으나 숙지하시기 바랍니다)
Ch 17 (factorial)
- 효과크기 중 에타제곱이 아닌 부분은 (r-가족, 오메가 제곱 등) 제외
- 17.7 제외
- 17.8, 17.9 포함
Ch 18 (repeated measures anova)

Week16 (June 18, 20)

Final-term

¹⁾

why n-1 참조

Table of Contents

Week01 (March 5, 7)

ideas and concepts

Assignment

Week02: (March 12, 14)

ideas and concepts

Assignment 2

Week03 (March 19, 21)

ideas and concepts

Assignment

Week04 (March 26, 28)

ideas and concepts

Assignment

Week05 (April 2, 4)

ideas and concepts

Assignment

Week06 (April 9, 11)

ideas and concepts

Assignment

Week07 (April 16, 18)

시험범위

ideas and concepts

Assignment

Week08 (April 23, 25)

Week09 (April 30, May 2)

ideas and concepts

Assignment

Week10 (May 7, 9)

ideas and concepts

Assignment

Week11 (May 14, 16)

ideas and concepts

Assignment

Week12 (May 21, 23)

ideas and concepts

Assignment

Week13 (May 28, 30)

ideas and concepts

Assignment

Week14 (June 4, 6)

Week15 (June 11, 13)

Week16 (June 18, 20)