# RESEARCH, TESTING & STATISTICS (Free Access)

Research process is divided in the following manner:

• Research Problem
• Meaning and Characteristics of a Research Problem
• Usually in a question form and should be
• clear
• specific
• interconnected
• substantively relevant
• Two or more variables
• Must show some relationship
• Can be tested by empirical methods
• Hypothesis
• Any testable proposition to a research problem
• Based on literature
• conceptual clarity
• must be testable
• should be economical and parsimonious
• should be related to existing theory or fact
• logical unity
• general inscape
• available scientific tools and techniques
• must be similar to other hypothesis
• 4 major types
• Universal – when the relationship holds true for all variable for any time and place
• Existential – relationship holds true for at least one case
• Based on goal and causation:
• Causal – causal influence in relationship
• Descriptive – shows some characteristic or goal for observation
• Other types are
• Simple – one or two variables
• Complex – more than two variables
• Research – derived from a theory, also known as working hypothesis
• (H0) Null – Denial of a relationship, also known as no effect, negative difference
• (H1) Statistical (alternate) – existence of a relationship, makes numerical expressions of null and research hypotheses, operational statement
• Variables
• Variables are the characteristics or conditions that are manipulated, controlled or observed by the experimenter
• Classification:
• Dependent – is the variable about which the experimenter makes a prediction
• Independent also known as stimulus variable – is the variable the experimenter manipulates, selects and measures for the purpose of producing observable change in dependent variable
• 2 types:
• Type E – directly manipulated
• Type S – manipulated through selection
• Qualitative
• categories that cannot be ordered in magnitude
• precise measurement in numerical terms is not possible
• Quantitative
• categories that can be ordered in magnitude
• Precise measurements in numerical form can be made
• 2 categories
• Continuous variable – categories that can be measured in any arbitrary degree of fineness or exactness
• E.g. marks in an exam, height
• Discrete also known as categorical variable – cannot be measured by fineness because clear gap exists
• E.g. gender, sex, educational level
• Moderator/Mediating variable
• Mediating – it is affected by the IV, it affects the DV, indirectly via the relationship between IV and DV.
• Moderating- it affects the DV but not through any influence of the IV
• Active/Attribute
• Active- the variable manipulated by the experimenter
• Attribute- not manipulated by experimenter
• Reliability [Charles Spearman]
• consistency of scores obtained by the same person when they are examined on a test multiple times
• error of measurement
• It is measured by understanding error scores
• measures of reliability make it possible to estimate what proportion of total variance is error variance (irrelevant conditions)
• Correlational coefficient
• degree of consistency or agreement between two sets of scores
• Types of Reliability
• Test-Retest Reliability
• same test, given twice (correlated)
• interval between the two is kept short
• usually a fortnight
• practice will cause error variance
• due to recall
• temporal stability coefficient
• good for speed and power tests
• Sources of error variance (SOEV)
• Time sampling: errors will occur due to time differences
• Alternate form Reliability
• Also known as parallel form/equivalent form/comparative form
• two different, yet similar tests are administered
• interval is important
• more time and different test reduces coaching or teaching effects
• coefficient of equivalence
• SOEV- immediate – content sampling- dependent on the closeness of the two tests
• SOEV long term – time sampling
• Split Half Reliability
• Also known as coefficients of internal consistency
• one test, divided into parts, usually odd and even, which are then correlated
• coefficient of only half the test
• the longer the test, the higher the reliability
• source of error variance – content sampling
• Spearman-Brown formula can test the coefficient
• Kuder Richardson and Coefficient Alpha
• Performance is rising in each item
• When homogenous items present in the test KR<Split half
• For yes-no items, 120 items is considered optimum
• for multiple choice use
• coefficient alpha (Cronbach alpha)
• difference of KR and SH may show in heterogeneity of tests
• KR underestimates coefficient source
• SOEV-content sampling
• Interscorers Reliability
• two or more scorers review the same test
• coefficients of the two are calculated
• useful for subjective tests
• Validity
• what the test is meant to measure and how well it does so
• Types:
• Content validity
• the measurement of whether the test content covers a behaviour that is to be measured
• objectives of the test need to be broad and well covered
• not good for aptitude and personality tasks
• intrinsic, relevance, circular and representativeness
• requires item and sampling validity
• Face validity
• the superficial appearance of what the test measures
• Criterion related validity
• effectiveness of the test in predicting an individual’s performance
• Concurrent – criterion data for measuring performance  are already available
• Predictive – criterion may not be presently available but will be available in the near future to make a comparison
• Empirical or Statistical- predicts future behaviour
• Predictive validity is lower than construct
• Construct Validity
• Also known as factorial or test validity
• It encompasses the entire test
• It is the extent to which a test can measure theoretical construct or trait
• 2 types
• Convergent
• should correlate with other related tests
• Divergent
• should not correlate with other unrelated tests
• Mixing the two creates a multi-trait-multimethod matrix
• Types of research designs
• Quantitative
• Hypothesis derived based on an existing theory,
• tested through data analysis
• Identify the cause-effect relationship
• Experimental: is lab based
• measures effects or results on the dependent variable by manipulating the independent variable
• pre-decided steps
• causality between independent variable and dependent variable
• 3 principles
• Replication
• helps revalidation
• identical procedures, place, irrespective of time
• avoids experimental error (based on faulty experimental design)
• Randomization
• it ensures independence of observation
• improves validity
• Local control- done in three ways:
• Grouping- refers to placing similar (homogenous) subjects into a group
• Blocking – creating different blocks for attainment of grouping
• Balancing- grouping and blocking should create designs that are balanced
• Non-experimental: No causation or effect but building relationships between many factors
• Methods of Collecting Data
• Survey: Questionnaires are sent to many people to gain information usually in a short space of time
• Diary Method: same questionnaires are sent out to same number of participants at different times
• Quantitative Data Analysis
• Advanced statistical techniques and softwares IBM SPSS/AMOS
• Causal Modeling: relations between given set of variables – helps to tests specific hypothesis
• Mediation: An outcome can be explained by the effect of a third factor known as mediator.
• Sobel’s test: compares difference of the outcome with and without mediators
• Moderation: shows the strength or direction of the relationship of the third variable with the dependent variable
• Qualitative
• Generate and analyse data which are not reducible to numbers
• Focus on meaning and interpretation
• Inductive – theory generating
• Sensitive to the context
• Recognize researcher’s perspective and subjectivity
• Data collection
• Interviews – interactions based on Q&A, structured or unstructured
• Focus Group – group discussion to get opinions
• Naturally occurring data – based on observing people in their natural day-today environment
• Observation – natural, to understand in an interrelated events
• Structured: Created by the researcher to fit a context
• Structured methods of data collection – open ended questionnaire
• Qualitative Data Analysis
• Narrative Analysis – understanding data from stories
• Discourse Analysis – is the understanding that different situations create different meaning
• Archival Research – Using past information such as written stories, past census, personal diaries etc.
• Ethological Research
• scientific and objective study of animal behaviour focused on behaviour under natural conditions and as an evolutionary adaptive traits
• Mixed Methodcombining both qualitative and quantitative methods
• Triangulation – multiple methods of data collection and analysis to arrive at conclusive results

Statistics

• Speed and Power tests – no perfect score
• Speed test
• Result is dependent on time
• low item difficulty
• no single trait reliability tasks (odd even, KR)
• Power test
• time limited
• steeply graded difficulty (from easy to hard, usually)
• may include items that are too difficult
• (Study Tip: Speed = how quickly, infinite time, Power= how many, finite time)
• Classical test theory
• any observed score is equal to the score (T) plus error score
• errors of measurement are random
• errors of measurement cannot be correlated with other scores
• Item analysis
• A net of procedures, that is applied to know the indices of truthfulness (validity) of the items
• Item analysis shows
• which items are difficult, easy or moderate (index of difficulty)
• ability of the item to discriminate between inferior and superior
• Indicates how well multiple choices create distractions
• This can be done via:
• Structural Equation Modeling
• Hypothesized casual relations
• Item difficulty
• the method to differentiate the correct answer from the incorrect answer
• Value is discerned from the percentage of persons who answer correctly
• Maximum number of discrimination is 50X50=2500, occurs when ID is at 50%
• Must have normal distribution of difficulty
• Variance should be 0.25, Standard Deviation 0.5
• Power test item difficulty
• Power test
• do not have a set time limit
• arranged from easy to hard questions
• Item difficulty
• the difficulty value is discerned by the percentage of individuals who answer the item correctly
• maximum number of discrimination is 50 x 50 = 2500
• this occurs when independent variable is 50%
• must have a normal distribution of difficulty
• Empirical method
• P = R/N
• P = index of difficulty
• R = number of correct responses
• N = total number
• For speed test
• P=R/N, where R is number of people with similar attempts
• Method of judgement
• judgement by experts
• Index of Discrimination also known as Item validity index
• ability of the item to divide between superiors and inferiors
• positively discriminating (correct answers higher in upper group)
• negatively discriminating (correct answer is lower in upper group)
• non-discriminating (equal in both groups)
• these items are usually dropped
• 2 methods of calculating Index of discrimination
• A test of significance of difference between two percentages/propositions
• top 27% and bottom 27%, N = 370 (normal curve)
• using critical ratio
• Guilford suggests using chi square when there are an equal number of people in each group
• Marshall and Hales
• Net D Index of Discrimination
• Correlational Techniques
• each item is validated against internal criteria of total score called item total correlation
• closer relationship suggests better discrimination
• product moment, biserial, point biserial, tetrachoric, and phi coefficients are employed
• Item Characteristics Curve
• the graphic representation of the probability of giving the correct answer to an item as a function of the level of attribute assessed by the test
• used to illustrate discriminator power and item difficulty
• slope = discrimination
• position = difficulty
• Item Response Theory
• Latent trait theory
• Item characteristic curve
• Each item on a test has an independent item characteristic curve that describes the probability of getting each item right or wrong, given the certain level of the examinee
• IRT > CTT
• It can help in making predictions

Descriptive Stats

• Central tendency –  measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data
• The mean – It can be used with both discrete and continuous data, although its use is most often with continuous data – Average
• Median – is the middle score for a set of data that has been arranged in order of magnitude
• The median is less affected by outliers and skewed data.
• The mode is the most frequent score in our data set.
• Normal distribution: The normal distribution describes how the values of a variable are distributed.
• It is a symmetric distribution where most of the observations cluster around the central peak and the probabilities for values further away from the mean taper off equally in both directions.
• An assessment of the normality of data is a prerequisite for many statistical tests because normal data is an underlying assumption in parametric testing.
• The main tests for the assessment of normality are Kolmogorov-Smirnov and Shapiro-Wilk test
•  Kolmogorov-Smirnov test : This test is used as a test of goodness of fit and is ideal when the size of the sample is larger  (above 2000).
• Shapiro-Wilk test : for smaller sample sizes less than 2000
• If p < 0.05, we don’t believe that our variable follows a normal distribution in our population. (95 percent confidence interval)

Inferential Stats

• Difference between two independent means
• The comparison of two independent population means is very common and provides a way to test the hypothesis that the two groups differ from each other.
• The Independent Samples t-Test is a statistical test used to determine if 2 groups are significantly different from each other on your variable of interest.
• The assumptions for the Independent Samples t-Test include:
• Continuous Data
• Normally Distributed
• Randomness ensured
• At least 5 data
• Interpretation
• When we run the analysis, we get a t-statistic and a p-value.
• The t-statistic is a measures group differences
• A p-value less than or equal to 0.05 means that our result is statistically significant, and we can trust that the difference is not due to chance alone.
• Mann Whitney U Test
• The Mann-Whitney U test is used to compare whether there is a difference in the dependent variable for two independent groups. (Non parametric)
• Paired sample t test
• The dependent t-test (called the paired-samples t-test in SPSS Statistics) compares the means between two related groups on the same continuous, dependent variable.
• Assumptions
• Measured on a continuous scale (i.e., it is measured at the interval or ratio level
• Your independent variable should consist of two categorical, “related groups” or “matched pairs”.
• There should be no significant outliers in the differences between the two related groups.
• Normally distributed
• The Wilcoxon signed-rank test is the nonparametric test equivalent to the dependent t-test.
• As the Wilcoxon signed-rank test does not assume normality in the data, it can be used when this assumption has been violated and the use of the dependent t-test is inappropriate.
• Other Statistical Techniques
• Chi Square (Helmert; Karl Pearson)
• Often used for goodness of fit
• Is a test of significance
• (x2) – used when data is in frequency or percentage is discrete, in categories, data is nonparametric, or to test the goodness of fit
• For one variable – (x2) distribution can be used to determine how well the experimentally obtained results fit the results expected theoretically
• Degrees of Freedom refers to the maximum number of logically independent values, which are values that have the freedom to vary, in the data sample.
• 3, 8, 5, and 4 are four of the five numbers in the set and the average of the entire data sample is revealed to be 6
• The next number can only be 10
• Df = (N-1), therefore Df = 4
• (Except chi square)
• Degrees of freedom for Chi Square = (r – 1)(c – 1)
• r = number of rows in the contingency table
• c = number of columns in the contingency table
• Same procedure for two independent variables
• First null hypothesis testing by formula then df by critical value
• Contingency Coefficient- the measure of the correlation between two variables with each going into two or more variables
• 2 x 2 tables with 1 df = Yates correction
• Correlation or dependence is any statistical relationship
• Positive – moving in same direction
• Negative – moving in different direction
• Score is called correlation coefficient
• Represented by r
• CC is always between – 1 and +1
• -1 ———– 0 ————– +1
• Methods of Correlation
• (Point)Biserial correlation – used when finding results between a continuous variable and an artificially created dichotomy variable
• Dichotomous means a variable that is separated into two categories
• If dichotomy is natural, then point biserial should be used
• Natural dichotomous variables can be divided into two categories only and not more, therefore biserial correlation can be used.
• Point biserial (rbis) is better than biserial (rpbis) because
• Rpbis does not assume normality
• can be used for regression
• easy and convenient
• the standard error can be determined
• Tetrachoric correlation
• when both variables are dichotomous and cannot be expressed in scores
• Artificial dichotomy
• Phi correlation
• when both variables are naturally dichotomous
• useful for item analysis for item-item correlation
• Partial and Multiple correlations
• Partial correlation
• helps to estimate the independent reliable relationship between any two variables by eliminating and ruling out any undesirable influence of a third additional variable by controlling them
• 2nd order or 3rd order partial correlation include controlling main extra variables
• Multiple correlations
• to assess the relationship between the dependent variable and many independent variables
• T Scores [William A. McCall]
• A t score in psychometric (psychological) testing is a specialized term that is not the same thing as a t score that you get from a t-test.
• T scores in t-tests can be positive or negative. T scores in psychometric testing are always positive, with a mean of 50.
• A difference of 10 (positive or negative) from the mean is a difference of one standard deviation.
• HOW TO CALCULATE T SCORE?
• Step 1: subtract personal score from mean
• Step 2: Divide it by SD (now called z score)
• Step 3: (Multiply z score X 10) + 50
• Step 4: T Score!
• Analysis of variance (ANOVA)
• One Way ANOVA
• to test the significance of difference between the means of 3 groups
• gives a composite score
• 2 types of variance
• Within group – the average variance of members of each group around their respective group means
• Between group – the variance of group means around the total or composite mean of all groups
• F ratio
• is the critical ratio for determining the significance of the difference between group mean at a given level of significance
• doesn’t tell which group is better, only that they are different
• Procedure:
• computation of the total sum of squares
• computation of between group sum of squares
• computation of within group sum of squares
• computation of F ratio group sum of squares
• use of t-test
• if required, when F is significant
• df= N – K
• (N = number of p in a sample; k is 1 per group)
• Two-way ANOVA
•
• used when there are two experimental variables
• For example Total variance of effectiveness of teaching methods and school would be calculated in the following manner
• variance due to methods alone (1st IV)
• variance due to school alone (2nd IV)
• residual variance called interaction variance (MS)
• Chance
• uncontrollable variance
• merits of the methods
• If null hypothesis is true, variance due to methods is not very different from interaction variance
• same for school (2nd IV)
• this is analysed by F ratio