Daphne Academy
Daphne Academy is an intelligent tutoring system capable of adapting its content to a user’s array of ability levels (where each ability level corresponds to some learning topic). With the partnership of the D.O.D, Daphne Academy has developed into a mature prototype ready for more rigorous testing and, soon enough, final deployment. Guest account credentials for testing can be found below (as available).
Location: https://academy.selva-research.com
Guest Accounts
None available…
Methods
Item Response Theory
IRT Parameter Estimation¶
Deps
- numpy
- scipy
- matplotlib
These can be installed with your local python package manager
pip3 install numpy scipy matplotlib
If you see any import errors, change the path in the function below to point to the incorrectly imported module and run
import sys
# sys.path.insert(1, '/path/to/repo/cost-estimator-ca')
Item Characteristic Curve¶
Here, a test-item is modeled with a three parameter logistic model. The test item’s parameters are the following:
- a = 10 (discrimination – ability of question to split population accurately)
- b = 0.6 (difficulty – point at which persons with that ability level socre in the top 50th percentile)
- c = 0.25 (guessing parameter – probability that low-ability people can get the item correct)
The characteristics of the MC question are the following:
- has four equally interesting choices
- is slightly harder than average
- discriminates between novices and experts very well
The 3PL model allows us to determine the likelihood that a user at any given ability level will answer the question correctly.
$ p(\theta) = c + (1 – c)\frac{\exp[a(\theta – b)]}{1 + \exp[a(\theta – b)]}$
where
- $a$: discrimination parameter
- $b$: difficulty parameter
- $c$: guessing parameter
- $\theta$: user skill level
- $p$: Probability user answers correctly
import sys
from irt_estimation.IIIPL import IIIPL
import numpy as np
import matplotlib.pyplot as plt
# Define test-item parameters
a = 10
b = 0.6
c = 0.25
# Instantiate 3PL model for test-item
item_model = IIIPL(a, b, c)
# Sample a uniform distribution of ability parameter vaules [0, 1] to draw the Item Characteristic Curve
theta = np.arange(0, 1, 0.01)
prob_correct = item_model.prob_correct_arr(theta)
plt.plot(theta, prob_correct)
plt.xlabel("user ability")
plt.ylabel("probability of correctly answering")
_ = plt.title(f"Item Characteristic Curve a={a}, b={b}, c={c}")
Estimating an Examinee’s Ability¶
To estimate an examinee’s unknown ability parameter, it will be assumed that the numerical values of the parameters of the test items are known. Additionally, the initial guess of the examinee’s ability parameter will be some a priori value (e.g. $\theta = 1$).
Two methods for estimating an examinee’s ability will be demonstrated
- Maximum Likelihood Estimation
- Maximum a Posteriori Estimation
Maximum Likelihood Estimation¶
Consider a K item test where the items are scored $u_j=0$ or $u_j=1$ depending if the answer is incorrect or correct for $j=1,…,K$. Assuming local independence of test items in the response vector $u=(u_1,…,u_K)$, the maximum likelihood estimator for an examinee’s ability parameter $\theta$ is found with:
max: $L(\theta|u) = p(u|\theta) = \prod_{j = 1}^{K} p_j(\theta)^{u_j}[1-p_j(\theta)]^{1-u_j}$
from irt_estimation.IIIPL import MLE_Ability_Estimator
import matplotlib.pyplot as plt
# Exam results
answers = [
(1, IIIPL(10, .5, .25)),
(0, IIIPL(4, .4, .25)),
(1, IIIPL(6, .8, .25)),
(1, IIIPL(6, .9, .25)),
(1, IIIPL(3, .2, .25)),
(0, IIIPL(6, .4, .25)),
(1, IIIPL(4, .3, .25)),
(1, IIIPL(7, .5, .25)),
(1, IIIPL(7, .7, .25)),
(1, IIIPL(1.3, .5, .25))
]
# Instantiate estimator
estimator = MLE_Ability_Estimator(answers)
# Estimate the user's ability parameter by maximizing the likelihood function
estimate = estimator.estimate().x
print('--> ESTIMATE', estimate)
# Sample a uniform distribution of ability parameter vaules [0, 1] to draw Likelihood function
theta = np.arange(0, 1, 0.01)
likelihoods = estimator.likelihood_arr(theta)
plt.plot(theta, likelihoods)
plt.axvline(x=estimate, color='r', linestyle='dashed', linewidth=2, label='Estimate')
plt.legend()
plt.xlabel("user ability")
plt.ylabel("likelihood given exam results")
_ = plt.title(f"ML Estimation")
--> ESTIMATE 0.7338761709505345
Maximum a Posteriori Estimation¶
This estimation method is identical to the previous one except the likelihood function is multiplied by the ability paremeter’s distribution. The equation for the posterior distribution of $\theta$ is found according to Bayes’ theorem:
- $p(\theta|u) = \frac{p(u|\theta)p(\theta)}{p(u)}$
maximize: $p(\theta|u) = [\prod_{j = 1}^{K} p_j(\theta)^{u_j}[1-p_j(\theta)]^{1-u_j}] * p(\theta)$ over $\theta \in (0, 1)$ where
- $u$: Data on user answered questions: $u_j=0$ or $u_j=1$ depending if the question was answered correctly for $j=1,…,K$
- $p_j(\theta)$: Probability user answered question $j$ correctly
Because the denominator is not a function of $\theta$, it has no bearing on the optimization and hence can be ignored. Is is sufficient to find the value of $\theta$ that maximizes numerator, giving:
- max $p(u|\theta)p(\theta)$
from irt_estimation.IIIPL import MAP_Ability_Estimator
from scipy.stats import norm, uniform
# Exam results (same as above)
answers = [
(1, IIIPL(10, .5, .25)),
(0, IIIPL(4, .4, .25)),
(1, IIIPL(6, .8, .25)),
(1, IIIPL(6, .9, .25)),
(1, IIIPL(3, .2, .25)),
(0, IIIPL(6, .4, .25)),
(1, IIIPL(4, .3, .25)),
(1, IIIPL(7, .5, .25)),
(1, IIIPL(7, .7, .25)),
(1, IIIPL(1.3, .5, .25))
]
# Define ability parameter prior density
mean = 0.5
sd = 0.2
dist = norm(mean, sd)
# dist = uniform()
# Instantiate estimator, set prior distribution
estimator = MAP_Ability_Estimator(answers, theta_dist=dist)
# Estimate the user's ability parameter by maximizing the MAP function
estimate = estimator.estimate().x
print('--> ESTIMATE', estimate)
# Sample a uniform distribution of ability parameter vaules [0, 1] to draw MAP function
theta = np.arange(0, 1, 0.01)
probabilities = estimator.likelihood_arr(theta)
plt.plot(theta, probabilities)
plt.axvline(x=estimate, color='r', linestyle='dashed', linewidth=2, label='Estimate')
plt.legend()
plt.xlabel("user ability")
plt.ylabel("a posteriori probability given exam results")
_ = plt.title(f"MAP Estimation")
--> ESTIMATE 0.6160507721906785
Estimation vs Reality¶
To show how our ML estimation of $\theta$ compares with reality, an exam consisting of 10 MC questions is simulated 100 times for a user with ability $\theta = 0.6$. For each of the 10 questions in a given simulation, we sample the ICC to generate the probability the user will answer the question correctly and draw a random variable to decide whether the user gets the question right. Then, the item response vector for a given simulation is passed to the ML estimator to produce an estimate of the user’s ability parameter $\theta$. This procedure is repeated 100 times, and a historgram plotting the $\theta$ estimates is displayed along with additional metrics.
- It can be seen that increasing the number of exam questions reduces the standard deviation of the point estimates.
from irt_estimation.IIIPL import MLE_Ability_Estimator, IIIPL
import numpy as np
import random
import matplotlib.pyplot as plt
# Simulates an examination and estimates ability
def run_simulation(true_theta, questions):
probs = [q.prob_correct(true_theta) for q in questions]
results = np.random.rand(len(questions)) < probs
answers = [int(i) for i in results]
if all(i == 1 for i in answers) or all(i == 0 for i in answers):
return None
item_response_vector = []
for idx, question in enumerate(questions):
item_response_vector.append((answers[idx], question))
return MLE_Ability_Estimator(item_response_vector).estimate().x
# Define the examinee's true ability parameter value
true_theta = 0.6
num_questions = 3 # x10
num_sims = 100
# Define exam questions
questions = [
IIIPL(10, .6, .25),
IIIPL(5, .3, .25),
IIIPL(8, .8, .25),
IIIPL(2, .4, .25),
IIIPL(3, .65, .25),
IIIPL(4, .45, .25),
IIIPL(6, .7, .25),
IIIPL(4, .3, .25),
IIIPL(7, .25, .25),
IIIPL(8, .9, .25)
] * num_questions
# Run simulation and record estimates
estimates = []
for x in range(num_sims):
estimate = run_simulation(true_theta, questions)
if estimate:
estimates.append(estimate)
print('------> NUM SAMPLES:', num_sims)
print('----> TRUE ABILILTY:', round(true_theta, 5))
print('----> MEAN ESTIMATE:', round(np.mean(estimates), 5))
print('--> ESTIMATES STDEV:', round(np.std(estimates), 5))
print('----> ESTIMATES VAR:', round(np.var(estimates), 5))
# Plot histogram of estimates
bins = np.arange(0, 1.1, 0.1)
plt.hist(estimates, bins=bins)
plt.axvline(x=true_theta, color='r', linestyle='dashed', linewidth=2, label='True Ability')
plt.axvline(x=np.mean(estimates), color='g', linestyle='dashed', linewidth=2, label='Mean Estimate')
plt.legend()
plt.xlabel('Estimated User Ability')
plt.ylabel('Number of Occurrences')
_ = plt.title(f"Histogram of Ability Parameter Estimates: questions={len(questions)}")
------> NUM SAMPLES: 100 ----> TRUE ABILILTY: 0.6 ----> MEAN ESTIMATE: 0.60393 --> ESTIMATES STDEV: 0.09845 ----> ESTIMATES VAR: 0.00969
Adaptive Testing Methodology
import sys
# sys.path.insert(1, '/path/to/repo/cost-estimator-ca')
from irt_estimation.IIIPL import IIIPL, MLE_Ability_Estimator, MAP_Ability_Estimator, calculate_contribution
from scipy.stats import norm, uniform
import numpy as np
import matplotlib.pyplot as plt
from copy import deepcopy
Section 1: Student Ability Estimation¶
Scenario¶
- A new student has logged into Daphne Academy and is tasked with completing the
Lifecycle Cost
learning module. As the student is brand new, Daphne Academy has yet to estimate the student’s ability level in theLifecycle Cost
topic. - Once the user reaches the end of the
Lifecycle Cost
learning module, the user is assigned a short closing examination to get a baseline estimate for the user’s ability.
Step 1¶
- define test items for
Lifecycle Cost
module exam - create item response vector modeling user answers (0 if incorrect, 1 if correct)
# --> 1. Each of the test items are modeled using the three parameter logistic model
question_1 = IIIPL(4, .3, .5)
question_2 = IIIPL(4, .45, .25)
question_3 = IIIPL(4, .4, .5)
question_4 = IIIPL(5, .5, .25)
question_5 = IIIPL(5, .5, .5)
# --> 2. Say the user answered the first three questions correctly (1), and the last two incorrectly (0)
# we can build an item response vector linking user results to modeled questions
item_response_vector = [
(1, question_1), # Correct
(1, question_2), # Correct
(1, question_3), # Correct
(0, question_4), # Incorrect
(0, question_5) # Incorrect
]
Step 2¶
- Define ability parameter prior distribution for MAP estimation
- Estimate ability parameter using MAP estimator and item response vector
- Note: the prior distribution can be changed to impact the estimation
# --> 1. First, define the ability parameter prior distribution to be used in the estimate
prior_distribution_norm = norm(0.5, 0.2)
prior_distribution_unif = uniform()
# --> 2. Second, instantiate the MAP estimator and execute
estimate = MAP_Ability_Estimator(item_response_vector, theta_dist=prior_distribution_norm).estimate().x
print('--> ABILITY ESTIMATE', estimate, '(normal prior)')
--> ABILITY ESTIMATE 0.43174828924784403 (normal prior)
Section 2: Test Item Selection¶
Scenario (cont…)¶
- After our new user completed his closing examination / viewed his estimated ability parameter (let $\theta$ ~ 0.574 from before), he/she would like to take an adaptive exam to improve his/her ability.
- Assume Daphne Academy has a question bank $B$ containing 10
Lifecycle Cost
topic questions
Step 1¶
- Define question bank B
B = [
IIIPL(5, .6, .5), # Idx: 0
IIIPL(6, .4, .5), # Idx: 1
IIIPL(5, .5, .25), # Idx: 2
IIIPL(7, .4, .25), # Idx: 3
IIIPL(5, .45, .25), # Idx: 4
IIIPL(6, .6, .5), # Idx: 5
IIIPL(7, .75, .5), # Idx: 6
IIIPL(5, .55, .5), # Idx: 7
IIIPL(8, .6, .5), # Idx: 8
IIIPL(6, .67, .25), # Idx: 9
]
B_text = [
'The cost of a satellites propulsion system is driven in part by propulsion system type',
'The operational and maintenance phase of a mission is typically more expensive for constellations and reusable systems',
'During the production phase, the cost of producing multiple similar units is estimated using a',
'Which of the following best describes a Work Breakdown Structure (WBS)?',
'Which of the following is NOT considered a main lifecycle cost phase?',
'Lifecycle Cost\'s Research, Development, Test, and Evaluation phase (RDT&E) includes design, analysis, and testing of breadboards, brassboards, prototypes, and qualification units',
'The Production phase of lifecycle cost incorporates the cost of producing flight units. However, it doesn\'t incorporate their launch costs.',
'Replacement satellites and launches after the space system final operating capability has been established are not considered production units.',
'The operations and maintenance phase consists of ongoing operations and maintenance costs, excluding software maintenance.',
'Which of the following best describes the Theoretical First Unit (TFU) for a space mission?'
]
B_num = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Step 2¶
- Calculate each question’s ability parameter contribution score
- Select the question with the highest contribution score
# --> 1. Iterate over each question and calculate its expected contribution
print('--------- Question Theta Contributions (',round(estimate, 2),') ---------')
contributions = []
for idx, question in enumerate(B):
contribution = calculate_contribution(estimate, question, item_response_vector)
print('--> Question ' + str(B_num[idx]) + ':', round(contribution, 4))
contributions.append(contribution)
# --> 2. Determine which question gives the highest contribution
question_idx = contributions.index(max(contributions))
selected_question = deepcopy(B[question_idx])
print('\n--> SELECTED: Question', B_num[question_idx])
print('------> TEXT:', B_text[question_idx])
--------- Question Theta Contributions ( 0.43 ) --------- --> Question 0: 0.0016 --> Question 1: 0.0023 --> Question 2: 0.0016 --> Question 3: 0.0028 --> Question 4: 0.0014 --> Question 5: 0.0025 --> Question 6: 0.0013 --> Question 7: 0.0017 --> Question 8: 0.0046 --> Question 9: 0.0031 --> SELECTED: Question 8 ------> TEXT: The operations and maintenance phase consists of ongoing operations and maintenance costs, excluding software maintenance.
Step 3¶
- Model user answer (1 for correct || 0 for incorrect)
- Update ability parameter estimate based on answer
- Remove question from user question bank
# --> 1. Model if the user gets the adaptively assigned question correct or incorrect
user_answer = 0
# --> 2. Update user ability estimate with answer
item_response_vector.append((user_answer, selected_question))
updated_estimate = MAP_Ability_Estimator(item_response_vector, theta_dist=prior_distribution_norm).estimate().x
print('--> ABILITY ESTIMATE UPDATE:', round(estimate, 3), '->', round(updated_estimate, 3))
estimate = updated_estimate
# --> 3. Remove the selected question from the question bank
del B[question_idx]
del B_text[question_idx]
del B_num[question_idx]
--> ABILITY ESTIMATE UPDATE: 0.432 -> 0.398
Step 4¶
- Repeat step 2 (calculate ability parameter contributions and select next question)
# --> 1. Iterate over each question and calculate its expected contribution
print('--------- Question Theta Contributions (',round(estimate, 2),') ---------')
contributions = []
for idx, question in enumerate(B):
contribution = calculate_contribution(estimate, question, item_response_vector)
print('--> Question ' + str(B_num[idx]) + ':', round(contribution, 4))
contributions.append(contribution)
# --> 2. Determine which question gives the highest contribution
question_idx = contributions.index(max(contributions))
selected_question = deepcopy(B[question_idx])
print('\n--> SELECTED: Question', B_num[question_idx])
print('------> TEXT:', B_text[question_idx])
--------- Question Theta Contributions ( 0.4 ) --------- --> Question 0: 0.0009 --> Question 1: 0.0013 --> Question 2: 0.0007 --> Question 3: 0.0014 --> Question 4: 0.0005 --> Question 5: 0.0014 --> Question 6: 0.0006 --> Question 7: 0.0009 --> Question 9: 0.0016 --> SELECTED: Question 9 ------> TEXT: Which of the following best describes the Theoretical First Unit (TFU) for a space mission?
Intelligent Tutoring Systems Literature
- Van Der Linden, W., & Ren, H. (2015). Optimal Bayesian Adaptive Design for Test-Item Calibration. Psychometrika, 80(2), 263–288. https://doi.org/10.1007/s11336-013-9391-8
- Koedinger, K. R., Anderson, J. R., Hadley, W. H., & Mark, M. A. (1997). Intelligent Tutoring Goes To School in the Big City. 30–43.
- Corbett, Albert T., Kenneth R. Koedinger, and John R. Anderson. “Intelligent tutoring systems.” Handbook of human-computer interaction. North-Holland, 1997. 849-874
- Kim, Woo-Hyun, and Jong-Hwan Kim. “Individualized AI tutor based on developmental learning networks.” IEEE Access 8 (2020): 27927-27937.
- Mayo, Michael, and Antonija Mitrovic. “Optimising ITS behaviour with Bayesian networks and decision theory.” (2001): 124-153.
- Folsom-Kovarik, Jeremiah T., Gita Sukthankar, and Sae Schatz. “Tractable POMDP representations for intelligent tutoring systems.” ACM Transactions on Intelligent Systems and Technology (TIST) 4.2 (2013): 1-22
- Graesser, Arthur C., et al. “AutoTutor: An intelligent tutoring system with mixed-initiative dialogue.” IEEE Transactions on Education 48.4 (2005): 612-618.
- D’mello, Sidney, and Art Graesser. “AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk back.” ACM Transactions on Interactive Intelligent Systems (TiiS) 2.4 (2013): 1-39.
- Cheung, B., et al. “SmartTutor: An intelligent tutoring system in web-based adult education.” Journal of Systems and Software 68.1 (2003): 11-25.
- Ong, James, and Sowmya Ramachandran. “Intelligent tutoring systems: Using ai to improve training performance and roi.” Networker Newsletter 19.6 (2003): 1-6.