Interactive AI and Machine Teaching of Active Sequential Learners

# <img src="figs/modelling_each_other_cropped_white.svg" style="width: 25%" /> Interactive AI and Machine Teaching of Active Sequential Learners
### Tomi Peltola <a href="mailto:tomi@cai.fi">tomi@cai.fi</a>
### FCAI Machine Learning Coffee Seminar March 9, 2020

---

# Outline

1. Interactive AI
2. Machine teaching of active sequential learners
3. Modelling users as boundedly rational teachers
4. Challenges and summary

.smaller[
.col_left[
<img src="figs/neurips_paper.png" style="width: 100%; max-width: 100%;" />
33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

https://papers.nips.cc/paper/9299-machine-teaching-of-active-sequential-learners

https://aaltopml.github.io/machine-teaching-of-active-sequential-learners/
]

.col_right[
<img src="figs/chiws_paper.png" style="width: 100%; max-width: 100%;" />
Computational Modeling in HCI: ACM CHI 2019 Workshop

https://arxiv.org/abs/1912.05284
]
]

---

# Outline

1. Interactive AI
2. Machine teaching of active sequential learners
3. Modelling users as boundedly rational teachers
4. Challenges and summary

---

# Interactive AI

---

# Interactive AI

FCAI research program on Interactive AI:
https://fcai.fi/interactive-ai

Goal is **natural human–AI collaboration**:
 * "understand our goals and abilities"
 * "infer human beliefs and abilities from observations"
 * "predicting the consequences of its actions on humans"

* Recommendation systems, interactive search, adaptive user interfaces...
 * Current systems mainly model users as passive data sources rather than active agents.
 * Our work focuses on intentional human–AI collaboration.

???

"The goal of FCAI’s research program Interactive AI is to enable AI that people can naturally work and solve problems with, and which demonstrates the ability to better understand our goals and abilities, takes initiative more sensitively, aligns its objectives with us, and supports us. This research program contributes to FCAI research objective Understandability (objective III) by developing methods for collaborative forms of AI: the ability to infer human beliefs and abilities from observations and predicting the consequences of its actions on humans"

---

# Interactive AI

<img src="figs/humans_modelling_each_other_cropped.svg" style="width: 50%;" />
]

---

# Interactive AI

<img src="figs/modelling_each_other_cropped.svg" style="width: 50%;" />
]

---

# Interactive AI

<img src="figs/machines_modelling_each_other_cropped.svg" style="width: 61%;" />
]

---

# Machine Teaching of Active Sequential Learners

---

# Machine teaching

Finding an optimal training dataset `$D$` to teach a machine learner:
`$$\begin{align} \min_D & \text{ teaching_loss}(\hat\theta, \theta^*) + \text{teaching_cost}(D) \\ & \text{s.t. } \hat\theta = \text{learner}(D) \end{align}$$`
 * `$\theta^*$` is known to the teacher (teaching goal).

* Fundamentally interesting for machine learning.
 * Applications: education, user modelling, adversarial attacks.

---

# Active learners

1\. Ask a question towards learning something.

2\. Obtain an answer.

3\. Update your knowledge.

---

# Active learners

.hilite-blue-color[
**Example: Logistic regression learner**
 * **Pool of unlabeled data.**
 * **Goal: learn parameters of `$p(y \mid x_k, \theta)$` with few queries for labels.**
]

1\. Ask a question towards learning something.

**Choose an unlabeled data point `$x$` with most uncertain prediction.**

2\. Obtain an answer.

**Obtain label `$y$`.**

3\. Update your knowledge.

**Add `$(x, y)$` to dataset and re-learn parameters `$\theta$`.**

---

# Active learners

.hilite-red-color[
**Example: Bayesian multi-armed bandit learner**
 * **Set of `$K$` arms with features `$x_k$` and rewards `$p(y \mid x_k, \theta)$`.**
 * **Goal: maximize cumulative reward `$\sum_t y_t$`.**
]

1\. Ask a question towards learning something.

**Choose an arm `$x$` balancing exploration and exploitation.**

2\. Obtain an answer.

**Obtain reward `$y$`.**

3\. Update your knowledge.

**Add `$(x, y)$` to dataset `$\mathcal{D}$` and update `$p(\theta \mid \mathcal{D})$`.**

---

# Teaching active learners

* Learner chooses the questions.
 * Teacher provides the answers.

**Teacher aims to steer the learner towards a teaching goal.**

* **Learn some logistic regression parameters `$\theta^*$` or attain high accuracy on some dataset.**
 * **Teach a relevance profile of arms or accumulate as much reward as possible.**

**Teacher should acknowledge the sequential nature of the problem.**

---

# Teaching active learners

**Teacher should acknowledge the sequential nature of the problem.**

Teacher models the learner to
1. understand the state of knowledge of the learner, and
2. anticipate the learner's questions.

**To teach, answer the current question to steer the learner's future knowledge towards the teaching goal.**

Markov decision process / model-based reinforcement learning: 
`$\Rightarrow$` Solve or plan to find optimal teaching policy.

---

# Teaching active learners

Markov decision process:
 * state: knowledge and current question of the learner,
 * transitions: knowledge update and next question,
 * actions: answers,
 * reward: the teaching goal.

`$\Rightarrow$` Solve or plan to find optimal teaching policy.

---

# Logistic regression learner

---

# Logistic regression learner

---

# Logistic regression learner

---

# Probabilistic teacher model

Probabilistic teacher:
 `$$p(y \mid x_t, \theta^*) \propto \exp\left(Q^*(s_t, y \mid \theta^*)\right)$$`
 * `$Q^*(s_t, y \mid \theta^*)$` is the optimal state-action value function of the MDP.
 * `$\theta^*$` is known to the teacher (teaching goal).

Compare to naive teacher model (label distribution/environmental reward model):
`$$p(y \mid x_t, \theta^*)$$`

---

# Teacher-aware learning

**For the learner `$\theta^*$` is unknown.**

* Infer `$\theta^*$` using probabilistic teacher as observation model.
 * Probabilistic inverse reinforcement learning.

Model of active user in interactive AI systems:
 *  user = teacher ― system = teacher-aware learner

---

## Modelling user as (boundedly rational) teacher

.col_left[
1\. User knows that the system has beliefs and/or state and can anticipate how these change with her actions.
 *"Naive" learner*
]
.col_right[
.center[
<img src="figs/model_0.svg" style="width: 120%; max-width: 120%;" />
]
]

---

## Modelling user as (boundedly rational) teacher

.col_left[
1\. User knows that the system has beliefs and/or state and can anticipate how these change with her actions.

2\. User plans her actions, based on the model of the system, to achieve good future states.
 *Markov decision process, with "naive" learner providing transition dynamics*
]
.col_right[
.center[
<img src="figs/model_1.svg" style="width: 120%; max-width: 120%;" />
]
]

---

## Modelling user as (boundedly rational) teacher

.col_left[
1\. User knows that the system has beliefs and/or state and can anticipate how these change with her actions.

2\. User plans her actions, based on the model of the system, to achieve good future states.

3\. System interprets the observed user's actions based on the user model and infers the user's intent/interests/goals.
 *"Sophisticated" learner, with observation model defined via the state-action value function of the MDP*
]
.col_right[
.center[
<img src="figs/model_2.svg" style="width: 120%; max-width: 120%;" />
]
]

---

## Modelling user as (boundedly rational) teacher

.col_left[
1\. User knows that the system has beliefs and/or state and can anticipate how these change with her actions.

2\. User plans her actions, based on the model of the system, to achieve good future states.

3\. System interprets the observed user's actions based on the user model and infers the user's intent/interests/goals.
 *"Sophisticated" learner, with observation model defined via the state-action value function of the MDP*
]
.col_right[
.center[
<img src="figs/model_3.svg" style="width: 120%; max-width: 120%;" />
]
]

---

# Simulation experiments