class: center, middle, inverse, title-slide #
Interactive AI and
Machine Teaching of
Active Sequential Learners ###
Tomi Peltola
tomi@cai.fi
### FCAI Machine Learning Coffee Seminar
March 9, 2020 --- class: split50 # Outline 1. Interactive AI 2. Machine teaching of active sequential learners 3. Modelling users as boundedly rational teachers 4. Challenges and summary -- .smaller[ .col_left[ <img src="figs/neurips_paper.png" style="width: 100%; max-width: 100%;" /> 33rd Conference on Neural Information Processing Systems (NeurIPS 2019) https://papers.nips.cc/paper/9299-machine-teaching-of-active-sequential-learners https://aaltopml.github.io/machine-teaching-of-active-sequential-learners/ ] .col_right[ <img src="figs/chiws_paper.png" style="width: 100%; max-width: 100%;" /> Computational Modeling in HCI: ACM CHI 2019 Workshop https://arxiv.org/abs/1912.05284 ] ] --- # Outline 1. Interactive AI 2. Machine teaching of active sequential learners 3. Modelling users as boundedly rational teachers 4. Challenges and summary <br /> .center[ <span style="font-size: 40px";>AI `\(\approx\)` machine learning based adaptive system</span> ] --- class: inverse, middle, center # Interactive AI --- # Interactive AI FCAI research program on Interactive AI: https://fcai.fi/interactive-ai Goal is **natural human–AI collaboration**: * "understand our goals and abilities" * "infer human beliefs and abilities from observations" * "predicting the consequences of its actions on humans" -- * Recommendation systems, interactive search, adaptive user interfaces... * Current systems mainly model users as passive data sources rather than active agents. * Our work focuses on intentional human–AI collaboration. ??? "The goal of FCAI’s research program Interactive AI is to enable AI that people can naturally work and solve problems with, and which demonstrates the ability to better understand our goals and abilities, takes initiative more sensitively, aligns its objectives with us, and supports us. This research program contributes to FCAI research objective Understandability (objective III) by developing methods for collaborative forms of AI: the ability to infer human beliefs and abilities from observations and predicting the consequences of its actions on humans" --- # Interactive AI .center[ Theory of mind is essential for efficient human–human collaboration. <img src="figs/humans_modelling_each_other_cropped.svg" style="width: 50%;" /> ] --- # Interactive AI .center[ For efficient human–AI collaboration, both need to model each other. <img src="figs/modelling_each_other_cropped.svg" style="width: 50%;" /> ] --- # Interactive AI .center[ Multi-agent modelling provides the computational framework. <img src="figs/machines_modelling_each_other_cropped.svg" style="width: 61%;" /> ] --- class: inverse, middle, center # Machine Teaching of<br /> Active Sequential Learners --- # Machine teaching Finding an optimal training dataset `\(D\)` to teach a machine learner: `$$\begin{align} \min_D & \text{ teaching_loss}(\hat\theta, \theta^*) + \text{teaching_cost}(D) \\ & \text{s.t. } \hat\theta = \text{learner}(D) \end{align}$$` * `\(\theta^*\)` is known to the teacher (teaching goal). -- * Fundamentally interesting for machine learning. * Applications: education, user modelling, adversarial attacks. --- # Active learners <br /><br /><br /><br /> 1\. Ask a question towards learning something. <br /> 2\. Obtain an answer. <br /> 3\. Update your knowledge. <br /> --- class: hilite-blue # Active learners .hilite-blue-color[ **Example: Logistic regression learner** * **Pool of unlabeled data.** * **Goal: learn parameters of `\(p(y \mid x_k, \theta)\)` with few queries for labels.** ] 1\. Ask a question towards learning something. **Choose an unlabeled data point `\(x\)` with most uncertain prediction.** 2\. Obtain an answer. **Obtain label `\(y\)`.** 3\. Update your knowledge. **Add `\((x, y)\)` to dataset and re-learn parameters `\(\theta\)`.** --- class: hilite-red # Active learners .hilite-red-color[ **Example: Bayesian multi-armed bandit learner** * **Set of `\(K\)` arms with features `\(x_k\)` and rewards `\(p(y \mid x_k, \theta)\)`.** * **Goal: maximize cumulative reward `\(\sum_t y_t\)`.** ] 1\. Ask a question towards learning something. **Choose an arm `\(x\)` balancing exploration and exploitation.** 2\. Obtain an answer. **Obtain reward `\(y\)`.** 3\. Update your knowledge. **Add `\((x, y)\)` to dataset `\(\mathcal{D}\)` and update `\(p(\theta \mid \mathcal{D})\)`.** --- # Teaching active learners * Learner chooses the questions. * Teacher provides the answers. -- **Teacher aims to steer the learner towards a teaching goal.** * <span class="hilite-blue">**Learn some logistic regression parameters `\(\theta^*\)` or attain high accuracy on some dataset.**</span> * <span class="hilite-red">**Teach a relevance profile of arms or accumulate as much reward as possible.**</span> -- **Teacher should acknowledge the sequential nature of the problem.** --- # Teaching active learners **Teacher should acknowledge the sequential nature of the problem.** Teacher models the learner to 1. understand the state of knowledge of the learner, and 2. anticipate the learner's questions. -- **To teach, answer the current question to steer the learner's future knowledge towards the teaching goal.** -- Markov decision process / model-based reinforcement learning:<br /> `\(\Rightarrow\)` Solve or plan to find optimal teaching policy. --- # Teaching active learners <img src="figs/plan_tree.svg" style="width: 45%; max-width: 45%; float: right; margin-right: -5%;" /> Markov decision process: * state: knowledge and current question of the learner, * transitions: knowledge update and next question, * actions: answers, * reward: the teaching goal. `\(\Rightarrow\)` Solve or plan to find optimal teaching policy. --- class: graybg # Logistic regression learner <img src="figs/active_learning_animation_pool.gif" style="width: 100%; max-width: 100%;" /> --- class: graybg # Logistic regression learner <img data-gifffer="figs/active_learning_animation_noteacher.gif" style="width: 100%; max-width: 100%;" /> --- class: graybg # Logistic regression learner <img data-gifffer="figs/active_learning_animation.gif" style="width: 100%; max-width: 100%;" /> --- # Probabilistic teacher model <img src="figs/plan_tree_qvals.svg" style="width: 35%; max-width: 35%; float: right; margin-right: -5%" /> Probabilistic teacher: `$$p(y \mid x_t, \theta^*) \propto \exp\left(Q^*(s_t, y \mid \theta^*)\right)$$` * `\(Q^*(s_t, y \mid \theta^*)\)` is the optimal state-action value function of the MDP. * `\(\theta^*\)` is known to the teacher (teaching goal). -- Compare to naive teacher model (label distribution/environmental reward model): `$$p(y \mid x_t, \theta^*)$$` --- # Teacher-aware learning Probabilistic teacher: `$$p(y \mid x_t, \theta^*) \propto \exp\left(Q^*(s_t, y \mid \theta^*)\right)$$` * `\(Q^*(s_t, y \mid \theta^*)\)` is the optimal state-action value function of the MDP. * `\(\theta^*\)` is known to the teacher (teaching goal). **For the learner `\(\theta^*\)` is unknown.** * Infer `\(\theta^*\)` using probabilistic teacher as observation model. * Probabilistic inverse reinforcement learning. -- Model of active user in interactive AI systems: * user = teacher ― system = teacher-aware learner --- class: split60 ## Modelling user as (boundedly rational) teacher .col_left[ 1\. User knows that the system has beliefs and/or state and can anticipate how these change with her actions. <br /><br />*"Naive" learner* ] .col_right[ .center[ <img src="figs/model_0.svg" style="width: 120%; max-width: 120%;" /> ] ] --- class: split60 ## Modelling user as (boundedly rational) teacher .col_left[ 1\. User knows that the system has beliefs and/or state and can anticipate how these change with her actions. <br /><br /> 2\. User plans her actions, based on the model of the system, to achieve good future states. <br /><br />*Markov decision process, with "naive" learner providing transition dynamics* ] .col_right[ .center[ <img src="figs/model_1.svg" style="width: 120%; max-width: 120%;" /> ] ] --- class: split60 ## Modelling user as (boundedly rational) teacher .col_left[ 1\. User knows that the system has beliefs and/or state and can anticipate how these change with her actions. <br /><br /> 2\. User plans her actions, based on the model of the system, to achieve good future states. <br /><br /> 3\. System interprets the observed user's actions based on the user model and infers the user's intent/interests/goals. <br /><br />*"Sophisticated" learner, with observation model defined via the state-action value function of the MDP* ] .col_right[ .center[ <img src="figs/model_2.svg" style="width: 120%; max-width: 120%;" /> ] ] --- class: split60 ## Modelling user as (boundedly rational) teacher .col_left[ 1\. User knows that the system has beliefs and/or state and can anticipate how these change with her actions. <br /><br /> 2\. User plans her actions, based on the model of the system, to achieve good future states. <br /><br /> 3\. System interprets the observed user's actions based on the user model and infers the user's intent/interests/goals. <br /><br />*"Sophisticated" learner, with observation model defined via the state-action value function of the MDP* ] .col_right[ .center[ <img src="figs/model_3.svg" style="width: 120%; max-width: 120%;" /> ] ] --- # Simulation experiments Bayesian multi-armed bandit learner .center[ <img src="figs/sim_experiments_1.svg" style="width: 80%; max-width: 80%;" /> ] --- # Simulation experiments Bayesian multi-armed bandit learner .center[ <img src="figs/sim_experiments_2.svg" style="width: 80%; max-width: 80%;" /> ] --- # Simulation experiments Bayesian multi-armed bandit learner .center[ <img src="figs/sim_experiments_3.svg" style="width: 80%; max-width: 80%;" /> ] --- # Simulation experiments Bayesian multi-armed bandit learner .center[ <img src="figs/sim_experiments_4.svg" style="width: 80%; max-width: 80%;" /> ] --- # Simulation experiments Bayesian multi-armed bandit learner .center[ <img src="figs/sim_experiments_5.svg" style="width: 80%; max-width: 80%;" /> ] --- # User study Word search task with 10 users teaching a Bayesian bandit learner .center[ <img src="figs/user_study_cum_reward_NIPS.svg" style="margin-top: -3%; width: 70%; max-width: 70%;" /> ] --- class: inverse, middle, center # Challenges and summary --- # Challenges in adaptive AI systems * **Choosing the right applications** * Where do we want adaptive AI? Ethics? * Where can it be realized with current or future computational methods and/or theories about human behaviour? -- * **Non-stationarity** * Fundamental challenge in multi-agent learning. * Critical to user experience with adaptive systems. * Predictability and understandability of the adaptive system. -- * **Scarcity of training/interaction data** * How to collect training data? * Strong priors, transfer and meta-learning. --- class: split75 # Summary <img src="figs/modelling_each_other_cropped.svg" style="width: 25%; max-width: 25%; float: right;" /> * Probabilistic teacher–learner two-agent model. * Towards treating <b>users as active agents instead of passive data sources</b> in interactive AI systems. -- Better human–AI collaboration through multi-agent & computational theory of mind models? * Clear conceptual framework, but not easy to implement for realistic tasks? * Challenges: theories/models of human behaviour in relation to adaptive systems; non-stationarity; computation in nested multi-agent models. -- .center[ <div style="margin-top: 3%;"><b style="font-size: 30px;">Thanks!</b></div> https://aaltopml.github.io/machine-teaching-of-active-sequential-learners/ <a>tomi@cai.fi</a> - http://www.tmpl.fi - https://research.cs.aalto.fi/pml/ ]