[AI Talk Summary] Adapting and Explaining Deep Learning for Autonomous Systems

[AI Talk Summary] Adapting and Explaining Deep Learning for Autonomous Systems - Trevor Darrell

April 24, 2020

Problem definition

The talk was given by Prof. Trevor Darrell and he starts by comparing Machine Learning models to the mind of a human being. He questions why ImageNet can perform very well in static images, but poorly in videos even though a video is just a sequence of images? Possible reasons to this question is dataset bias, which prevents the model to adapt to different environment (such as lower resolution than trained), alterations in the image (such as motion blur) and so forth.

Prof. Trevor Darrell is concerned that machine learning models are trained only to do specific tasks and he goes on to talk about his vision of “Beyond Supervised AI”. There are three themes:

Adaptation

How can we build models that can work across domains (or a change in environment)?

Exploration

How can we teach a model to explore instead of providing it very specific rewards/goals?

Explanation

Can we design models to tell us why they think the way they do?

Algorithm, Results and Discussion

In the talk, Prof. Trevor Darrell did not provide any actual result for the concept or papers he was referencing, but he did give a high level feedback of the result.

Adaptation

In the topic of adaptation, Prof. Trevor Darrell talked about the domain transfer mentioned in the Problem Definition section. Specifically, he talked about Domain Adversarial Optimization. It works similar to GANs. Using cups as example, given that we have data about cups in a particular domain (eg. image taken from a particular angle), can our model identify cups in another domain (eg. image taken from a different angle)? By using adversarial techniques, we can:

use adversarial optimization on the domain classifier
take loss from the domain classifier
inversely optimize it; similar to how GANs work.

Exploration

In the topic of exploration, Prof. Trevor Darrell explains how reinforcement learning models often have dense rewards, meaning that rewards happen very often and in a way, leads the model learn what we want it to learn. However, this way of learning is very different from how humans learn. Humans don’t usually follow short goals, instead we are more ‘exploratory’ and this should be how we train our agent. In order to implement this, he explains how we can train the agent to pick actions to maximize ‘curiosity’. In every case, the agent should try to predict what happens next. When its prediction is different from the result (of the environment interaction), it generates a positive curiosity signal that drives the agent to learn. In some sense, curiosity is the internal reward signal to motivate the agent to explore.

He explains that this idea was implemented on the game, Mario Brothers, initially. It started off not doing much, but then it started bouncing around and feeling curious. Eventually, it learnt to play the game. A surprising result was that the agent learnt to go through several levels even though it was not trained to do that. It was also trying not to die, which is not what it was trained to do. A possible explanation for this is similar to the idea of FOMO, the fear of missing out. By being alive longer, the agent can explore more, which is motivated by the internal curiosity reward. This was generalizable to other video games and virtualized environments as well.

However, this concept is not flawless. It is prone to “fake news”, which he means as distraction from the initial goal. For example, in one experiment, there was a television in the virtualized environment. The agent kept watching TV forever as he gets rewarded with curiosity. To some degree, it is arguable whether this is similar to how humans learn, because this is similar to gambling habits. Another flaw exists in the area of “multi-agent exploration”, where there are two curious agents, instead of one, playing pong. The problem was the game never ended because the two agents were not trying to beat each other, but instead kept passing the ball as they are rewarded with curiosity.

Explanation

In the topic of explanation, Prof. Trevor Darrell talks about the possibility of taking models, which are known to be black boxes, and derive some meaning for the way they think. He stresses that it is important for the model to be learning for the right reasons. These are known as XAI (Explainable AI) research.

There’s always a trade-off between accuracy and explainability. The question is: which one should you choose?

Prof. Trevor Darrell explains that even though it is tempting to pick accuracy, we should be choosing explainability because we should be trying to build models that can perform new environments and can interactively be trained by humans, rather than a model that lives in a closed-world away from a human in the loop.

He explains that a recent work at Berkeley focuses on building XAI models that allows us take to deep learning agent and relate to language. He goes on to explain how they started off to teach the model to “learn to explain”, and the next step is to teach it to “explain to learn”. This way, a human can give advice to the system/model.

Another research in this area is the RISE model from the DARPA XAI team at Boston University. The focus is to show which parts or pixels of an image were used to make a decision. It’s useful in answering questions, such as “if I was going to classify a sheep, which part of the image matters?”

Video Link

[SAIF 2019] Day 1: Adapting and Explaining Deep Learning for Autonomous Systems - Trevor Darrell: https://www.youtube.com/watch?v=IVg7or0jEGU

Comments

UnknownApril 24, 2020 at 10:46 AM
🤭
ReplyDelete
Replies

Add comment

Search This Blog

NotAfraidOfWong