connectionist world models
This site contains 3 simple demonstrations of a connectionist world
model. For details please see
-
M. Toussaint (2004): Learning a world model and planning with
a self-organizing dynamic neural system [ps.gz].
In
Advances of Neural Information Processing Systems 17 (NIPS
2003), 929-936, MIT Press, Cambridge. For a draft see nlin.AO/0306015.
The following 3 movies visualize the experiments with the CWM on a
maze problem. The speed of the movies directly corresponds to the
speed of the experiment done online on a 2GHz Pentium (the code though
is not optimized for speed).
- self-organization [avi]
- The movie displays the growth of the CWM during
self-organization. On the left, you find the agent exploring the maze
via a random walk. On the right, the central layer of the CWM is
displayed; the color of the connections corresponds to their weights
(red=1 and blue=0).
- planning [avi]
- The movie visualizes the planning process with the CWM. On the
left, you find the maze; the current goal is marked by a red spot. The
agent (the white spot) moves straight to the goal. On the right, you
find the value field on the central layer visualized (red=1 and
blue=0). Whenever the agent reaches the goal, the goal is changed to a
new random position. The value field rearranges quickly and relaxes to
its fixed point (which corresponds to the Bellman equation). Given
this stationary value field, the agent chooses actions that lead
`uphill' towards the goal.
- learning [avi]
- The movie displays how the CWM is capable to learn changes of the
world. On the right, the color of the connections visualizes their
weights (red=1 and blue=0).
As before the goal randomly set within the maze and changes whenever
the agent reaches the goal. However, at some time, a trespass in the
upper left part of the maze is shut. It occurs that the agent tries to
move through this trespass; it re-adapts its world model (note how some
connections become blue!); and, if the readaptation of these weights
induces a sufficient change of the relaxed value field, the agent
moves around the blockade to the goal. Thereafter, the agent has to
learn that the trespass is blocked also when approaching from the
left.
In the lower right part of the maze, another trespass was blocked and
the agent analogously learns this blockade (see that all connections
in this region turn blue). Once the two blockades have been learned,
the agent never explores them again.
Recent Posts
Die gängigen Erklärungen zu “Was ist Informatik?” – etwa von der
Gesellschaft für Infomatik,
der
TU Dresden,
oder auf Wikipedia –
machen es einem schwer, sic...