First order versus variable order sequences and prediction


There is one more major topic to discuss before we end our discussion on the spatial and temporal poolers. It may not be of interest to all readers and it is not needed to understand Chapters 3 and 4.


What is the effect of having more or fewer cells per column? Specifically, what happens if we have only one cell per column?


In the example used earlier, we showed that a representation of an input comprised of 100 active columns with 4 cells per column can be encoded in 4^100 different ways. Therefore, the same input can appear in a many contexts without confusion. For example, if input patterns represent words, then a region can remember many sentences that use the same words over and over again and not get confused. A word such as “dog” would have a unique representation in different contexts. This ability permits an HTM region to make what are called “variable order” predictions.

A variable order prediction is not based solely on what is currently happening, but on varying amounts of past context. An HTM region is a variable order memory.


If we increase to five cells per column, the available number of encodings of any particular input in our example would increase to 5^100, a huge increase over

4^100. But both these numbers are so large that for many practical problems the

increase in capacity might not be useful.


However, making the number of cells per column much smaller does make a big difference.


If we go all the way to one cell per column, we lose the ability to include context in our representations. An input to a region always results in the same prediction, regardless of previous activity. With one cell per column, the memory of an HTM region is a “first order” memory; predictions are based only on the current input.


First order prediction is ideally suited for one type of problem that brains solve: static spatial inference. As stated earlier, a human exposed to a brief visual image can recognize what the object is even if the exposure is too short for the eyes to move. With hearing, you always need to hear a sequence of patterns to recognize what it is. Vision is usually like that, you usually process a stream of visual images. But under certain conditions you can recognize an image with a single exposure.


Temporal and static recognition might appear to require different inference mechanisms. One requires recognizing sequences of patterns and making predictions based on variable length context. The other requires recognizing a static spatial pattern without using temporal context. An HTM region with multiple

cells per column is ideally suited for recognizing time-based sequences, and an HTM region with one cell per column is ideally suited to recognizing spatial patterns. At Numenta, we have performed many experiments using one-cell-per-column regions applied to vision problems. The details of these experiments are beyond the scope

of this chapter; however we will cover the important concepts.


If we expose an HTM region to images, the columns in the region learn to represent common spatial arrangements of pixels. The kind of patterns learned are similar to what is observed in region V1 in neocortex (a neocortical region extensively studied in biology), typically lines and corners at different orientations. By training on moving images, the HTM region learns transitions of these basic shapes. For example, a vertical line at one position is often followed by a vertical line shifted to the left or right. All the commonly observed transitions of patterns are remembered by the HTM region.


Now what happens if we expose a region to an image of a vertical line moving to the right? If our region has only one cell per column, it will predict the line might next appear to the left or to the right. It can’t use the context of knowing where the line was in the past and therefore know if it is moving left or right. What you find is that

these one-cell-per-column cells behave like “complex cells” in the neocortex. The predictive output of such a cell will be active for a visible line in different positions, regardless of whether the line is moving left or right or not at all. We have further observed that a region like this exhibits stability to translation, changes in scale, etc. while maintaining the ability to distinguish between different images. This behavior is what is needed for spatial invariance (recognizing the same pattern in different locations of an image).


If we now do the same experiment on an HTM region with multiple cells per column, we find that the cells behave like “directionally-tuned complex cells” in the neocortex. The predictive output of a cell will be active for a line moving to the left

or a line moving to the right, but not both.


Putting this all together, we make the following hypothesis. The neocortex has to do both first order and variable order inference and prediction. There are four or five layers of cells in each region of the neocortex. The layers differ in several ways but they all have shared columnar response properties and large horizontal connectivity within the layer. We speculate that each layer of cells in neocortex is performing a variation of the HTM inference and learning rules described in this chapter. The different layers of cells play different roles. For example it is known from

anatomical studies that layer 6 creates feedback in the hierarchy and layer 5 is involved in motor behavior. The two primary feed-forward layers of cells are layers

4 and 3. We speculate that one of the differences between layers 4 and 3 is that the

cells in layer 4 are acting independently, i.e. one cell per column, whereas the cells in layer 3 are acting as multiple cells per column. Thus regions in the neocortex near sensory input have both first order and variable order memory. The first order sequence memory (roughly corresponding to layer 4 neurons) is useful in forming representations that are invariant to spatial changes. The variable order sequence memory (roughly corresponding to layer 3 neurons) is useful for inference and prediction of moving images.


In summary, we hypothesize that the algorithms similar to those described in this chapter are at work in all layers of neurons in the neocortex. The layers in the neocortex vary in significant details which make them play different roles related to feed-forward vs. feedback, attention, and motor behavior. In regions close to sensory input, it is useful to have a layer of neurons performing first order memory as this leads to spatial invariance.


At Numenta, we have experimented with first order (single cell per column) HTM regions for image recognition problems. We also have experimented with variable order (multiple cells per column) HTM regions for recognizing and predicting variable order sequences. In the future, it would be logical to try to combine these in a single region and to extend the algorithms to other purposes. However, we believe many interesting problems can be addressed with the equivalent of single- layer, multiple-cell-per-column regions, either alone or in a hierarchy.


Chapter 3: Spatial Pooling Implementation and Pseudocode


This chapter contains the detailed pseudocode for a first implementation of the spatial pooler function. The input to this code is an array of bottom-up binary inputs from sensory data or the previous level. The code computes activeColumns(t) - the list of columns that win due to the bottom-up input at time t. This list is then sent as input to the temporal pooler routine described in the next chapter, i.e. activeColumns(t) is the output of the spatial pooling routine.

The pseudocode is split into three distinct phases that occur in sequence: Phase 1: compute the overlap with the current input for each column

Phase 2: compute the winning columns after inhibition

Phase 3: update synapse permanence and internal variables


Although spatial pooler learning is inherently online, you can turn off learning by simply skipping Phase 3.


The rest of the chapter contains the pseudocode for each of the three steps. The various data structures and supporting routines used in the code are defined at the end.


