AIM 1679

AIM-1679 AIM-1679

User Manual: AIM-1679

Open the PDF directly: View PDF .
Page Count: 9

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

ARTIFICIAL INTELLIGENCE LABORATORY

and

CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING

DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES

A.I. Memo No. 1679 December 14, 1999

C.B.C.L. Paper No. 183

A note on object class representation

and categorical perception

Maximilian Riesenhuber and Tomaso Poggio

This publication can be retrieved by anonymous ftp to publications.ai.mit.edu.

Abstract

We present a novel scheme (“Categorical Basis Functions”, CBF) for object class representation in the brain and

contrast it to the “Chorus of Prototypes” scheme recently proposed by Edelman [4]. The power and ﬂexibility of CBF

is demonstrated in two examples. CBF is then applied to investigate the phenomenon of Categorical Perception, in

particular the ﬁnding by B¨ulthoff et al. [2] of categorization of faces by gender without corresponding Categorical

Perception. Here, CBF makes predictions that can be tested in a psychophysical experiment. Finally, experiments are

suggested to further test CBF.

This report describes research done within the Center for Biological and Computational Learning in the Department of Brain and Cognitive

Sciences and in the Artiﬁcial Intelligence Laboratory at the Massachusetts Institute of Technology. This research is sponsored by a grant from

Ofﬁce of Naval Research under contract No. N00014-93-1-3085, Ofﬁce of Naval Research under contract No. N00014-95-1-0600, National

Science Foundation under contract No. IIS-9800032, and National Science Foundation under contract No. DMS-9872936. Additional support

is provided by: AT&T, Central Research Institute of Electric Power Industry, Eastman Kodak Company, Daimler-Benz AG, Digital Equipment

Corporation, Honda R&D Co., Ltd., NEC Fund, Nippon Telegraph & Telephone, and Siemens Corporate Research, Inc. M.R. is supported

by a Merck/MIT Fellowship in Bioinformatics.

1 Introduction

Object categorization is a central yet computationally difﬁ-

cult cognitive task. For instance, visually similar objects can

belong to different classes, and conversely, objects that ap-

pear rather different can belong to the same class. Categoriza-

tion schemes may be based on shape similarity (e.g., “human

faces”), on conceptual similarity (e.g., “chairs”), or on more

abstract features (e.g., “Japanese cars”, “green cars”). What

are possible computational mechanisms underlying catego-

rization in the brain?

Edelman has recently presented an object representation

scheme called “Chorus of Prototypes” (COP) [4] where ob-

jects are categorized by their similarities to reference shapes,

or “prototypes”. While this categorization scheme is of ap-

pealing simplicity, the reliance on a single metric in a global

shape space imposes severe limitations on the kinds of cate-

gories that can be represented. We will discuss these short-

comings and present a more general model of object cate-

gorization along with a computational implementation that

demonstrates the scheme’s capabilities, relate the model to

recent psychophysical observations on categorical perception

(CP), and discuss some of the model’s predictions.

2 Chorus of Prototypes (COP)

In COP, “the stimulus is ﬁrst projected into a high-

dimensional measurement space, spanned by a bank of

[Gaussian] receptive ﬁelds. Second, it is represented by

its similarities to reference shapes” ([4], p. 112, caption to

Fig. 5.1).

The categorization of novel objects in COP proceeds as fol-

lows (ibid., p. 118):

1. A category label is assigned to each of the training ob-

jects (“reference objects”), for each of which an RBF

network is trained to respond to the object from every

viewpoint;

2. a test object is represented by the activity pattern it

evokes over all the output units of the reference object

RBF networks (i.e., the “similarity to reference shapes”

above);

3. categorization is performed using the activity pattern

and the labels associated with the output units of the ref-

erence object RBF networks. Categorization procedures

explored were winner-take-all, and -nearest-neighbor

using the training views (this time taking the prototypes

to be not the objects but the object views), i.e., the cen-

ters of individual RBF units in each network, with the

class label in this cased based on the label of the major-

ity of the closest stored views to the test stimulus.

The appealingly simple design of COP also seems to be its

most serious limitation: While a representation based solely

on shape similarities seems to be suited for the taxonomy of

some novel objects (cf. Edelman’s example of the descrip-

tion of a giraffe as a “cameleopard” [4]), such a representa-

tion appears too impoverished when confronted with objects

that can be described on a variety of levels: A car, for in-

stance, can look like several other cars (and also unlike many

other objects), but it could also be described as a “cheap” car,

a “green” car, a “Japanese” car, an “old” car, etc. — dif-

ferent qualities that are not simply or naturally summarized

by shape similarities to individual prototypes but nevertheless

provide useful information to classify or discriminate the ob-

ject in question from other objects of similar shape. The fact

that an object can be described in such abstract categories,

and that this information appears to be used in recognition

and discrimination, as indicated by the ﬁndings on categori-

cal perception (see below), calls for an extension of Chorus to

permit the use of several categorization schemes in parallel,

to allow the representation of an object within the framework

of a whole dictionary of categorization schemes that offers a

more natural description of an object than one global shape

space.

While Edelman ([4], p. 244) suggests a reﬁnement of Cho-

rus where weights are assigned to different dimensions driven

by task demands, it is not clear how this can happen in one

global shape space if two objects can be judged as very sim-

ilar under one categorization scheme but as rather different

under another (as, for instance, a chili pepper and a candy

apple in terms of color and taste, resp.). Use of different cat-

egorization schemes appears to require reversible temporary

warping of shape space depending on which categorization

scheme is to be used, which runs counter to the notion of one

general representational space.

3 A Novel Scheme: Categorical Basis

Functions (CBF)

In CBF, the receptive ﬁelds of stimulus-coding units in mea-

surement space are not constrained to lie in any speciﬁc

class — unlike in COP, there are no class labels associated

with these units. The input ensemble drives the unsuper-

vised,i.e., task-independent learning of receptive ﬁelds. The

only requirement is that the receptive ﬁelds of these stim-

ulus space-coding units (SSCUs) cover the stimulus space

sufﬁciently to allow the deﬁnition of arbitrary classiﬁca-

tion schemes on the stimulus space (in the simplest version,

“learning” just consists in storing all the training examples by

allocating an SSCU to each training stimulus).

These SSCUs in turn serve as inputs to units that are trained

on categorization tasks in a supervised way — in fact, if each

training stimulus is represented by one SSCU, then the net-

work would be identical to a standard radial basis function

(RBF) network. Figure 1 illustrates the CBF scheme.

Novel stimuli in this framework evoke a characteristic ac-

tivation pattern over the existing categorization units (as well

as over the SSCUs). In fact, CBF can be seen as an extension

of COP: instead of a representation based on similarity in a

global shape space alone (as in “the object looks like xyz”,

The idea of representing color through similarities to prototype

objects seems especially awkward considering that it ﬁrst requires

the build-up of a library of objects of a certain color with the sole

purpose of allowing to “average out” object shape.

"cats"

"dogs"

Figure 2: Illustration of the cat/dog stimulus space. The stimulus

space is spanned by six objects, three “cats” and three “dogs”. Our

morphing software [13] allows us to generate 3D objects that are ar-

bitrary combinations of the six prototypes. The lines show possible

morph directions between two prototypes each, as used in the test

set.

where x,y,z can be objects for which individual units have

been learned), abstract features, which are the result of prior

category learning, are equally valid for the description of an

object (as in “the object looks expensive/old/pink”). Hence,

an object is not only represented by expressing its similarity

to learned shapes but also by its membership to learned cate-

gories, providing a natural basis for object description.

In the proof-of-concept implementation described in the

following, SSCUs are identical to the view-tuned units from

the model by Riesenhuber and Poggio [12] (in reality, when

objects can appear from different views, they could also be

view-invariant — note that the view-tuned units are already

invariant to changes in scale and position [12]). For simplic-

ity, the unsupervised learning step is done using k-means, or

just by storing all the training exemplars, but more reﬁned

unsupervised learning schemes, which better reﬂect the struc-

ture of the input space, such as mixture-of-Gaussians or other

probability density estimation schemes, or learning rules that

provide invariance to object transformations [15] are likely

to improve performance. Similarly, the supervised learning

scheme used (Gaussian RBF) can be replaced by more bio-

logically plausible or more sophisticated algorithms (see dis-

cussion).

3.1 An Example: Cat/Dog Classiﬁcation

To illustrate the capabilities of CBF, the following simulation

was performed: We presented the hierarchical object recog-

nition system (up to the C2 layer) of Riesenhuber & Poggio

[12] with 144 randomly selected morphed animal stimuli, as

used in a very recent monkey physiology experiment [6] (see

Fig. 2).

A view-tuned model unit was allocated for each training

stimulus, yielding 144 view-tuned units (results were similar

0 (cat) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (dog)

−2

−1.5

−1

−0.5

0.5

1.5

response

position on morph line

Figure 3: Response of the categorization unit (based on 144 SSCU,

256 afferents to each SSCU, ) along the nine class

boundary-crossing morph lines. All stimuli in the left half of the

plot are “cat” stimuli, all on the right-hand side are “dogs” (the class

boundary is at 0.5). The network was trained to output 1 for a cat and

-1 for a dog stimulus. The thick dashed line shows the average over

all morph lines. The solid horizontal line shows the class boundary

in response space.

if the 144 stimuli were clustered into 30 units using k-means,

see appendix). The activity patterns over the 144 units to each

of the 144 stimuli were used as inputs to train a gaussian

RBF output unit, using the class labels 1 for cat and -1 for

dog as the desired outputs. The categorization performance

of this unit was then tested with the same test stimuli as in the

physiology experiment (which were not part of the training

set). More precisely, the testing set consisted of the 15 lines

through morph space connecting each of the prototypes, each

subdivided into 10 intervals, with the exclusion of the stimu-

lus at the mid-points (which in the case of lines crossing the

class boundary would lie right on the class boundary, with

an undeﬁned label), yielding a total of 126 stimuli. Figure

3 shows the response of the categorization unit to the stimuli

on the category boundary-crossing morph lines, together with

the desired label. A categorization was counted as correct if

the sign of the network output was identical to the sign of the

class label.

Performance on the training set was 100% correct, perfor-

mance on the test set was 97%, comparable to monkey per-

formance, which was over 90% [6]. The four categorization

errors the model makes lie right at the class boundary.

3.2 Introduction of parallel categorization schemes

To demonstrate how different classiﬁcation schemes can be

used in parallel within CBF, we also trained a second net-

work to perform a different categorization task on the same

stimuli. The stimuli were resorted into three classes, each

based on one cat and one dog prototype. For this categoriza-

tion task, three category units were trained (on a training set

of 180 animal morphs, taken from training sets of an ongoing

Riesenhuber & Poggio

model of view-tuned units

task ntask 2 task-related units

(unsupervised training)

(supervised training)

task 1

stimulus space-covering units (SSCUs)

MAX

weighted sum

Figure 1: Cartoon of the CBF categorization scheme, illustrated with the example domain of cars. Stimulus space-covering units (SSCUs)

are the view-tuned units from the model by Riesenhuber & Poggio [12]. They self-organize to respond to representatives of the stimulus

space so that they “cover” the whole input space, with no explicit information about class boundaries. These units then serve as inputs to

task-related units that are trained in a supervised way to perform the categorization task (e.g., to distinguish American-built cars from imports,

or compacts from sedans etc.). In the proof-of-concept implementation described in this paper, the unsupervised learning stage is done via

k-means clustering, or just by storing all the training exemplars, and the supervised stage consists of an RBF network.

physiology project), each one to respond at a level of 1 for

stimuli belonging to “its” class and a level of -1 for stimuli

from the other two classes. Each category unit received in-

put from the same 144 SSCUs as the cat/dog category unit

described above.

As mentioned, it is an open question how to best perform

multi-class classiﬁcation. We evaluated two strategies: i) cat-

egorization is said to be correct if the maximally activated

category unit corresponds to the true class (“max” case); ii)

categorization is correct if the signs of the three category units

are equal to the correct answer (“sign” case).

Performance on the training set in the “max” as well as in

the “sign” case was 100% correct. On the testing set, per-

formance using the “max” rule was 74%, whereas the perfor-

mance for the “sign” rule was 61% correct, the lower numbers

on the test set as compared to the cat/dog task reﬂecting the

increased difﬁculty of the three-way categorization. We are

currently training a monkey on the same categorization task,

and it will be very interesting to compare the animal’s perfor-

mance on the test set to the model’s performance.

4 Interactions between categorization and

discrimination: Categorical Perception

When discriminating objects, we commonly do not only rely

on simple shape cues but also take more complex features

into account. For example, we can describe a face in terms

of its expression, its age, gender etc. to provide additional

information that can be used to discriminate this face from

other faces. This suggests that training on categorization

tasks could be of use also for object discrimination.

The inﬂuence of categories on perception is expected to be

especially strong for stimuli in the vicinity of a class bound-

ary: In the cat/dog categorization task described in the pre-

vious paragraph, the goal was to classify all members of one

class the same way, irrespective of their shape. Hence, when

presented with two stimuli from the same class, the catego-

rization result will ideally not allow to discriminate between

the two stimuli. On the other hand, two stimuli from differ-

ent classes are labelled differently. Thus, one would expect

greater accuracy in discriminating stimulus pairs from differ-

ent classes than pairs belonging to the same class (note that in

this paper we are not dealing with the discrimination process

itself — while several mechanisms have been proposed, such

as a representation based directly on the SSCU activation pat-

tern, or one based on the activity pattern over prototypes such

as view-invariantRBF units [4, 9], we in this section only dis-

cuss how prior training on categorization tasks can provide

additional information to the discrimination process, without

regard to how the latter might be implemented computation-

ally).

This phenomenon, called Categorical Perception [8],

where linear changes in a stimulus dimension are associated

Multi-class classiﬁcation is a challenging and yet unsolved

computational problem — the scheme employed here was chosen

for its simplicity.

with nonlinear perceptual effects, has been observed in nu-

merous experiments, for instance in color or phoneme dis-

crimination.

A recent experiment by Goldstone [7] investigated Cate-

gorical Perception (CP) in a task involving training subjects

on a novel categorization. In particular, subjects were trained

on a combined task that ﬁrst required them to categorize stim-

uli (rectangles) according to size or brightness or both and

then to discriminate stimuli from the same set in a same-

different design.

The study found evidence for acquired distinctiveness,

i.e., cues (size and brightness, resp.) that were task-relevant

became perceptually salient even during other tasks. The

task-relevant interval of the task-relevant dimension became

selectively sensitized, i.e., discrimination of stimuli in this

range improved (local sensitization at the class-boundary —

the classical Categorical Perception effect), but dimension-

wide sensitization was, to a lesser degree, also found (global

sensitization). Less sensitization occured when subjects had

to categorize according to size and brightness, indicating

competition between those dimensions.

4.1 Categorical Perception in CBF

The CBF scheme suggests a simple explanation for category-

related inﬂuences on perception: When confronted with two

stimuli differing along the stimulus dimension relevant for

categorization, the different respective activation levels of the

categorization unit provide additional information to base the

discrimination on, and thus discrimination across the cate-

gory boundaryis facilitated, as compared to the case where no

categorization network has been trained. Fig. 4 illustrates this

idea: The (continuous) output of the categorization unit(s)

provides additional input to the discrimination network in a

discrimination task. In a categorization task, the output of the

category unit is thresholded to arrive at a binary decision, as

is the output of the discrimination network in a yes/no dis-

crimination task.

In particular, global sensitization would be expected as a

side effect of training the categorization unit if its response

is not constant within the classes, which is just what was ob-

served in the simulations shown above (Fig. 3): The “catness”

response level of the categorization unit decreases as stim-

uli are morphed from the cat prototypes to cats at the class

boundary and beyond. Its output is then thresholded to arrive

at the categorization rule, which determines the class by the

sign of the response (cf. above). Local sensitization (Cate-

gorical Perception) occurs as a result of a stronger response

difference of the categorization unit for stimulus pairs cross-

ing the class boundary than for pairs where both members

belong to the same class.

In agreement with the experiment by Goldstone [7], we

would expect competition between different dimensions in

CBF when class boundaries run along more than one dimen-

sion (e.g., two, as in the experiment), as compared to a class

boundary along one dimension only: For the same physi-

cal change in one stimulus property (one dimension), the re-

category decision

task n

stimulus space-covering units

task-related unit(s)

same/different decision

network

discrimination

Figure 4: Sketch of the model to explain the inﬂuence of experience with categorization tasks on object discrimination, leading to global

and local (Categorical Perception) sensitization. Key is the input of the category-tuned unit(s) to the discrimination network (which is shown

here for illustrative purposes as receiving input from the SSCU layer, but this is just one of several alternatives), shown by the thick horizontal

arrow.

sponse of the categorization unit should change more in the

one-dimensional than in the two-dimensional case since in

the latter case crossing the class boundary requires change of

the input in both dimensions.

4.2 Categorization with and without Categorical

Perception

B¨ulthoff et al. have recently reported [2] that discrimination

between faces is not better near the male/female boundary,

i.e., they did not ﬁnd evidence for CP in their study, even

though subjects could clearly categorize face images by gen-

der.

Such categorization without CP can be understood within

CBF: Following the simulations described above, CP in CBF

is expected if the response of the category unit shows a

stronger drop across the class boundary than within a class,

for the same distance in morph space. Suppose now the slope

of the categorization unit’s response is uniform across the

stimulus space, from the prototypical examplars for one class

(e.g., the “masculine men”) to the prototypical exemplars of

the other class (e.g., the “feminine women”). If the subject

is forced to make a category decision, e.g., using the sign

of the category unit’s response, as above, the stimulus en-

semble would be clearly divided into two classes (noise in

the category unit’s response would lead to a smoothed out

sigmoidal categorization curve). However, in a discrimina-

tion task, the difference of response values of the category

unit for two stimuli across the boundary would not be differ-

ent from the difference for two stimuli within the same class

(if the within-pair distance for both pairs with respect to the

category-relevant dimension is the same). Hence, no Cate-

gorical Perception, or, more precisely, any local sensitization

would be expected.

In CBF, the slope of a category unit’s response curve is in-

ﬂuenced by the extent of the training set with respect to the

class boundary. To demonstrate this, we trained a cat/dog

category unit as described above using four training sets dif-

fering in how close the representatives of each class were al-

lowed to get to the class boundary (which was again deﬁned

by an equality in the sum over the cat and dog coefﬁcients).

Introducing the “crossbreed coefﬁcent”, , of a stimulus be-

longing to a certain class (cat or dog) as the coefﬁcient sum of

its corresponding vector in morph space over all prototypes of

the other class (dog or cat, resp.), training sets differed in the

maximum value of , ranging from 0.1 to 0.4 in steps of 0.4 (

values of stimuli in each training set were chosen uniformly

within the permissible interval, and training sets contained an

equal number of stimuli, i.e., 200). The ﬁrst case, ,

thus contained stimuli that were very close to the prototypical

representatives of each class, whereas the set con-

tained cats with strong dog components and dogs with strong

cat components, resp.

Fig. 5 shows how the average response along the morph

lines differs for the two cases and . The leg-

end shows in parentheses the performance on the training set

and on the test set, resp.; the number after the colon shows the

0 (cat) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (dog)

−1

−0.5

0.5

0.1 (100,93): 0.9

0.4 (100,94): 1.6

Figure 5: Average responses over all morph lines for the two net-

works (parameters as in Fig. 3) trained on data sets with and

, respectively. The legend shows in parentheses the perfor-

mance (on the training set and on the test set, resp.); the number af-

ter the colon shows the average change of response across the morph

line (absolute value of response difference at positions 0.4 and 0.6)

divided by the response difference for that morph line averaged over

all other stimulus pairs 0.2 units apart.

average change of response across the morph line (absolute

value of response difference at positions 0.4 and 0.6) relative

to the response difference for that morph line averaged over

all other stimulus pairs 0.2 units apart. While categorization

performance in both cases is very similar (93% vs. 94% cor-

rect on the test set), the relative change across the class border

is much greater for the case than in the case,

where the response drops almost linearly from position 0.2 to

position 0.9 on the morph line (incidentally, the relative drop

of 1.6 in the case is very similar to the drop observed

in prefrontal cortical neurons of a monkey trained on the same

task [6] with the same maximum value).

Thus, CBF predicts that the amount of categorical percep-

tion is related to the extent of the training set with respect

to the class boundary: If the training set for a categorization

task is sparse around the class boundary (as is the case for

face gender classiﬁcation where usually most of the training

examplars clearly belong to one or the other category with a

comparatively lower number of androgynous faces), a lower

degree of CP would be expected than in the case of a training

set that extends to the class boundary.

It will be interesting to test this hypothesis experimentally

by training subjects on a categorization task where differ-

ent groups of subjects are exposed to subsets of the stimu-

lus space differing in how close the training stimuli come to

the boundary. Category judgment can then be tested for (ran-

domly chosen) stimuli lying on lines in morph space passing

through the class boundary. In a second step, subjects would

be switched to a discrimination task to look for evidence of

CP. The prediction would be that while subjects in all groups

would divide the stimulus space into categories (not neces-

sarily in the same way or with the same degree of certainty,

as there would be uncertainty regarding the exact location of

the class boundary that increases for groups that were only

trained on stimuli far away from the boundary), the degree of

CP should increase with the closeness of the training stim-

uli to the true class boundary. Naturally, the categorization

scheme used in this task should be novel for the subjects to

avoid confounding inﬂuences of prior experience. Hence,

a possible alternative to the cat/dog categorization task de-

scribed above would be to group car prototypes (randomly)

into two classes and then train subjects on this categorization

task.

One issue to be addressed is whether the fact that subjects

are trained on different stimulus sets will inﬂuence discrimi-

nation performance (even in the absence of any categorization

task). For the present case, simulations indicate only a small

effect of the different training sets on discrimination perfor-

mance (see Fig. 6), but it is unclear whether this transfers

to other stimulus sets. However, while the different train-

ing groups might differ in their performance on the untrained

part of the stimulus space due to the different SSCUs learned,

the prediction is still that the area of improved discriminabil-

ity should coincide with the subjects’ location of the class

boundary rather than with the extent of the training set. To

avoid range and anchor effects [3] (see footnote below), stim-

uli should be chosen from a continuum in morph space, e.g., a

loop.

Why has no CP been found for gender classiﬁcation while

other studies have found evidence for CP in emotion classi-

ﬁcation using line drawings [5] as well as photographic im-

ages of faces [3]? For the case of emotions, subjects are

likely to have had experience with not just the “prototypical”

facial expression of an emotion but also with varying combi-

nations and degrees of expressions and have learned to cate-

gorize them appropriately, corresponding to the case of high

values in the cat/dog case described above, where CP would

be expected.

5 COP or CBF? — Suggestion for

Experimental Tests

It appears straightforward to design a physiological experi-

ment to elucidate whether COP or CBF better model actual

category learning: A monkey is trained on two different cat-

egorization tasks using the same stimuli (for example, the

cat/dog stimuli used in the simulations above). The responses

of prefrontal cortical neurons (which have been shown in a

preliminary study using these stimuli [6] to carry category in-

formation) to the test stimuli are then recorded from while

the monkey is passively viewing the test stimuli (e.g., dur-

ing a ﬁxation task). In CBF, we would expect to ﬁnd neu-

rons showing tuning to either categorization scheme, whereas

COP would predict that cell tuning reﬂects a single metric in

shape space. In the former case, it will be interesting to com-

pare neural responses to the same stimuli while the monkey

is performing the two different categorization tasks to look

CP has also been claimed to occur for facial identity [1], but the

experimental design appears ﬂawed as stimuli in the middle of the

continuum were presented more often than the ones at the extremes,

and prototypes were easily extracted from the discrimination task,

biasing subjects discrimination responses towards the middle of the

continuum [11].

0.5

c=0.1

distance between activity patterns

0.5

c=0.4

distance between activity patterns

0.5

−0.1

0.1

0.2

(c=0.1) − (c=0.4)

distance between activity patterns

Figure 6: Comparison of Euclidean distances of activation patterns (over 144 SSCU, as used in the previous simulations) for stimuli lying at

two different positions on morph lines for the cases of and . The left panel shows the average euclidean distance between the

activity pattern for a stimulus at position (y-axis) and a stimulus on the same morph line at position (x-axis), for the network trained on

the data set with (note that there were no stimuli at the 0.5 position). The middle panel shows the corresponding plot for the network

trained on , while the right panel shows the difference between the two plots: Differences between the two networks are usually quite

low in magnitude (note the different scaling on the z-axes), suggesting that discrimination performance in the case should be close to

the case.

at response enhancement/suppression of neurons involved in

the different categorization tasks.

6 Conclusions

We have described a novel model of object representation

that is based on the concurrent use of different categorization

schemes using arbitrary class deﬁnitions This scheme pro-

vides a more natural basis for classiﬁcation than the “Chorus

of Prototypes” with its notion of one global shape space. In

our framework, called “Categorical Basis Functions” (CBF),

the stimulus space is represented by units whose receptive

ﬁelds self-organize without regard to any class boundary. In

a second, supervised stage, categorization units receiving in-

put from the stimulus space-covering units (SSCUs) come to

learn different categorization task(s). Note that this just de-

scribes the basic framework — one could imagine, for in-

stance, the addition of slow time-scale top-down feedback to

the SSCU layer, analogous to the GRBF networks of Pog-

gio and Girosi [10], that could enhance categorization per-

formance by optimizing the receptive ﬁelds of SSCUs. Simi-

larly, the algorithms used to learn SSCUs (k-means clustering

or simple storage of all training examples) and the catego-

rization units (RBF) should just be taken as examples. For

instance, (a less biological version of) CBF could also be im-

plemented using Support Vector Machines [14]. In this case,

a categorization unit would only be connected to a sparse sub-

set of SSCUs, paralleling the sparse connectivity observed in

cortex.

A ﬁnal note concerns the advantages of CBF for the learn-

ing and representation of class hierarchies: While the simula-

tions presented in this paperlimited themselves to one level of

categorization, it is easily possible to add additional layers of

sub- or superordinate level units receiving inputs from other

categorization units. For instance, a unit learning to classify

a certain breed of dog could receive input not only from the

SSCUs but also from a “generic dog” unit, or a “quadruped”

unit could be trained receiving inputs from units selective for

different classes of four-legged animals, in both cases greatly

simplifying the overall learning task.

Acknowledgements

Thanks to Christian Shelton for k-means and RBF MATLAB

code, and for the morphing programs used to generate the

stimuli [13]. Special thanks to Prof. Eric Grimson for help

with taming the “AI Lab Publications Approval Form”.

References

[1] Beale, J. and Keil, F. (1995). Categorical effects in the

perception of faces. Cognition 57, 217–239.

[2] B¨ulthoff, I., Newell, F., Vetter, T., and B¨ulthoff, H.

(1998). Is the gender of a face categorically perceived?

Invest. Ophthal. and Vis. Sci. 39(4), 812.

[3] Calder, A., Young, A., Perrett, D., Etcoff, N., and Row-

land, D. (1996). Categorical perception of morphed fa-

cial expressions. Vis. Cognition 3, 81–117.

[4] Edelman, S. (1999). Representation and Recognition in

Vision. MIT Press, Cambridge, MA.

[5] Etcoff, N. and Magee, J. (1992). Categorical perception

of facial expressions. Cognition , 227–240.

[6] Freedman, D., Riesenhuber, M., Shelton, C., Poggio,

T., and Miller, E. (1999). Categorical representation of

visual stimuli in the monkey prefrontal (PF) cortex. In

Soc. Neurosci. Abs., volume 29, 884.

[7] Goldstone, R. (1994). Inﬂuences of categorization on

perceptual discrimination. J. Exp. Psych.: General 123,

178–200.

[8] Harnad, S. (1987). Categorical perception: The ground-

work of cognition. Cambridge University Press, Cam-

bridge.

[9] Poggio, T. and Edelman, S. (1990). A network that

learns to recognize 3D objects. Nature 343, 263–266.

[10] Poggio, T. and Girosi, F. (1989). A theory of networks

for approximation and learning. Technical Report AI

Memo 1140, CBIP paper 31, MIT AI Lab and CBIP,

Cambridge, MA.

[11] Poulton, E. (1975). Range effects in experiments on

people. Am. J. Psychol. 88, 3–32.

[12] Riesenhuber, M. and Poggio, T. (1999). Hierarchical

models of object recognition in cortex. Nature Neurosci.

2, 1019–1025.

[13] Shelton, C. Three-dimensional correspondence. Mas-

ter’s thesis, MIT, (1996).

[14] Vapnik, V. (1995). The Nature of Statistical Learning

Theory. Springer, New York.

[15] Wallis, G. and Rolls, E. (1997). A model of invariant

object recognition in the visual system. Prog. Neuro-

biol. 51, 167–194.

Appendix: Parameter Dependence of

Categorization Performance for the Cat/Dog

Task

0 (cat) 0.2 0.4 0.6 0.8 1 (dog)

−4

−2

n=144 a=40 sig=0.2 (100,66): 0.48

0 (cat) 0.2 0.4 0.6 0.8 1 (dog)

−5

n=144 a=40 sig=0.7 (100,58): 0.85

0 (cat) 0.2 0.4 0.6 0.8 1 (dog)

−1

−0.5

0.5

n=144 a=256 sig=0.2 (100,94): 1

0 (cat) 0.2 0.4 0.6 0.8 1 (dog)

−1.5

−1

−0.5

0.5

n=144 a=256 sig=0.7 (100,97): 2.1

Figure 7: Output of the categorization unit trained on the cat/dog

categorization task from section 3.1, for 144 SSCUs (where each

SSCU was centered at a training example) and two different values

for the of the SSCU and the number of afferents to each SSCU

(choosing either all 256 C2 units or just the 40 strongest afferents,

cf. [12]). The numbers in parentheses in each plot title refer to the

unit’s categorization performance on the training and on the test set,

resp. The number on the right-hand side is the average response

drop over the category boundary relative to the average drop over

the same distance in morph space within each class (cf. section 4.2).

Note the poor performance on the test set for a low number of affer-

ents to each unit, which is due to overtraining. The plot in the lower

right shows the unit from Fig. 3.

0 (cat) 0.2 0.4 0.6 0.8 1 (dog)

−1.5

−1

−0.5

0.5

n=30 a=40 sig=0.2 (98,91): 0.78

0 (cat) 0.2 0.4 0.6 0.8 1 (dog)

−1

−0.5

0.5

1.5

n=30 a=40 sig=0.7 (99,88): 0.95

0 (cat) 0.2 0.4 0.6 0.8 1 (dog)

−1

−0.5

0.5

n=30 a=256 sig=0.2 (99,91): 0.77

0 (cat) 0.2 0.4 0.6 0.8 1 (dog)

−1

−0.5

0.5

n=30 a=256 sig=0.7 (100,94): 1.2

Figure 8: Same as the above ﬁgure, but for a SSCU representation

based on just 30 units, chosen by a k-means algorithm from the 144

centers in the previous example.

AIM 1679

AIM-1679 AIM-1679

Navigation menu

Versions of this User Manual:

Views

Navigation