Simple Brain Simulation Resources 
Home > Theoretical Background > Neural Networks to 1990

DanceFloor by dynamic artist Jenny James. Copyright 2006 (used with permission)

History and Principles of Neural Networks From 1960 to 1990

by David D. Olmsted (Copyright - 1998, 1999, 2006. Free to use for personal and educational purposes)
Last Revised August 27, 2006

The Classic Perceptrons (1962)

Figure 1
Classic Perceptron: Normalized Inputs

In 1962 Frank Rosenblatt published a book which combined the concepts of his original perceptron with those of ADALINE to come up with the classic perceptron design shown in figure 1. In contrast to ADALINE, perceptrons are based on repulsive learning in which only the weights on the non-active lines are changed in response to an error. In other words the weights change only in response to a misclassification. Thus the weight values are not pulled towards some defined goal but are pushed away from non-goals. Consequently each subcircuit can represent a whole class of patterns.

The adaptive multiplication factors (weights) are now placed before the summation node like ADALINE instead of after the node as in the original perceptron. In addition all convergent subcircuits now share a common set of inputs instead of having randomly connected inputs (although the initial values of the weights may be randomized which would effectively accomplish the same thing). These changes allowed the input pattern to dispense with the binary line signal requirement in favor of analog signals which could represent the frequency of an action potential pulse or the ionic charge on a neuron. Yet, in order for patterns to be reliably discriminated by perceptrons the pattern inputs had to be normalized, that is the numbers in each pattern had to add up to the same value - usually one. Using analog values (and thus analog equations) also required that the binary threshold be replaced with a subtractive threshold.

Figure 2
Classic Perceptron: Non-Normalized Inputs

Figure 2 shows the effect of non-normalized input patterns. The values of pattern one add up to 1.0 yet the values of pattern two add up to 1.2. No combination of weight values or threshold values will allow each of these patterns to have their own unique convergent subcircuit output.. However, if an additional weight is placed after the summation operation then classifications of non-normalized patterns are possible. Yet this does not seem to have been done for the manipulation of post summation node value is not easily incorporated (mathematically) into the learning procedures used to find the pre-summation operation weight values.

As was seen with ADALINE changing convergent subcircuit weights only shifts the angle and height of the equal value lines but since the perceptron uses repulsive learning that equal value line now becomes the basis for defining the border between pattern classes. The equal value graph for the figure 1 example is shown in figure 3. The axis's of the graph list the values of the pattern input lines which will only produce an output from the subcircuit's threshold if they are above or to the right of their equal value line. Thus the value on input "B" must be above .625 in order for the top subcircuit (represented by the red line) to produce an output. Since the top subcircuit has a zero valued weight on the "A" line it can be any value. In contrast the input values for the bottom subcircuit must be above the blue line for it to produce an output. Since the greatest valued output is the one selected the input with the greatest effective distance from its equal value line is selected. Consequently, the perceptron has the same linear limitation as the ADALINE although in this case it is called linear separability.

Figure 3
Pattern Separation Space for Fig. 1

During learning the weights are changed according to an increment rule such that the weights on a misclassified pattern are increased by an amount proportional to the total error. This has the effect of moving the equal value line further upward and rightward. The process of finding the weight values giving the greatest pattern separation is complex and requires the use of various optimization procedures. These optimization procedures work best when the changes they command all have the same effect on the process being optimized. Since the subtractive threshold has different effect from the weights it was generally set to some fixed value and ignored. Full optimization as opposed to piecewise optimization requires that the whole problem be considered with all possible inputs. Consequently, all the perceptron optimization procedures in the literature require that all the patterns be known before the optimization procedure begins which really limits their use.

These optimization procedures work by initializing the weights to some low value, not necessarily random. All the patterns are presented and for those patterns which are incorrectly classified, the degree of mismatch between the response of the correct convergent subcircuit and the incorrectly responding convergent subcircuit is noted. All these mismatches are added together in some fashion (depending on the type of optimization procedure) to give the total error which is to be minimized and all the weights on the incorrectly responding templates are incremented upwards in proportion to the total error. (for a comprehensive mathematical review of all perceptron improvements centered around linear discriminant functions see chapter 5 of Duda and Hart).

Rosenblatt summed up perceptrons in this passage from his 1962 book (page 28):

"Perceptrons are not intended to serve as detailed copies of any actual nervous system. They're simplified networks, designed to permit the study of lawful relationships between the organization of a nerve net, the organization of its environment, and the 'psychological' performances of which it is capable. Perceptrons might actually correspond to parts of more extended networks and biological systems; in this case, the results obtained will be directly applicable. More likely they represent extreme simplifications of the central nervous system, in which some properties are exaggerated and others suppressed. In this case, successive perturbation and refinements of the system may yield a closer approximation."

The 1960's also saw the growth of artificial intelligence techniques based mostly on net search techniques and higher order logic languages known as propositional and predicate calculus. The aim of artificial intelligence (A.I.) researchers was to simulate intelligent processes at a level more abstract than that of the direct neural level and they were having good success at the time. They could see what was required of any intelligent machine so when the supporters of perceptrons began to oversell its potential one of A.I.'s founders, Marvin Minsky (who had started out with neural networks), with Seymour Papert were inspired in 1969 to write a book describing the Perceptron's inherent limitations. Since the perceptron was the most sophisticated neural network idea at the time that book essentially ended neural network research in the United States for a period of time.

Duda, R.O. & Hart, P.E. (1973). Pattern Classification and Scene Analysis, John Wiley & Sons, New York

Minsky, M. & Papert, S. (1969). Perceptrons, An Introduction to Computational Geometry, MIT Press, Cambridge, MA

Rosenblatt, F (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Spartan books, Washington D.C.

The Association Networks of Kohonen and Anderson (1972)

Still enamored by the idea that brain memory is based upon distributed pathways the next stage in neural network research was to avoid the pattern classification problems as represented by the perceptron and instead concentrate on how these memories might be formed. This stage in neural network research began in 1972 with the publication of two papers on the subject by different author's working independently of each other. One was by James Anderson who was inspired by the William James - Donald Hebb model of memory for he states in his paper:

"If a group of neurons projects to another we shall show that strengthening or weakening the synaptic connection  between the two groups according to a simple multiplicative function of activity in pre and postsynaptic cells automatically generates an interactive memory ..." (page 182)
Figure 4
An Association Network

In a similar way, Teuveo Kohonen from the University of Helsinki in Finland was inspired by an idea prevalent at the time which was that memory may be holographic in nature. The result was a network identical to that proposed by Anderson. Interestingly, both used matrix mathematics to describe their ideas and apparently because of this did not realize that what they produced was an array of analog ADALINE circuits.

Figure 4 shows such a network, the purpose of which is to associate or correlate an input pattern (vector) with some desired output pattern (vector). The weights for each convergent subcircuit represent a row in a matrix such that the calculations done by each convergent subcircuit itself is the inner (dot) product between the input vector and the weight vector. Consequently, figure 15 represents a square 3 row by 3 column matrix. An association network is different from perceptron like networks in that a presented pattern is supposed to activate a set of output lines instead of just one. Consequently, a single convergence for a decision is not required in association networks.

Producing an an output pattern which matches exactly the desired output pattern requires that the input patterns don't overlap thus avoiding the subset problem. In vector terminology: this is the same thing as saying the input pattern vectors need to be orthogonal. Alternatively, a successful matches can also occur if all input patterns have differing overall magnetudes (distances) that are proportional to their desired output patterns. Figure 4 shows the failure of pattern association with overlapping and normalized (adding up to 1.8 in this case and thus having the same overal magenetude) patterns using only positive inputs. Yet such a correlation is possible using a mix of negative inputs (represented by negative valued weights) and positive inputs and that the Widrow-Hoff procedure for learning the weight values will work in most cases.

The Widrow-Hoff procedure has trouble when both high valued patterns are mixed with low valued patterns. In this case the weight values for higher valued inputs will tend to oscillate wildly while the weights associated with the low valued inputs would change very little thus preventing any convergence. In order to minimize this problem the Widrow-Hoff procedure is often modified for use in these association network by making the weight changes proportional to the product of the input values and the desired output values instead of just proportional to the input values. This tends to average out the value variations at the cost of longer learning times. Still, to be absolutely sure of convergence where such solutions are possible, other more complex algorithmic (not likely to be realizable in a biological neural setting) optimization procedures are be needed. Solutions are even less likely to occur if the input patterns are not normalized which results in more situations in which the convergent subcircuits have identical values for most of their input values leaving less "room" for the remaining weights to adjust themselves to achieve the desired output values. For an extensive mathematical review of this type of neural network with many its many variations and weight change rules see the 1984 book by Kohonen.

Anderson, James A (1972). A Simple Neural Network Generating an Interactive Memory, Mathematical Biosciences 14:197-220

Kohonen, Teuvo (1972). Correlation Matrix Memories, IEEE Transaction on Computers, C-21:353-359

Kohonen, Teuvo (1984). Self-Organization and Associative Memory, Springer-Verlag, Berlin

The Cognitron - First Multilayered Network (1975)

Figure 5
The Repeatable Unit of the Cognitron

In 1975, inspired by the self-organization ability of the brain, Kunihiko Fukushima from Japan introduced the Cognitron network as an extension of the Perceptrons. Like the Original Perceptron the Cognitron is a pattern regularity detector meaning it is able to learn patterns without some mechanism (a teacher) to indicate the success or non-success of a pattern match. Unlike the original Perceptron the Cognitron is better able to handle (but not perfectly) the pattern subset problem in which one pattern is completely contained within the other. It does this by using a special inhibitory input to the convergent subcircuit node which tends to counteract the effects of larger patterns. Also unlike the original Perceptron the Cognitron can discriminate to some degree between analog patterns although binary patterns are usually presented to the first layer.

A basic unit (section) of the Cognitron having two convergent subcircuits is shown in figure 5. It has four input lines labeled A through D. Notice lines B and C are common to both convergent subcircuits. It learns by increasing the weights on the the active convergent subcircuit lines of the subcircuit selected by the gate comparitor as having the greatest output. Like the original perceptron, the best match is simply strengthened. The rule for adjusting each positive line weight in a selected convergent subcircuit is:

Facilitory Weight Increment (Proportionality Constant) * (Positive Line Value) / (Number of Subcircuit Inputs).

In the Cognitron the weights can increase without limit but this is balanced by increasing the weights on the inhibitory inputs at the same time. The rule for adjusting the inhibitory weight is:

Inhibitory Weight Increment = (Pattern Generality Constant) * [(Sum of all positive inputs into the subcircuit node) / (Total Pattern Value)]

The Pattern Generality Constant in the original paper was 1/2 and it is needed to help define the dynamic equilibrium of the network between the positive and negative line values feeding into the subcircuit node. This dynamic equilibrium in turn defines the degree of pattern discrimination versus pattern generality.

Dynamic equilibrium effects are best seen in the example shown in figure 5 which represents a Cognitron section at a particular moment in time in which the positive weights have a value of 1 and the top inhibitory weight has a value of 0.75 while the bottom subcircuit has an inhibitory weight value of 0.6. These weights allow a subset pattern discrimination (such as 1,1,1 verses 0,1,1). Yet this discrimination is only possible if the bottom inhibitory weight has a value between 0.75 and 0.45. Any weight value outside that range forces the two patterns to be classified as belonging to the same general class. The choice of the Pattern Generality Constant is what ever works for no analytical derivation as to its value for any degree of generalization has yet been devised. Also one would think that it would be a prime candidate to be adaptively determined itself but no method has yet been devised for that either.

With such a narrow range for subset pattern discrimination the number of input lines of a convergent subcircuit needs to be rather small in order to preserve resolution. For example, the percentage difference between patterns having 20 and 21 binary values is not as great as the difference between patterns having 3 and 4 binary values. Consequently, Fukushima divided the Cognitron into repeatable sections and to connect the sections he was forced to use several layers. This use of multiple layers was to inspire other multilayered yet quite different networks in the future (such as the hybrid network below).

Figure 6
The NeoCognitron's Position Independence Strategy

In order to combat the ever increasing line values due to ever increasing weight values Fukushima did not use simple summation and subtraction operations for the convergent subcircuit node. Instead he combined the positive and subtractive nodal inputs with a formula which slows the growth of the output value. The exact equation is (e - h)/(1 + h) where e is the exitory or additive input and h is the inhibitory or subtractive input.

The many layers and sections of the Cognitron allowed it to be modified so that it could respond in the same way (having the same final output) if the same object moved around in a visual field. This modification was called the Neocognitron by Fukushima who published it in 1980. All that was done was to add a final set of summation nodes after a layer's gate comparitor which summed all the outputs from all the convergent subcircuits in the same location of each section. (see figure 6). If a feature pattern was moved it would be in the same location in some new section as it had it had been in its old previously learned section. Consequently it would activate the same final summation node as before to effect position independence which is limited only by the degree of overlap between sections (if the sections do not overlap very much then the pattern would have a low probability of being in its exact relative location in the new section).

Fukushima, K (1975). Cognitron: A Self-organizing Multilayered Neural Network, Biological Cybernetics, 20:121-136

Fukushima, K (1980). Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position, Biological Cybernetics 36, 193-202

The Hopfield Association Network - Revival of the Reverberatory Networks (1982)

Figure 7
Hopfield Network Examples

In 1982, John Hopfield revived interest in neural networks in the United States with the introduction of a new type of reverberatory network of the association type. It differed from the earlier versions by using bi-directional lines (equivalent to two reciprocal unidirectional lines) between summation nodes instead of unidirectional lines and emphasized individual cells (nodes) instead of cell assemblies. The summation nodes have a threshold of zero and only produce an output value of one if that threshold is met or exceeded. The weight values between the nodes can be any number between -1 and 1 so both exitory and inhibitory operations are represented. The general learning idea behind the network is that the weights between the active nodes (those producing an output of 1) will increment while those between all other nodes will decrease. This is usually accomplished by normalizing all the weights. In the original paper by Hopfield all the weights added up to zero.

So in order to learn a correlation (association) between some input and output pattern a separate training session must be held in which the input pattern is presented along with the desired output pattern. In the original paper the weights were assigned initial values at random. The weights between active nodes are then incremented by some amount and the normalization process then decrements the remaining weights. The process is repeated for all patterns until the weights stop changing (or nearly so).

Figure 7 shows some of the characteristics of the Hopfield association network. Both subset discrimination and generalization are possible. A use for this generalization ability emphasized by Hopfield is this network's use as a content addressable memory in which the full memory (pattern) would be retrievable by providing only partial information. As can be seen from the example this mostly amounts to creating a pathway for the common elements in the input patterns. Yet not all associations are possible. In practice the number of different associations which can be learned and recalled is only about 15% of the number of summation nodes. This lack of association formation is shown in the bottom example in figure 18. The small four node network cannot learn the associations presented in the example. The network needs to be larger or the normalization value must be larger than zero. Yet increasing the normalization value will decrease the pattern discrimination ability of the network.

As learning proceeds in a Hopfield network with initially random weight values the weights will change in a manner so that fewer paths will be active. So the fewer paths which are active the closer is the network to fully learning its pattern set. This general tendency towards minimizing active pathways can be measured by summing together the output values of all the weights. Hopfield calls this the energy level of the network which is minimized as the network learns.

While the Hopfield association network can discriminate between almost any pattern given sufficient size the number of connections increases exponentially with the size of the network meaning it will get very complicated very fast. Also the requirements for normalization and binary inputs are other limitations.

Hopfield, J.J. (1982). Neural Networks and Physical Systems with Emergent Collective Computational Abilities, Proceedings of the National Academy of Sciences, 79:2554-2558

Reilly & Cooper's Hybrid Network - The First Hybrid Network (1982)

While the Hopfield network got most of the press during the neural network revival another significant type of network, called at this site a hybrid network, was presented by by Doug Reilly, Leon Cooper and Charles Elbaum. This is a two layered network with each layer accomplishing (although imperfectly) a different strategy (thus the hybrid name). The first (input) layer of convergent subcircuits is responsible for generalization while the second (output) layer of convergent subcircuits is responsible for specification. This turns out to be an important and powerful pattern classification strategy.

Figure 8
Two Layer Hybrid Network

Figure 8 shows the basic workings of a section Reilly and Cooper's hybrid network having two pattern classification outputs and three pattern feature inputs. A network would consist of many such sections. The top output line of this section is activated for class "A" patterns while the bottom output line is activated for class "B" patterns.

If a pattern is presented which does not produce an output then the weights on the active lines of of the front layer convergent subcircuits are incremented to some value greater than one. Since the threshold has a value of one this insures that the pattern will produce an output from the front layer. Next the weight on a branch of that output line (which is now active) in the second layer which has been assigned to represent that class of patterns is now incremented to one resulting in an output.

If a pattern (the blue "A" class pattern) is then presented which is supposed to belong to the same class as the red pattern yet it is so different as to not produce an output then the above procedure is repeated. Another possibility is that a pattern belonging to a different class (the blue "B" class pattern) will falsely produce an output from the "A" class output. In such a case called confusion, the front layer of the "A" class convergent subcircuit must be trimmed back by reducing the weight values on the active lines until an output is no longer produced. These learning procedure will work as long as no too much overlap occurs between the presented patterns. The greater the pattern feature overlap the more front layer convergent subcircuits must be used and each must have fewer input lines.

The main limitation of Reilly's hybrid network is that it has all the limitations found in the Adaline type of convergent subcircuit with its multiplication factor weights and summation node. Consequently, the input patterns need to be normalized (adding up to 1 here). Its great strength is that non-linear patterns can be classified and discriminated.

Reilly, Douglas L.; Cooper, Leon N.; Elbaum, Charles (1982). A Neural Model for Category Learning, Biological Cybernetics 45, 35-41

The Multilayered Back-Propagation Association Networks (1986)

Figure 9
A Back Propagation Convergent Subcircuit

With multiple layered neural networks in the news the question was what was the best way to extend the Widrow-Hoff (Delta) rule to multiple layers. In 1986 three independent groups of researchers: 1) Y. Le Cun 2) D. Parker 3) D. Rumelhart, G. Hinton, & R. Williams came up with essentially the same idea which came to be called the Back Propagation network for the way it distributes pattern recognition errors throughout the network. Yet their ideas turned out to be the neural analog of the a steepest descent algorithm discovered by Paul Werbos in 1974. (I have not seen this paper so I am taking Stephen Grossberg's word on this).

The basic repeatable unit (convergent subcircuit) used in the Back-Propagation network as described by Rumelhart, Hinton, and Williams is shown in figure 9. The weights, represented by w(i,j), can be any positive or negative value but they start out as small, randomly chosen numbers (-0.3 and 0.3 were used). The squashing function (funct.) is generally used to keep the output of the convergent circuit 1 or less. The error is the difference between the desired output and the actual output.

The strategy used in back-propagation is to determine and use the; error contribution made by each layered pathway through the network. In a three layered network the learning rule for each layer is (red indicates the additions relative to the last layer):

Weight value change for last layer = (proportionality constant)(error)(change in output value from last layer’s function)(pre-weight line value) (weight of last layer)

Weight value change for middle layer = (proportionality constant)(error)(change in output value from last layer’s function)(value change from middle layer function)(pre-weight line value)(weight of last layer) (weight of middle layer)

Weight value change for front layer = (proportionality constant)(error)(change in output value from last layer’s function)(value change from middle layer function)(value change from first layer function) (pre-weight line value)(weight of last layer) (weight of middle layer)(weight of first layer)
Figure 10
Learning Rate Example

Notice that the rule for changing the weights in the last layer is the variation of the Widrow-Hoff (Delta) rule used first in their Adaline network but making use of the more accurate but time consuming post-weight line value given by (pre-weight line value) (weight value). The other change is the incorporation of the value change due to the function used to keep the line values between layers less than or equal to 1. If the network is strictly connected layer to layer without any convergent subcircuits bypassing a layer the function component can be left out of the weight change rule for they would no longer produce any possible difference in the error contribution.

So as one moves forward in the network towards the inputs the weight change rules include all the weight values of the more rearward layers in its error path. This produces smaller weight changes nearer to the front. The assumption here is that the weights near the back of the layer are more responsible for the error than those near the front in accordance with The Fundamental Pattern Classification Strategy. Whereas Hybrid networks use just two layers to accomplish this strategy the Back-Propagation networks spread it over several. The result is slow learning often needing as many as thousands of iterations to learn a set of patterns.

An example of how including weight values in the weight change rule is given in figure 10. Weights less than 1 produce simply produce small weight changes.

Parker, D.B. (1986). A Comparison of Algorithms for Neuron-Like Cells. In J.S. Denker (Ed.), Neural Networks for Computing (pp 327-332) New York: American Institute of Physics.

Rumelhart, David D., Hinton Geoffrey E., and Williams, Ronald J. (1986). Learning Representations by Back-Propagating Errors, Nature 323:533-536

Werbos, Paul (1974) Beyond regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Unpublished doctoral thesis, Harvard University, Cambridge, MA

The First Regularity Detector with Dynamic Allocation - The Adaptive Resonance (ART) Networks (1987)

Figure 11
The Problem of Dectecting a Pattern MisMatch

The first generation of pattern regularity detectors for unsupervised learning were based upon using convergent subcircuits having random input connections. Those input connections which happened to match some presented pattern were then strengthened. Those networks run into the combinatorial explosion problem as the number and size of the presented patterns increased since the number of average random connections had to increase even more.

The second generation of regularity detectors was introduced in 1987 by Gail Carpenter and Stephen Grossberg of Boston University. Their network using binary input patterns was called ART 1 (1987a) while their network using analog inputs was called ART 2 (1987b). They were the first networks to use the convergent subcircuit dynamic allocation strategy which assigns a new convergent subcircuit to the presented pattern if no other convergent subcircuit has a sufficient match.

The problem here is determining when a convergent subcircuit does not match its pattern. The easiest approach is simply to assume that the greater the output of the subcircuit the better the match so one can simply use a Gate Comparitor subcircuit to pick the greatest output value. Yet this is not always the case as shown in the top subcircuit of figure 11 in which its matching pattern (the superset pattern) produces a lower output value than a mismatched subset pattern. Yet the bottom convergent subcircuit does produce an output which is larger for its matching pattern when the matching pattern is the subset pattern. If these two subcircuits were in the same network and the superset pattern was presented, the bottom subcircuit would still produce the greater output even though the pattern is not matched to it! This competitive mismatch problem is inherent in all convergent subcircuits based on summation nodes even when they use normalized inputs as is done in figure 11.

The competitive mismatch problem of summation node convergent subcircuits can be mostly solved by using the positive feedback approach of the ART networks. The positive feedback is used to enhance the superset signals in a process called Adaptive Resonance, a term which was first coined by Stephen Grossburg in 1976 as he investigated how stable reverberation circuits might be formed (which he called a short term memory) so that weight changes could take place (long term memory). The limitation of adaptive resonance is that during learning the superset patterns must always be presented first so that the weights can be set to their proper value for the resonance to work.

Figure 12
The Core of the ART 2 Network

Figure 12 shows the core circuitry of the ART 2 network (analog inputs). The F2 layer contains two convergent subcircuits tuned to the same patterns which were used in figure 11 except here the weight values are 10 times the pattern values. At time = 0 and in accordance with the competitive mismatch problem, the presented superset pattern of 0.6 and 0.4 produces the largest output out of the bottom subset subcircuit (tuned to the pattern 1, 0). This value is selected by a gate comparitor subcircuit (which passes the largest value among its inputs), is then converted to a constant "d" (here 0.9), and fed back to all the inputs of the F1 layer. But before the feedback signal can pass through the network again the network is reset by a large inhibition of the active node in the F2 layer.

This immediately causes the gate comparitor to output the signal of the top convergent subcircuit which manages to surge through the network before the reset circuit can react again. So the workings of the ART 2 network require precise signal timing which is why it is often described as a real time network. The addition of the feedback signal now forces the F1 nodes to output their maximum value of one which prevents any reset.

The key to making the ART 2 work is the reset rule. The simplest and most intuitive reset rule (and one which allows superset patterns to be presented first) would be to calculate the average of the non-zero inputs into the F2 layer. Since these lines have a maximum value of 1 a perfect match would have an average of 1. A vigilance parameter between 0 and 1 (0.9 used in the example) can then be defined to allow less than perfect matches to be averaged together by the convergent subcircuit (a vigilance parameter of 1 requires a perfect match by each convergent subcircuit). Yet this rule is an abrupt and discontinuous rule which makes an all-or-nothing cut-off in not considering 0 valued input lines. What about lines having a 0.1 value? So what ART 2 really does is use a directional average taken from vector mathematics in the form U / ||U|| which relative to a baseline coordinate equals the cosine of the angle between the vector U and its norm or length ||U||. But with this formula the smaller line values actually have a greater impact so the vigilance parameter is divided by the result so that the small values will not "weighted" more than the larger values. In either case both rules would be difficult to accomplish in real neural systems.

Learning must take place when no convergent subcircuit matches some input pattern. When no match occurs the network cycles through several resets testing the most likely convergent subcircuits. This resetting will continue unless some other circuit eventually brings it to a halt allowing the weights to change on the final selected convergent subcircuit. The extra circuit in the ART 2 network is a slow positive feedback loop on the front end which increases in value after a period of time and in so doing shuts down the resetting. Repeated normalization's of the signals keep the values in bounds and a three level circuit assures that when the input pattern is removed the positive feedback signals stop as well.

Carpenter, G.A. & Grossberg, Stephen (1987a). A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine, Comput. Vision Graphics Image Processing 37:54

Carpenter, G.A. & Grossberg, Stephen (1987b). ART 2: Self-Organization of Stable Category Recognition codes for Analog Input Patterns, Applied Optics 26:4919-4930

Grossberg, Stephen (1976). Adaptive Pattern Classification and Universal Recoding: II Feedback, Expectation, Olfaction, Illusions, Biological Cybernetics 23:187-202

The First Multivalued Logic Neural Network (1990)

Given the limitations of summation based neural networks and the need for logic like operations for higher level intelligence as shown by the artificial intelligence field some sort of analog based logic network seemed worth investigating (the next logical step :) )The first multivalued logic neural network was a simple adaptive switching network meant to represent the reticular formation of the brain which was published (Olmsted, 1990). It was later incorporated as the second stage in a very successful hybrid network (not published).

Olmsted, David (1990). The Reticular Formation as a Multi-Valued Logic Neural Network, International Joint Conference on Neural Networks - Vol. 1, pp 619 - 624, IEEE Neural Networks Council


Web site by David D. Olmsted. He can be contacted at brainsim1-contact at yahoo dot com (this is an anti-spam tactic. Type the address as normal). Original site established August 21, 1998 by David D. Olmsted. New home page published August 25, 2006

Information compiled by David D. Olmsted © 1998 to 2006 (Free to use for personal and educational use)