## Abstract

This paper presents a novel clustering technique known as adaptive double self-organizing map (ADSOM). ADSOM has a flexible topology and performs clustering and cluster visualization simultaneously, thereby requiring no a priori knowledge about the number of clusters. ADSOM is developed based on a recently introduced technique known as double self-organizing map (DSOM). DSOM combines features of the popular self-organizing map (SOM) with two-dimensional position vectors, which serve as a visualization tool to decide how many clusters are needed. Although DSOM addresses the problem of identifying unknown number of clusters, its free parameters are difficult to control to guarantee correct results and convergence. ADSOM updates its free parameters during training, and it allows convergence of its position vectors to a fairly consistent number of clusters provided that its initial number of nodes is greater than the expected number of clusters. The number of clusters can be identified by visually counting the clusters formed by the position vectors after training. A novel index is introduced based on hierarchical clustering of the final locations of position vectors. The index allows automated detection of the number of clusters, thereby reducing human error that could be incurred from counting clusters visually. The reliance of ADSOM in identifying the number of clusters is proven by applying it to publicly available gene expression data from multiple biological systems such as yeast, human, and mouse. ADSOM’s performance in detecting number of clusters is compared with a model-based clustering method.

- microarray
- detecting number of clusters
- unsupervised learning
- cluster visualization

recent technological advances allow us to measure expression levels for thousands of genes simultaneously (24). Efficient techniques are urgently needed to effectively analyze the generated gene expression data. The availability of reliable and accurate analysis tools will help the research community to identify the genetic makeup of the diseases and lead to suitable medical interventions.

Key and early processing steps in the analysis of gene expression data include clustering groups of genes that manifest similar expression patterns. Genes with similar expression profiles might be transcriptionally regulated through a same transduction pathway. Thus the rationale for clustering gene expression data is to identify a new transduction pathway or novel genes, which may be coregulated through the same known pathway.

A wide range of techniques has been applied for clustering gene expression data. Examples include hierarchical clustering (4), adaptive resonance theory (ART) (21), self-organizing map (SOM) (28), *k*-means (29), graph-theoretic approaches (1, 9), fuzzy ART (31), fuzzy *c*-means (8), fuzzy Kohonen (7), and growing cell structures network (7). However, most of the above-mentioned clustering algorithms are heuristically motivated, and the issues of determining the “correct” number of clusters and choosing a “good” clustering algorithm are not yet rigorously solved. Clustering gene expression data using hierarchical clustering (4) and SOM has been very popular among the bioinformatics research community.

Hierarchical clustering organizes the expression profiles in a hierarchical tree structure, which allows detecting higher order relationships between clusters of profiles. Although hierarchical clustering has been proven valuable for describing gene expression (4, 25), it has several shortcomings. The hierarchical trees do not reflect the multiple ways in which expression patterns of genes can be similar. The deterministic nature of hierarchical clustering can cause the data points to be clustered on the basis of local decisions, with no opportunity to reevaluate. As the amount of data increases, this problem can be exacerbated. Mangiameli et al. (19) compared SOM with hierarchical clustering method and found that SOM is superior in both robustness and accuracy.

The design of SOM starts with defining a geometric configuration for the partitions in a one- or two-dimensional grid. Then, random weight vectors are assigned to each partition. During training, a gene expression profile (input vector) is picked randomly. The weight vector closest to the expression profile is identified. The identified weight vector and its neighbors are adjusted to look similar to the expression profile. This process is repeated until the weight vectors converge. During operation, SOM maps gene expression profiles to the relevant partitions based on the weight vectors to which they are most similar. However, in circumstances where the expected number of partitions (clusters) available in gene expression data is unknown, the validation of the SOM clustering result becomes a critical issue. One may heuristically validate the clustering results to identify a reasonable number of clusters.

Many validation techniques have been implemented to evaluate clustering results. Yeung et al. (35) introduced figure of merit (FOM); Tibshirani (30) applied “gap statistic”; Jain and Dubes (14) referred to using “Hubert and Jaccard index”; Hubert and Arabie (12) addressed the use of “adjusted rand index”; Lubovac et al. (18) proposed to use “entropy measure”; Musavi et al. (21) used the “change in internode distance per cluster”. All of these algorithms have been proven valuable by some experiments. However, the use of heuristic evaluation technique makes the clustering process extremely time-consuming and complicated given the large volume and high dimension of the gene expression data.

A number of methods have been proposed in the literature to accomplish data partitioning and cluster validation/visualization either simultaneously or independently. Fraley and Raftery (5, 6) developed model based clustering that provides functionality for displaying/visualizing cluster results. Ramoni et al. (23) introduced the Bayesian method of model-based clustering of gene expression dynamics. Kaski et al. (15, 16) introduced methods for detecting, visualizing, and interpreting clusters generated by SOM. They improved the SOM-based method of U-matrix (32) for visualization of cluster information. Nikkilä et al. (22) and Kaski (17) have applied this improved method for the analysis and visualization of gene expression data. Herrero et al. (10, 11) have proposed a combination of SOM and hierarchical method for clustering gene expression data.

Su and Chang (26) developed a new technique known as the double self-organizing map (DSOM). In DSOM, each center in the network has an *N*-dimensional weight vector and a two-dimensional position vector. The position vectors are projection of the weight vectors into a two-dimensional space and serve as a visualization tool for deciding how many clusters are needed, thus combining clustering and cluster visualization in one computational procedure. In other words, with the help of position vectors, DSOM adjusts its network structure during the learning phase so that neurons that respond to similar stimuli will not only have similar weight vectors but also move spatially nearer to each other.

Although DSOM provides a visualization tool for cluster evaluation, the selection of its free parameters is vital for a proper projection of the position vectors in a two-dimensional space. Some combinations of these parameters make all the position vectors converge too quickly into a small dense area. Some other combinations lead the updating process to “get stuck” after several epochs and result in a wrong number of clusters. Thus the regulation of these parameters is very important.

In this paper, we propose an adaptive double self-organizing map (ADSOM), which updates the free parameters involved in DSOM during the training process. This is achieved by carefully analyzing the mathematical relationships between the parameters and the updating processes. This new approach improves the performance of the original DSOM by reducing its unreliability. Unlike DSOM, ADSOM gives fairly consistent number of clusters provided that the initial number of nodes is greater than the expected number of clusters.

To demonstrate its effectiveness, ADSOM is applied to cluster gene expression data from multiple biological systems such as yeast, human, and mouse. The results show that ADSOM is a reliable technique for clustering gene expression data. ADSOM addresses the issue of identifying unknown number of clusters while performing data partitioning simultaneously.

## SYSTEMS AND METHODS

In addition to updating its weights and position vectors, ADSOM offers a systematic method of adapting its free parameters during the training process, thereby providing improved convergence. Let *w*_{j} be the *N*-dimensional synaptic weight vector of node *j*; let *p*_{j} be a two-dimensional position vector corresponding to node *j*; *p*_{w}(*k*) and *w*_{w}(*k*) represent the position and weight vectors of the winning neuron when the *k*th input vector ** x**(

*k*) is applied, respectively. The weight and position vectors are updated as follows 1 2 where 3 4 5 6 7 8 where ||·|| denotes the Euclidean distance; η

_{w}and η

_{p}are initial learning rates;

*s*is a scalar parameter which regulates how fast the function Λ

_{w}_{w}(

*k*,

*j*) changes;

*k*

_{max}is the maximum number of epochs; s

_{p}(

*k*) and

*s*(

_{x}*k*) are the values of scalar parameters at epoch

*k*that regulate the speed of the movement of the position vectors;

*t*(

_{i}*k*) specifies the ratio

*s*(

_{x}*k*) to

*s*(

_{p}*k*) in a given boundary;

*f*is a continuous function of α; α(

*k*) is the maximum change of position vectors at epoch

*k*.

*Equations 1*–

*8*are repeated until α is steadily less than a prespecified threshold β.

Detailed mathematical derivations of the above updating equations are provided in the appendix. Copies of source codes for implementing ADSOM are available from authors of this manuscript upon request. A scheme for selecting the initial weight vectors is outlined below.

### Initialization scheme.

Proper initialization of the weight vectors of ADSOM helps the parameter updating process converge quickly. The weight initialization scheme provided by Su et al. (27) is used for initialization in ADSOM. If one assumes the size of weights array to be *K* × *L* (Fig. 1), then the following steps are used for initialization.

First, we select a pair of input patterns whose distance is the largest one among the whole input data set, and then we initialize the weight nodes on the lower left corner and the upper right corner (i.e., *w _{K}*

_{,1}and

*w*

_{1,L}) as these two input patterns, respectively. From the remaining input data, use the pattern that is the farthest to the two chosen patterns to initialize the weight vector on the upper left corner (i.e.,

*w*

_{1,1}). Set the weight on the lower right corner (i.e.,

*w K,L*) as the input pattern which is the farthest from the previously selected three patterns.

Second, we proceed to initialize the weights of the neurons on the four edges by uniformly partitioning each edge into *L* − 1 or *K* − 1 segments.

Finally, third, we initialize the remaining weights from left to right, and from top to bottom as follows

### Identifying number of clusters from final position vectors.

Once the final locations of the position vectors are obtained, one can visually determine the number of clusters. To minimize possible human error resulting from counting the number of clusters visually, we developed a novel technique that provides an index for each cluster. The index is calculated based on the outcome of clustering the position vectors themselves using hierarchical method. Consider a 2-by-*N* matrix ** T** that contains the final locations of all two-dimensional position vectors; where

*N*denotes the total number of position vectors. We calculate the Euclidean distances between each pair of position vectors in

*T*. Based on these distances and subsequent distances between merged clusters, we create a hierarchical cluster tree that grows vertically upwards. During clustering, we store the distance between two clusters that are merged together through a branch by applying the single linkage method. Hence, the height of each branch corresponds to the distance assigned to it. Consider a horizontal line that cuts

*c*branches of the hierarchical cluster tree perpendicularly, the horizontal line picks

*c*clusters that are embraced by the branches it cuts. An index corresponding to this horizontal line is designated as γ(

*c*) and is defined as the sum of the heights of each branch it cuts where

*z*(

_{c}*i*) is the height contributed by the

*i*th branch for the case when a horizontal branch cuts

*c*branches of the tree. The index, referred to as tree-based index, is calculated for

*c*= 2,…,

*N*. Note that γ(1) = 0, since there will be no horizontal line that would cut the tree only once. The value of

*c*at which the index reaches its highest peak is considered as an indicator of the maximum separation between clusters in a two-dimensional space. Hence, it is used as a means to detect the number of clusters.

## EXPERIMENTAL TESTS AND RESULTS

Experimental tests were conducted using gene expression data from multiple biological systems. The performance of ADSOM is compared with a model-based clustering method introduced by Fraley and Raftery (6) and implemented in a computer program called MCLUST (Model-Based Clustering Software). The software is publicly available at **http://www.stat.washington.edu/fraley/mclust/**.

We also investigated other algorithms that would detect the number of clusters in the gene expression data we used for our experiments. These algorithms include a Bayesian method for model-based clustering introduced by Ramoni et al. (23) and adaptive quality-based clustering method proposed by De Smet et al. (3). These two algorithms are implemented in computer programs called CAGED (“cluster analysis of gene expression dynamics”) and Adap_Cluster (adaptive quality-based clustering) and are publicly available at **http://genomethods.org/caged/** and **http://www.esat.kuleuven.ac.be/~thijs/Work/Clustering.html**, respectively. In both these algorithms the number of clusters detected depends on the input parameter values specified by the user. In CAGED, these parameters are Markov order, prior precision, and Bayes factor. In Adap_Cluster, the results were sensitive to the selection of a user-defined estimate of the minimal probability of gene belonging to cluster and minimum number of genes in a cluster. In this paper, we compared the results obtained using ADSOM and MCLUST, because both methods do not require the user to define sensitive parameters that would affect the outcome significantly.

To test the consistency of ADSOM in data partitioning, we counted the number of common profiles found in the clusters that were obtained by varying the initial number of nodes. We present the percentage of common profiles for the data sets where ADSOM resulted in constant number of clusters. This provides a confidence measure for the clusters formed by ADSOM and illustrates the effect of different initial number of nodes on data partitioning.

### Yeast cell cycle data with the 5-phase criterion.

We used a subset of the gene expression data provided by Cho et al. (2) that shows the fluctuation of expression levels of ∼6,000 genes over two cell cycles (17 time points) corresponding to mitotic cell cycle of *Saccharomyces cerevisiae*. As in Yeung et al. (34–36), a subset of 384 genes whose expression levels peak at different time points corresponding to the five phases of cell cycle was used for our experiments. These data were obtained from supplementary material provided by Yeung et al. at **http://staff.washington.edu/kayee/cluster/raw_cellcycle_384_17.txt**. Prior to clustering, the data were normalized to have zero mean and unit variance. ADSOM was applied to cluster this dataset with different numbers of initial weights such as 9, 12, 15, and 20. As depicted in Fig. 2, the final locations of position vectors converge into fewer groups in a two-dimensional space regardless of the initial number of nodes. We applied the tree-based validation index to estimate the optimal number of clusters from the final locations of the position vectors. As described before, the number of clusters at which the tree-based index reaches its highest peak can be used as an indicator of optimal number of clusters. The results in Fig. 2 indicated that the optimal number of clusters is five, regardless of a different number of initial nodes. This result agrees with that of Yeung et al. (34) and was as expected from a data set corresponding to five phases of the yeast cell cycle.

Table 1 shows the number of genes in each cluster as well as the number of common genes found when clustering the yeast cell cycle data with different number of initial nodes (*N*), i.e., *N* = 9, 12, 15, and 20. The percentage of common genes between clusters formed by *N* = 9 and 12 was 88%; *N* = 9, 12, and 15 was 79.7%; and *N* = 9, 12, 15, and 20 was 70.8%.

To compare the performance of ADSOM with the model-based clustering method, we run MCLUST for several models. The models are characterized by their covariance matrices such as equal volume spherical (EI); unequal volume and spherical (VI); ellipsoidal with equal volume, shape and orientation (EEE); ellipsoidal, varying volume, shape, and orientation (VVV); diagonal, varying volume, varying shape (VVI), etc. The Bayesian information criterion (BIC) introduced by Yeung et al. (34) was used to compare models with different numbers of clusters and different covariance matrix parameterization. A large BIC score indicates strong preference for the corresponding model. Figure 3 shows the BIC scores obtained by clustering the yeast cell cycle data for various cluster numbers ranging from 2 to 32 using five different models. As can be seen from Fig. 3, the model-based approach favors the EEE model, which reached its maximum BIC score at 6 clusters.

### Yeast sporulation data.

Gene expression data in the budding yeast *Saccharomyces cerevisiae* from spotted cDNA microarrays studied during the diauxic shift, the mitotic cell division cycle, sporulation, and temperature and reducing shocks (**http://genome-www.stanford.edu/clustering/**) were considered. In particular, the data corresponding to sporulation were used for this experiment. From the yeast sporulation data, we extracted 218 × 11 gene expression data that belong to four classes. Variance normalization was applied to the data. The four classes are ribosomal proteins, respiration, mitochondrial organization, and tricarboxylic acid pathway. This functional classification was made using information from Munich Information Center for Protein Sequences (MIPS) (20). ADSOM clustered this yeast data into its four distinct classes without having the knowledge of the actual number of clusters. The beauty of ADSOM algorithm is that it would converge to the actual number of clusters without having to go through tedious cluster validation techniques. To demonstrate this, two experiments were conducted with this data set; once ADSOM was assigned 9 initial nodes and once 20 nodes. The ADSOM position vectors that were assigned to these initial nodes converged to four final clusters regardless of the initial number of nodes chosen. Figure 4 shows the location of final position vectors for 9 initial nodes (Fig. 4*A*), for 20 initial nodes (Fig. 4*B*), and the results obtained after applying the tree-based validation index (Fig. 4*C*). Note that these final four clusters that represent the actual and true number of clusters in the data are shown by enclosed boundaries. It also shows which of the initial 9 or 20 nodes had to be grouped together to make the final four classes. As can be seen from Fig. 4*C*, the optimal number of clusters is 4, regardless of different number of initial nodes (9 and 20).

It was observed that ADSOM resulted in partitioning the data with 70% accuracy. This accuracy was calculated based on the biological classification made by MIPS. In gene expression data, one does not normally have a priori knowledge about the true number of classes, and what distinguishes ADSOM from the currently available techniques is that one does not need to have this knowledge to be able to find the true number of classes accurately.

To evaluate the consistency of the clustering, the number of common genes in two clusters for different number of initial nodes was investigated. Table 2 illustrates the number of genes obtained in the four clusters formed by ADSOM and the number of common genes between the cluster pair formed with 9 and 20 initial number of nodes, respectively. The robustness and consistency of ADSOM in clustering is evident from the large number of common genes (90.4%) in the clusters as shown below.

In a separate experiment, the performance of ADSOM for different cluster sizes was investigated. To accomplish this, yeast sporulation data from two, three, and four MIPS classes were extracted and clustered independently. As shown in Fig. 5, the number of clusters was correctly identified by ADSOM using nine initial nodes.

Figure 6 shows the BIC scores obtained using MCLUST for four different models at clusters ranging from 1 to 20. As can be seen from Fig. 6, the model-based approach favors the EEE model whose first maximum BIC score is reached at 10 and the second maximum BIC score is reached at 5 clusters. Note that the expected number of clusters for the yeast sporulation data is 4.

### Yeast cdc15 and elu.

The spotted cDNA microarray data used for this experiment were those corresponding to cdc15 and elu (**http://genome-www.stanford.edu/clustering/**); cdc15 data consisted of 15 time points following arrest of cdc15 temperature-sensitive mutant, and elu data contained 14 time points following elutriation. After removing genes with missing points, there were 1,882 genes in cdc15 data and 2,020 genes in elu data. Before applying ADSOM, normalization was applied to both data sets to have zero mean and unit variance. ADSOM was used to cluster each data set with different number of initial weights such as 25, 30, 36, and 42. The tree-based indices corresponding to the final locations of the position vectors were used to identify the number of clusters in both data sets. ADSOM gave fairly consistent number of clusters in the unknown yeast data set, thus demonstrating its effectiveness and reliability. As it can be seen from Table 3, the result of this experiment shows that there are 16 or 17 clusters available in both data sets.

Figure 7 shows the BIC scores obtained using MCLUST for five different models in clustering the yeast elu (Fig. 7*A*) and yeast cdc15 (Fig. 7*B*) data. As can be seen from Fig. 7, the maximum BIC score is reached at 21 clusters for the EEE model when clustering the yeast elu data. For the yeast cdc15 data, the EEE model reached its first, second, and third maximum BIC scores at 27, 21, and 19 clusters, respectively.

### UNC 9 mouse tumor data.

UNC 9 mouse tumor data were provided by the Statistical Genetics Group at the Jackson Laboratory (**http://www.jax.org/staff/churchill/labsite/index.html**). The data contained nine tumor samples from different mouse strains. The 9 independent tumor samples were assayed with 18 cDNA microarrays using dye-swap reference design. There were 15,488 rows of genes and 36 columns representing the samples. After removing the rows that had less than threefold change, there were 12,866 rows left to which range normalization (i.e., between 0 and 1) was applied. We used the proposed ADSOM to cluster these tumor samples. As shown in Fig. 8*B*, ADSOM identified three distinct clusters. The first cluster contained *samples 1*, *2*, *3*, *8*, and *9*; second cluster contained *samples 4*, *5*, and *6*; third cluster contained *sample 7*. As shown in Fig. 8*C*, the tree-based index cluster validation result also identifies three clusters. Interpreting the results, we observed that the first cluster contained all mammary tumor tissue samples; the second cluster contained all normal mammary tissue samples; and the third cluster contained Waptag liver control samples. An interesting observation made is that tumors tend to cluster together more than different tissues of the same strain.

The ADSOM results were compared with hierarchical clustering in partitioning these tumor samples. MAANOVA (33) was used to perform hierarchical clustering. The results obtained using the hierarchical method are shown in Fig. 8*A*. Three clusters were identified which were similar to those obtained using ADSOM. However, interpretation of the number of clusters is user dependent. This can be a problem when there are a large number of clusters.

### Human fibroblast data.

A subset of the gene expression data provided by Iyer et al. (13) was used for this experiment. The full data show the response of human fibroblast to serum, using cDNA microarrays representing about 8,600 distinct human genes to observe the temporal program of transcription that underlies this response. DNA microarray hybridization was used to measure the temporal changes in mRNA levels of 8,613 human genes at 12 times, ranging from 15 min to 24 h after serum simulation. The subset of 517 genes that were chosen on the basis of substantial change in the expression by Iyer et al. was considered. The data containing normalized R/G ratios were obtained from **http://genome-www.stanford.edu/serum/data.html**.

ADSOM was applied to cluster this data set with different initial nodes such as 16, 20, 25, and 30. Repeated trials resulted in either 10 or 11 final clusters. The location of the final position vectors and the results of tree-based index for 16 and 20 initial nodes are shown in the *top* (*A* and *C*) and *bottom* (*B* and *D*) of Fig. 9, respectively. Figure 10 depicts the BIC scores for five different models using MCLUST for the human fibroblast data. As can be seen from Fig. 10, model-based clustering favors the EEE model, whose maximum BIC score is reached at 11 clusters. Note that ADSOM also detected 11 clusters in most cases. Similar number of clusters was obtained using hierarchical clustering by Iyer et al. (13). Interpretation of the correct number of clusters using hierarchical clustering is, however, user dependent.

## CONCLUSION

This paper shows that ADSOM gives a consistent final topology regardless of the initial topology. This topological flexibility of ADSOM provides credible information about the number of clusters in the data. ADSOM accomplishes both cluster visualization and data partitioning simultaneously without compromising the clustering accuracy. This particular feature of ADSOM is achieved by adapting its free parameters during the training process. In addition, in combination with hierarchical clustering, based on the tree-based index, it also accomplishes cluster validation reliably. Comparisons between ADSOM and a model-based clustering method showed that ADSOM has comparable or in some cases better performance.

In summary, the methods described in this paper offer two main advantages. First, ADSOM converges to a consistent number of groups regardless of the initial number of nodes. This is demonstrated by testing ADSOM on real-world gene expression data. Second, ADSOM can visually present the clusters while clustering the data. With the help of a tree-based index, the number of clusters is easily identified. The approaches eliminate the trial-and-error process as well as the heuristic validating process, thus saving significant computational time.

## APPENDIX

### Derivation of Updating Equations for ADSOMs

The nodes of DSOMs are represented not only by their weight vector but also by two-dimensional position vectors. Let node *j* be represented by an *N*-dimensional synaptic weight vector *w*_{j} and a two-dimensional position vector *p*_{j}. As described in Ref. 26, the updating formulas involved in DSOM for both vectors at epoch *k* are given in *Eqs. A1*–*A6* A1 A2 where A3 A4 A5 A6

In *Eqs. A5* and *A6*, there are several parameters such as *s _{x}*,

*s*, η

_{p}_{p}, and

*k*

_{max}, which mainly affect the movement of position vectors. In

*Eq. A2*,

*p*_{w}(

*k*) is the position vector corresponding to a winner weight at epoch

*k*;

*p*_{j}(

*k*) and

*p*_{j}(

*k*+ 1) are the old and new

*j*th position vectors, respectively; η

_{2}(

*k*,

*j*) and

*h*(

_{w}*k*,

*j*) are both positive scalars and are always less than 1. Let us define a new positive scalar Φ

_{j}(

*k*) as follows A7

In Fig. 11, OA, OB, and OC represent vectors *p*_{j}(*k*), *p*_{j}(*k* + 1), and *p*_{w}(*k*), respectively. According to *Eq. A2*, one can easily prove that points *A*, *B*, and *C* are located on a straight line. As shown in Fig. 11, the ratio between the length of vector *AB* and *AC*, which actually represents the vectors *p*_{j}(*k* + 1) − *p*_{j}(*k*) and *p*_{w}(*k*) − *p*_{j}(*k*), respectively, is Φ_{j}(*k*). So we can draw a conclusion that the larger Φ_{j}(*k*) is, the faster the *j*th position vector *p*_{j} moves toward *p*_{w}.

Let us assume that there are two nodes with weight vectors *w*_{1} and *w*_{2} and position vectors *p*_{1} and *p*_{2}, respectively. At epoch *k* and for a given input pattern ** x**(

*k*), if

*w*_{1}(

*k*) is closer to

**(**

*x**k*) than

*w*_{2}(

*k*), which means ||

*w*_{1}(

*k*) −

**(**

*x**k*)|| is less than ||

*w*_{2}(

*k*) −

**(**

*x**k*) ||, one requires that

*p*_{1}(

*k*) moves toward

*p*_{w}(

*k*) faster than

*p*_{2}(

*k*). However,

*Eqs. A1*–

*A6*(the original DSOM algorithm) could not guarantee this, and it is one of the reasons why sometimes the updating process gets stuck with unexpected results. Therefore, the relationship given below ought to be true to keep position vectors from “getting stuck.” A8

Instead of using constant values for *s _{p}* and

*s*, we suggest to update them at every epoch, thus we define

_{x}*s*(

_{p}*k*) and

*s*(

_{x}*k*). This will improve the model’s adaptability. Moreover, to analyze the proposed algorithm in a more general way, we replace the power to ||

*w*_{i}(

*k*) −

**(**

*x**k*)|| − ||

*w*_{w}(

*k*) −

**(**

*x**k*)|| in

*Eq. A6*from 2 to a variable

*n*(

*n*>

*0*) and replace the power to ||

*p*_{i}(

*k*) −

*p*_{w}(

*k*)|| in

*Eq. A5*from 1 to a variable

*m*(

*m*> 0).

Defining *A _{i}*(

*k*) = ||

*w*_{i}(

*k*) −

**(**

*x**k*)|| − ||

*w*_{w}(

*k*) −

**(**

*x**k*)||,

*Eq. A8*can be rewritten as A9 If

*A*(

_{j}*k*) =

*A*(

_{i}*k*), then

*Eq. A9*is always valid regardless of the relationship between Φ

_{i}(

*k*) and Φ

_{j}(

*k*). So in the following, we analyze the cases when

*A*(

_{j}*k*) ≠

*A*(

_{i}*k*). Letting and

*Case 1:*if

*A*(

_{j}*k*) >

*A*(

_{i}*k*), then Φ

_{j}(

*k*) ≤ Φ

_{i}(

*k*) Since

*s*(

_{p′}*k*) and

*s*(

_{x′}*k*) have the same positive coefficient, i.e., 1 + (

*k*/

*k*

_{max}), we get the following A10 Since

*A*(

_{j}*k*) >

*A*(

_{i}*k*) > 0, the inequality

*A*(

_{j}^{n}*k*) >

*A*(

_{i}^{n}*k*) > 0 is valid. Also, since

*s*(

_{x}*k*) > 0,

*s*(

_{p}*k*) > 0,

*Eq. A10*can be rewritten as A11

*Case 2*: if

*A*(

_{j}*k*) <

*A*(

_{i}*k*), then Φ

_{j}(

*k*) ≥ Φ

_{i}(

*k*). Following the same steps used in

*case 1*, we can prove that A12 From

*Eqs. A11*and

*A12*, we can draw a conclusion that

*Eq. A8*holds, if A13

*Equation A13*gives us one end of the boundary for the ratio

*s*(

_{x}*k*)/

*s*(

_{p}*k*). However, the other end of the boundary is not specified. It is clear that closing both sides of the boundaries will be helpful to provide a more complete range of this ratio to further improve the accuracy and effectiveness of ADSOM. In this paper, we use a particular choice of this ratio as follows.

Defining we specify the following relationship that allows *Eq. A13* to be satisfied and which we found to be effective in our experiments. A14

Defining a stopping criterion is important. On one hand, overtraining could worsen the results. On the other hand, undertraining will result in uncertain outcome. So, instead of directly finding out a way to intelligently select the number of epochs, a stopping criterion is defined. Letting the maximum change of position vectors at epoch *k* and defining a threshold (β) as 0.1% of largest distance among all the original position vectors, we cease the training process, if α is always less than β. This definition is not general but suitable to our experiments.

If α becomes larger, which means the position vectors move faster, then one needs to reduce the speed of movement of the position vectors, which means that *s _{p}* needs to be increased. Similarly, if α becomes smaller, then

*s*needs to be decreased. So at epoch

_{p}*k*, the relationship between α and

*s*can be represented by a mathematical continuous function

_{p}*f*as follows A15 Once we know

*s*(

_{p}*k*), we can use

*Eq. A14*to adjust the parameter

*s*(

_{x}*k*). We suggest using any function

*f*(

*y*), which satisfies the following three requirements:

*1*)

*y*∈ (−1,+∞),

*f*(

*y*) ∈ (0,

*C*), where

*C*is a positive constant;

*2*)

*f*(

*y*) is a continuous, increasing, and nonlinear function; and

*3*)

*f*(0) = 1. In fact, there are many functions, which have all the properties mentioned above. In our experiments, we used

## Footnotes

Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).

Address for reprint requests and other correspondence: H. Ressom, Intelligent Systems Laboratory, Dept. of Electrical and Computer Engineering, Univ. of Maine, Orono, ME 04469 (E-mail: ressom{at}eece.maine.edu).

10.1152/physiolgenomics.00138.2002.

- Copyright © 2003 the American Physiological Society