Support Vector Machine Kernel Similarity Space for Categorization

Instead of working with the original sample representation in the original 6500 dimensional peptide space, SVM and GP classification methods operate directly on what is called the KERNEL MATRIX (see Glossary). This matrix has elements which correspond to the similarities between data points in the original space; in other words, each element of the matrix gives a measure of how similar the measured peptide profiles are between a pair of samples (red indicates highly similar and blue indicates dissimilarity). So the above figure shows how similar the peptide profiles are for every pair of samples available, where each axis corresponds to the total available samples (100 in total). The first thing to note is that we have now drastically reduced the dimensionality of the representation of each sample, from a point in a 6500 dimensional peptide profile space to a point in a 100 dimensional ‘peptide profile similarity space’. The second thing to note is that the matrix identifies that there are two groups of samples which all share highly similar peptide profiles (samples 1–45 and samples 46–86) with some evidence of a smaller group (samples 86–100). This data representation is now employed in SVM and GP classifiers rather than the original multidimensional space.

SVMs often are reported as achieving superior classification performance compared to other methods when compared across most applications and tasks. What is of importance is that they are fairly insensitive to the ‘Curse of Dimensionality’, primarily because of the use of the ‘kernel’ matrix [26]. This matrix is derived from the similarities between samples and thus operate on dimensions which are equal to the number of data points available. (Figure 3 gives an example derived from 100 samples, each of dimension 6500.) Therefore, SVMs are computationally reasonably efficient to cope with large-scale classification in both sample and variables. In clinical bioinformatics, they have the potential to provide powerful experimental disease diagnostic models based on gene or protein expression data with thousands of features and a small number, as little as a few dozen of samples [27,28].

Overview of Process of Algorithmic Classification

Methods for classification based on classical multivariate statistical theory should not be applied onto data where the number of dimensions exceeds the number of examples. For instance, data derived from a microarray study aimed towards the establishment of a classifier which will discriminate between diseased and healthy tissue might well describe the expression levels of over 20,000 genes for each tissue sample, with generally far less than 1000 examples of each type of tissue. The problem that is encountered is the ‘Curse of Dimensionality’ [14], which basically tells us that when the number of samples is small compared to the dimensionality of the samples, a perfect classification can be achieved in the dataset by chance. The classifier will make decisions which have little to do with the information content of the data. This will produce a classifier which will subsequently make poor predictions on new datasets. To solve this problem, several strategies have been developed. One such strategy is to select the individual single features which show high discriminatory power between the classes. This can be achieved by performing several possible statistical tests to assess, for example, significant differences of mean or median values across classes. This form of feature ranking and selection will reduce the number of features which subsequently can be used to build the classifier and help to reduce the variability in the eventual classifier performance.

Principal Component Analysis Cigar, Egg, Cloud Projections

The figure shows some clouds of 200 points each, along with ellipsoids containing 50% of each cloud and axes aligned with the principal directions. In the first row the clouds have essentially one principal component, comprising 95% of all the variance: these are the cigar shapes. In the second row the clouds have essentially two principal components, one about twice the size of the other, together comprising 95% of all the variance: these are the pancake shapes. In the third row all three principal components are sizable: these are the egg shapes.

PCA fits an ellipsoid to the data. An ellipsoid is a multidimensional generalization of distorted spherical shapes like cigars, pancakes, and eggs. These are all neatly described by the directions and lengths of their principal (semi-)axes, such as the axis of the cigar or egg or the plane of the pancake. No matter how the ellipsoid is turned, the eigenvectors point in those principal directions and the eigenvalues give you the lengths. The smallest eigenvalues correspond to the thinnest directions having the least variation, so ignoring them (which collapses them flat) loses relatively little information.

Being able to reduce dimensions is a good thing: it makes it easier to describe the data and, if we’re lucky to reduce them to three or less, lets us draw a picture.

Principal Component Analysis 2D-dimensions

PCA will find the “best” line according to two different criteria of what is the “best”. First, the variation of values along this line should be maximal. The “spread” / “variance” of the red dots changes while the line rotates. When it reaches maximum? If we reconstruct the original two characteristics (position of a blue dot) from the new one (position of a red dot), the reconstruction error will be given by the length of the connecting red line.

Choosing K via Elbowpoint Method

Often it is uncertain how many clusters is best to choose. In the elbow point method, you choose a different number of clusters and start plotting the within-cluster distance to the centroid.

From this graph we can infer that at k=4, the graph reaches an optimum minimum value. Even though the within-cluster distance decreases after 4, we would be doing more computations. Therefore, we choose a value of 4 as the optimum number of clusters.

Curse of Dimensionality

Cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance. In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient.

Leiden, the Netherlands, the anatomy theatre, interior of a church, town weighing establishment and portraits.
Nerve networks in the body.
Parts of a monster and instruments.
Woodcut, instruments for dissection.
The Dutch anatomist Steven Blankaart (1650-1704) performing a dissection in an anatomy theatre, with seven observers. Engraving, 1687.
Thesaurus botanical preparations.
Collections of Perfection

For the Leiden anatomists Rau, Albinus, Van Doeveren, Bonn and Brugmans perfection was at the core of their decisions. Aesthetically, the objects had to be presented according to fixed proportions, perspectives and other aesthetic conventions. Technologically and scientifically, the anatomical collections were aimed at showing ever more perfect methods of revealing and preserving nature. Ethically, the collections functioned like mirrors and helped in the educational and therefore ethical perfectibility of man. There was even a theological meaning of perfection as some collectors sought to represent the perfect order of creation. Also the exhibition of so called ‘monsters’, tumours and other malformations were meant to enhance (ex-negativo) the image of the perfect body.

Tableau des proportions du corps humain à l'usage des artists, 1830
Stomach channel of leg yangming, C17/18 Chinese book art
Skulls, 1824
Palm measurement method, Japanese woodcut, 1807
Nerves of the human body. Watercolour drawing by a Persian artist.
Myotomia reformata or an anatomical treatise on the muscles of the human body, 1724
Muscles of the human body. Watercolour drawing by a Persian artist.
Observations on the muscles, Blane, Gilbert, Sir, 1749-1834.
Acu-moxa aid, landmark measurements, Chinese MS, late Qing
Human skin hanging in frame.
Human proportions established through mythological figures
Human anatomy planes, 2014
Euclidean Distance in R2
Body measurements, back view, Chinese woodcut, 1443
Proportional Body Measurement - Maijing Tuzhu - Middle Finger
L'Homme anatomique ou L'Homme zodiacal. Limbourg brothers (fl. 1402–1416)
Mystical body of tantric meditation, flow of the life force.
Pinax Mircocosmographicus.
Horizontal section through brain.
Encephalitis, in Specilegium anatomicum.
Chinese diagram of human body, handwritten letter in English.
Introduction to Graphology - Howard
Chiromancy - 4 classifications of hand shapes
Analysis of Woman's Handwriting - Howard
Origins of Writing - Early-stage Cuneiform Token with Counting Inscription
Handwriting Sample Form from NIST Dataset
MNIST Database Examples - Size Normalized
LeNet-5 Convolutional Neural Network Architecture
Generating Handwriting Sequences with Recurrent Neural Networks
Efficient and Inefficient Handwriting Grips
COLD Feature Based Handwriting Analysis for Ethnicity Identification
AIM Online Handwriting Dataset Sample
Metafont – Calligraphic or Skeleton Approach

The PostScript format, for example, describes letterforms by their contour instead of their skeleton. While yet other lesser known file formats might take an inverse approach. This is the case with Metafont, which originated from linear-drawing, writing and calligraphy where different kinds of pens (pointed, broad-nib, etc …) get applied to a skeleton resulting in different kind of characters depending on the pen.

Gesture and Speech (Tools and Gesture)
Gesture and Speech (Graphism)
Understanding Clusters - Range and Shape Keys in Measure of Concordance

Range and shape key Following is a short summary of of the different graphical shapes and descriptor labels used to characterise a measure’s general behaviour for each given test.

• Constant functions are indicated by the symbol C.
• Linear functions are indicated by the symbol L, with a subscript of either F or R to indicate whether their value respectively falls or rises over the interval.
• Non-linear functions are described in terms of whether the absolute value of their derivative over the interval is increasing or decreasing, i.e. whether the function value is accelerating or decelerating. Decelerating functions are indicated by the symbol D, and accelerating functions by the letter A, with subscripts F and R indicating falling and rising behaviour as before.
• Sigmoid functions can be described as piecewise functions assembled from a pair of falling or a pair of rising functions, with the two members of the pair exhibiting opposite accelerating/decelerating behaviour. So for instance, the shape labelled as SADF in Fig. 6g is constructed from an initial AF function followed by a DF function. The other three sigmoid functions are defined similarly.
• Functions which are undefined over the interval are indicated by a U.
• Other functions are indicated by an X, however there is only one such function (Forbes in the Incremental Independence test.)
Trivial and Non-trivial Machines
Three Theories of Concepts - Classical, Probabilistic, Exemplar
Perceptron and Biological Brain Organization
Organization of the MARK I Perceptron
Ontology Building

Usually the ontology building is performed manually, but researchers try to build ontology automatically or semi-automatically to save the time and the efforts of building the ontology.

Clerkin et al. used concept clustering algorithm (COBWEB) to discover automatically and generate ontology. They argued that such an approach is highly appropriate to domains where no expert knowledge exists, and they propose how they might employ software agents to collaborate, in the place of human beings, on the construction of shared ontologies[6]

Neural Computation in the Retina of Vertibrates

VII.Computation. The retina of vertebrates with its associated nervous tissue is a typical case of neural computation. Fig.15 is a schematic representation of a mammalian retina and its post-retinal network. The layer labeled #1 represents the array of rods and cones, and layer #2 the bodies and nuclei of these cells. Layer #3 identifies the general region where the axons of the receptors synapse with the dendritic ramifications of the bipolar cells (#4) which, in turn, synapse in layer #5 with the dendrites of the ganglion cells (#6) whose activity is transmitted to deeper regions of the brain via their axons which are bundled together to form the optic nerve (#7). Computation takes place within the two layers labeled #3 and #5, that is, where the synapses are located.

Form computation: take the two-layered periodic network of Fig.16, the upper layer representing receptor cells sensitive to, say, ‘light’. Each of these receptors is connected to three neurons in the lower (computing) layer, with two excitatory synapses on the neuron directly below (symbolized by buttons attached to the body), and with one inhibitory synapse (symbolized by a loop around the tip) attached to each of the two neurons, one to the left and one to the right. It is clear that the computing layer will not respond to uniform light projected on the receptive layer, for the two excitatory stimulions a computer neuron will be exactly compensated by the inhibitory signals coming from the two lateral receptors.This zero-response will prevail under strongest and weakest stimulation as well as to slow or rapid changes of the illumination. The legitimate question may now arise. Why this complex apparatus that doesn’t do a thing? Consider now Fig.17 in which an obstruction is placed in the light path illuminating the layer of receptors. Again all neurons of the lower layer will remain silent, except the one at the edge of the obstruction, for it receives two excitatory signals from the receptor above, but only one inhibitory signal from the sensor to the left. We now understand the important function of this net, for it computes any spatial variation in the visual eld of this eye, independent of intensity of the ambient light and its temporal variations, and independent of place and extension of the obstruction.

K-means Initialization Strategies

## 3.2. Clustering Performance Evaluation

### 3.2.1. Inertia

Inertia or within-cluster sum of squares distance is a key measure to evaluate the internally coherent of clustering. The sum of squared distance is calculated between each point and its nearest centroid.

### 3.2.2. Homogeneity (Shorthand as Homo)

In fact, the result of clustering should satisfy homogeneity. It means that each point only belongs to a cluster. This rule should be also independent of labels. The range of score should be standardized between 0.0 and 1.0.

### 3.2.3. Completeness (Shorthand as Compl)

Completeness measure how well the K-means algorithm assigns all the data points with a given label to the same group. Meanwhile, the score should be standardized from 0.0 to 1.0.

### 3.2.4. V-Measure (Shorthand as V-Meas)

Specifically, V-measure measures the harmonic criteria whether it has satisfied the homogeneity and completeness. In addition, the score is from 0.0 to 1.0.

### 3.2.5. Silhouette Coefficient (Shorthand as Silhouette)

The Silhouette Coefficient for a sample is defined as:

$silhouette&space;=&space;\frac{a-b}{max(a,b)}$

where a is the mean of intra-cluster distance, b indicates the nearest-cluster distance. Moreover, the range of the parameter is −1 ~ 1. Specifically, 1 is the best result and −1 is the worst result. The higher the score of Silhouette Coefficient is, the more suitable the model satisfies the defined clusters.

Incremental Concept Induction

## Incremental concept induction

Many concept learning systems, whether they carry out learning from examples or conceptual clustering, are non incremental - all objects must be present at the outset of system execution.

In contrast, incremental methods accept a stream of objects that are assimilated one at a time. A primary motivation for using incremental systems is that knowledge may be rapidly updated with each new observation, thus sustaining a continual basis for reacting to new stimuli. This is an important property of systems that are used under real-world constraints (Carbonell &; Hood, 1986; Langley, Ki-bler, & Granger, 1986; Sammut & Hume, 1986).

Search-intensive methods may be appropriate in a non-incremental system, but may be too costly for incremental processing, since they require updating a frontier of concept hypotheses and/or examining a list of previously seen objects. Schlimmer and Fisher (1986) imply that incremental processes are profitably viewed as strategies operating under diminished search control. Specifically, they use a hill-climbing strategy (with no backtracking) to implement and test incremental variants of Quinlan’s (1983) ID3 program. Schlimmer and Fisher demonstrate that the cost of object incorporation can be significantly reduced, while preserving the ability of the learning system to converge on concept descriptions of high quality. The ability to achieve high quality concept descriptions, despite the limitations of hillclimbing, is maintained by extending the set of available operators. Rather than restricting search to be unidirectional, both generalization and specialization operators are supplied. Bidirectional mobility allows an incremental system to recover from a bad learning path.In learning from examples, Winston’s (1975) ‘ARCH’ program fits this view of incremental processing; it employs a hill-climbing strategy with operators for both generalization and specialization. This view can also be extended to conceptual clustering. For instance, Fisher and Langley (1985,1986) view Lebowitz’ (1982, 1986a) UNIMEM as an incremental conceptual clustering system. Given a new object and an existing hierarchy that was built from previous observations, the program incorporates the object into the hierarchy. This results in a classification hierarchy that covers the new object as well as previously seen objects. Since UNIMEM maintains only one hierarchy following each observation, it can be viewed as hill climbing through a space of classification hierarchies. Second, UNIMEM does not build its hierarchies in an entirely top-down or bottom-up fashion. Instead, it has operators for merging nodes in an agglomerative manner and deleting nodes and associated subtrees. Node deletion selectively undoes the effects of past learning and thus approximates backtracking.

While existing descriptions of UNIMEM and similar systems like CYRUS (Kolodner, 1983) are not framed as search, desirable search properties can be abstracted from them. These systems use diminished search control and greater operator flexibility to navigate through hierarchy space, and thus employ a practical strategy for incremental learning. The advantage of viewing these systems in terms of search is that it requires explicit consideration of the ‘goal’ of learning and of the system’s ability to achieveor approximate this goal. The search framework forces analysis to move beyond anecdotal characterizations of system behavior.3. COBWEB: Incremental conceptual clusteringUNIMEM and CYRUS, along with the conceptual clustering work of Michalski and Stepp, have inspired the COBWEB system. COBWEB is an incremental system for hierarchical conceptual clustering. The system carries out a hill-climbing search through a space of hierarchical classification schemes using operators that enable bidirectional travel through this space. This section describes COBWEB, filling in the details of the general incremental strategy. Specifically, the section gives• the heuristic evaluation measure used to guide search,• the state representation, including the structure of hierarchies and the representation of concepts,• the operators used to build classification schemes, and• the control strategy, including a high level description of the system.

### 3.1 Category utility: A heuristic evaluation measure

COBWEB uses a heuristic measure called category utility to guide search. Gluck and Corter (1985) originally developed this metric as a means of predicting the basic level in human classification hierarchies. Briefly, basic level categories (e.g., bird) are retrieved more quickly than either more general (e.g., animal) or more specific (e.g., robin) classes during object recognition. More generally, basic level categories are hypothesized to be where a number of inference-related abilities are maximized in humans (Mervis & Rosch, 1981). Identifying preferred concepts in humans is important from a cognitive modeling standpoint, but it also provides a basis for developing principled criteria for evaluating concept quality in AI systems. Category utility can be viewed as a function that rewards traditional virtues held in clustering generally - similarity of objects within the same class and dissimilarity of objects in different classes. In particular, category utility is a tradeoff 146D. H. FISHER between intra-class similarity and inter-class dissimilarity of objects, where objects are described in terms of (nominal) attribute-value pairs…

Conceptual Clustering

Creating a classification’ is typically the first step in developing a theory about a collection of observations or phenomena. This process is a form of learning from observation (learning without a teacher), and its goal is to structure given observations it to a hierarchy of meaningful categories. The problem of automatically cre- ating such a hierarchy has so far received little attention in AI.

Yet creating classifications is a very basic and widely practiced intellectual process. Past work on this problem was done mostly outside AI under the headings of numerical taxonomy and cluster analysis (Anderberg, 1973). Those methods are based on the application of a mathematical measure of similarity between objects, defined over a finite, a priori given set of object attributes. Classes of objects are taken as collections of objects with high intraclass and low interclass similarity. The methods assume that objects are characterized by sequences of attribute/value pairs and that this information is sufficient for creating a classification. The methods do not take into consideration any background knowledge about the semantic relationships among object attributes or global concepts that could be used for characterizing object configurations. Nor do they take into consideration possible goals of classification that might be indicated by background knowledge.

As a result, classifications obtained by traditional methods are often difficult to interpret conceptually. The problem of interpreting the results has remained a challenging task for the data analyst. In addition, traditional classification-building methods describe objects by attribute value sequences and therefore are inadequate for creating classifications of structured objects. The description of such objects must involve not only attributes of objects as a whole but also attributes of object components and relationships among these components.

What is Ontology?

## What Is the Ontology

The word “ontology” has been recognized in philosophy as the subject of existence. In the Artificial Intelligence community, ontology means a formal, explicit specification of a shared conceptualization.

Conceptualization refers to an abstract model of some world phenomena. Ontology concepts and the relationship among those concepts should be explicitly defined. Further, ontology should be machine-readable and the ontology should capture consensual knowledge accepted by the community [13].

Ontology is used for knowledge sharing and reuse. It improves information organization, management and understanding. Ontology has a significant role in the areas dealing with vast amounts of distributed and heterogeneous computer-based information, such as the World Wide Web, Intranet information systems, and electronic commerce. Ontology will play a key role in the second generation of the web, which Tim Berners-Lee called the “Semantic Web”, in which information is given well-defined meaning, and is machine-readable. Search engines will use ontology to find pages with words that are syntactically different but semantically similar.

Adding Mammal and Bird to an existing Classification Tree
Preserving Monstrosities

By the late eighteenth century, the making of anatomical preparations was still an act of selection, accentuation, and seeking beauty, even in the deformed and the ugly. By preserving monstrosities like these in preparations, and averting the immediate danger implied by the visceral disgust such specimens would provoke unpreserved, their makers shaped them into didactic instruments and purveyors of meaning.

Poetic Rhetoric

Yet some controversies however could not be solved through making preparations and illustrations; in the case of smallpox inoculation, anatomists turned to poetic rhetoric.

Investigation of the Senses - Naturalia and Artificialia
Freakish Incidents or Devil's Work or Unknown Categories

More importantly, from the way Van Doeveren dealt with monsters it appears he tacitly aimed to change the disgusting monsters and severed, decaying body parts into tasteless, odorless preparations through aesthesis. He used his own observations and sensory perceptions in as many cases as possible, both human and animal, to seek beauty, purpose, harmony, and regularity in what had long been viewed as freakish incidents or even the devils work. In presenting the monsters as plain, natural, purposeful, inoffensive objects, Van Doeveren implicitly showed that they were not so much ugly, horrible, or disgusting but rather representatives of thus-far unknown categories.

By using aesthesis as an analytical category and the materiality of anatomical preparations to understand the epistemic culture of which the eighteenth century Leiden anatomical collections are the result, I have put the actual objects that constitute these collections centre-stage without losing sight of the actors, work and social structures from which they emerged, thus transcending traditional historiographical categories.

Ear
An Aesthesis ex-Negativo Emerged

So far, the preparations from the eighteenth-century Leiden anatomical collections appear to be predominantly examples of normal - or even perfect - human anatomy, prepared in such a way that they convey the aesthesis of anatomy. The result of particular yet tacit ideas of beauty, perfection and elegance, most of the body parts were chosen for these preparations because they were already in themselves perfect specimens: there are no obvious pathologies, and most of them were (part of) young, lean, healthy bodies. (…) Yet ultimately, severed body parts on the brink of decay - the ultimate emblem of disgust — are the constituents of this collection, and as in the course of the eighteenth-century research into pathology and abnormalities became increasingly important, an aesthesis ex negativo emerged: an aesthesis of the ugly and the imperfect.

Brain by Vesalius' De humani corporis fabrica
Elegant Anatomy book as training set input
Human body series by Vesalius' De humani corporis fabrica
Tacit knowledge

Our experience affirmed that the task of making anatomical preparations relies largely on tacit knowledge. (…) Learning how to make a well-injected preparation is an art that can only be learned through endless practice, through trial and error, with hands-on work. It is truly about gaining knowledge from all your sensory perceptions—we used our touch, smell, sight and hearing, and it takes little imagination to envision the inclusion of taste.

Categorizing skulls

…by the end of the eighteenth century, to simply claim that a skull was from a particular country or area was no longer enough for some anatomists to prove the existence of this region and its inhabitants, and they resorted to taking and filing endless measurements of skulls to distinguish certain categories into which particular ‘species’ of humans could be made to fit.

Hacking the body

Hacking advocates: the need to examine the ideas we use to organize knowledge and inquiry, and to propose, advocate, or refute theories of knowledge. Footnote: Hacking. Historical Ontology

Tactile processes of commodification

…aesthesis in anatomy is inevitably characterized by the very tactile processes of commodification, domestication and objectification: it involves the creation of lasting, transferable anatomical preparations that both represent and are made of parts of the human body, as well as the domestication of the (exotic) other. Footnote: Latour, Science in Action, 223.

Commodification of the human body

…unlike most other forms of commodification of the human body, the creation of anatomical preparations was not aimed at the consumption of the body or its services, nor at stabilizing it for primarily sym- bolic, ritual, spiritual or decorative reasons (although these could also play a role). Making an anatomical preparation is first and foremost aimed at rendering knowledge, gained through sensory perception and experimentation with the body, stable and tradable. Sometimes this tradability was only of secondary importance, as for the many anatomists and collectors who