Program System LOREG
LOgical REGularities
(softwere for classification, recognition, data mining)
CONTENTS |
|
1. INTRODUCTION
1.1. "LOREG" - What is it?
1.2. Information requirements
1.3. Application areas
1.4. System requirements
1.5. Loreg Setup
1.6. Bibliography
2. THEORY
2.1. Loreg - new approach for data analysis and pattern
recognition
2.2. Standard information
2.3. Logical regularities
2.4. Logical descriptions
2.5. Pattern recognition algorithms
2.6. New information for features, objects and classes
2.7. Collective solutions
3. FORMAT OF INITIAL INFORMATION
4. LOGICAL REGULARITIES DETERMINATION AND
PATTERN RECOGNITION PROBLEM SOLVING
4.1. Pattern recognition algorithms based upon voting over
sets of logical regularities
4.2. MCL application
4.3. Collective rule recognition algorithm (Colw application)
4.4. Table Transformation (Tran application)
5. VISUALISATION
5.1. Why is it necessary
5.2. How to start Loreg Visual
5.3. What and how is displaying
5.4. How to change active items
5.5. How to get information
5.6. Menu commands
5.7. Getting help
6. LOREG IN DECISION OF PRACTICAL PROBLEMS
( EXAMPLE OF SOME MEDICINE DIAGNOSTICAL PROBLEM)
7. HOW TO DECIDE RECOGNIZING TASK - FIRST
STEPS
7.1.Original information
7.2. Learning
7.3. Recognition for objects from file
7.4. Recognition in on-line mode
7.5. What else?
1. INTRODUCTION |
1.1. "LOREG" -- What is it? |
The program system LOREG have being designed for logical regularity
search, data analysis, and pattern recognition. The feature descriptions
of objects being used. There is a finite number of classes (patterns),
and the training information being given by sample of object descriptions.
The following problems can be solved by system LOREG:
There are the extended and useful tools for System adaptation to
user's demands or real data restrictions. User has multivariate realization
of different stages of regularity search process and can obtain the most
adequate to size and quality of training data results (optimal pattern
recognition algorithms, the collective of recognition algorithms, the sets
of the same parameters by various methods calculated). The based upon combination
of new logical, optimization, and statistical technique theoretical backgrounds
of pattern recognition methods have being elaborated in the Computing Center
of Russian Academy of Sciences.
1.2. Information Requirements |
The recognizing and training objects are described in terms of numerical
features. Each class of training table must contain two objects (they can
be equals) or more. It is available the partial contradictory of class
descriptions up to their intersection as well as incompleteness of data
(absence of some feature values).
IT DOES NOT TAKE any additional restrictions to information.
You will obtain answers for the questions about training information
quality, feature space informativity, existence of logical interrelations
between features and classes, exactness of recognition problem decisions,
and correctness of pattern recognition statement itself from the experience
with LOREG.
1.3. Application Areas |
The system LOREG could being used for diagnostics, recognition,
forecasting, and classification problem solution in medicine, politic,
sociology, physics, chemistry, technology, economics, geology, business,
finances.
These algorithms have being applied for solution of practical problems
in the following areas:
and others (more than 60 applications).
1.4. System requirements |
80386 processor or higher, Windows'95, 4Mb RAM, 10Mb HDD.
1.5. Loreg Setup |
To install LOREG, insert installation disk in floppy drive and run Setup.exe from Windows, then follow instructions displayed on the screen.
1.6. Bibliography |
2. THEORY |
2.1. "Loreg" - new approach for data analysis and pattern recognition. |
There are various "classical" methods being based upon
statistical approaches, structure analysis, modelling on neural networks,
fuzzy sets for data analysis, classification and pattern recognition.
The mathematical background of program system LOREG is a combination
of some logical, optimization and statistical technique for data analysis.
Logical analysis consists in a search on training table of special
fragments and fragment neighbourhoods of training objects that being described
in terms of features. Such fragment neighbourhoods are "typical"
for some classes and "non-typical" for the others. The parametric
and nonparametric methods have being elaborated for their description and
search.
Optimization approach consists in introduction of various numerical
criteria for estimation of fragment neighbourhoods and solution of corresponding
mathematical programming problems for search of optimal feature neighbourhoods.
The optimal solutions are interpreted as logical regularities.
Statistical ideas are used both by introduction of various optimization
criteria (functionals) and for creation of decision rules in pattern recognition
algorithms.
Practically everything is much hard and interrelated. This technique
is however the "interior part" of LOREG that does not require
any special knowledge from a user.
The major advantage of LOREG is the availability of automatically calculation
over training data of new useful quantities and knowledge that could being
test in a simple way.
2.2. Standard Information |
The following terminology for initial information description is
used.
A set M = {K1, K2,..., Kl}
of objects, phenomena or processes is considered that can be represented
as the union of l non-intersected subsets K1, K2,...,
Kl called classes or patterns.
Each object S is represented as some numerical row s1,s2,...sn
of feature values x1,x2,...xn
which characterize indirectly the affiliation of object S ={ s1,s2,...sn
} to class Kj.
The initial information I0 about classes is given
as a sample of object descriptions S1, S2, ...
, Sm, contained the representatives of all classes. The
initial information I0 is presented as a numerical training
table (table of templates) and additional information consisting from
number of features - n, number of classes - l, training objects
distribution on classes (vector m = (m0, m1, m2,...,
ml)) and code of feature value absence (some integer value
r) in descriptions of objects. Here m0 = 0,
m1 is a number of training objects from the first class,
m2 is a number of training objects from the first and
the second classes, and so on. The total number of training object is ml.
2.3. Logical Regularities |
The predicate Pj(S) is called logical regularity for class Kj if the following conditions are being satisfied:
The functional f(Pj) = <
a number of training objects Si from class Kj
such that Pj(Si) = 1> is the optimality
criterion in the system LOREG.
The predicate Pj(S) is called partial logical
regularity for class Kj if the conditions 1 and 3 are satisfied.
The determined in the system LOREG logical regularities are of the
following type: Pj(Si ) = & (sit
- eit <= st <= sit + eit
), where the conjunction is taken over some feature subspace and training
object Si = {si1,si2,...sin}
belongs to class Kj. The automatically calculated vector
Ei = {ei1,ei2,...ein}
gives the characteristic for class Kj neighbourhood of
object Si in the feature description space.
The geometrical interpretation of predicate Pj(Si)
is a hyperparallelepiped in some feature description subspace with object
Si as the central element. This hyperparallelepiped contains
the training objects only from the same class. Some optimality condition
is satisfied additionally. Denote such hyperparallelepiped contains the
training objects dominantly from the same class t in the case of
partial logical regularity. Such neighbourhoods are called the optimal
feature neighbourhoods.
2.4. Logical Descriptions |
The disjunctive form Dj(S) = Pj1(S)
V Pj2(S) V ... V Pjh(S)
is called a logical description of class Kj , if the
disjunction is taken over the set of logical regularities Pji(S)
for class Kj. Evidently Dj(Si)
= 1 for all training objects Si from class Kj,
and Dj(Si) = 0 for all training objects Si
that does not belong to class Kj. So, Dj(S)
can be considered as characteristic function for class Kj.
The disjunctive form Dj(S) of minimal number of conjunctions
Pji(S) is called the shortest description
of class Kj.
The disjunctive form Dj(S) contained minimal number
of variables is called the minimal description of class Kj.
2.5. Pattern Recognition Algorithms |
The pattern recognition problem is formulated as the calculation
of predicates aj(S), j=1,2, ... , l, for object
S to be recognized.
The condition aj(S)=1 denotes the assignment of object
S to class Kj by pattern recognition algorithm.
The condition aj(S)=0 denotes object S does
not belong to class Kj by opinion of pattern recognition
algorithm.
The row a(S) = (aj(S), aj(S), ... , aj(S))
containing the single unity denotes the single-valued pattern recognition
problem solution for object S.
We have the many-valued pattern recognition problem solution if the
row a(S) contains two or more unites. Pattern recognition algorithm
denotes several classes containing probably the object S.
The zero row a(S) is interpreted as the absence of similarity
for S to anyone class.
The presented in system LOREG pattern recognition algorithms are based
upon determination of logical regularities and use of voting procedures.
The basic scheme of recognition is a successive three-step procedure.
1. Logical regularity determination.
With training information I0 the sets {Pji(S)}
of logical regularities for classes Kj are determined.
2. Proximity measure calculation.
The proximity measure Gj = Sum(bi Pji(S))
of object S for class Kj is calculated. The coefficients
bi can be introduced in various ways. They define the
mode of the voting. The particular modes of their determination are considered
in chapter 4. The measure is interpreted as a weighted sum of votes for
class Kj.
3. Recognition.
The object S is related to class Kr if Gr=
max {Gj , j=1, 2, ... , l}.
The first step is the training one. The other steps are of the recognition
steps.
2.6. New information for features, objects and classes |
The sets {Pji(S)} of logical regularities
for classes Kj can be used not only for construction
of classes' logical descriptions, proximity measures estimations of objects
to classes and pattern recognition problem solving but also for determination
of some important informational characteristics for classes, training objects
and features.
The parameter pi = Ni /N is called informativity
measure for feature (or weight of feature) , if Ni
is the quantity of logical regularities containing feature xi
and N is a total number of logical regularities. It may be introduced
in a similar way the with particular classes' associated informativity
measures for objects.
The other important characteristics for training objects and classes
are the average length of logical regularities for objects (for
classes) and quantity of logical regularities for objects (for classes).
2.7. Collective solutions |
Collective rule recognition algorithm builds collective classification by taking into account individual rule recognition results weighted by given values. As a result, for a given object it's relative proximity measure to each class is calculated. The object is considered to be of a class with maximal value of relative proximity measure.
3. FORMAT OF INITIAL INFORMATION |
The initial information I0 about classes (see 2.2) must be presented as a Standard Format Table in the following form:
<header>
<M x N numeric table>
Header: N L m0 m1 ... mL b
N |
- number of features (columns) - number of classes - end of class object (row) indexes (mL=M) - numeric value denoting blanks |
Example:
3 2 0 2 5 -1 - header
1 4.5 -1
2.3 8.14 2.15 - the end of the 1st class
1.1 2.9 7
2.5 8.4 6.86
3.21 4.46 12 - the end of the 2nd class
Transformation of the table for purpose of recognition problem investigation can be performed by TRAN application included in the LOREG.
4. LOGICAL REGULARITIES DETERMINATION AND PATTERN RECOGNITION PROBLEM SOLVING |
4.1. Pattern recognition algorithms based upon voting over sets of logical regularities |
The based upon voting over sets of logical regularities' pattern
recognition algorithms are constructed by training data as the result of
program MCL application. Program MCL has the following five
control parameters: <Y1, Y2, Y3,
Y4, Y5, Y6>. Let us introduce their
definitions.
Y1 = "Weights of neighbourhoods for features".
The determination of logical regularities (optimal feature neighbourhoods)
is based upon solution of special integer- valid linear programming problems.
Coefficients of goal functional and of constraint matrix are calculated
by initial information I0.
The value "constant" of parameter Y1
corresponds to the choice of functional F (z1, z2,
... , zn) = Sum (zi), where zi
is 0 or 1 . The value zi = 1 of optimal
solution corresponds to participation of feature xi in
the found logical regularity (optimal feature neighbourhoods) and vice
versa.
The value "functional" of parameter Y1
corresponds to the choice of functional F of more complex type taking
explicitly into account the results of feature values' comparison of objects
from the same and different classes.
Y2="Neighbourhood size". Usually, there
is a continual set of feature neighbourhoods equivalent to the found optimal
feature neighbourhoods . Some feature neighbourhoods are called equivalent
if they contain the training subsets of like composition.
The values "Max", "Min", "Norm"
of parameter Y2 give us geometrically similar optimal
feature neighbourhoods of maximal, minimal and middle size (according to
inclusion relation). The corresponding recognition algorithms realise "greedy",
"careful" and "compromise" approaches in voting procedures.
Y3 = "Vote method". There are various approaches
for choice of parameters bi in proximity measure Gj = Sum(biPji(S))
calculation.
The choice bi = f(Pj) (see 2.3) corresponds
to the value "Proportional" of parameter Y3.
The bi parameters are calculated according to statistical
weighted procedure in the use value "Statistical".
Y4 = "Affiliation of neighbourhood to class".
The question of optimal feature neighbourhoods' existence becomes problematical
under partial contradictories and incompleteness of training data. The
obtained values f(Pj) can be small and the optimal feature
neighbourhoods will be "statistically unjustified". The recognition
algorithms will be unstable.
A parameter 50<=Y4<=100 is introduced for optimal
application of program MCL in the cases of bad class separability
on training data (partial contradictories, incompleteness, large random
noise in training data, and the like).
The values Y4<100 correspond usually to the case
of partial logical regularities Pj(S). The quantity of
violations of condition 2) will increase, as a rule, under decreasing values
of Y4. So, this parameter gives us some lower bound for
proximity measure of feature neighbourhood to classes
or, in other words, "a power of realness" of condition 2) over
training data under consideration.
The value Y4=100 corresponds to the case when predicates
Pj(S) are the logical regularities.
Y5 = "Exactness". The parameter characterises
the exactness of some integer-valid linear programming problem that is
the most important step of logical regularities search. The parameter Y5
is a natural number from 1 to 5. The unit corresponds to minimal exactness
of solution (the minimal exactness does not denote "bad"). The
number five corresponds to maximal level of solution exactness. Naturally,
both program's speed and possible dimensions of processing data are in
a direct correlation with values of Y5 (direct and inverse,
respectively).
Y6 = "Minimal representativness". For each
found logical regularity the number of objects from own class of this regularity
with the coordinates satisfying corresponding predicate is calculated.
If the ratio (in per cents) of this number to the whole number of objects
in class is less than the discussed parameter the regularity is excluded
from further considerations and use.
The possibilities of control parameters use are important in view of
the following points.
At first, the user has possibility to select the parameter values that
will be the most appropriate to concrete practical data.
At second, the user can construct various recognition algorithms for
recognition problem solution by collective of algorithms.
4.2. Applicaton MCL |
The program MCL is used for the construction of optimal pattern recognition algorithms belonging to the logical regularities model and for recognition of new objects. The program also allows to represent in convinient form for user some practically useful information about the structure of constructed algorithm.
Program initialization. Project file.
The Windows 3.1 is necessary to initialize the program. After initialization
the main program window appears at the screen. Before the beginning of
calculations user must create the project file or open the existing one
with the help of command Create (Open) Project menu. The project file contains
the names of data files used for calculations and meanings of the algorithms
parameters.
After creating or opening the project file the user with the help of
command Work from menu Project initializes the dialogue the Work with the
project. Here it is possible to chose the meanings of algorithm parameters
, to choose data files names, to initialize the training procedure, to
initialize the recognition procedure, to load the files for print (see
the data files description) to the MCL editor.
It is possible to change the algorithm parameters meanings in the dialogue
Parameters (command Parameters from the dialogue Work with project)
After the completion the work with the dialogue Parameters the user
can save the changes in the project file (command Save) or to cancel them
(command Cancel).
The command Change from the dialogue Work with project is used to change
the considered data file.
Training procedure.
In the training mode (command Training from dialogue Work) the program
by the table of templates and in accordance with control parameters constructs
the optimal recognizing algorithm belonging to the model with the logical
regularities. The found recognizing algorithm is put to the file with the
name coinciding with the name of the project file and with the extension
.mcl. The result of the training are saved as text file for print.
The most important results of training are put to the file of special
format ("Video file"), which is used by the program Loreg Visual.
The recognition procedure.
The recognition mode allows to solve the task of recognition (classification) of objects from recognized objects table applying recognizing algorithm found in training mode. The results of the classification are put to the File for print (recognition) and "Video file".
The results of the work examination.
The results of the work of program are saved in the files for print
and Video-file. Files for print are simply text files. It is possible to
examine them in any text editor and in the own MCL editor . To load file
for print to the MCL editor commands Load from the dialogue Work with project
and command Open menu file are used. The special visualization program
works with Video file. (see LOREG VIDEO).
Formats of files containing tables of templates and tables of recognized
objects must be standard ones.
File for print (training).
The created at the training stage file contains the heading with the information about project and various useful information.
Informational characteristics of objects:
Informational characteristics of classes:
Informational characteristics of task:
File for print (recognition).
The file is created at the stage of recognition. It contains the heading with the information about project and recognition results.
The file contains the information about coincidence of initial and received classifications.
4.3. Collective Rule Recognition Algorithm. |
Collective rule recognition algorithm builds collective classification
by taking into account individual rule recognition results weighted by
given values. As a result, for a given object it's relative proximity measure
to each class Gi (i=1, 2, ... , l; i - number
of classes) is calculated. The object is considered to be of a class
Kc if the corresponding value of relative proximity measure
is maximal.
Recognition threshold D (0<D<=1) sets the caution level of
recognition. If for some class Ki=/=Kc , Gi/Gc > D,
the object is considered to be of a class K0 (not recognized).
The lower value of D the more cautious classificator.
Input and Output
Input
1. VIDEO.* files in working directory, containing individual rule training
and recognition results and feature weights. The application relies on
true internal format of the files, all of them being of the same problem.
Otherwise, behaviour of the application is unpredictable.
2. Parameters (see below).
Output
1. VIDEO.COL file in the working directory, containing collective rule
training and recognition results, and feature weights. The old version
of the file is deleted on the beginning of the application.
2. Listing file - text file in the working directory, containing collective
rule recognition results. On successful finishing the application invokes
MS Windows NotePad application with this file. You can look through/edit/print
it.
Parameters
Listing File Name | The name of the text file where collective recognition results are written to (input). |
Threshold | Sets the caution level of recognition (0<th<=1). The lower value of threshold the more cautious classificator (input). |
Individual Rules | List of individual recognition rules for which recognition results are available (output). |
Individual Rule Weights | Relative weights of individual recognition rules (w>=0) (input). |
4.4. Table Transformation (Tran application) |
Table transformation (Tran) application provides some simple operations on Standard Format Tables. Giving ranges for New Classes and New Features we construct new view of recognition problem under consideration.
Parameters
Input Table | Source for transformation. |
Output Table | Result of transformation. |
New Classes | Defines new classes (1, 2, ...) in terms of input table row indexes
(leave empty if no changes). |
New Features | Defines new feature subset in terms of input table column indexes (leave empty if no changes). |
Standard Format Table
<header>
<M x N numeric table>
Header: N L m0 m1 ... mL b
N |
- number of features (columns) - number of classes - end of class object (row) indexes (mL=M) - numeric value denoting blanks |
Example:
3 2 0 2 5 -1 - header
1 4.5 -1
2.3 8.14 2.15 - the end of the 1st class
1.1 2.9 7
2.5 8.4 6.86
3.21 4.46 12 - the end of the 2nd class
New Classes
Class Number Ranges
1 Lo11 Up11
1 Lo12 Up12
.........................................................
1 Lo1k1 Up1k1
2 Lo21 Up21
.........................................................
(Up >= Lo)
New Features
Ranges
Lo1 Up1
Lo2 Up2
......................
(Up >= Lo)
5. VISUALISATION |
This chapter explains how you can use Loreg Visual for analysis training information and recognition results.
5.1. Why is it necessary? |
As result of training and recognition you obtain large amount of
useful information in the form of print files. However, it is interesting
for user the observation of the more important results. This task is realised
by using the program Loreg Visual.
For visual analysis of training information, classes description, logical
regularities and recognition results against the background of learning
data in different feature subspaces you may use Loreg Visual. Use it and
you study your task as well as possible.
5.2. How to start Loreg Visual? |
To begin this application you must have information, which organized
as standard table or, that is more desirably , some video-file. You can
get such file as result of MCL application work.
Loreg Visual is the part of LOREG and you can start it by two ways:
Start window Loreg Visual appears.
To load your information for visual analysis choose File / Training Table. Open dialog box appears. Choose one of the following file types from the box <List types of file>:
ADVICE Most interesting analysis you can do if you have video-file, therefore execute MCL application before starting Loreg Visual. |
Choose file from file list. If file is correct then the main window of Loreg Visual appears.
5.3. What and How is displaying? |
Main window of Loreg Visual has the next components:
An example 1 of the tools panel. |
Main component for visual analysis is visual area. Visual areas intend for representation of different types of information. Following types of information are displaying.
Features
One pair of features is active at any time. One of the feature is displaying
at the axe X, other at the axe Y. Numbers of the active features
are displaying as Xi near the axes. Representative interval
is defined as region of feature meanings on the set of the training objects.
Coordinates beginning corresponds to pair of the minimal meanings of the
active features.
Objects
There are three types of objects:
Training objects
Any training object belongs to one of the classes. Projections of the
objects at the plane of active features pair are displaying as circles
of different colours for various classes or as figures, where each figure
corresponds to some class. To change mode of displaying of the objects
choose from menu Object / Mode.
If object is inside active logical regularity then white circle with
smaller radius is displaying inside such object.
One of the training object may be active. Active object is blinking.
Recognizing objects
Recognizing object may be belonged to one of the classes, if recognition
have been previously made and results saved in the video-file.
Recognition objects are displaying as small rectangles of different
colours for different classes or as question marks. If recognizing object
is not recognized then the colour is grey. Display mode switches by choosing
from menu Object / Mode.
If object is inside active logical regularity then white circle with
smaller radius is displaying inside such object.
One of the recognizing object may be active. Active object is blinking.
New objects
LOREG VISUAL can recognize new object in interactive mode. To do
it use menu item Object/New. Dialogue appears. Input values of features.
If the value of some feature is unknown than will be unknown code used).
Choose button Recognize. After calculation the next information appears:
class, table with estimates and estimates diagram corresponding to recognizing
object. Radius of circle depends from count of logical regularities which
took part in recognition.
An example 2 of the tools panel. |
Class logical descriptions
If you consider video-file then one of the class descriptions is active
(full, shortest or minimal) and one of the classes is active. To display
class logical description choose from menu Description / Output. Distribution
of the logical regularities of active class appears:
An example 3 of the tools panel. |
Logical regularities
If you consider video-file then one of the class descriptions is active
(full, shortest or minimal) and one of the regularities is active.
Active regularity is displaying as rectangle if displaying features
are available at regularity. Colour of regularity corresponds to colour
of it's class.
If some object is inside the active regularity, it has a circle of
white colour with smaller radius in the centre of their presentation.
5.4. How to change active item |
This chapter explains how to change active features, object, logical regularity, class description, which are displaying at the visual area of main window Loreg Visual.
Features
To change active pair of the features use one of the following methods:
ADVICE If you analyse some logical regularity, use method 4. It allows to consider pairs of features presenting at active logical regularity. If you analyze class description, use method 3. In that case, you consider pairs of features in informativity order. |
Training objects
To change active training object use one of the following methods:
Recognizing objects
To change active recognizing object use one of the following methods:
Class logical descriptions
To change type of class description choose from menu Description /
(Full, Shortest or Minimal). Active class description is checked. The active
logical regularity is changed too, while you change the active class description.
To display class logical description choose from menu Description /
Output.
To change active class choose from menu Regularity / Next class.
Logical regularities
To change active logical regularity use one of the following methods:
5.5. How to get information |
This chapter explains how to get information about features, objects,
logical regularities, class descriptions, which are displaying at the visual
area of main window Loreg Visual.
Information dialogue boxes intend for getting different information.
To start such dialogue boxes use menu. To exit to main mode, use button
<Exit>.
Common information
To get a common information about task choose from menu Information
/ Common. Information about filenames of training and recognizing tables,
quantity of the classes, training objects and features appears.
Information about training object
To get information about training object choose from menu Object
/ Find and info. The following information appears:
Some dialogue box intends for displaying and changing active training object. To change active object choose button <Next> or <Previous>. When you press button <Exit>, the last displaying object will be active.
Information about recognizing object
To get information about recognizing object choose from menu Object
/ Recognizing. The following information appears:
That dialogue box intends for displaying and changing of active training object. To change active object choose button <Next> or <Previous>. When you press button <Exit>, last displaying object will be active.
Information about classes
To get information about classes choose from menu Information / Classes.
Information about distribution of the objects for classes appears.
Information about features
To get information about features choose from menu Information / Features.
Information about minimal and maximal meaning and informativity of features
appears. Informativity of features may be displayed as diagram. Choose
button <Diagram>. You can change order (on numbers or on informativity).
The green line corresponds to single level of informativity.
Information about class description
To get information about class description choose from menu Information
/ Description. Information about quantity of logical regularities for each
description appears.
Information about logical regularity
To get information about logical regularity choose from menu Regularity
/ Find and Info.
The following information appears:
The dialogue box intends for displaying and changing active regularity and active class. To change choose button <Next> or <Previous> or <Next Class>. After pressing of the button <Exit>, the last displaying regularity and class will be active.
5.6. Menu commands |
This chapter explains purpose of menu commands. Description has the following structure: menu item and items of submenu with descriptions and references.
File
Training table - choosing file with information for visual analysis,
see 5.2;
Recognition table - choosing file with recognizing information.
Use it item if you didn't use MCL application and hasn't video-file;
Exit - exit from Loreg Visual;
Regularity
Next - make active next regularity of active class, see
5.4;
Previous - make active previous regularity of active class,
see 5.4;
Next class - make active next class and active regularity to
be first, see 5.4;
Find and info - getting information about logical regularity,
see 5.5;
Object
Next - make active the next object, see 5.4;
Previous - make active the previous object, see
5.4;
Mode - change mode of displaying of objects (circles/figures),
see 5.3;
Recognizing - getting information about recognizing object,
see 5.5;
Find and info - getting information about training object, see
5.5;
Feature
on Axes - define active pair of the features, see
5.4;
on Weights - start to view features in order of its informativity,
see 5.4;
on Regularity - start to view features, which are presenting
in the active regularity, see 5.4;
Find and info - getting information about features, see
5.5;
Description
Full, Shortest, Minimal - choosing active class description, see
5.4;
Output - change mode of displaying of class description (single regularity
/ class description ), see 5.3.
Information
Common, Classes, Regularities, Descriptions, Features - getting information,
see 5.5.
Help
Contents - getting help contents, see 5.7;
Language - change language of interface, see 5.7;
Tools Panel - change mode (on/off) of tools panel;
About - getting information about Loreg Visual;
5.7. Getting help |
If you need help, complete any of the dialogues box or menu items
presented by Loreg Visual. Online Help is available by pressing key <F1>
in any time of work with program. Moreover, Help contains main definitions
(such as object, feature, video-file, etc.), description of modes of the
work and answers on your main question. To get contents of online help
choose from menu Help / Contents.
Loreg Visual allows change language of interface. To change language
choose from menu Help / Language. Items of menu and dialogues box are saved
at the files lang1.txt, lang2.txt, it allows translate interface for any
language.
6. LOREG IN DECISION OF PRACTICAL PROBLEMS (EXAMPLE OF SOME MEDICINE DIAGNOSTICAL PROBLEM) |
In proceedings of 9-th Scandinavian conference ( Sweden, Uppsala,
on June 6-9, 1995 ) there were presented preliminary results of a problem
of melanoma recognition1 by 32 features, first
17 of which describe the geometrical form of tumour, last 15 features -
its radiological characteristics. The initial information was made by sample
of numerical lines, each of which is 32D descriptions or malignant lesions
(class 1), or benign formation (class 3), or "intermediate" dysplastic
objects, (class 2). The problem of melanoma recognition consisted in automatic
classification of a line of 32 numbers, to one of three above-stated classes.
The initial information was casually splitted on two tables, including
representatives of all classes: the training table (17 objects of the first
class, 20 of the second and 20 of the third) and
the test table (12, 10 and 10 objects of appropriate classes).
So, our task will consist in investigation of the training table TLMEL.TAB
with the help of the application MCL, solving of the recognition problem
for lines of the table TRMEL.TAB, including construction of individual
and collective rule algorithms, analysis of received results. Thus, we
do not use any assumption or knowledge of the internal contents of problem,
as some problem of medical diagnostics. We merely process the given numerical
tables.
We begin computing process from running application MCL.
We create a project file m1.mcp, using option <PROJECT> and <CREATE>
of the main menu of the program.
Through option <PROJECT> and <WORK> we establish names
of files and parameter values: tlmel.tab - reference table ( training table
), trmel.tab - table of objects to be recognized, m1.inf - file of results
of the analysis training information and, m1.ans - file of recognition
results, video.m1 - special format file of main results for visualization.
We choose parameters in option <CHOOSE>: "Weight of neighbourhoods
for features" (Y1) = "functional", "the
size of a neighbourhood" (Y2) = "Max",
"A method of voting" (Y3) = "proportional",
"Affiliation of neighbourhood to a class" (Y4) = 100,
"Exactness" (Y5) = 1, "Minimal representativity"
(Y6) = 3.
After sequential fulfilment of commands <TRAINING> and <RECOGNITION>
with the help of a command <LOADING> (or option <FILE> of the
main menu of the program MCL) it is possible to look through files of training
and recognition results. For each sample the number of logical regularities
(number of features in their record), length of the best regularity, its
weight (number of the samples from the same class satisfying the best regularity)
and relative share of given samples from total amount of samples. At last,
the best logical regularities are presented: feature numbers and intervals
of their variations.
For each new applied problem we a priori do not know, which managing
parameters are better to use, therefore it is worth to conduct series of
calculations at various variants of parameter values. We shall present
results of test information recognition, which were received by various
algorithms of voting on logical regularities. Below, in the table, results
of 13 calculations are presented: values of managing parameters and percent
of the true answers on test information.:
N of calcul. | Neighb. weights | Neighb. .size | Voting method | Affilia t. to class | Min repr. | Exact- ness | % of true answers |
1 | func. | max | prop. | 100 | 1 | 3 | 68.7 |
2 | func. | max | prop. | 100 | 1 | 1 | 71.8 |
3 | func. | max | stat. | 100 | 1 | 1 | 59.3 |
4 | func. | min | prop. | 100 | 1 | 1 | 71.8 |
5 | func. | min | prop. | 80 | 1 | 1 | 75.0 |
6 | func. | min | stat. | 80 | 10 | 1 | 65.6 |
7 | func. | min | prop. | 80 | 10 | 1 | 65.6 |
8 | func. | min | stat. | 80 | 20 | 1 | 71.8 |
9 | func. | min | stat. | 80 | 30 | 1 | 68.7 |
10 | func. | min | stat. | 80 | 40 | 1 | 65.6 |
11 | func. | min | stat. | 100 | 20 | 1 | 71.8 |
12 | const. | norm. | stat. | 100 | 20 | 1 | 53.1 |
13 | const. | norm. | prop. | 100 | 20 | 1 | 59.3 |
In recognition of any unknown new objects
there is a natural question: What exactness of the decision? From presented
table we can see that it can vary in reasonably large limits. The system
LOREG enables to evaluate recognition results by received results of training
(length of logical regularities, number of informative features, estimates
(votes) of objects and etc.). For example, too small values of estimates,
lengths of regularities, or number of informative features usually indicate
about bad quality training information, essential difference of recognition
data from training data or rather unsuccessful choice of parameter values
for application MCL. In the case it is recommended to repeat training process
at other parameter values (for example, in accordance with calculation
N 1 or N 2). The preferable variants of parameters choice usually become
clear for not large experience of work with the system.
Moreover there is a opportunity of automatic reception of stable results
with use of adjusting module COLW: some calculations are carried out at
various parameter values of the program MCL and then collective decision
rule with the help of module COLW is calculated. In the case the errors
of various calculations usually "absorb" one another, and as
a rule we have the best decision or close to best.
Collective decision, constructed on the basis of all 13 mentioned above
algorithms has supplied 71.8 % of true answers, i.e. was hardly below the
best of them. We shall allocate those algorithms and their decision, to
which there corresponded less than 70 % of the true answers. Given eight
algorithms are marked by a information line on Fig. 12. The collective
decision, constructed on their base, had 68.7 % true answers and has matched
the best decisions from the given list.
7. HOW TO DECIDE RECOGNIZING TASK - FIRST STEPS |
We consider some simple pattern recognition example as illustration. What minimal steps must we do to decide the recognition task?
7.1. Original information. |
Consider simple example with four classes. There are named as "Normal", "Easy" ,"Hard", "Catastrophe". Recognizing objects are described in terms of next 7 features: "width", "depth", "colour", "weight", "temperature", "length", "size". We have the following 3 files in a form of text format.
1. table_l.tab - #CLASSES Normal Easy Hard Catastrophy #FEATURES width depth color weight temperature length size |
2. table_l.tab 7 4 0 3 6 9 12 -1 11 15 23 12 12 19 -1 23 36 23 17 18 26 34 19 42 22 31 15 73 11 17 77 33 43 38 88 95 15 75 38 -1 64 86 75 10 86 30 21 89 90 20 55 34 34 18 70 59 23 59 37 38 26 72 62 36 70 36 37 -1 81 70 40 10 45 56 92 32 66 40 17 42 59 38 23 67 43 14 -1 56 38 27 69 48 |
3. table_r.tab 7 4 0 1 2 3 4 -1 15 37 18 22 -1 -1 34 16 77 39 28 50 99 80 60 36 -1 27 77 60 35 15 40 51 40 25 60 50 |
Table_l.opi consists of information with
names of features and classes. These names are labels for users.
Table_l.tab (learning table) consists of learning (training) information
about objects and classes. First string is header: count of features(7),
count of classes (4), 0, number of last object of first class (3), number
of last object of second class (6), ... , number of last object of last
class (12), code for unknown value (-1). Other strings present the feature
descriptions of objects.
Table_r.tab (recognizing table). It has format that is similar one
to learning table and consists of descriptions of unknown objects (testing
objects). If it is the testing table, you may describe it in header. In
our example, it is <0, 1, 2, 3, 4>.
7.2. Learning. |
Run MCL from group LOREG. Main window of MCL appears.
Choose Project/Create. Dialogue "Create project FILE"
appears where user must change name of project with .mcp extension.
Let name is m1.mcp. Press OK. Choose Project/Work.
Set files with learning and recognizing objects.
Run Training. Results of training are placed in the special file m1.mcl
and ASCII - file m1.inf.
7.3. Recognition for objects from file. |
Press Recognize if you have recognizing table (table_r.tab). Results will be in ASCII - file m1.ans.
For each object you can see the result of its recognition (second column). For example, object 1 belongs to class number 1 and estimates (affiliation measures) are 0.379 (for 1 class), 0.000 (for 2 class), 0.030 (for 3 class), and 0.049 (for 4 class). It means this object is classified as object of the first class.
7.4. Recognition in on-line mode. |
You can input feature values for unknown object and recognize it. This
mode is possible only after step 2 (training).
Run Loreg visual, execute File/Learning table
Choose video.mcl.
Choose Object/New.
An example 4 of the tools panel. |
Input value of features for recognition object (for example, <21,72,39,59,41,68,25>) and press Recognize.
An example 5 of the tools panel. |
Decision of recognizing task appears (number of class, estimates and diagram)
7.5. What else? |
After that you can change parameters for new training. To estimate results you can use print-files of MCL, Visual analysis with Loreg Visual and get collective(colw) decision.
1 Authors thank very much Prof. H.Ganster from Technical University Graz (Austria) for placed at our disposal melanima data. (See. H.Ganster, M.Gelautz, A.Pinz, Initial Results of Automated Melanima Recognition. Procceedings of 9th SCIA, June, 1995, Uppsula, Sweden, pp.209-218) |
|