CATEGORII DOCUMENTE
Afaceri Calculatoare Casa masina Didactica pedagogie Diverse Educatie Finante Geografie Istorie & politica Legislatie Limba Management Sanatate Tehnologie

Bulgara	Ceha slovaca	Croata	Engleza	Estona	Finlandeza	Franceza
Germana	Italiana	Letona	Lituaniana	Maghiara	Olandeza	Poloneza
Sarba	Slovena	Spaniola	Suedeza	Turca	Ucraineana

Administration	Animals	Art	Biology	Books	Botanics	Business	Cars
Chemistry	Computers	Comunications	Construction	Ecology	Economy	Education	Electronics
Engineering	Entertainment	Financial	Fishing	Games	Geography	Grammar	Health
History	Human-resources	Legislation	Literature	Managements	Manuals	Marketing	Mathematic
Medicines	Movie	Music	Nutrition	Personalities	Physic	Political	Psychology
Recipes	Sociology	Software	Sports	Technical	Tourism	Various

Ranked modelling on feature vectors with missing values

computers

+ Font mai mare | - Font mai mic

DOCUMENTE SIMILARE

Routing Fundamentals and Subnets

CONTROL SYSTEM AND SOFTWARE DETAIL SPECIFICATION

USB 6 IN 1 Card Reader Usert�s Manual

3-D MODEL JOB

Updating the Magellan Maestro 40X0 to firmware version 2.36

ISPro 6.x Update Sample Project

Lab: Install Additional RAM

IP Network Numbers - Class A

Configuring Routing Information Protocol - RIP

ON-LINE communication protocol

TERMENI importanti pentru acest document

Ranked modelling on feature vectors with missing values

Abstract: Ranked models can reflect regularities in a given set of feature vectors enriched by a priori knowledge in the form of ranked relations between selected objects or events represented by these vectors. Ranked regression models have the form of linear transformations of multivariate feature vectors on the line which preserve in the best possible way given set of ranked relations. We pay attention to the situations when particular objects or events can be represented by feature vectors with different dimensionality. Different dimensionality of feature vectors might appear when values are missing or when successive changing of feature space occurs. The linear ranked transformations can be designed on the basis of feature vectors of different dimensionality via minimization of the convex and piecewise linear (CPL) criterion functions.

Key words feature vectors with missing values, ranked relations, ranked linear transformations, convex and piecewise linear criterion functions

Introduction

Exploratory data analysis or pattern recognition methods can be aimed at discovering regularities or trends in multivariate data sets [1], [2]. In a standard data representation, objects or events are represented in the form of feature vectors with the same number of numerical components (features) or as points in a feature space of fixed dimensionality. Assumption about equal dimensionality of feature vectors might be too restricted in many practical tasks. For example, the missing data often undermines the assumption about fixed dimensionality. Let us take the causal sequence of liver diseases formulated by medical doctors [3] as another example. It is natural to assume that the more serious cases in this sequence should be examined in a more comprehensive manner than patients with light diseases. As a result, the dimensionality of the feature vectors can increase in a successive manner in accordance with the causal sequence of diseases.

Prominent role in the exploratory data analysis is played by procedures originating from the regression analysis. Ranked regression models have the form of such linear transformations of multivariate feature vectors on a line which preserve given set of ranked relations between selected objects or events in the best possible way [4]. We are taking into consideration designing ranked regression models on the basis of given set of ranked relations between selected objects or events represented by feature vectors with different dimensionality.

The ranked regression model can be induced from a given set of feature vectors enriched by a set of ranked relations between some of these vectors through minimization of the convex and piecewise-linear (CPL) criterion functions defined on differential vectors [5]. Theoretical properties of the CPL approach with varied dimensionality of feature vectors are analyzed in the presented paper.

2. Ranked relations

Let us consider a family of m objects (events, patients) O_j(j = 1,., m). We assume that each object O_j may be entirely (completely) represented by the n-dimensional feature vector x_j[n] = [x_j1,,x_jn]^T. In this case, the vectors x_j[n] belong to the n-dimensional feature space F[n] (x_j[n]F[n]), and indices i of features x_i belong to the set I₀ = . The component (feature) x_ji of the vector x_j[n] is the numerical result of the i-th examination (i = 1,.., n) of a given object O_j. The feature vectors x_j[n] can be of a mixed type, and can represent different types of measurements of a given object O_j (for example x_iI or x_iIR

Let us assume that for some reason, for example due to missing values, particular objects O_j are not fully represented. This means that the object O_jis not represented by the n-dimensional feature vector x_j[n] but by the n_j-dimensional reduced vector x_j[n_j] (n_j n):

( j I x_j[n_j] = [x_j,i(1),, x_j,i(nj)]^T,

where i(k) I I_jand I_j I₀

In accordance with the above relation, each feature vector x_j[n_j] is characterized by its own set I_j of feature indices i(k). The reduced vector x_j[n_j] can be obtained from the n-dimensional feature vector x_j[n] by neglecting of the n - n_j features x_j,iwith the indices i not belonging to the set I_j. This means geometrically that the vector x_j[n_j] (x_j[n_j]F_j[n_j]) results from a projection of the vector x_j[n] on the feature subspace F_j[n_j].

In some cases, the family of objects O_j(j = 1,., m) can be characterised not only by feature vectors x_j[n_j] but also by ranked relations 'O_j O_k' between some of these objects. The ranked relation 'O_j O_k' ('O_k follows O_j') is fulfilled within pairs of the objects with the indices (j, k) from some set J_p:

(j, k) I J_p) O_j O_k O_k follows O_j

The family (3) of ranked relations 'O_j O_k', where (j, k) I J_p represents additional knowledge about some objects O_j. Let us assume, that the ranked relation is transient. It means that the below implication is fulfilled:

if O_j O_kand O_k O_l then O_j O_l

The ranked relation O_j O_k can be represented by the transient ranked relation 'x_j x_k' between feature vectors x_j[n_j] and x_k[n_k].

(j, k) I J_p) x_j x_k x_k follows x_j

For example additional knowledge about m = 5 objects O_j is represented by the family of three ranked relations between feature vectors: 'x x x x', and 'x x'. We are allowing a situation, where some of feature vectors x_j[n_j] (1) represented in the ranked relations have missing values

Ranked linear transformations

Let us consider linear transformation of the feature vectors x_j[n] (x_j[n]F[n]) on the line::

j I ) y_j = w[n]^Tx_j[n]

where w[n] = [w,,w_n]^Tis the parameter (weight) vector.

We are considering the problem of how to design such transformations (the ranked line y w^Tx (5) which preserve the relations x_j x_k the best.

Definition 1: The family of the ranked relations 'x_j x_k' (4) defines the sequential pattern P of the vectors x_j in the feature space F[n] (x_jI F[n] if and only if there exists such n-dimensional weight vector w that the below implication takes place

(j, k) I J_p) x_j x_k T (w )^Tx_j < (w )^Tx_k

The procedure of the sequential patterns P discovering and the ranked line designing y = w^Tx can be based on the concept of the linear separability of the set R of the differential vectors r_jk = (x_k x_j), where (j, k) I J_p

Definition 2: The set R (7) is linearly separable in the n-dimensional feature space F[n] if and only if there exists such a weight vector w that the below inequalities hold

w ( r_jkI R (w )^Tr_jk >

The weight vector w defines the hyperplane H(w ) in the feature space:

H w

The hyperplane H(w ) passes through the point 0 in the feature space. If the inequalities (8) hold, then the hyperplane H(w ) separates the set R⁺ (7). It means that all the elements r_jj of the set R⁺ (8) are located on the positive side of the hyperplane H(w

Lemma The family of the ranked relations (4) defines the sequential pattern P (6) in the n-dimensional feature space F[n] (Def. 1) if and only if the set R (7) is linearly separable (8) in this space.

Proof: If the hyperplane H(w ) (9) separates (8) the set R (7), then the below inequalities hold (4):

( r_jkI R) (w )^Tx_k > w )^Tx_j

As a result, the implication (6) is true. O the other hand, if the implication (6) takes place, then (8)

(( (j, k) I J_p (w )^Tx_j < (w )^Tx_k)T ( r_jkI R) (w )^Tr_jk > 0)

4. Dimensions equalization in pairs of feature vectors with missing values

In accordance with Lemma 1, the linear separability (8) of the set R (7) in the complete feature space F[n] allows for designing entirely ranked (6) transformation y = (w ^Tx. Now we will explore the possibility of designing ranked transformations on the basis of relations x_j[n_j] x_k[n_k]' (4) between feature vectors x_j[n_j] (x_j[n_j]F_j[n_j]) and x_k[n_k] (x_k[n_k]F_k[n_k]) from different feature spaces F_j[n_j] and F_k[n_k].

For this purpose , it is useful to define the set R (7) differently. The differences r_jk = (x_k x_j) of the feature vectors x_j[n_j] (x_j[n_j]F_j[n_j]) and x_k[n_k] (x_k[n_k]F_k[n_k]) could be defined for the vectors belonging to the same feature space. We will distinguish the types E and Eof the feature space equalization:

Type E: The equalized feature space F_j,k[n_j,k] for the relation 'x_j[n_j] x_k[n_k]' is defined as the intersection of the spaces F_j[n_j] and F_k[n_k]:

F_j,k[n_j,k] = F_j[n_j] F_k[n_k]

If the space F_j[n_j] is equal to F_k[n_k], then also the equalized feature space F_j,k[n_j,k] is equal to F_k[n_k]. If the space F_j[n_j] is disjoined with F_k[n_k], then the equalized space F_j,k[n_j,k] is equal to zero (empty).

Type E: The equalized feature space F_j,k[n_j,k] is defined as the sum of the spaces F_j[n_j] and F_k[n_k]:

F_j,k[n_j,k] = F_j[n_j] F_k[n_k]

It is necessary in this case to define missing values x_j,i or x_k,i for such features x_i which belong to only one of the feature spaces F_j[n_j] or F_k[n_k]:

if (i I I_j) and (i I_k), then x_k,i = c_k,I

if (i I_j) and (i I I_k), then x_j,I = c_j,I

where I_jis the set of feature indices i of the vector x_j[n_j] (1) and c_k,iis the value assigned to the missing value x_k,i In this particular case, all missing values can be equalized to zero:

if (i I I_j) and (i I_k), then x_k,i =

if (i I_j) and (I I I_k), then x_j,i =

The rules (13) or (14) and (15) allow to equalize feature spaces F_j[n_j] and F_k[n_k] related to each ranked relations 'x_j[n_j] x_k[n_k]' (4), where x_j[n_j]IF_j[n_j] and x_k[n_k]I F_k[n_k]. Equalization of the feature space gives the possibility to compute the differential vectors r_j,k[n_j,k] = (x_k^~[n_j,k] x_j[n_j,k])) (7), where x_k^~[n_j,k] and x_j[n_j,k]I F_j,k[n_j,k] (12) or x_k^~[n_j,k] and x_j^~[n_j,k] I F_j,k[n_j,k] (13).

Let us remark that the equalization of the Type E(12) related to the relation 'x_j[n_j] x_k[n_k]' means reducing some features x_i, but without introducing artificial values to any feature. During the equalization of the Type E (13), no value of any feature x_i is lost, but artificial values may be introduced to some features in this way. Generally, introducing artificial values could be the source of the ranked models bias.

5. Ranked transformations of equalized and enlarged feature vectors

The difference r_j,k[n_j,k] = (x_k^~[n_j,k] x_j[n_j,k]) of equalized feature vectors x_k^~[n_j,k] and x_j[n_j,k] can be defined for each ranked relations O_j O_k (2) by using the rule (12) or (13).

( (j, k) I J_p O_j O_kT r_j,k[n_j,k] = x_k^~[n_j,k] - x_j^~[n_j,k]

The differential vector r_j,k[n_j,k] (16) belongs to the feature space F_j,k[n_j,k] (10) or to the space F_j,k[n_j,k] (11). The dimension n_j,k of the vector r_j,k[n_j,k] (16), where r_j,k[n_j,k] I F_j,k[n_j,k] (10) or r_j,k[n_j,k] I F_j,k[n_j,k] (11) depends on the type of the feature spaces F_j[n_j] and F_k[n_k] equalization.

In order to design the ranked transformation (6) each differential vector r_j,k[n_j,k] (16) is enlarged to the full n-dimensional vector r_j,k[n], where r_j,k[n]F[n] and F_j,k[n_j,k] F_j,k[n_j,k] F[n]. The enlargement of the vector r_j,k[n_j,k] (16) to r_j,k[n] is done by putting the values zero for all such components of the vector r_j,k[n] which are not represented in the vector r_j,k[n_j,k] (zero-enlargement).

The set R (7) is now defined on the enlarged vectors r_j,k[n]:

R[n]

where J_p is the set of such pairs of indices (j,k) for which the ranked relation O_j O_k (2) holds for the objects O_jand O_k.

We will examine the possibility of representation (6) on the line (5) of the ranked relations 'x_j[n] x_k[n]' between enlarged vectors x_j[n] andx_k[n].

( w [n]) ( (j, k) I J_p

x_j^{^}[n] x_k[n] T (w [n])^Tx_j^{^}[n] < (w [n])^Tx_k^{^}[n]

In accordance with Lemma 1, linear separability (8) of the set R[n] (17) is the necessary and sufficient condition for the implication (18)

Lemma 2: If the set R[n] (17) of the enlarged vectors x_j[n] is linearly separable (8) in the feature space F[n], then the implication (18) also holds for the vectors x_k[n_j,k] with equalized dimension n_j,k

( w [n]) ( (j, k) I J_p

x_j[n] x_k[n] T (w_j,k [n_j,k])^Tx_j^~[n_j,k < (w_j,k [n_j,k])^Tx_k^~[ n_j,k

where x_j^~[n_j,k]F_j,k[n_j,k] (12) or x_j^~[n_j,k]F_j,k[n_j,k] (13), the parameters vector w_j,k [n_j,k] is obtained from the vector w [n] = [w₁,,w_n]^T (18) by reducing such components w_i, which are not represented by features x_i in the equalized vector x_j^~[n_j,k].

Proof: For each ranked relation 'O_j O_k' (2), the equalized feature vectors x_k^~[n_j,k] and x_j[n_j,k] are enlarged to the vectors x_j^{^}[n] andx_k^{^}[n] by including components equal to zero. As a result, the below equalities hold:

( (j, k) I J_p

(w [n])^Tx_j^{^}[n] = (w_j,k [n_j,k])^Tx_j^~[n_j,k

(w [n])^Tx_k^{^}[n] = (w_j,k [n_j,k])^Tx_k^~[ n_j,k

From these equalities result the implications.

In accordance with Lemma 2, if the ranked relations 'x_j^{^}[n] x_k[n]' ((j, k) I J_p) (4) form the sequential pattern P (6) of the enlarged vectors x_j^{^}[n], then the relations 'x_j^~[n_j,k x_k n_j,k]' form the pattern P of the equalized vectors x_j^~[n_j,k

Let us remark that given feature vector x_j[n_j] (x_j[n_j]F_j[n_j]) can be equalized and enlarged in a different manner, depending on the ranked relation O_j O_k (2) considered, and on the type of the equalization (the Type E(12) or the Type E⁺(13)).

The implication (18) in thesis of the Lemma 2 is fulfilled both for the Type E(12) as well as for the Type E (13) equalization. The Type E(12) equalization of the vectors x_j[n_j] and x_k[n_k] ( O_j O_k ) does not introduce the bias in the ranked model, but it can be a significant loss of information as a result of reducing features x_i. The Type E(13) of the dimension equalization means preserving information contained in all measured features x_i, but the bias is introduced as a result of artificial values c_k,i(14).

Convex and piecewise linear CPL) criterion function defined on feature vectors with varied dimensionality

Let us define the penalty function j_jk(w[n]) for each element (j,k) of the set J_p (2):

j,k I J_p

1 - (w_j,k[n_j,k ^Tr_jk[n_j,k if (w_j,k[n_j,k ^Tr_jk[n_j,k 1

j_jk(w[n]) =

0 if (w_j,k[n_j,k])^Tr_jk[n_j,k] > 1

where r_j,k[n_j,k] = (x_k^~[n_j,k] - x_j^~[n_j,k]) is the difference of the equalized vectors x_k^~[n_j,k] and x_j^~[n_j,k]), and w_j,k[n_j,k] is the parameters vector obtained from the vector w[n] = [w₁,,w_n]^T by reducing these components w_i, which are not represented by features x_i in the vector r_j,k[n_j,k].

The criterion function F(w[n]) is the weighted sum of the penalty functions j_jk(w[n]):

F(w[n] S g_jkj_jk(w[n])

^(j,k)^I^Jp

where g_jk g_jk> 0) is a positive parameter (price) related to the ranked relation 'O_j O_k'.

The function F(w[n]) (22) has a similar structure to the perceptron criterion function used in the theory of neural network and pattern recognition [2], [5]. The criterion function F(w) (22) is convex and piecewise linear (CPL) as the sum of this type of penalty functions j_jk(w[n]) (21). The basis exchange algorithms, similar to linear programming, allow one to find the minimum of such a function efficiently, even in the case of large multidimensional data sets [6]:

F F(w^*[n] = min F(w[n] 0

Lemma 3: The minimal value F(w^*[n]) (23) of the criterion function F(w[n]) is equal to zero if and only if the linear transformation y = (w[n])^Tx[n] preserves (18) all the ranked relations 'x_j^~[n_j,k x_k n_j,k]' between equalized vectors x_k^~[n_j,k] and x_j^~[n_j,k]).

Proof: In accordance with Lemma 2, the ranked relations 'x_j^~[n_j,k] x_k[n_j,k]' between equalized vectors are equivalent (20) to the ranked relations 'x_j^{^}[n_j,k] x_k[n_j,k]' between enlarged vectors x_j^{^}[n] andx_k^{^}[n].We infer on the basis of Lemma 1 that the ranked relations x_j^{^}[n] x_k^{^}[n] ((j,k) I J_p (2)) (4) form the sequential pattern P (6) in the n-dimensional feature space F[n] (Def. 1) if and only if the set R^{^}[n] (17) is linearly separable (8). In this case there exists such weight vector w [n], that the hyperplane H(w [n]) (9) separates (8) the set R^{^}[n] (17) and all the ranked relations 'x_j^{^}[n] x_k^{^}[n]' ((j,k) I J_p (2)) are preserved (18) by the model y_j w [n])^Tx_j^{^}[n]. By taking adequate large constant c (c > 0), we can assure that all the inequalities cw_j,k[n_j,k])^Tr_jk[n_j,k] > 1 (21) are fulfilled. As a result, all the penalty functions j_jk(cw [n]) (21) are equal to zero in the point cw [n] and the value of the criterion function F(cw [n]) is also equal to zero.

On the other hand, if the minimal value F(w^*[n]) (23) is greater than zero (F(w^*[n]) > 0), then exists such value j_jk(w^*[n]) of at lest one penalty function (21) which is greater than zero (j_jk(w^*[n]) > 0) in the optimal point w^*[n] (23). It means that the set R^{^}[n] (17) is not linearly separable (8) and not all ranked relation x_j[n] x_k^{^}[n] are preserved by the model y_j w[n])^Tx_j^{^}[n]

In accordance with Lemma 3, if there is no possibility of preserving all the ranked relations 'x_j[n_j,k] x_k[n_j,k]' (18) by any of linear transformations y = (w)^Tx then F > 0 (23). The linear transformation y = (w ^Tx defined by the optimal vector w (23) is called the ranked model.

Example: gradual enlargement of the feature space related to the causal sequence of liver diseases

A causal sequence of events can also provide the basis for ranked modeling. An example of such events is given by modeling of the causal sequence of chronic liver diseases _k ):

The symbol '_i _i+1' in the above sequence means that the disease _i+1of a given patient O_j resulted from his earlier disease _{i ,}or _i+1is a consequence of the disease _i (i = 1,., K-1). The sequence (24) should be formed in accordance with medical knowledge [3].

The ranked model of the causal sequence (24) was built with the use of Hepar system database [3]. About 800 feature vectors x_j(k) describing particular patients O_j(k) related to one of seven (K = 7) chronic liver diseases _k have been extracted from this database: - non hepatitis patients; - hepatitis acuta; - hepatitis persisten; - hepatitis chronica activa; - cirrhosis hepatitis compensata; - cirrhosis decompensata; - carcinoma hepatis. The sets feature vectors (examples, prototypes) x_j(k) ralated to particular diseases _k formed the so called learning sets C_k:

k I ) C_k =

where J_k is the set of m_k indices j of such feature vectors x_j(k) which are related to the disease _k: m₁ = 16; m₂ = 8; m₃ = 44; m₄ = 95; m₅ = 38; m₆ = 60; m₇ = 11.

The feature vectors x_j(k) in the database of the system Hepar are of the mixed, qualitative-quantitative type. They contain both symptoms and signs (x_i) as well as the numerical results of laboratory tests (x_iR). About 200 different features x_i describe one patients case in this system. For the purpose of preliminary computations, each patient has been described by the feature vector x_j(k) composed of about 40 features chosen as a standard by medical doctors.

The causal sequence (24) also allows to determine the ranked relation 'x_j(k) x_j(k between the feature vectors x_j(k) (x_j(k)IC_k) representing patients O_j(k) assigned to particular diseases _k

k, k I) ( x_j(k)I C_k) and ( x_j(k I C_k)

if _k _k' then x_j(k) x_j(k

Let us remark, that in accordance with the above rule, there is no ranked relations 'x_j(k) x_j(k)' between patients O_j(k) and O_j(k) assigned to the same disease _k

The causal sequence (24) represents the process of liver diseases development and transformation from the most light to the most serious state. It is natural to assume that patients with more serious diseases should be examined more extensively than patients with light diseases. Basing o this, the following scheme of the gradual enlargement of the feature spaces F_k[n_k] consistent with the sequence (24) is assumed here:

F[n₁] F[n₂] F_K[n_K] = F[n]

where F_k[n_k] is the n_k-dimensional feature space appropriate (standard) for the k-th disease _k

Let us remark that in all ranked relations 'x_j(k) x_j(k consistent with the rules (26) and (27) the vector x_j(k ) is represented by all the features x_i of the vector x_j(k) and possibly also by some other features

x_j(k) x_j'(k') T F_k[n_k] F_k'[n_k']

The Type E(12) of the vectors x_j(k) and x_j(k ) equalization is recommended for the case (28). The Type E(12) of the dimensionality equalization allows to define the differential vectors r_j,j[n_k] for all the relations 'x_j(k) x_j(k (26) in the below manner:

if x_j(k) x_j(k , then r_j,j[n_k] = x_j[n_k] - x_j[n_k]

where x_j[n_j] = x_j(k) and x_j[n_j] is obtained from the vector x_j(k ) by neglecting such features x_i which are not represented in the vector x_j(k) (i I_k). In the case (29) the equalized feature space is equal to F_k[n_k] (28) with the dimension n_k.

The rule (29) allows to define the positive penalty function j_j,j(w[n]) (21) for each ranked relation x_j(k) x_j(k (26). The penalty functions j_j,j(w[n]) (21) are defined by the enlargement of the the vector r_j,j[n_k] (29) to the vector r_j,j[n] (18) in the n-dimensional space F[n]. The enlargement of the vector r_j,j[n_k] (29) to r_j,j[n] is done by putting the values zero for all such components of the vector r_j,j[n] which are not represented in the vector r_j,j[n_k] (19). The vector r_{j, j'}[n_k'] (31) is enlarged to r_j,j[n] in a similar manner. As a consequence, the ranked linear model can be defined by the optimal vector w[n] (23) constituting the minimum of the criterion function F(w[n]) (22) in the n-dimensional feature space F[n]

y_j(k)= (w[n])^Tx_j(k)

where the enlarged vector x_j^{^}(k) (x_j^{^}(k) I F[n]) is obtained from the n_k-dimensional feature vector x_j(k) (x_j(k) I F_k[n_k]) by putting the values zero (x_i = 0) for all such components x_i of the vector x_j^{^}(k) which are not represented in the vector x_j(k)

The ranked model (30) can be used among others for prognosis or classification purposes. Let us consider the n-dimensional vector x (xI F[n₀]) representing patient Owith unknown diseases _k (k = 1,., K). The ranked model (30) allows to assign the point yto the vector x:

y= (w[n])^Tx

where xis the enlarged vector obtained from the n-dimensional vector x

The K-nearest neighbors (K-NN) decision rule can be used on the ranked model (30) for assigning some disease _k(0) to the patient Orepresented by to the vector x. For this purpose we are selecting the disease _k(0) which is mainly represented among K such points y_j(k) (30), which are nearest to y. The points y_j(k) (30) nearest to y (31), are characterized by the smallest absolute values | y_j(k) - y |

The K-nearest neighbors (K-NN) decision rule based on the points y_j(k) (30) and y(31) can be generally applied to feature vectors with missing values x_j[n_j] (1) and not only to the vectors x_j(k) (28) from the successively enlarged feature spaces F_k[n_k] (27).

8. Concluding remarks

The ranked linear models (30) can be designed on the basis of additional knowledge in the form of the ranked relations O_j O_k (3), which are determined within selected pairs of the objects (patients) O_j. Objects O_jcan be represented by feature vectors x_j[n_j] (1) taken from varied feature spaces F_j[n_j] (x_j[n_j]F_j[n_j]). A variability of feature spaces F_j[n_j] may result first of all from missing values in feature vectors x_j[n_j]. The ranked relations O_j O_k between objects O_j and O_krepresented by the feature vectors x_j[n_j] and x_k[n_k]. In some cases the relations 'x_j[n_j] x_k[n_k] can be well preserved (6) by the ranked model in the form of the linear transformation y = w[n]^Tx[n] (5) from the n-dimensional feature space F[n] (x[n]F[n]) on the line (yR). In accordance with the Lemma 1, the linear transformation y = w [n]^Tx[n] (5) preserves (6) all the ranked relations 'x_j[n] x_k[n] if and only if the set R (7) of the differential vectors r_jk[n] = x_k[n] x_j[n] is separated (8) by the hyperplane H(w [n] (9) in the n-dimensional feature space F[n]. Designing of the ranked linear model (5) can be performed by the minimization of the (CPL) criterion function F(w[n]) (22).

The ranked relations O_j O_k (3) can be represented in some cases by the relations x_j[n_j] x_k[n_k] (4) between feature vectors from different feature spaces F_j[n_j] (x_j[n_j]F_j[n_j]) and F_k[n_k] (x_k[n_k]F_k[n_k]). The two-stage procedure has been proposed in the paper for the purpose of designing ranked models in such case. During the first stage, the equalization of the feature spaces F_j[n_j] and F_k[n_k] is carry out separately for each relation O_j O_k (j, k) I J_p (2)). The common feature space F_j,k[n_j,k] (12) or F_j,k[n_j,k] (13) with the equalized feature vectors x_i^~[n_j,k] can result from this stage. The common feature space allows to define the differential vectors r_j,k[n_j,k] = x_k^~[n_j,k] - x_j^~[n_j,k] (14). Each differential vector r_j,k[n_j,k] (14) is enlarged to the full n-dimensional vector r_j,k[n]. The enlargement of the vector r_j,k[n_j,k] (14) to r_j,k[n] is done by putting the values zero for all such components of the vector r_j,k[n] which are not represented in the vector r_j,k[n_j,k] (zero-enlargement). Preservation (6) of the relations x_j[n_j,k x_k[n_j,k] by the equalized feature vectors x_j^~[n_j,k] and x_k^~[n_j,k] by the ranked model y_j(k)= (w[n])^Tx_j(k) (30) has been linked to the linear separabilty (8) of the set R[n] (15) composed of the enlarged vectors r_j,k[n].

The ranked models (30) designed on the basis of the feature vectors x_j[n_j] with varied dimensionality n_j can be used among others for the purpose of decision (diagnosis) support. The K-NN decision rule based the point y (31) of unknown origin and the nearest points y_j(k) (30) assigned to particular learning sets C_k (25) can be used for this purpose. Statistical properties of the ranked models (30) based on the ranked relation 'x_j[n_j] x_k[n_k] (2) between feature vectors x_j[n_j] with varied dimensionality n_j need further study.

Acknowledgement: The author would like to thank Professor Jan Bemmel from the Erasmus University in Rotterdam for his intriguing question.

Bibliography

1. Johnson R. A., Wichern D. W.: Applied Multivariate Statistical Analysis, Prentice-Hall,

Inc., Englewood Cliffs, New York, 1991

2. Duda O. R. and Hart P. E., Stork D. G.: Pattern Classification, J. Wiley, New

3. Bobrowski L, Łukaszuk T., Wasyluk H.: Ranked modeling of liver diseases sequence,

European Journal of Biomedical Informatics

4. Bobrowski L.: Ranked modelling with feature selection based on the CPL criterion

functions, in: Machine Learning and Data Mining in Pattern Recognition, eds. P. Perner et al., Lecture Notes in Computer Science vol. 3587, Springer Verlag, Berlin, 2005

5. Bobrowski L.: Eksploracja danych oparta na wypukłych i odcinkowo-liniowych funkcjach

kryterialnych (Data mining based on convex and piecewise linear (CPL) criterion

functions) (in Polish), Białystok Technical University, 2005.

6. Bobrowski L.: Design of piecewise linear classifiers from formal neurons by some basis

exchange technique, Pattern Recognition, 24(9), pp. 863-870, 1991

This work is a part of the Polish - Romanian agreement on Scientific Cooperation between Romanian Academy and Polish Academy of Sciences. The work was partially financed by the KBN grant 3T11F01130, by the grant 16/St/2008 from the Institute of Biocybernetics and Biomedical Engineering PAS, and by the grant W/II/1/2008 from the Białystok University of Technology.

Politica de confidentialitate | Termeni si conditii de utilizare

DISTRIBUIE DOCUMENTUL

Vizualizari: 1405
Importanta:

Comenteaza documentul:

Te rugam sa te autentifici sau sa iti faci cont pentru a putea comenta

Creaza cont nou

Distribuie URL
https://www.scrigroup.com/limba/engleza/87/Ranked-modelling-on-feature-ve85926.php

Adauga cod HTML in site
<a href="https://www.scrigroup.com/limba/engleza/87/Ranked-modelling-on-feature-ve85926.php" target="_blank" title=" - https://www.scrigroup.com/limba/engleza/87/Ranked-modelling-on-feature-ve85926.php">Ranked modelling on feature vectors with missing values</a>