conceptual vectors

operators
and functions

Generality

Vector. a vector V is a n-uple. Each value are mathematicaly defined between 0 and 1.

Practicaly, each component v_i is defined between 0 and encoding-size. Note that the choice of the encoding-size is theorically without consequence as for any interval of R there is a bijection between [0, 1]. But, from a practical point of view, we have a tradeoff between the discrete set cardinal and representation size. In our prototype, we set the encoding-size to 2¹⁵.

When a vector has a mathematical norm equal to 1, then the implementation norm is equal to encoding-size.

n is the dimension of V, i.e. the number of concepts of which is defined.

Star metaphor. When meaningful, we will use the star metaphor to illustrate our concepts and function. This metaphor often guided us in the naming process of functions. Basically the metaphor is the following :

A meaning is a point in the sence space, and a vector is the coordinate set of this meaning.

A star if a know word as found in dictionaries. It is a point that is visible that that "attracts" attention. Its position of defined by its meanings that are not visible (but can be guessed).

The space contains constellations of words. Some part of the space are dense with stars, some other are quite empty.

The origin point (all coordinates equal to 0) if the orign of the space. Unless otherwise, star are viewed from this point of view and as such can be compared by angular distance (but also by euclidian or other distance functions).

Basic Properties

Norm. The implementation norm of vector V.

Norm(V) = sqrt(v₁² + … + v_n²).

Mass. The Mass is the mathematical norm of V usually noted |V|.

Mass(V) = Norm(V)/encoding-size

If the mass is equal to 1, then V is normalized. Maybe otherwise valued especially with term to term product. A high mass will "weight" more in vector operations and tends attract more other meanings.

Dim. The dimension of vector V, i.e the value of n.

Basic Vector operations

Normalization. V = N(X) is the normalization function of a vector X. The resulting vector V is normalized (its implementation norm = 1 * encoding-size). Each v_i of V is equal to x_i/|X|encoding-size.

Sum. V = X₁ + … + X_m is the generalized sum between m vectors X₁, … X_m. A component of V noted as v_i is equal to x₁ + … + x_n. Generally the result V is normalized, so we have :

V =N(X₁ + … + X_m)

Product. V = X₁ … * X_m is the generalized term to term product between n vectors. A component of V noted as v_i is equal to x₁ * … * x_n. Generally the result V is NOT normalized as Mass(sqrtⁿ(V)) is generally a good indicator of similarity.

n-Square-Root. V = sqrtⁿ(X) is the nth square root applied to each term of X. This is the usual normalizing function applied to term to term product result. The norm may be < 1. A component of V noted as v_i is equal to sqrtⁿ(x_i).

n-Power. V = powⁿ(X) is the nth power applied to each component of X. A generalisation of the function above. Obviously, we have pow^1/n(X) = sqrtⁿ(X). By definition, if n = 0, all terms are set to 1 (even if the term is equal to 0, although 0⁰ is not defined).

Amplification. V = Ampⁿ(X) is the signed n^th power applied of each component of X. A component of V noted as v_i is equal to (x_i)ⁿ* sign(x_i). [note that sign(x) = -1 iff x < 0, sign(x) = 1 iff x > 0, and sign(x) = 0 iff x = 0.]

Diff. Difference V = X - Y is the difference between 2 vectors. A component of V noted as v_i is equal to x_i - y_i. The resulting V is generaly normalized, so in practice we have :

V =N(X - Y)

DDiff. Dotted Difference V = X \ Y is the dotted difference between 2 vectors. A component of V noted as v_i is equal to x_i - y_i if x₁ - y₂ > 0 and 0 otherwise. The resulting V is generaly normalized, so in practice we have :

V =N(X \ Y)

Comp. Complementary V = C(X) is the complementary function of a vector. A component of V noted as v_i is equal to (encoding-size - x_i). The resulting vector V' is generaly normalized, so in practice we have :

V =N(C(X))

Statistical functions

Mean. The arithmetic mean between v_i. Mean(V) = (v₁ + … + v_n)/n

Var. Variance of v_i. With a flat vector V we have Var(V) = 0.

SD. Standard deviation between v_i. SD(V) = sqrt(Var(V)).

VC. Variation Coefficent = SD/Mean. Meaningfull when all v_i are > 0.

Interesting value is 1 (?). The higher the value, the more "conceptual" is the vector. A good indicator to hyperonymy. Precisely, we have :

If X is hyperonym of Y then we (statistically) have VC(X) > VC(Y).

Naturally, the reverse if also true : If X is hyponym of Y then we (statistically) have VC(X) < VC(Y).

We DO NOT HAVE the reciprocal : if VC(X) > VC(Y) then …

Other Vector functions

Inter. (Intersection function) V = Inter(X, Y) :

V = sqrt²(X_Y)

Gamma. (Contextualization function) V = C(X, Y) :

V = N(X+ Inter(X, Y))

Anti-Gamma. (Anti-Contextualization function) V = AntiC(X, Y) :

V = N(X— Inter(X, Y))

Note : in both cases (C and AntiC), the value of V is basically X plus (or minus) the intersection of X and Y. The value Mass(Inter(X, Y)) is the determining factor (this mass is always between 0 and 1).

Request computing methods

A request is a set of words w₁, … w_p. To each word is associated a vector X_i as found in the lexicon.

Sum. The resulting vector is the center of gravity of all given vectors.

V = Sum(X₁, … , X_p)
= N(X₁ + … + X_p)

Sum-product. The resulting vector is the center of gravity plus the common intersection.

V = SumP(X₁, … +, X_p)
= N( N(X₁ + … + X_p) + N(sqrt^p(X₁ … * X_p)) )

Static Resonance. The request vector is the normalized sum of the application of the resonance function between each vector and the other.

V = StaticRes_m(X₁, … , X_n))
= N( Res(X₁ , X₂+ … + X_n)) + … + Res(X_n , X₁+ … + X_n-1)) )

This function is static (opposed to dynamic) because only V is adjusted and X_i are fixed.The static resonance function Res as computed for step n is defined recursively as follows :

step 0 : Res₀(Y, X₁ , … , X_p) = V

step n+1 : Res_n+1(Y, X₁ , … , X_p) = Res_n(Y, X₁ , … , X_p) + sqrt²(Res_n(Y, X₁ , … , X_p)X₁) + … sqrt²(Res_n(Y, X₁ , … , X_p)X_p).

Dynamic Resonance. Static resonance (m steps) is applied n times for each X.

V = DynRes_m(X₁, … , X_p))

step 0 : v_i⁰ = StaticRes_m(v_i, v₁ , … , v_i-1, v_i+1, v_p)

step n+1 : v_iⁿ⁺¹ = StaticRes_m(v_iⁿ, v₁ⁿ , … , v_i-1ⁿ, v_i+1ⁿ, v_pⁿ)

Both Resonance functions are deemed to be convergent.

last update17 april 2001

mathieu lafourcade LIRMM - 161, rue ADA - 34392 Montpellier Cedex 5 - France - Tél : (33) 04 67 41 85 71 - Fax : (33) 04 67 41 85 00 - courriel : lafourca@lirmm.fr