Diapositive 1

Objectives


	Semantic Analysis
			Word Sense Disambiguation
			Text Indexing in IR
			Lexical Transfer in MT
	Conceptual vector
			Reminiscent of Vector Models (Salton, Sowa, LSI)
			Applied on pre-selected concepts (not terms)
			Concepts are not independent
	Propagation
			on morpho-syntactic tree (no surface analysis)

Conceptual vectors


	An idea
		= a combination of concepts = a vector
	The Idea space
		= vector space
	A concept
		= an idea = a vector
		= combination of itself + neighborhood
	Sense space
		= vector space + vector set

Conceptual vectors


	Annotations
		Helps building vectors
		Can take the form of vectors

	Set of k basic concepts Ń example
		Thesaurus Larousse = 873 concepts
		A vector = a 873 uple
		Encoding for each dimension C = 2¹⁵

Vector construction
Concept vectors


	H : Thesaurus hierarchy
	V(C_i) : <a₁, É, a_i, É , a_n>
		a_j = 1/ (2 ** D_um(H, i))

Vector construction
Concept vectors


	C : mammals
		L4 : zoologie, mammals, birds, fish, É
		L3 : animals, plants, living beings
		L2 : É , time, movement, matter, life , É ,
		L1 : the society, the mankind, the world

Vector construction
Concept vectors

Vector construction
Term vectors


Example : cat
	Kernel
		c:mammal, c:stroke
		nmammal + nstroke

	Augmented with weights
		c:mammal, c:stroke, 0.75c:zoology, 0.75c:love É
		nzoology + nmammal + 0.75 nstroke + 0.75 nlove É

	Iteration for neighborhood augmentation

Vector construction
Term vectors

Vector space


		Basic concepts are not independent
		Sense space
			= Generator Space of a real kŐ vector space (unknown)
			= Dim kŐ Ł k
		Relative position of points

Conceptual vector distance


	Angular Distance D_A(x, y) = angle (x, y)
		0 Ł D_A(x, y) Ł p
		if 0 then colinear - same idea
		if p/2 then nothing in common
		if p then D_A(x, -x) with -x as anti-idea of x

Conceptual vector distance


Distance = acos(similarity)
	D_A(x, y) = acos(x.y/\|x\|\|y\|))

		D_A(x, x) = 0
		D_A(x, y) = D_A(y, x)
		D_A(x, y) + D_A(y, z) ³ D_A(x, z)
		D_A(0, 0) = 0 and D_A(x, 0) = p/2 by definition

		D_A(ax, by) = D_A(x, y) with ab > 0
		D_A(ax, by) = p - D_A(x, y) with ab < 0
		D_A(x+x, x+y) = D_A(x, x+y) Ł D_A(x, y)

Conceptual vector distance


	Example
		D_A(tit, tit) = 0
		D_A(tit, passerine) = 0.4
		D_A(tit, bird) = 0.7
		D_A(tit, train) = 1.14

		D_A(tit, insect) = 0.62

		tit = kind of insectivorous passerine É

Conceptual lexicon


	Set of (word, vector) = (w, n)*

	Monosemy
		word
		1 meaning
		1 vector

		(w, n)

Conceptual lexicon
Polyseme building


	Polysemy
		word
		n meanings
		n vectors

		{(w, n), (w.1, n₁) É (w.n, n_n) }

Conceptual lexicon
Polyseme building


	n(w) = Ś n(w.i) = Ś n.i

	bank :
		bank.1: Mound
		bank.3: River border, É
		bank.2: Money institution
		bank.3: Organ keyboard
		bank.4: É

Conceptual lexicon
Polyseme building


	n(w) = classification(w.i)

Lexical scope


	LS(w) = LS_t(t(w))
		LS_t(t(w)) = 1 if t is a leaf
		LS_t(t(w)) = (LS(t₁) + LS(t₂)) /(2-sin(D(t(w)))
		otherwise
	n(w) = n_t(t(w))
		n_t(t(w)) = n(w) if t is a leaf
		n_t(t(w)) = LS(t₁)n_t(t₁) + LS(t₂)n_t(t₂)
		otherwise

Vector Statistics


	Norm (N)
		[0 , 1] * C (2¹⁵=32768)
	Intensity (I)
		Norm / C
		Usually I = 1
	Standard deviation (SD)
		SD²= variance
		variance = 1/n * Ś(x_i - m)² withm as the arith mean

Vector Statistics


Variation coefficient (CV)
	CV = SD / mean
		No unity - Norm independent
		Pseudo Conceptual strength
		If A Hyperonym B Þ CV(A) > CV(B)
		(we donŐt have † )

	vector Ç fruit juice Č (N)
			MEAN = 527, SD = 973 CV = 1.88
	vector Ç drink Č (N)
			MEAN = 443, SD = 1014 CV = 2.28

Vector operations


	Sum
		V = X + Y Þ v_i = x_i + y_i
		Neutral element : 0
		Generalized to n terms : V = Ś V_i
		Normalization of sum : v_i /\|V\|* c

Vector operations


	Term to term product
		V = X € Y Þ v_i = x_i * y_i
		Neutral element : 1
		Generalized to n terms V = Í V_i

Vector operations


Amplification
	V = X ^ n Þ v_i = sg(v_i) * \|v_i\|^ n
		… V = V ^ 1/2 and ⁿ… V = V ^ 1/n
		V € V = V ^ 2 if " v_i³ 0

	Normalization of ttm product to n terms V = ⁿ… Í V_i

Vector operations


	Product + sum
		V = X € Y = ( X € Y ) + X + Y
		Generalized n terms : V = ⁿ… Í V_i + Ś V_i
		Simplest request vector computation in IR

Vector operations


	Subtraction
		V = X - Y Þ v_i = x_i - y_i
	Dot subtraction
		V = X × Y Þ v_i = max (x_i - y_i, 0)
	Complementary
		V = C(X) Þ v_i = (1 - x_i/c) * c
	etc.

Intensity Distance


Intensity of normalized ttm product
	0 Ł I(… (X € Y)) Ł 1 if \|x\| = \|y\| = 1
	D_I(X, Y) = acos(I(… X € Y))
	D_I(X, X) = 0 and D_I(X, 0) = p/2

		D_I(tit, tit) = 0 (D_A = 0)
		D_I(tit, passerine) = 0.25 (D_A = 0.4)
		D_I(tit, bird) = 0.58 (D_A = 0.7)
		D_I(tit, train) = 0.89 (D_A = 1.14)
		D_I(tit, insect) = 0.50 (D_A = 0.62)

Relative synonymy


	Syn_R(A, B, C) Ń C as reference feature


		Syn_R(A, B, C) = D_A(A€C, B€C)


		D_A(coal,night) = 0.9
		Syn_R(coal, night, color) = 0.4
		Syn_R(coal, night, black) = 0.35

Relative synonymy


		Syn_R(A, B, C) = Syn_R(B, A, C)
		Syn_R(A, A, C) = D (A € C, A € C) = 0

		Syn_R(A, B, 0) = D (0, 0) = 0
		Syn_R(A, 0, C) = p/2

		Syn_A(A, B) = Syn_R(A, B, 1)
		= D (A € 1, B € 1)
		= D (A, B)

Subjective synonymy


	Syn_S(A, B, C) Ń C as point of view
	Syn_S(A, B, C) = D(C-A, C-B)

	0 Ł Syn_S(A, B, C) Ł p
	normalization:
	0 Ł asin(sin(Syn_S(A, B, C))) Ł p/2

Subjective synonymy


		When \|C\| ¨ ´ then Syn_S(A, B, C) ¨ 0

		Syn_S(A, B, 0) = D(-B, -A) = D(A, B)
		Syn_S(A, A, C) = D(C-A, C-A) = 0
		Syn_S(A, B, B) = Syn_S(A, B, A) = 0

		Syn_S(tit, swallow, animal) = 0.3
		Syn_S(tit, swallow, bird) = 0.4
		Syn_S(tit, swallow, passerine) = 1

Semantic analysis


	Vectors propagate on syntactic tree

Semantic analysis


	Initialization - attach vectors to nodes

Semantic analysis


	Propagation (up)

Semantic analysis


	Back propagation (down)
	n(N_{i j}) = (n(N_{i j}) € n(N_i)) + n(N_{i j})

Semantic analysis


	Sense selection or sorting

Sense selection


Recursive descent
	on t(w) as decision tree
	D_A(nŐ, n_i)





		Stop on a leaf
		Stop on an internal node

Vector syntactic schemas


	S: NP(ART,N)
		n(NP) = V(N)

	S: NP1(NP2,N)
		n(NP1) = a n(NP1)+ n(N) 0<a<1

		n(sail boat) = n(sail) + 1/2 n(boat)
		n(boat sail) = 1/2 n(boat) + n(sail)

Vector syntactic schemas


Not necessary linear
S: GA(GADV(ADV),ADJ)
	n(GA) = n(ADJ)^p(ADV)

	p(very) = 2
	p(mildly) = 1/2

		n(very happy) = n(happy)^2
		n(mildly happy) = n(happy)^1/2

Iteration & convergence


	Iteration with convergence

		Local
		D(n_i, n_i+1) Ł e for top n

		Global
		D(n_i, n_i+1) Ł e for all n

Lexicon construction


	Manual kernel
	Automatic definition analysis
	Global infinite loop = learning
	Manual adjustments

Application
machine translation


Lexical transfer
	n _source ¨ n _target
	Knn search that minimizes D_A(n_source, n_target)

	Submeaning selection
		Direct
		Transformation matrix

Application
Information Retrieval on Texts



Textual document indexation
	Language dependant
Retrieval
	Language independent - Multilingual
Domain representation
		horse Ç equitation
Granularity
		Document, paragraphs, etc.

Application
Information Retrieval on Texts


	Index = Lexicon = (d_i,n_i )*







	Knn search that minimizes D_A(n(r), n(d_i))

Search engine
Distances adjustments



Min D_A(n(r), n(d_i)) may pose problems

Especially with small documents
	Correlation between CV & conceptual richness
	Pathological cases
		Ç plane Č and Ç plane plane plane plane É Č
		Ç inundation Č Ç Ç blood Č D = 0.85 (liquid)

Search engine
Distances adjustments



	Correction with relative intensity
		Request vs retrieved doc (n_r and n_d)

		D = … (D_A(n_r , n_d) * D_I(n_r , n_d))


		0 Ł I(n_r , n_d) Ł 1 ¨ 0 Ł D_I(n_r , n_d) Ł p/2

Conclusion


	Approach
		statistical (but not probabilistic)
		thema (and rhema ?)
	Combination of
		Symbolic methods (IA)
		Transformational systems
	Similarity
		Neural nets
		With large Dim (> 50000 ?)

Conclusion


Self evaluation
	Vector quality
	Tests against corpora
Unknown words
	Proper nouns of person, products, etc.
		Lionel Jospin, Danone, Air France
	Automatic learning
Badly handled phenomena?
	Negation & Lexical functions (Meltchuk)

Diapositive 49