Linear Discriminative Learning

Linear Discriminative Learning (LDL) [1] was developed against the backgroud of Naive Discriminative Learning (NDL) [2]. NDL is based on the Rescorla-Wagner learning rule [3]. The learning rule updates associations between cues (e.g., word forms) and outcomes (e.g., meanings) incrementally, based on cooccurrences of cues and outcomes. Incrementally learned associations can asymptote an equilibrium, where association strengths stay almost constant with (almost) no more updates. Such an equilibrium state can theoretically be estimated without incrementally learning associations. The “endstate-of-learning” of the Rescorla-Wagner learning rule is the Danks equation [4].

NDL only accepts binary inputs and outputs. Cues or outcomes are present (1) or absent (0). LDL loosenes this constrainty and generalizes NDL so that cues and outcomes can also take real values. For the current implementation, LDL adopts the real-value counterpart of the Danks equation. In other words, LDL estimates the equilibrium state of associations between cues and outcomes at once, without incrementally learning the associations. The method of estimating the equilibrium associations is mathematically equivalent to multivariate regression, where multiple continuous predictors and response variables are accepted. For more detail, see [1] and [5].

To estimate associations (or weight matrices) between cues and outcomes, LDL requires two matrices. One is a C-matrix (i.e., \(\mathbf{C}\)), which can also be called a form matrix or a cue matrix. \(\mathbf{C}\) has words as rows and sublexical units (e.g., triphones) as columns. Each row represents a form vector of a word. In the current implementation, each form vector is coded 1 where the triphone is contained in the word and 0 otherwise.

With discriminative_lexicon_model, you can create a \(\mathbf{C}\) from a list of words by using discriminative_lexicon_model.mapping.gen_cmat.

>>> import discriminative_lexicon_model.mapping as pmap
>>> words = ['walk','walked','walks']
>>> cmat  = pmap.gen_cmat(words)
>>> cmat
<xarray.DataArray (word: 3, cues: 9)>
array([[ True,  True, False, False, False,  True, False, False,  True],
       [ True,  True,  True,  True, False, False,  True, False,  True],
       [ True,  True, False, False,  True, False, False,  True,  True]])
Coordinates:
  * word     (word) <U6 'walk' 'walked' 'walks'
  * cues     (cues) <U3 '#wa' 'alk' 'ed#' 'ked' 'ks#' 'lk#' 'lke' 'lks' 'wal'

The other matrix LDL requires is a S-matrix (i.e., \(\mathbf{S}\)). \(\mathbf{S}\) can also be called a meaning matrix or an outcome matrix. \(\mathbf{S}\) also has words as rows as \(\mathbf{C}\), but \(\mathbf{S}\)’s columns are semantic dimensions. Therefore, rows of \(\mathbf{S}\) can be understood as semantic vectors of words.

While \(\mathbf{S}\) can be obtained by embedding techniques such as word2vec, discriminative_lexicon_model offers a way of approximating words’ semantic vectors by those words’ inflectional information. The semantic vectors created in this method are called “simulated semantic vectors” [6].

>>> import pandas as pd
>>> infl = pd.DataFrame({'Word':['walk','walked','walks'], 'Lemma':['walk','walk','walk'], 'Tense':['PRES','PAST','PRES']})
>>> smat = pmap.gen_smat_sim(infl, dim_size=5)
>>> smat.round(2)
<xarray.DataArray (word: 3, semantics: 5)>
array([[ 0.75,  1.25,  0.39, -4.41, -0.12],
       [-1.68,  0.6 , -0.  , -3.55, -2.23],
       [-2.77,  0.71, -0.48, -2.76,  0.15]])
Coordinates:
  * word       (word) <U6 'walk' 'walked' 'walks'
  * semantics  (semantics) <U4 'S000' 'S001' 'S002' 'S003' 'S004'

Simulated semantic vectors were explained in [6] as the sum of the pertinent random normal vectors corresponding to the lemma and morphological features. For example, a semantic vector for “walks” is created by taking the sum of random normal vectors for “WALK” (lemma), third-person, and singular.

This method is implemented with two matrices: \(\mathbf{M}\) and \(\mathbf{J}\). The rows of \(\mathbf{M}\) are words and its columns are morphological features. Therefore, \(\mathbf{M}\) encodes which morphological features each word has. \(\mathbf{J}\) has morphological features as rows and semantic dimensions as columns. Therefore, rows of \(\mathbf{J}\) are randomly-generated semantic vectors for each morphological feature. Simulated semantic vectors are obtained for words by multiplying them:

\[\mathbf{S}_{\text{sim}} = \mathbf{MJ}\]

\(\mathbf{M}\) and \(\mathbf{J}\) can be obtained in discriminative_lexicon_model with discriminative_lexicon_model.mapping.gen_mmat and discriminative_lexicon_model.mapping.gen_jmat. They are used internally in discriminative_lexicon_model.mapping.gen_smat_sim.

Now that we have \(\mathbf{C}\) and \(\mathbf{S}\), we can “learn” the associations between them. The associations, or weight matrices, between them are called \(\mathbf{F}\) and \(\mathbf{G}\). These two weight matrices are mathematically obtained as below [1]:

\[ \begin{align}\begin{aligned}\mathbf{CF} = \mathbf{S}\\\mathbf{C^{T}CF} = \mathbf{C^{T}S}\\\mathbf{(C^{T}C)^{-1}C^{T}CF} = \mathbf{(C^{T}C)^{-1}C^{T}S}\\\mathbf{IF} = \mathbf{(C^{T}C)^{-1}C^{T}S}\\\mathbf{F} = \mathbf{(C^{T}C)^{-1}C^{T}S}\end{aligned}\end{align} \]

\[ \begin{align}\begin{aligned}\mathbf{SG} = \mathbf{C}\\\mathbf{S^{T}SG} = \mathbf{S^{T}C}\\\mathbf{(S^{T}S)^{-1}S^{T}SG} = \mathbf{(S^{T}S)^{-1}S^{T}C}\\\mathbf{IG} = \mathbf{(S^{T}S)^{-1}S^{T}C}\\\mathbf{G} = \mathbf{(S^{T}S)^{-1}S^{T}C}\end{aligned}\end{align} \]

In discriminative_lexicon_model, \(\mathbf{F}\) and \(\mathbf{G}\) can be obtained with discriminative_lexicon_model.mapping.gen_fmat and discriminative_lexicon_model.mapping.gen_gmat:

>>> fmat = pmap.gen_fmat(cmat, smat)
>>> fmat.round(2)
<xarray.DataArray (cues: 9, semantics: 5)>
array([[-0.  , -0.  , -0.  , -0.  , -0.  ],
       [-0.  , -0.  , -0.  , -0.  , -0.  ],
       [-0.56,  0.2 , -0.  , -1.18, -0.74],
       [-0.56,  0.2 , -0.  , -1.18, -0.74],
       [-1.39,  0.35, -0.24, -1.38,  0.07],
       [ 0.75,  1.25,  0.39, -4.41, -0.12],
       [-0.56,  0.2 , -0.  , -1.18, -0.74],
       [-1.39,  0.35, -0.24, -1.38,  0.07],
       [-0.  , -0.  , -0.  , -0.  , -0.  ]])
Coordinates:
  * cues       (cues) <U3 '#wa' 'alk' 'ed#' 'ked' 'ks#' 'lk#' 'lke' 'lks' 'wal'
  * semantics  (semantics) <U4 'S000' 'S001' 'S002' 'S003' 'S004'

>>> gmat = pmap.gen_gmat(cmat, smat)
>>> gmat.round(2)
<xarray.DataArray (semantics: 5, cues: 9)>
array([[-0.11, -0.11, -0.03, -0.03, -0.27,  0.19, -0.03, -0.27, -0.11],
       [ 0.06,  0.06, -0.06, -0.06,  0.05,  0.08, -0.06,  0.05,  0.06],
       [-0.01, -0.01,  0.03,  0.03, -0.08,  0.04,  0.03, -0.08, -0.01],
       [-0.23, -0.23, -0.01, -0.01, -0.05, -0.17, -0.01, -0.05, -0.23],
       [ 0.02,  0.02, -0.43, -0.43,  0.29,  0.15, -0.43,  0.29,  0.02]])
Coordinates:
  * semantics  (semantics) <U4 'S000' 'S001' 'S002' 'S003' 'S004'
  * cues       (cues) <U3 '#wa' 'alk' 'ed#' 'ked' 'ks#' 'lk#' 'lke' 'lks' 'wal'

\(\mathbf{F}\) has cues as its rows and semantics as its columns. It can be used to predict words’ meanings based on the words’ forms. Namely:

\[\mathbf{CF} = \mathbf{\hat{S}}\]

\(\mathbf{\hat{S}}\) is a predicted semantic matrix (or semantic vectors). Since this equation represents the process to infer meanings based on forms, it can be understood conceptually as the comprehension process of language.

In discriminative_lexicon_model, you can use discriminative_lexicon_model.mapping.gen_shat for this purpose:

>>> shat = pmap.gen_shat(cmat=cmat, fmat=fmat)
>>> shat.round(2)
<xarray.DataArray (word: 3, semantics: 5)>
array([[ 0.75,  1.25,  0.39, -4.41, -0.12],
       [-1.68,  0.6 , -0.  , -3.55, -2.23],
       [-2.77,  0.71, -0.48, -2.76,  0.15]])
Coordinates:
  * word       (word) <U6 'walk' 'walked' 'walks'
  * semantics  (semantics) <U4 'S000' 'S001' 'S002' 'S003' 'S004'

In fact, you do not have to produce \(\mathbf{F}\), if you are only interested in producing \(\mathbf{\hat{S}}\). You can directly estimate \(\mathbf{\hat{S}}\) from \(\mathbf{C}\) and \(\mathbf{S}\) with discriminative_lexicon_model.mapping.gen_shat:

>>> shat = pmap.gen_shat(cmat=cmat, smat=smat)
>>> shat.round(2)
<xarray.DataArray (word: 3, semantics: 5)>
array([[ 0.75,  1.25,  0.39, -4.41, -0.12],
       [-1.68,  0.6 , -0.  , -3.55, -2.23],
       [-2.77,  0.71, -0.48, -2.76,  0.15]])
Coordinates:
  * word       (word) <U6 'walk' 'walked' 'walks'
  * semantics  (semantics) <U4 'S000' 'S001' 'S002' 'S003' 'S004'

Similarly to \(\mathbf{F}\), \(\mathbf{G}\) is also used to produce predicted form matrix/vectors (\(\mathbf{\hat{C}}\)) as below. The equation can be understood conceptually as the production process of language.

\[\mathbf{SG} = \mathbf{\hat{C}}\]

In discriminative_lexicon_model, \(\mathbf{\hat{C}}\) is obtained by discriminative_lexicon_model.mapping.gen_chat.

>>> chat = pmap.gen_chat(smat=smat, gmat=gmat)
>>> chat.round(2)
<xarray.DataArray (word: 3, cues: 9)>
array([[ 1.,  1.,  0.,  0., -0.,  1.,  0., -0.,  1.],
       [ 1.,  1.,  1.,  1.,  0., -0.,  1.,  0.,  1.],
       [ 1.,  1., -0., -0.,  1., -0., -0.,  1.,  1.]])
Coordinates:
  * word       (word) <U6 'walk' 'walked' 'walks'
  * cues       (cues) <U3 '#wa' 'alk' 'ed#' 'ked' 'ks#' 'lk#' 'lke' 'lks' 'wal'

>>> chat = pmap.gen_chat(smat=smat, cmat=cmat)
>>> chat.round(2)
<xarray.DataArray (word: 3, cues: 9)>
array([[ 1.,  1.,  0.,  0., -0.,  1.,  0., -0.,  1.],
       [ 1.,  1.,  1.,  1.,  0., -0.,  1.,  0.,  1.],
       [ 1.,  1., -0., -0.,  1., -0., -0.,  1.,  1.]])
Coordinates:
  * word       (word) <U6 'walk' 'walked' 'walks'
  * cues       (cues) <U3 '#wa' 'alk' 'ed#' 'ked' 'ks#' 'lk#' 'lke' 'lks' 'wal'