Thursday, December 1, 2016

ECD Graphs? oh, no...

There's a common belief that everything that has been implemented once, in a given programming language X (say C++), can be more easily re-implemented later, in language Y (e.g Java).

Changing programming language for your system is a function of fitting in with a company, or with friends or with fashions. ECD was originally written in Prolog, ported to C++ and then to Java. But it can also be conceptualized using graphs. (there are many hits for "entailment graphs" on the web, should check some of them.)

An ECD Graph comprises three sub-graphs: a semantic graph for the text, a semantic graph for the hypothesis, and a semantic graph of term matches linking hypothesis and text terms. Other classes are constructed on top of an ECD Graph to perform actions like making initial term matches, updating the specificity relations on those matches, and figuring out the entailment relation between text and hypothesis. These operations will update the contents of the ECD Graph.

There is an  EcdGraph class,  initialized with lists of semantic analyses for the text and hypothesis sentences, which get added to the text and hypothesis subgraphs of the EcdGraph.  

On incorporating semantic representations, multiple text sentences are added to the single text graph. Likewise for hypotheses. 
At present it is assumed that the bulk of the lexical lookup will have been done in constructing the semantic representations. However a database of lexical knowledge can also be accessed at inference time and concept details for additional word senses can be introduced by  extra  semantics modules, if desired.

On constructing the initial graph a number of additional data structures are populated, to assist subsequent processing. This includes equivalence classes of coreferent text and hypothesis terms, text and hypothesis terms subject to relative clause restrictions, and determiner properties.

Initial Term Matching
The first operation on an ECD graph is to identify potential matches between hypothesis and text skolem terms. A variety of match types are possible, and these are searched for in a fixed order:
  1. H- and T-terms have the same stem (specificity: equals)
  2. H- and T-terms have the same surface form (occasionally the parser's stemming is messed up) (specificity: equals)
  3. H- and T-terms have at least one sense that belongs to the same cluster of senses (specificity: equals)
  4. H- and T-terms have at least one sense embedded under a concept that is in a sub/superclass of the other (specificity: depends on concept relation)
  5. H-term has a sense that matches the sense on a lexically derived T-term (specificity: equals)
  6. H-term has a sense that has a cluster-sense shared with  the sense on a lexically derived T-term (specificity: equals)
  7. H-term has a sense that has a concept related to a concept for  the sense on a lexically derived T-term (specificity: depends on concept relation)
  8. H-term has a lexically derived sense that is a sense or cluster match with a surface T-term.

hmm, it would be good to have examples here,  wouldn't it?

While multiple matches for a hypothesis term are allowed at any one level, if a hypothesis term is matched at an earlier level it is not considered for matching at later levels; e.g. a stem match for a hypothesis term precludes searching for cluster-sense matches for that term.

All initial term matches found lead to a new match edge being added to the ECD graph, with the specificity direction of the match (equals, sub-class, super-class, disjoint) being included on the edge.

Once the initial matches are found, they are closed under co-reference chains. That is, if H1 is matched with T1, and H1 is coreferent with H2 and T1 coreferent with T2, then new matches should be added for [H1,T2], [H2, T1], [H2, T2].
(don't you love commuting squares and don't you miss latex in blogger? I know I do!)

Updating Match Specificity

Having made initial term matches between lexical items, it remains to be seen how the specificity of these matches is affected by other modifying terms on both the text and hypothesis side. For example, while "dog" is a sub-class of "mammal", it is not a sub-class of "white mammal" (not all dogs are white). Specificity updates are recorded on the match edges, along with the justifications for those updates.  

Specificity updating considers each match in turn. 

Consider a match [H, T]. If terms H and T both have no further nodes dependent on them in their respective role graph, then the initial term match remains unchanged. If H has additional dependents but T does not, then the hypothesis side of the match becomes more specific. If the H term was already more specific than the T term (subclass), then the match remains as (even more) specific. If the original match was one of equality, it now becomes one of being more specific. If the original class had H as more general than T (superclass), making this match more specific leads to the specificity of the match becoming undetermined (none). Likewise, but in the opposite direction if T has further dependents but H does not.

What happens when both H and T have dependents, Hd and Td? 

First we need to see if all matches for Hd and/or Td are completely updated. If not, defer judgement until they all are. For all hypothesis dependents Hd, suppose that Hd has a (fully updated) match with Tm. 

First find out whether Tm is a direct or indirect dependent of T, or in fact is non dependent. If it is not, then H has an unmatched dependent, and the [H,T] match is made more specific. Otherwise the [h,T] match is updated with the specificity of the [Hd, Tm] match. Once all the Hd's have been considered, are there any Td's that have not been mopped up by one of the Hd matches?
If so, then the T term becomes more specific, so the [H,T] match becomes more general.

Sometimes there will be more than one possible match for an Hd term. In this case, we have to make copies of the [H,T] match, and separately explore the match updates according to the various Hd matches.

Scoring Role Paths
When looking for Tm's matched with Hd's, the Hd is generally a direct dependent of T. But Tm can be anything that is linked to T, either via a directed path (i.e. direct or indirect modifier of T), or via an undirected path (i.e. there is some kind of possibly distant relation between T and Tm). These undirected paths will not necessarily capture a coherent connection between T and Tm, still less one that parallels the connection between H and Hd. Nonetheless these paths are allowed to pass. The hypothesis and text paths pairs for each Hd match are recorded on the match edge when its specificity is updated. The eventual idea is to record these role path pairs, and try to learn which kinds of pairing are reasonable, and which are not.

To this end, there is the stub of a path score included in the specificity update. This assigns a cost to each path pairing, and defines a threshold cost beyond which pairs should be discounted. At present the cost is a combination of (a) the difference in length of hypothesis and text paths, and (b) whether an undirected text path jumps from one sense of a word into another.
Relative Clauses and Determiners
Once a match is completely updated, some more work still remains to be done. Part of this is dealing with relative clause like constructions. Relative clauses like "the dog that barked" are odd in that the noun is restricted by the head of the relative clause, but the noun is at the same time an argument/modifier of the relative clause. Up to this point, only the latter relation has been considered in updating specificity.

Suppose that on the hypothesis side some term has a relative clause restriction. If there is a matching relative clause restriction on the other side, all is well. Likewise if whatever matches the H-restriction is a direct or indirect modifier of the T-term (in which case the [H,T] match is updated with the specificity of the restriction match. If the H-restriction is not accounted for in this way, the [H,T] match is made more specific.

After all the H-restrictions have been dealt with, we look to see if there any remaining unmatched T-restrictions. If the T-restriction matches a hypothesis term that has H as a descendant, then update [H,T] with the specificity of this match. Otherwise there is an unmatched T-restriction, and so [H,T] is made more general.


Once the effects of relative clauses have been taken into account, we need to deal with the effect of determiners. 
Determiners can flip the specificity of a match. For example, while "a dog" is more general than "a black dog", "every dog" is more specific than "every black dog". Determiners can have different monotonicity properties on their restrictions and bodies:
  • + +       some, at least N:                some old men are keen gardeners  |= some men are gardeners
  • - +       all, every:                            all men are keen gardeners  |= all old men are keen gardeners  |= all old men are gardeners
  • - -        no, at most N:                     no men are gardeners |= no old men are keen gardeners
  • ? ?       exactly N:                           exactly 3 old men are keen gardeners  ?? exactly 3 old men are gardeners ?? exactly 3 men are gardeners
  • ? +       most,many                         most men are keen gardeners |= most men are gardeners
  • ? -        few:                                   few men are gardeners |= few men are keen gardeners
It turns out that all determiners that are downward monotone in their second argument ("no", "few") can be treated as a negated version of a determiner that is upward monotone in the second argument (no = not some, few = not many). The effects of this are captured by including a negation in the semantic graphs containing these determiners. Monotonicity entailments in the first argument are dealt with through a determiner table in Java that captures such facts as "all <subclass> |= some <superclass>", e.g. "all dirty water |= some water". This table is used to update the specificity on matches for terms with the corresponding determiners in the property graph.

An agenda of matches to be updated is constructed, prioritizing matches of terms that have least dependents. As matches become fully updated, they are pulled off the agenda. When an Hd has more than one match, new copies of the [H,T] match are added to the graph and to the agenda. The agenda is repeatedly processed until it either becomes empty or does not change. Also, the determiner entailments table is incomplete. Also, copying matches when an Hd has more than one match can sometimes lead to multiple [H,T] matches with the same specificity. When this happens, the higher scoring matches are removed.

Determining Entailment Relations
Once all the term matches have been fully updated according to the structure of the text and hypothesis graphs, the entailment relation between the two graphs can be determined. It is at this point that the context structure of the hypothesis and text graphs becomes important.

First up, we look for any contradictions. How does a contradiction occur? A contradiction requires one term in a positive context being matched with a term in a negative context. Specifically, the two terms must be the context heads of their respective contexts. If the hypothesis term is positive and more or equally specific than the negative text term (a (young) man :: no man), then a contradiction occurs. If it more general (a man :: no young man), then no determination can be made. Likewise, if the hypothesis term is in a negative context and the positive text term is equally or more specific, then a contradiction occurs.

If there are no contradictions, then a search is made for entailments. This looks for the highest level, non-vacuous hypothesis term(s). 
If it occurs in a positive context, then it needs to match a text term in a positive context that is as or more specific. If it is in a negative context, it must match a negative text term that is as or more general.

If an entailment is not found, the relation is neutral. In the case of neutrality, there are some weaker guesses at contradiction and entailment that consider the specificity of the initial term matches rather than the finally updated specificity.

Ideas for Further Work
A number of further things should be explored
  • (we use a readymade clustering) Using Brown clustering for another form of initial term match in ECD
  • Using word sense disambiguation to cut down on the number of ridiculous entailment derived from incorrect word sense choices
  • Explore running ECD off lightly modified (how much? for what?) syntactic dependencies (i.e. with context structure and lexical information included, but without going through all of the transfer semantics)
  • Adding ontological  data to the database server (SUMO, OpenCyc, ...)
  • Learning path scores for the path scorer from training on labeled data.

 The picture is from Liz Copock's project page
Channels of Meaning: How information can be expressed without being said