There's a common belief that everything that has been implemented once, in a given programming language X (say C++), can be more easily re-implemented later, in language Y (e.g Java).
Changing programming language for your system is a function of fitting in with a company, or with friends or with fashions. ECD was originally written in Prolog, ported to C++ and then to Java. But it can also be conceptualized using graphs. (there are many hits for "entailment graphs" on the web, should check some of them.)
An
ECD Graph comprises three sub-graphs: a semantic graph for the text, a semantic
graph for the hypothesis, and a semantic graph of term matches linking
hypothesis and text terms. Other classes are constructed on top of an ECD Graph
to perform actions like making initial term matches, updating the specificity
relations on those matches, and figuring out the entailment relation between
text and hypothesis. These operations will update the contents of the ECD Graph.
There is an
EcdGraph class,
initialized with lists of semantic analyses for the text and hypothesis
sentences, which get added to the text and hypothesis subgraphs of the
EcdGraph.
On
incorporating semantic representations, multiple text sentences are added to
the single text graph. Likewise for hypotheses.
At present it is assumed that
the bulk of the lexical lookup will have been done in constructing the
semantic representations. However
a database of lexical knowledge can also be accessed at inference time
and concept details for additional word senses can be introduced by
extra
semantics modules, if desired.
On
constructing the initial graph a number of additional data structures are
populated, to assist subsequent processing. This includes equivalence classes
of coreferent text and hypothesis terms, text and hypothesis terms subject to
relative clause restrictions, and determiner properties.
Initial Term
Matching
The
first operation on an ECD graph is to identify potential matches between
hypothesis and text skolem terms. A variety of match types are possible, and
these are searched for in a fixed order:
- H- and T-terms have the same stem (specificity: equals)
- H- and T-terms have the same surface form (occasionally the parser's stemming is messed up) (specificity: equals)
- H- and T-terms have at least one sense that belongs to the same cluster of senses (specificity: equals)
- H- and T-terms have at least one sense embedded under a concept that is in a sub/superclass of the other (specificity: depends on concept relation)
- H-term has a sense that matches the sense on a lexically derived T-term (specificity: equals)
- H-term has a sense that has a cluster-sense shared with the sense on a lexically derived T-term (specificity: equals)
- H-term has a sense that has a concept related to a concept for the sense on a lexically derived T-term (specificity: depends on concept relation)
- H-term has a lexically derived sense that is a sense or cluster match with a surface T-term.
hmm, it would be good to have examples here, wouldn't it?
While
multiple matches for a hypothesis term are allowed at any one level, if a
hypothesis term is matched at an earlier level it is not considered for
matching at later levels; e.g. a stem match for a hypothesis term precludes
searching for cluster-sense matches for that term.
All
initial term matches found lead to a new match edge being added to the ECD
graph, with the specificity direction of the match (equals, sub-class,
super-class, disjoint) being included on the edge.
Once
the initial matches are found, they are closed under co-reference chains. That
is, if H1 is matched with T1, and H1 is coreferent with H2 and T1 coreferent
with T2, then new matches should be added for [H1,T2], [H2, T1], [H2, T2].
(don't you love commuting squares and don't you miss latex in blogger? I know I do!)
Updating Match
Specificity
Specificity
updating considers each match in turn.
Consider a match [H, T]. If terms H and
T both have no further nodes dependent on them in their respective role graph,
then the initial term match remains unchanged. If H has additional dependents
but T does not, then the hypothesis side of the match becomes more specific. If
the H term was already more specific than the T term (subclass), then the match
remains as (even more) specific. If the original match was one of equality, it
now becomes one of being more specific. If the original class had H as more
general than T (superclass), making this match more specific leads to the
specificity of the match becoming undetermined (none). Likewise, but in the
opposite direction if T has further dependents but H does not.
What
happens when both H and T have dependents, Hd and Td?
First we need to see if
all matches for Hd and/or Td are completely updated. If not, defer judgement
until they all are. For all hypothesis dependents Hd, suppose that Hd has a
(fully updated) match with Tm.
First find out whether Tm is a direct or
indirect dependent of T, or in fact is non dependent. If it is not, then H has an unmatched
dependent, and the [H,T] match is made more specific. Otherwise the [h,T] match
is updated with the specificity of the [Hd, Tm] match. Once all the Hd's have
been considered, are there any Td's that have not been mopped up by one of the
Hd matches?
If so, then the T term becomes more specific, so the [H,T] match
becomes more general.
Sometimes
there will be more than one possible match for an Hd term. In this case, we
have to make copies of the [H,T] match, and separately explore the match
updates according to the various Hd matches.
Scoring
Role Paths
When
looking for Tm's matched with Hd's, the Hd is generally a direct dependent of
T. But Tm can be anything that is linked to T, either via a directed path (i.e.
direct or indirect modifier of T), or via an undirected path (i.e. there is
some kind of possibly distant relation between T and Tm). These undirected
paths will not necessarily capture a coherent connection between T and Tm,
still less one that parallels the connection between H and Hd. Nonetheless
these paths are allowed to pass. The hypothesis and text paths pairs for each
Hd match are recorded on the match edge when its specificity is updated. The
eventual idea is to record these role path pairs, and try to learn which kinds
of pairing are reasonable, and which are not.
To
this end, there is the stub of a path score included in the specificity update.
This assigns a cost to each path pairing, and defines a threshold cost beyond
which pairs should be discounted. At present the cost is a combination of (a)
the difference in length of hypothesis and text paths, and (b) whether an
undirected text path jumps from one sense of a word into another.
Relative
Clauses and Determiners
Once
a match is completely updated, some more work still remains to be done. Part of
this is dealing with relative clause like constructions. Relative clauses like
"the dog that barked" are odd in that the noun is restricted by the
head of the relative clause, but the noun is at the same time an
argument/modifier of the relative clause. Up to this point, only the latter
relation has been considered in updating specificity.
Suppose
that on the hypothesis side some term has a relative clause restriction. If
there is a matching relative clause restriction on the other side, all is well.
Likewise if whatever matches the H-restriction is a direct or indirect modifier
of the T-term (in which case the [H,T] match is updated with the specificity of
the restriction match. If the H-restriction is not accounted for in this way,
the [H,T] match is made more specific.
After
all the H-restrictions have been dealt with, we look to see if there any
remaining unmatched T-restrictions. If the T-restriction matches a hypothesis
term that has H as a descendant, then update [H,T] with the specificity of this
match. Otherwise there is an unmatched T-restriction, and so [H,T] is made more
general.
Once
the effects of relative clauses have been taken into account, we need to deal
with the effect of determiners.
Determiners can flip the specificity of a match.
For example, while "a dog" is more general than "a black
dog", "every dog" is more specific than "every black
dog". Determiners can have different monotonicity properties on their restrictions
and bodies:
- + + some, at least N: some old men are keen gardeners |= some men are gardeners
- - + all, every: all men are keen gardeners |= all old men are keen gardeners |= all old men are gardeners
- - - no, at most N: no men are gardeners |= no old men are keen gardeners
- ? ? exactly N: exactly 3 old men are keen gardeners ?? exactly 3 old men are gardeners ?? exactly 3 men are gardeners
- ? + most,many most men are keen gardeners |= most men are gardeners
- ? - few: few men are gardeners |= few men are keen gardeners
It
turns out that all determiners that are downward monotone in their second
argument ("no", "few") can be treated as a negated version
of a determiner that is upward monotone in the second argument (no = not some,
few = not many). The effects of this are captured by including a negation in
the semantic graphs containing these determiners. Monotonicity entailments in
the first argument are dealt with through a determiner table in Java that captures such facts as "all
<subclass> |= some <superclass>", e.g. "all dirty water
|= some water". This table is used to update the specificity on matches
for terms with the corresponding determiners in the property graph.
An
agenda of matches to be updated is constructed, prioritizing matches of terms
that have least dependents. As matches become fully updated, they are pulled
off the agenda. When an Hd has more than one match, new copies of the [H,T]
match are added to the graph and to the agenda. The agenda is repeatedly
processed until it either becomes empty or does not change. Also, the
determiner entailments table is incomplete. Also, copying matches when an Hd
has more than one match can sometimes lead to multiple [H,T] matches with the
same specificity. When this happens, the higher scoring matches are removed.
Determining
Entailment Relations
Once
all the term matches have been fully updated according to the structure of the
text and hypothesis graphs, the entailment relation between the two graphs can
be determined. It is at this point that the context structure of the hypothesis
and text graphs becomes important.
First
up, we look for any contradictions. How does a contradiction occur? A
contradiction requires one term in a positive context being matched with a term
in a negative context. Specifically, the two terms must be the context heads of
their respective contexts. If the hypothesis term is positive and more or
equally specific than the negative text term (a (young) man :: no man), then a
contradiction occurs. If it more general (a man :: no young man), then no
determination can be made. Likewise, if the hypothesis term is in a negative
context and the positive text term is equally or more specific, then a
contradiction occurs.
If
there are no contradictions, then a search is made for entailments. This looks
for the highest level, non-vacuous hypothesis term(s).
If it occurs in a
positive context, then it needs to match a text term in a positive context that
is as or more specific. If it is in a negative context, it must match a
negative text term that is as or more general.
If
an entailment is not found, the relation is neutral. In the case of neutrality,
there are some weaker guesses at contradiction and entailment that consider the
specificity of the initial term matches rather than the finally updated
specificity.
Ideas for
Further Work
A
number of further things should be explored
- (we use a readymade clustering) Using Brown clustering for another form of initial term match in ECD
- Using word sense disambiguation to cut down on the number of ridiculous entailment derived from incorrect word sense choices
- Explore running ECD off lightly modified (how much? for what?) syntactic dependencies (i.e. with context structure and lexical information included, but without going through all of the transfer semantics)
- Adding ontological data to the database server (SUMO, OpenCyc, ...)
- Learning path scores for the path scorer from training on labeled data.
The picture is from Liz Copock's project page
Channels of Meaning: How information can be expressed without being said
Channels of Meaning: How information can be expressed without being said
No comments:
Post a Comment