Sunday, June 22, 2014

Whitehead's Lost Letter

Serendipity on the web or simply ADD of older people? I was trying to do something completely unrelated when I fell upon this very interesting link in the Library of the Congress (of 2011)
Symposium on Alfred North Whitehead Correspondence.
Now I feel like writing to the Library to get the transcript of the letter online. After all they say "The six-page letter was handwritten in 1936 by Whitehead to his former student and personal assistant, Henry S. Leonard. In the letter, Whitehead responds to and comments on an essay written by Leonard regarding the conflict between speculative philosophy and logical positivism. Whitehead comments on the relationship between philosophy and science, and also reveals his attitudes toward, and appraisals of, his collaborator Bertrand Russell, Ludwig Wittgenstein and the father of logical positivism, Rudolf Carnap."
Maybe the letter is online somewhere? Wish I knew.

Thursday, June 19, 2014

The pleasure of open source tools

Joel showed me this great website
It is quite incredible what it can do in terms of calculating bi-grams, 3-grams etc for you.

I also was told by my friend Hema about the voices in the NeoSpeech Lab

NeoSpeech Text-to-Speech voices | Interactive TTS Demo | TTS SDK

Besides being amazingly realistic they can be used as a proof reading tool for your paper,
if you have the patience to paste it into their  sample box, paragraph by paragraph...

There is also the XLE  online that the Norwegians made available here INESS :: XLE-Web.

But my favourites are  the OpenMultiLingual WordNet Multilingual Wordnet 1.0

and the Portuguese corpus Corpus do Português. Go NEH (National Endowment for the Humanities)!

Monday, June 2, 2014

Chocolate Boxes

Looking  at noun-noun compounds in the representation language associated to TIL (Textual Inference Logic, described in Contexts for Quantification)  and trying to decide which modifications one should make to the Abstract Knowledge Representation(AKR) treatment, if any.

1. AKR takes the view that given a noun-noun compound (like 'chocolate box') there is a relation between the two nouns, which we don't know what it is, (only using language), so we leave it unspecified.

2. So  whether we want to talk about :
{a Brad Pitt movie}
{a Tarantino movie}
{a James Bond movie}
{a tv program}
{a tv show}
{a birth certificate}
{a chocolate box}

We want to say that the representation of the noun-noun compound should be something like:
role(nn_element, HEAD, MOD)
instantiable(HEAD, cxt)
instantiable(MOD, cxt)
where MOD is the modifier noun.

3. Certain noun-noun compounds are lexicalized by WordNet such as {birth_certificate} or {tv_show}, so they correspond to a single concept as far as WordNet is concerned.
Clearly the line between noun-noun compounds that should be lexicalized in any generic ontology and the ones that should not is a very fluid one. Many people complain that WordNet does not have all the nn-compounds it should and the literature on nn-compounds (but more generally in multi-word expressions) is huge.

4. AKR embraces ambiguity, so it says that sometimes a lexicalization is a good idea and sometimes it isn't. ( the chocolate box above is not merely a box containing chocolates, it is also a box made of chocolate...) Hence AKR produces two solutions for compounds that are lexicalized like {a birth certificate}

a birth certificate
% Choices:
[choice([A1,A2], 1)
Conceptual Structure:
Contextual Structure:

The representation  above has two solutions: we have one concept in the solution that says that {birth_certificate} is a single entity,  and two concepts in the more generic interpretation of birth certificate that says that there is a noun "birth" and a noun "certificate" and we don't know what exactly is the relationship between the two.

5.  It's clear that if I say  {a Tarantino movie} I want to have two concepts, one concept for `Tarantino' which someone else should be able to say is a movie director and two, a movie that  hopefully is vaguely associated with Tarantino. and again hopefully someone else will  decide which one is the relationship between the two concepts.

When nn-compounds are lexicalized (like birth-certificate or tv-show) then maybe there is only a single concept, but it seems that there should be a range, some expressions really a single concept, others very weak relation between the nouns and others between the extremes.

6. This is   more complicated for "media genres", as sometimes one of the nouns stands in for the compound: thus {a documentary movie} maybe  should be equivalent to {a documentary}, and if documentary is a genre that applies to other media (e.g. a radio documentary), how should the mapping be that makes documentary the same as documentary movie, but independent enough to produce a sensible mapping for {radio documentary}?

 It seems to me that, in principle the semantic mapping should produce for  any nn-compound a pair of concepts: the concepts associated to the  HEAD noun,  and MOD(ifier) noun and an {underspecified relation between these} concepts, that gets resolved. Either:

A. It  transforms the two concepts into a single one (if the concept is lexicalized), e.g  {a tv show} should produce simply the ontological concept  TVShow and not an ambiguity between the generic relation  between nouns
(role(nn_element,show-7,TV-4)) and the lexicalized concept

B. Or it produces the right relationship in the ontology between the concepts  if such exists

C. Or it produces a clear relation (Rel X) between the conepts, so that we know that we need to try to find out what this relation is.

This is to be contrasted to a noun phrase like an adjective-noun compound such a {a French actress} or {a hungry boy} that should always produce only one concept (the one for the HEAD noun) while the concept for the modifier, well, it just modifies the head.

Thus  the mapping to the ontology for a phrase like {a French actress} should always go to a concept
for actress, suitably modified by whatever meaning we think the adjective brings in.