Wednesday, August 13, 2014

Digital Humanities? Sure, why not?

So we finally have a first paper written, submitted and accepted about work on the DHBB -- the Brazilian Dictionary of Historical Biographies.

This is going to be presented in Guaruja', pretty soon, in the Workshop on Digital Humanities. This was really fun work to do, analyzing the results of text processing with FreeLing
and thinking of ways of improving the information extraction that we could do. I prefer my original title for the work, but reviewers didn't like it, so it's now
As is to be expected named entities and their recognition play a main role in this project, as do time expressions recognition and multiword expressions. All interesting stuff.

Tuesday, August 12, 2014

Maude and Isabelle...

Together with Vivek Nigam I am working on proving textual entailment using the traditional rewriting system Maude.

This is an attempt to redo, with open source software, some of the work done in PARC using the Bridge system. What interests me most is the logic we (the PARC Natural Language Theory and Technology team) devised for this task, because the representations in this logic took several years for linguists to agree on. So Vivek and I are taking the short cut of 'assuming' a perfect NLP system. (even explaining what a perfect NLP system means is already kind of long and full of design choices.)

We have a short note, explaining the project, that is to appear in LFSA 2014 in Brasilia. The note builds up from papers describing TIL (Textual Inference Logic) such as A Basic Logic for Textual Inference, Textual Inference Logic: Take Twoand Contexts for Quantification.
The note is called  Towards a Rewriting Framework for Textual Entailment.

What about Isabelle? Well, I've been trying to ask people if we could do a parallel implementation in Isabelle and I hope I have convinced Sara Kalvala to do it with me, yepiie!

( I am a member of the group a researchers at the Center for the Study of Language and Information (CSLI) the  Language and Natural Reasoning group, the picture is one of the workshops organized by the group, CSLI Workshop on Natural Logic, Proof Theory, and Computational Semantics, 2011)

Tuesday, July 8, 2014

BACAT? yes!

My friend Vlad Patryshev organizes the Bay Area Categories And Types  meetup.  

This is a splendid idea, I think. It means people who like the idea of Category Theory and Type Theory can meet and discuss issues, without the pressure of publication and/or competition for spots. I'd love to go more often. I have so far given two talks at BACAT, as we call it.

The first was about partial compilers and dialectica spaces, work of  Mihai Budiu, Joel Galenson and Gordon Plotkin, available from Gordon's webpage.  My slides are here Partial Compilers and Dialectica Spaces. (at the talk I discovered that Mihai is actually the husband of an old friend from PARC days...)
The second talk was on even older work with Gavin Bierman, Nick Benton and Martin Hyland, that we never published properly, only in conferences like TLCA and CSL. This is on deriving a term calculus or a type theory for Intuitionistic Linear Logic, together with its categorical models. I find that the best version is still the technical report from Cambridge. My recent and over-simplified slides are here Linear Type Theory Revisited.

Sunday, June 22, 2014

Whitehead's Lost Letter

Serendipity on the web or simply ADD of older people? I was trying to do something completely unrelated when I fell upon this very interesting link in the Library of the Congress (of 2011)
Symposium on Alfred North Whitehead Correspondence.
Now I feel like writing to the Library to get the transcript of the letter online. After all they say "The six-page letter was handwritten in 1936 by Whitehead to his former student and personal assistant, Henry S. Leonard. In the letter, Whitehead responds to and comments on an essay written by Leonard regarding the conflict between speculative philosophy and logical positivism. Whitehead comments on the relationship between philosophy and science, and also reveals his attitudes toward, and appraisals of, his collaborator Bertrand Russell, Ludwig Wittgenstein and the father of logical positivism, Rudolf Carnap."
Maybe the letter is online somewhere? Wish I knew.

Thursday, June 19, 2014

The pleasure of open source tools

Joel showed me this great website
It is quite incredible what it can do in terms of calculating bi-grams, 3-grams etc for you.

I also was told by my friend Hema about the voices in the NeoSpeech Lab

NeoSpeech Text-to-Speech voices | Interactive TTS Demo | TTS SDK

Besides being amazingly realistic they can be used as a proof reading tool for your paper,
if you have the patience to paste it into their  sample box, paragraph by paragraph...

There is also the XLE  online that the Norwegians made available here INESS :: XLE-Web.

But my favourites are  the OpenMultiLingual WordNet Multilingual Wordnet 1.0

and the Portuguese corpus Corpus do Português. Go NEH (National Endowment for the Humanities)!

Monday, June 2, 2014

Chocolate Boxes

Looking  at noun-noun compounds in the representation language associated to TIL (Textual Inference Logic, described in Contexts for Quantification)  and trying to decide which modifications one should make to the Abstract Knowledge Representation(AKR) treatment, if any.

1. AKR takes the view that given a noun-noun compound (like 'chocolate box') there is a relation between the two nouns, which we don't know what it is, (only using language), so we leave it unspecified.

2. So  whether we want to talk about :
{a Brad Pitt movie}
{a Tarantino movie}
{a James Bond movie}
{a tv program}
{a tv show}
{a birth certificate}
{a chocolate box}

We want to say that the representation of the noun-noun compound should be something like:
role(nn_element, HEAD, MOD)
instantiable(HEAD, cxt)
instantiable(MOD, cxt)
where MOD is the modifier noun.

3. Certain noun-noun compounds are lexicalized by WordNet such as {birth_certificate} or {tv_show}, so they correspond to a single concept as far as WordNet is concerned.
Clearly the line between noun-noun compounds that should be lexicalized in any generic ontology and the ones that should not is a very fluid one. Many people complain that WordNet does not have all the nn-compounds it should and the literature on nn-compounds (but more generally in multi-word expressions) is huge.

4. AKR embraces ambiguity, so it says that sometimes a lexicalization is a good idea and sometimes it isn't. ( the chocolate box above is not merely a box containing chocolates, it is also a box made of chocolate...) Hence AKR produces two solutions for compounds that are lexicalized like {a birth certificate}

a birth certificate
% Choices:
[choice([A1,A2], 1)
Conceptual Structure:
Contextual Structure:

The representation  above has two solutions: we have one concept in the solution that says that {birth_certificate} is a single entity,  and two concepts in the more generic interpretation of birth certificate that says that there is a noun "birth" and a noun "certificate" and we don't know what exactly is the relationship between the two.

5.  It's clear that if I say  {a Tarantino movie} I want to have two concepts, one concept for `Tarantino' which someone else should be able to say is a movie director and two, a movie that  hopefully is vaguely associated with Tarantino. and again hopefully someone else will  decide which one is the relationship between the two concepts.

When nn-compounds are lexicalized (like birth-certificate or tv-show) then maybe there is only a single concept, but it seems that there should be a range, some expressions really a single concept, others very weak relation between the nouns and others between the extremes.

6. This is   more complicated for "media genres", as sometimes one of the nouns stands in for the compound: thus {a documentary movie} maybe  should be equivalent to {a documentary}, and if documentary is a genre that applies to other media (e.g. a radio documentary), how should the mapping be that makes documentary the same as documentary movie, but independent enough to produce a sensible mapping for {radio documentary}?

 It seems to me that, in principle the semantic mapping should produce for  any nn-compound a pair of concepts: the concepts associated to the  HEAD noun,  and MOD(ifier) noun and an {underspecified relation between these} concepts, that gets resolved. Either:

A. It  transforms the two concepts into a single one (if the concept is lexicalized), e.g  {a tv show} should produce simply the ontological concept  TVShow and not an ambiguity between the generic relation  between nouns
(role(nn_element,show-7,TV-4)) and the lexicalized concept

B. Or it produces the right relationship in the ontology between the concepts  if such exists

C. Or it produces a clear relation (Rel X) between the conepts, so that we know that we need to try to find out what this relation is.

This is to be contrasted to a noun phrase like an adjective-noun compound such a {a French actress} or {a hungry boy} that should always produce only one concept (the one for the HEAD noun) while the concept for the modifier, well, it just modifies the head.

Thus  the mapping to the ontology for a phrase like {a French actress} should always go to a concept
for actress, suitably modified by whatever meaning we think the adjective brings in.

Saturday, May 24, 2014

The Perils of Propaganda

The pictures don't do justice to Boulder's Flatirons. They're extremely impressive "live".

My Invited talk at  the North-American Meeting of the Association for Symbolic Logic in Boulder, CO wasn't recorded (no talks were recorded, mathematicians are usually low-tech).

So the propaganda will look even more 'jarring', perhaps in the slides.

But since I did work very hard, trying to convey what I think young logicians ought to know about mathematics in the 20th century, here (Edwardian Proofs as Futuristic Programs for Personal Assistants, Boulder, CO)  are the slides, warts and all.