Monday, August 31, 2020

What did you do in your Summer vacation?



This is kind of silly. It would make more sense to have this list hanging of the homepage of OWN-PT itself, and maybe it will get there soon, but for the time being I need a list of our publications about OpenWordnet-PT and so this is it. Mostly copied from Alexandre's list of publications (thanks for being so organized Alexandre!!!).

  1. de Paiva, Valeria, and Alexandre Rademaker. 2012. “Revisiting a Brazilian WordNet.” In Proceedings of Global Wordnet Conference. Matsue: Global Wordnet Association.
  2. de Paiva, Valeria, Alexandre Rademaker, and Gerard de Melo. 2012. “OpenWordNet-PT: An Open Brazilian Wordnet for Reasoning.” In Proceedings of COLING 2012: Demonstration Papers, 353–60. Mumbai, India: The COLING 2012 Organizing Committee. http://www.aclweb.org/anthology/C12-3044.
  3. Rademaker, Alexandre, Valeria de Paiva, Gerard de Melo, Livy Real, and Maira Gatti. 2014. “OpenWordNet-PT: A Project Report.” In Proceedings of the 7th Global WordNet Conference, edited by Heili Orav, Christiane Fellbaum, and Piek Vossen. Tartu, Estonia. http://globalwordnet.org/global-wordnet-conferences-2/
  4. Real, Livy, Alexandre Rademaker, Valeria de Paiva, and Gerard de Melo. 2014. “Embedding NomLex-BR Nominalizations into OpenWordnet-PT.” In Proceedings of the 7th Global WordNet Conference, edited by Heili Orav, Christiane Fellbaum, and Piek Vossen, 378–82. Tartu, Estonia. http://globalwordnet.org/global-wordnet-conferences-2/
  5. de Paiva, Valeria, Livy Real, Alexandre Rademaker, and Gerard de Melo. 26AD. “NomLex-PT: A Lexicon of Portuguese Nominalizations.” In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), edited by Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis. Reykjavik, Iceland: European Language Resources Association (ELRA)
  6. Freitas, Cláudia, Valeria de Paiva, Alexandre Rademaker, Gerard de Melo, Livy Real, and Anne de Araujo Correia da Silva. 2014. “Extending a Lexicon of Portuguese Nominalizations with Data from Corpora.” In Computational Processing of the Portuguese Language, 11th International Conference, PROPOR 2014, edited by Jorge Baptista, Nuno Mamede, Sara Candeias, Ivandré Paraboni, Thiago A. S. Pardo, and Maria das Graças Volpe Nunes. São Carlos, Brazil: Springer.
  7. de Paiva, Valeria, Cláudia Freitas, Livy Real, and Alexandre Rademaker. 2014. “Improving the Verb Lexicon of OpenWordnet-PT.” In Proceedings of Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish (ToRPorEsp), edited by Laura Alonso Alemany, Muntsa Padró, Alexandre Rademaker, and Aline Villavicencio. São Carlos, Brazil: Biblioteca Digital Brasileira de Computação, UFMG, Brazil. http://www.lbd.dcc.ufmg.br/bdbcomp/servlet/Evento?id=755.
  8. de Paiva, Valeria De, Dário Oliveira, Suemi Higuchi, Alexandre Rademaker, and Gerard De Melo. 2014. “Exploratory Information Extraction from a Historical Dictionary.” In IEEE 10th International Conference on e-Science (e-Science), 2:11–18. IEEE. https://doi.org/http://dx.doi.org/10.1109/eScience.2014.50.
  9. Oliveira, Hugo Gonçalo, Valeria de Paiva, Cláudia Freitas, Alexandre Rademaker, Livy Real, and Alberto Simões. 2015. “As Wordnets Do Português.” Oslo Studies in Language 7 (1): 397–424
  10. Rademaker, Alexandre, Dário Augusto Borges Oliveira, Valeria de Paiva, Suemi Higuchi, Asla Medeiros e Sá, and Moacyr Alvim. 2015. “A Linked Open Data Architecture for the Historical Archives of the Getulio Vargas Foundation.” International Journal on Digital Libraries 15 (2-4): 153–67. https://doi.org/10.1007/s00799-015-0147-1
  11. Real, Livy, Fabricio Chalub, Valeria de Paiva, Claudia Freitas, and Alexandre Rademaker. 2015. “Seeing Is Correcting: Curating Lexical Resources Using Social Interfaces.” In Proceedings of 53rd Annual Meeting of The Association for Computational Linguistics and The 7th International Joint Conference on Natural Language Processing of Asian Federation of Natural Language Processing - Fourth Workshop on Linked Data in Linguistics: Resources and Applications (LDL 2015). Beijing, China.
  12. Paiva, Valeria de, Livy Real, Hugo Gonçalo Oliveira, Alexandre Rademaker, Cláudia Freitas, and Alberto Simões. 2016. “An Overview of Portuguese WordNets.” In Global Wordnet Conference 2016. Bucharest, Romenia.
  13. Real, Livy, Valeria de Paiva, Fabricio Chalub, and Alexandre Rademaker. 2016. “Gentle with Gentilics.” In Joint Second Workshop on Language and Ontologies (LangOnto2) and Terminology and Knowledge Structures (TermiKS) (Co-Located with LREC 2016). Slovenia.
  14. Chalub, Fabricio, Livy Real, Alexandre Rademaker, and Valeria de Paiva. 2016. “Semantic Links for Portuguese.” In 10th Edition of Its Language Resources and Evaluation Conference (LREC). Portoroz, Slovenia.
  15. de Paiva, Valeria, Fabricio Chalub, Livy Real, and Alexandre Rademaker. 2016. “Making Virtue of Necessity: a Verb Lexicon.” In PROPOR – International Conference on the Computational Processing of Portuguese. Tomar, Portugal.
  16. Rademaker, Alexandre, Valeria de Paiva, Fabricio Chalub, Livy Real, and Claudia Freitas. 2016. “Introducing OpenWordnet-PT: an Open Portuguese Wordnet for Reasoning.” In International FrameNet Workshop Part of 9th International Conference on Construction Grammar (ICCG9), edited by Tiago Timponi Torrent. Juiz de Fora, Brazil: Universidade Federal de Juiz de Fora - UFJF.
  17. Rademaker, Alexandre, Fabricio Chalub, Livy Real, Cláudia Freitas, Eckhard Bick, and Valeria de Paiva Universal Dependencies for Portuguese. 2017. “Universal Dependencies for Portuguese.” In Proceedings of the Fourth International Conference on Dependency Linguistics (Depling), 197–206. Pisa, Italy.
  18. Muniz, Henrique, Fabricio Chalub, Alexandre Rademaker, and Valeria de Paiva. 2018. “Extending Wordnet to Geological Times.” In Global Wordnet Conference 2018. Singapore.
  19. Real, Livy, Alexandre Rademaker, Fabricio Chalub, and Valeria de Paiva. 2018. “Towards Temporal Reasoning in Portuguese.” In Proceedings of 6th Workshop on Linked Data in Linguistics. Miyazaki, Japan. http://lrec-conf.org/workshops/lrec2018/W23/summaries/8_W23.html.
  20. de Paiva, Valeria, Alexandre Rademaker, Livy Real, Fabricio Chalub, and Gerard de Melo. 2018. “OpenWordNet-PT: Taking Stock.” Proceedings of Fifth Workshop on Natural Language and Computer Science (Affiliated with Federated Logic Conference 2018). Oxford, UK. https://doi.org/10.29007/tvgw
  21. Cid, Alessandra, Alexandre Rademaker, Bruno Cuconato, and Valeria de Paiva. 2018. “Linguistic Legal Concept Extraction in Portuguese.” In Legal Knowledge and Information Systems, edited by Monica Palmirani. Vol. 313. Frontiers in Artificial Intelligence and Applications. http://ebooks.iospress.nl/volumearticle/50848
  22. de Paiva, Valeria de, and Alexandre Rademaker. 2019. “Portuguese Manners of Speaking.” In Proceedings of the 10th Global Wordnet Conference. Global Wordnet Association.


Sunday, August 30, 2020

Wildfires near us

This has been a difficult week: heatwave, pandemic, possible power cuts and to cap it all, wildfires! the possible need to evacuate, the need to prepare go-bags, to find documents and photos. our bags are still sitting by the porch. The wildfires are only 40% contained, as I type this. Very unsettling! and we're the lucky ones: the ones who did not have to evacuate, who only had to consider it.

It took quite a lot of determination to keep making jokes about it. Like the one I made in facebook, stealing from someone I don't know on twitter.

"It’s raining ash in California, forcing us to wear a different kind of mask than we wear for the pandemic when we go buy the generator we need for either rolling blackouts or preemptive outages so we can work from home if we haven’t been evacuated, if we have work or our house hasn't burned down. well, I don't need a generator, I just need some more wine to keep it going!"


But all is fine, thank you! The heatwave has gone away, the air is clear again, there's wine in the house and I've managed to chair sessions at AiML and WBL, and watch lots of interesting talks. I even managed to produce slides and talk for some 20 min on Friday. So we're back to the old worries about all the deadlines missed and all the work not delivered, yet. but hey, this is normal! Wildfires no, they're not normal.

and this picture is from 2016, not now, but somehow is even more frightening that I had no idea it was happening!


Sunday, August 9, 2020

Understanding Portuguese

(Illustration by Jana Walczyk)

It might be possible to find students and programmers to develop an old dream of mine, I am told. This dream is a project about producing logic from texts in  Portuguese. I have been giving talks about this project since 2010 (paper from 2011). Thus I want to explain (to possible volunteers) what does this project entail, what is the work that we should be doing, and why.

Explaining why we should be doing this work is very easy. 

The amount of information published in scientific articles,  preprints, news, blog posts, fiction, as well as unstructured data has increased many-fold in the last few years. A major bottleneck in the discovery of relevant information for business and researchers alike arises when connecting new results with the previously established state-of-the-art. A potential solution to this problem is to transform the unstructured raw-text of the novel information onto structured database entries, which would allow us to reason with this new information in the same way that one already organizes and reasons with the previous content, using Knowledge Graphs. Thus this would allow programmatic querying of the content, checking it for contradictions, checking for new changes, as well as all manners of analytics of this content. The fact that one can do most of this processing in English, but not in Portuguese (or for that matter not in many other languages) should be a reason for concern.  Brazilian science, as well as its industry, cannot progress as well as others, if our native language is not processed as well as others.

Semantic Parsing Portuguese

Now explaining exactly what the work on a semantic parser for Portugues amounts to, is somewhat harder. The project of transforming unstructured text into knowledge is very hard, language is way too ambiguous and difficult to deal with. While many open-source tools and resources for processing English texts exist, very few can be used for Portuguese. So we describe in parallel what we do have for English and what we need to build for Portuguese.

The project of extracting semantic information from English sentences is very hard. ur best shot can be seen at the moment in the preliminary demo. This prototype, developed by Katerina Kalouli and Dick Crouch, goes over ideas developed when I worked with Crouch at Xerox PARC, but re-implements these ideas from scratch, using new technologies for all software that is proprietary technology of either Xerox PARC or Microsoft. (There is a paper explaining the system and a version showing how this can be hybridized with machine learning systems.)  

This new semantic parser project has a pipeline that depends on several other open-source projects: we discuss these several "steps" below. 

Steps for Semantic Parser in Portuguese

Semantic parsers for English abound, but we are following a specific line of work that starts with Daniel Bobrow and  Ronald Kaplan at PARC.

1. Grammatical parsing is improving every year. A recent development is the new Stanford system called "Stanza".  Stanza is multilingual, includes Portugues, it is written in Python and has a better (less restrictive) license than the previous CoreNLP Stanford systems. We need to fine-tune it for our experiments.

2. The semantic parser we have in English depends on the grammatical parsing of sentences using the Stanford-Google based project "Universal Dependencies". Actually, it uses "enriched dependencies", we need to check how they behave for Portuguese.

The Universal Dependencies project has been going on since 2016.  This has already a branch in Portuguese, with which I am associated through my work with Alexandre Rademaker and Livy Real, but the corpus we have in Portuguese is small and there are still many issues with the Portuguese Universal Dependencies. These need expanding and possibly some annotation effort to increase the size of the corpus.

3. The semantic parser also depends essentially on Princeton WordNet.  Building up the Portuguese version of the WordNet thesaurus and dictionary has been a much harder task than we had anticipated, but our system (for browsing and downloading) has been in operation since 2012, here's the original description. It is still being constructed or is ``in progress", but it is getting close to the end of its first (translation only) phase. 

4. The semantic parser also depends on some version of tool for disambiguation and we have been using JIGSAW (available from GitHub), but this has not been updated since 2012. And this will not work for Portuguese. We need a tool for the disambiguation of Portuguese that can be plugged into this pipeline.

5. The system also depends on a generic upper ontology, for which we are using SUMO  in English. But an upper ontology is not enough to provide the world knowledge necessary for our applications. The project of expanding SUMO into an appropriate ontology for Brazilian culture, a Knowledge Graph for Brazil and its different facets (be they history, culture or geology or tourism, etc) is another major undertaking.

6. Finally, we need a reasoner on top of the representations that the semantic parser produces. This could be an off-the-shelf system like Lean or Isabelle, or it could be an NLI (Natural Language Inference) like the ones produced via neural nets and/or hybrid methods described in this SEMEVAL meeting special issue proceedings.

I need to emphasize that these steps can be done in any scientific or commercial field that one is interested in. We could do it for History, Chemistry, or Mathematics, for example. We could do it to help integrate IoT (Internet of Things) appliances or to help design customer service automated systems. Of course, an application to dialogue will require a further module, a dialogue manager, which orchestrates the possible conversations and actions of the automated system. The different domains should correspond to different Knowledge Graphs.

However, each one of these steps is a considerable amount of work, possibly worth a master thesis, or maybe even a PhD. Putting them all together should also be a major engineering feat. I hope we will find people willing to take up this challenge.

Sunday, July 26, 2020

Editing books

The picture above is what Amazon knows about the books I've edited. There are a few other things, (e.g. special issues of journals), but of course Amazon doesn't sell them, so they wouldn't know about those. Now anyone in the least competent would have their edited books as part of their curriculum and webpage, right? Oh well, I'm failing this test too, so far. Need to add them.
Actually, Amazon also knows about the cover of the last book above, first edition in 1993, re-issued as a paperback in 2006.

Now for special issues, I guess I need to create my own picture.

The curious incident of the dropped streaming

This is a picture of zoom minutes before Women in Logic 2020, the workshop associated with FSCD/IJCAR, started on 30 June 2020. This year I am *not* one of the organizers of "Women in Logic".
I had promised myself to try to do it for three years and then pass on the ball. I was thrilled to be able to pass the ball to the very competent hands of Sandra Alves, Sandra Kiefer and Ana Sokolova, the organizers this year! They did a splendid job and the workshop had 145 attendees during these trying pandemic times, a wonderful feat, if you ask me.

But we had a bit of an incident during Women in Logic 2020 this time. The workshop was going really well, when during Alexandra Silva's Invited talk ("An algebraic framework to reason about concurrency"), my chat started blipping with the organizers of FSCD/IJCAR asking "what's going on on your workshop? everything ok? YouTube took down the streaming!!! they say someone complained about the workshop". What?

I explained that there was nothing wrong happening, no zoom bombing, no glitches that we (me or the real organizers) could see, and urged them to complain to YT to get the stream back up again. YouTube eventually restarted the stream again (the next day--the workshop was one day only) and sent an unapologetic message, see below.

So yes, we don't know at all what happened. If they thought there was a trademark infringement or if some human being triggered the complaints procedure to annoy us. (some of our friends seem to think that the latter was the case!)

We have some reasons to believe that this was a childish act of sabotage:  because Alexandra had finished the CS part of her presentation and had started the discussion on why we need meetings like "Women in Logic". Initially firmly convinced that it was some sort of glitch of automatic algorithms I took on to Twitter and asked:

OK, a small typo in "down", but nothing too controversial.  Belnap's lattice, Kleene algebras and nominal type theory are perfectly good subjects in logic and computer science. The workshop was running on Zoom and was been streamed on YouTube, so the meeting carried on with further talks and a discussion at the end. But the reason for streaming the meeting was to support also people who didn't want to use zoom, and these people could not participate then.

Quite a number of people responded to my tweet.  Ian Stark asked "Do you get any indication of what YouTube judge you've infringed?" and we were told that something similar happened with POPL2019, so I wrote to Fritz Henglein to ask for information. (there wasn't much info to be had)

Sara Kalvala commented "It is completely bizarre that anyone would feel threatened by a bunch of women having a workshop on logic and complain to @youtube. Even more bizarre that @youtube would delete the video. But it won't stop us having more meetings". To this I replied "yes, totally bizarre! a small correction is that YT didn't delete the video, they simply took down the streaming. Since stopping the streaming is immediate, but reinstatement takes lots of human intervention, they put it back the next day, but the workshop was one day only!".

Anyways a small consolation (for me) was to see the comments from colleagues in FSCD/IJCAR saying "I thought you were exaggerating, guess you're right and doing the right thing!! Keep doing it!!".

And yes, I think we are doing the right thing. To begin with I was a little skeptical. I am used to being in a very masculine world, a world of very few women. I `grew up' in research being treated like one of the "lads" and not worrying too much about it. I was expecting things to improve, as numbers of women improved. But the numbers of women in logic and Computer Science not only did not improve, some of them got decidedly much worse.

Many of  the young women finishing PhDs in CS I talked to feel that a place like "Women in Logic"  made them feel less attacked, more protected and better able to speak and be themselves. And the fact that many of our sisters have been doing these meetings for more than twelve years (e.g.Women in Machine LearningWomen in Machine Learning and Data ScienceWomen in Biology, etc), with huge numbers in attendance, showed me  that I was wrong, that meetings where only women present work are sensible and helpful and a "good thing" altogether.




Monday, July 20, 2020

Slideshare is my friend. Sometimes.

When I first had  issues with Google sites, Slideshare turned out to be an easy place to add pdfs of  slides to (btw I still have issues with Google sites, still need to find a few hours to try to debug what's wrong with my old webpage!)

 I wish I had made more use of slideshare all along, as by now I have no idea where most of my powerpoint talks are. I hope they are somewhere between my dropbox, my Google Drive storage, my Apple storage, or any of the four hard drives sitting on my desk. But yes trying to find anything at all in all these possible places is quite hard. While the Slideshare stuff is easy to find.


I can see that this year, so far, I have given five talks -- actually I have given 6, as I have forgotten to add the slides for "Logicians in Quarantine", as they were very similar indeed to the ones for SRI. I can see that I need some change in my beamer style -- which is lovely and matches well my favorite PowerPoint template, so I can import between the platforms. But you can have too much of a good thing like pale purplish-blue! and I swear that it took me much more than 3 hours to get those slides uploaded. because one deck had a glitch and insisted on failing its upload over and over again.

Anyways some effort will be happening here to recover the talks and the writings associated with them. Because, yes, I am failing miserably this year on my "just ship it" approach to paper writing. We're at the end of July, so I needed to have 7 papers submitted (yes, we cannot guarantee acceptance, but we can make sure that submission is done!). Instead, I have one paper to appear with Paul Tarau and one submitted with Samuel Gomes da Silva. Definitely not good! need to change that asap.

Sunday, July 19, 2020

Logicians in Quarantine



 This post is a short shout-out to Bruno Lopes and Petrucio Viana for the brilliant idea of creating "Logicians in Quarantine/Logicos em Quarentena" the Brazilian Logic Society online seminar.  I was invited to help to bootstrap it and I gave the second talk,  "Between a Rock and a Hard Place: Structural and Distributional Meaning representations", mostly because I had just given this talk at SRI Menlo Park (on 5th March), so it was ready. But also because I wanted to show to my Brazilian friends how my work with language does connect to my work with logic. The transition is not so obvious. 

Joao Marcos gave the first talk "On classes of structures axiomatizable by universal d-Horn sentences and universal positive distinctions".  The seminar seems to be working extremely well, with all sorts of interesting talks. These are recorded, so if something happens, you can always watch it later on. 

The seminar is now part of the Logic SuperGroup another splendid idea! many kudos to Shay Logan, Shawn Standefer, and many others for the brilliant implementation of the idea of connecting all the logic seminars.