Saturday, December 24, 2022

Dialectica Categories in Computing


In the first half of the year we had an AMS Mathematical Research Community (MRC) on Applied Category Theory, in Beaver Hollow, NY. John Baez blogged about it in “Learn Applied Category Theory!” Here’s a poster advertising the whole program.

The AMS poster for MRCs

We also had a paper in the Notices of American Mathematical Society, describing our proposal:

Applied Category Theory in Chemistry, Computing, and Social Networks John Baez, Simon Cho, Daniel Cicala, Nina Otter, and Valeria de Paiva.

An MRC is a bit like a version “on steroids” of the ACT Adjoint School, as there are many more researchers in a research subgroup in the MRC than in a Adjoint School group. While my ACT Adjoint School group had four students and a TA, my MRC group had thirteen researchers. So from four to thirteen people you have more than three times the number of people to do interesting research with. Plenty of fun, but it can be unsettling to think about research problems for so many people you don’t know, especially for me (well, I haven’t had that much experience directing other people’s research).

I was very worried to begin with, but when we got going, it was exhilarating. Great fun to discuss serious math problems with people interested in similar problems, with similar tastes to yours and with plenty of ideas of their own. The program had three mentors (John, Nina and myself), and I ended up with a group of 13, subdivided into 4 smaller groups. Things were going very well (you can check some of the discussions in the Zulip channel [practice:Dialectica]) until two days before the meeting in upper New York state I got ill, and ended up in the ER in the Stanford hospital. Twice.

I couldn’t go to the research meeting in Beaver Hollow, but Jérémie Koenig, my trusted deputy, managed to get everyone working well together into the four groups we had established before:

  1. Dialectica and Polynomials,
  2. Dialectica and Games,
  3. Dialectica and Lenses,
  4. Dialectica Petri net Processes.

They sent me a lovely picture from Beaver Hollow.

The MRC group on ACT in Beaver Hollow

The neat thing about MRCs is that they give the participants some funds to make academic visits and I was very touched when my group decided to come and visit me in California, using their funds. This meant that we had another smaller meeting in Cupertino, in the middle of the Summer. Well, it was already September and usually the weather over here is good most of the time, but we had some bad luck, and this week was a heatwave, the worst one this summer. But we had some fun. Now the problem is reworking what happened in the many discussions.

A third component of the MRC program is still to come. Of course we all hope that the math we discussed during the summer will mature nicely into preprints and papers and programs, and these are supposed to be presented in the AMS Joint Mathematics Meetings (JMM) in Boston in early January. So one organizer was chosen from each mentor’s group and I am very happy that Charlotte Aten accepted to be the organizer for our group, despite being in the last weeks of her PhD and interviewing for jobs. Then sometime in September we were told everyone needed to write up some ideas that they hoped could make solid papers before January 2023. I’m glad to report that every one of our four subgroups was able to produce an abstract and I hope the papers are in the process of being written as I write this.

Now this post was meant to be about the four subgroups above and why I am so excited for the mathematics we’re doing. But I am a bit behind with my processing of the knowledge we shared when people visited here (and over Zulip!). So, while I could possibly wait for the JMM to get to grips with more of the new math that people are producing, I think I will try to explain, at least a little, how the themes above interact.

I started my research career working on Gödel’s Dialectica Interpretation. On bad days I think not only have I never left the dialectica, but even today I hardly understand the stuff, after all these many years. Then I console myself with the thought that the stuff is really fascinating and full of surprising consequences and applications!

Someone else who’s awed by the Dialectica Interpretation is Jules Hedges. His blog post Lenses for Philosophers shows some of the badly understood consequences and connections of the work on categorical dialectica construction. In some ways the MRC Dialectica categories for Computing is an attempt to clarify some of the interrelations described in Hedges’ post.

Gödel’s Dialectica Interpretation is a proof interpretation of intuitionistic arithmetic into a quantifier-free theory of functionals of finite type, called System T.

The interpretation’s original use was to show the consistency of Heyting (or intuitionistic) arithmetic. When combined with Gödel’s double-negation interpretation, which reduces classical arithmetic to intuitionistic arithmetic, the Dialectica interpretation yields a reduction of the classical theory as well.

As explained by Fernando Ferreira,

The stated purpose of Gödel’s Dialectica paper is to present a consistency proof of first-order Peano arithmetic PA by way of an extension of the finitary viewpoint according to which the requirement that constructions must be of and over “concrete” objects is relaxed to be of and over certain abstracta, namely computable functionals of finite type over the natural numbers.

Thus the Dialectica interpretation can be considered a relaxed (or liberalized) form of Hilbert’s program, where instead of insisting on the finitary viewpoint, we accept ‘computable functionals of finite type’ as (sort-of) concrete objects. To understand the interpretation itself, I recommend A.S. Troelstra’s “Introductory Note to the Dialectica paper” in Gödel’s Collected works, volume II, where we can read that “Gödel presents his results as a contribution to a liberalized version of Hilbert’s program: to justify classical systems, in particular arithmetic, in terms of notions as intuitively clear as possible.” From there one can graduate to more complete work of Troelstra himself (Metamathematical investigation of intuitionistic arithmetic and analysis (1973)), of Avigad and Feferman (Godel’s Functional (“Dialectica”) Interpretation) and of Kohlenbach and his school.

The categorical internal versions of the Dialectica construction, starting in de Paiva’s thesis in 1988 (out as a technical report in 1991), have spelled out some of the requirements and consequences of the interpretation, especially its connection to Linear Logic. The categorical dialectica construction was generalized to fibrations, by Hyland (2002), Biering (2008) and Hofstra (2011). Generalizations to dependent type theories appeared in van Glehn (2016) and Moss (2018) and then together (2018). More recently Trotta et al, building on this work introduced Gödel fibrations and Gödel doctrines, rekindling the connections of the categorical dialectica construction to logic and making sure that the three dialectica principles (Independence of Premise, Markov Principle and Axiom of Choice) get their categorical explanation.

Some members of the MRC group

The main connections of the Dialectica construction discussed in the MRC, as anticipated above, are between ‘lenses’ and dialectica categories, between ‘polynomials’ and dialectica categories (see Nelson Niu’s blog post and Sean Moss’ talk), between ‘games’ and dialectica categories (already hinted at in Blass’ original paper on games for linear logic) and between concurrency theories and dialectica categories in the shape of Dialectica Petri nets Processes and their implementation in Agda. I hope the abstracts for the extended discussions of these four sets of connections will be available soon.

But it is clear that we’re only scratching the surface of this rich story of interconnections. Hopefully much more will come to light in the near future.

Extracting Mathematical Concepts

by Valeria de Paiva , Jacob Collard originally posted Wednesday, 16 Nov 2022

Categories: [research]
Tags: [AI] , [NLP] , [MathFoldr]

Humans are very good at recognizing the words they do not know, the concepts they haven’t met yet. To help with these they invented dictionaries, glossaries, encyclopedias, wikipedias, crib sheets, etc. Initially, they did it via laborious manual work, more recently through automated construction techniques. Natural language processing (NLP) tools have improved impressively in the last decade. Most of this incredible improvement happened to newspaper text and hence named entities of the kind found in news text can be detected and classified (usually into people, organizations, places and dates) with much greater accuracy than in the recent past.

Samuel Johnson
Dr Johnson, the author of the 1755 A Dictionary of the English Language.

However, for domain specific text, like the academic literature in different sciences (e.g. Medicine, Biology, Chemistry), or in the Humanities (History, Sociology, Philosophy, etc.) things are more complicated. Thus extracting technical terms/concepts from papers is important, interesting and usually difficult.

How do you detect mathematical concepts in text? As a human mathematician (or not), you probably recognize that the ovals in the snippet in this picture correspond to mathematical concepts. But there is some uncertainty in this choice, and several metrics might be used to decide whether the automated system used is getting it right or not.

An example of term extraction on a single paragraph of text

We wanted to see how well ‘automatic term extractors’ (ATE) systems work for mathematical text, especially for Category Theory, if we simply use them out-the-box. It seems reasonable to establish baselines using available, open source systems and this is what we do in our paper “Extracting Mathematical Concepts from Text”. The paper was presented at the 8th Workshop on Noisy User-generated Text (W-NUT) at Coling 2022 and you can find a short, technical video describing the work below.

This work is part of a larger project on Networked Mathematics that we started describing in “Introducing the MathFoldr Project” and broadened the discussion in “The many facets of Networked Mathematics”. We hope to provide more information on the project soon.

Thursday, December 1, 2022

Networked Mathematics


The many facets of Networked Mathematics

by Valeria de PaivaMonday, 18 Apr 2022

Categories: [research]
Tags: [MathFoldr] , [AI] , [NLP]

Last year we started a discussion on the production of mathematics, the pre-industrial kind of process that we all still follow of solving problems and taking the solutions to the market of ideas (conferences, seminars, blog posts, twitter, coffee breaks in the department, etc.) as drafts/preprints/submissions and all the difficulties that this process involves. In particular we discussed the difficulties of checking whether the work had been done before or not, and the difficulties of searching for other, similar work that might help us with a given question.

As a group we settle on a path of trying to apply the new tools of NLP and knowledge representation (neural nets, word embeddings, transformers, ontologies, semantic web tools, etc.) to what might be called mathematical English. We reasoned that, yes eventually we will need to get to the semantics of equations and diagrams, but it would be useful to know how much we can learn simply from using the tools of NLP to extract mathematical concepts from abstracts of articles.

We also decided that Category Theory would be our prototype domain. First because category theory is about the organization of Mathematics in general, discovering the backbones of the subject and the hidden similarities between, in principle, unrelated sub-areas. But secondly because we are category theorists, after all. So this is the domain in which we are domain experts.

We realized early on that corpora of mathematical text were needed, and we hoped this wouldn’t be a problem. Category theory was, after all, one of the first areas of Mathematics to have a free open-source journal, namely TAC (Theory and Applications of Categories) in 1995. But we need much more data than simply one journal. While we also have both the nLab and the whole of the Wikipedia Math Portal, as beautiful mathematical open resources,  even the licensing of the arXiv is more problematic than we knew.

As we explained in the earlier blog post, we have chosen to work with a corpus of TAC abstracts and one (snapshot) of the nLab from the beginning of 2021. The TAC abstracts are clean, with not so much LaTex thrown in, and form a small corpus of 3.2K sentences. The nLab is a bit more noisy, one needs to remove the wiki markup and sections and headings get lost, a traditional nightmare of processing scientific text.

We now have spaCy processing of all this data, as well as a collection of experiments on term extraction, using different tools: TextRank, OpenTapioca, DyGIE++ and Parmenides, a special tool of our NIST collaborators. There are plenty of interesting observations to make about the processing and the comparison between these tools, and we are preparing have now a write-up about it. However, the major problem with domain specific NLP strikes again: we have no golden standard for mathematical concepts in mathematical texts, much less for category theory concepts. Our predicament is not only ours: many others have the same issue and have tried to make do with not guaranteed good results. 

Meanwhile, we have been working on the idea that formalized mathematics will eventually be the way of producing mathematics, which is a new kettle of fish altogether. To discuss the issues of formalization (somewhat an eventual goal) we have had in the Topos Colloquium talks by several people involved with automated deduction: Jeremy Avigad, Kevin Buzzard, Larry Paulson, Florian Rabe, for example. We hope to have more talks this year. Finally, we’re happy to be helping to organize a Hausdorff Institute of Mathematics trimester on “Prospects of Formal Mathematics” in 2024.

The poster for the Prospects of formal mathematics trimester program
The “Prospects of formal mathematics” trimester program will run at the Hausdorff Institute of Mathematics from the 6th of May to the 16th of August 2024.

Wednesday, November 30, 2022

Women in Logic website

 If there is anything the internet has shown me, it is that repetition is useful. More duplication of posts from the Topos Institute blog.

Women in Logic website launched

by Valeria de PaivaThursday, 17 Mar 2022

Categories: [announcement]
Tags: [logic]

World Logic Day is an international day proclaimed by UNESCO in association with the International Council for Philosophy and Human Sciences (CIPSH) in 2019. The idea is to celebrate it on the 14th of January every year.

Last year we celebrated Logic Day. Not this Logic:

Logic (the rapper)

but more like the Logic of the ideas described below:

Logic (mathematical)
Also logic.

This year we decided to celebrate Logic Day by launching the 

 Women in Logic website:

Women in Logic (WiL) is a collective of women working in logic, as well as a forum that brings together women conducting research in logic and closely related areas, with the goals of enhancing the experience of women in these communities, making their achievements known, and increasing the number of women in logic.

The WiL website
The new Women in Logic website!

The flagship activity of the collective is the Women in Logic workshop that has happened every year since 2017. This workshop is usually associated with one of the big conferences in logic for Computer Science, so either FLoC (Federated Logic conference), LiCS (Logic in Computer Science) or FSCD (Conference on Formal Structures for Computation and Deduction), so far. The workshop has happened in Reykjavík (Iceland), Oxford (UK), Vancouver (Canada), Paris (online due to covid) and Rome (online due to covid). The next workshop should be happening happened in Haifa, Israel, as part of FLoC, on July 31st, 2022.

Women are chronically underrepresented in the Logic communities in Computer Science, in Philosophy, in Cognitive Science and in Mathematics; consequently they sometimes feel both conspicuous and isolated, and hence there is a risk that the under-representation is self-perpetuating. The WiL workshops provide an opportunity for women to increase awareness of one another and one another’s work, to combat the feelings of isolation. It provides an environment where women can present to an audience comprising mostly women, replicating the experience that most men have at most CS meetings, and lowering the stress of the occasion; we hope that this will be particularly attractive to early-career women.

The workshop Women in Logic (WiL) is supported by small grants provided by ACM’s SIGLOG, the Vienna Center for Logic and Algorithms, the Institute of Logic, Language and Computation of the University of Amsterdam and the Topos Institute. Thank you friends!!


Sunday, November 27, 2022

Duplicating blog posts?


Content gets old and sometimes not accessible any more. Hence, I've decided to duplicate here my blogposts in the Topos blog. At Topos, I have to follow a style that sometimes clashes with mine, while here I can do things my own way. And this is a huge plus. 

Below you can see what was the blog post we wrote about our MathFoldr project 17 months ago. Much can happen in 17 months, but also much can not happen. Despite a very enthusiastic response from our ACT community, we were not able to convince the funding agencies that this is an important project. Thus, the work was carried on, but at a slow pace. I thought I'd update a little the original post with comments, after reproducing the original.

Introducing the MathFoldr Project

by Brendan Fong , Valeria de PaivaSunday, 11 Jul 2021

Categories: [research]
Tags: [MathFoldr] , [AI] , [NLP]

a telegram tape
How do we encode, connect, and share knowledge in the 21st Century?

At Topos we believe knowledge empowers people, and that our community’s expertise should be available to all who seek it out. This is why, for example, we broadcast most of our seminars live on YouTube, and actively support numerous open publishing projects, such as the journal Compositionality, the nLab community wiki, or simply making our books freely available online.

But simple availability is not enough. True access is more than an open door: it’s clear, legible street signs, elevators, and gently sloping on-ramps. And with modern AI and natural language processing tools, we believe it’s beyond time to build these accessibility tools for science and mathematics.

This blog post provides an overview of our nascent MathFoldr project, sharing our dreams and our approach so far—and, at the end, a way for you to help just by doing a concrete, 5 minute activity!

1How do we organise mathematics?

A cornerstone of accessibility is search, and math is not easy to search. A striking, recent example comes from Quanta Magazine, November 2019. A group of physicists discovered a useful identity relating eigenvectors and eigenvalues, and did not know if it was novel. To check, they emailed a number of mathematicians, including Fields Medallist Terence Tao. Despite believing the result was “so short and simple—it should have been in textbooks already”, Tao had not previously heard of it. This led to a paper submitted for publication and, soon after, the article in Quanta. In the weeks after the story emerged, more than three dozen previously published instances of the result were reported, dating back to 1934. How can it be that even eminent mathematicians cannot find a widely published, basic result within their field of expertise?

The simple answer is that the mathematical literature has grown far too vast for even an expert to keep track of it all. A recent analysis finds over 120,000 math papers published in 2017 alone, with this rate growing exponentially at 3% a year.

An image from 'Looking at the mathematics literature', by E. Dunne
E. Dunne, “Looking at the mathematics literature”, Notices of the American Mathematical Society, vol. 66, no. 2, pp. 227–230, 2019.

Our infrastructure for organizing and communicating these results has not kept up. The ramifications are significant: wasted search time, duplication of research, and missed connections between fields.

2The MathFoldr vision and path so far

We seek to address this with our MathFoldr project, part of our Networked Mathematics theme. MathFoldr will provide search and literature curation tools that will make mathematics more accessible, with the ultimate goal of transforming the way mathematics is created and navigated.

Today mathematics is rather artisanal: mathematicians craft pdfs of new knowledge, and share these via posting on websites and advertising them in talks. Many recent technologies, from Google, to GitHub, to materials discovered via NLP over materials science literature, show the potential for something much more efficient and effective.

Our strategy for improving this begins with improving the organization and dissemination of math via NLP-powered tools, such as search engines, knowledge graphs, and glossaries, as an entry way to shift publication practices towards ever more formal representations. So the first task is to build ontologies, and to get the community excited and involved with good UI/visualizations.

Right now, we’re doing pilot studies with two corpora, both available on our GitHub:

These pilot studies aim to create a synthesis of machine- and community-driven methods of extracting and curating an ontology of categorical concepts, which will then be maintained via WikiData. From this ontology, we will then build tools to do concept recognition and other semantic processing tasks. A prototype tool, led by Antonin Delpeuch, is nLab.OpenTapioca.

A screen capture from nLab.OpenTapioca
A screen capture from nLab.OpenTapioca. nLab.OpenTapioca takes text and annotates it by identifying categorical concepts. The concepts are WikiData entities, and each is linked to an nLab article.

To extract ontologies, we’ve been collaborating with Jacob Collard and Eswaran Subrahmanian at NIST (the US National Institute of Standards and Technology), who have built an exciting pipeline that preprocesses the text with spaCy, and then uses a root- and rule-based linguistics method (R&R) to extract concepts. You can navigate the results with their Parmesan interface:

A screen capture from Parmesan
A screen capture from Parmesan TAC, which displays subject-verb-object triples automatically extracted from the TAC corpus using the R&R method.

3What’s next?

At present, we’re thinking about how to further clean these corpora, and refine the R&R methods to extract more accurate and precise concepts.

Simultaneously, we’re also thinking about the word embedding models that are both used by these toolkits, and that we could use separately to refine search and other methods. A central problem is that mathematical text is quite different from standard newswire English, and so, as with any domain-specific text, we’re seeing a number of processing errors. Can we tune these better for mathematical text?

A visualisation of a word-embedding semantic model for category theory concepts, trained on the nLab corpus. The model extracts, from analysis of statistical patterns in text alone, the strong similarity between schemes, varieties, and manifolds.
A visualisation of a word-embedding semantic model for category theory concepts, trained on the nLab corpus. The model extracts, from analysis of statistical patterns in text alone, the strong similarity between schemes, varieties, and manifolds. (Click here for a larger version of this image.)

But ultimately, as with any data-driven enterprise, the quality of the output depends crucially on the quality of the input. And so throughout this all, we’re working to improve our corpora, to more accurately capture the expertise and intuitions of mathematicians. Here, we have a request of you: please contribute your expertise and intuitions.

More precisely, we’d love some help identifying concepts in abstracts from Theory and Applications of Categories, to produce what’s known as an “annotated corpus”, which we will share openly for NLP experiments to benefit the scientific community. To contribute, just choose your favorite TAC abstract, and click the button below!


well, I now would say

>This is why, for example, we broadcast and record most of our seminars live on YouTube.

because having the recordings, after the event, seems to me, more important than broadcasting the talks. our permantly available library of exciting talks

Tuesday, October 25, 2022

Some weeks are really hard

 Our friend Veronica left us last Tuesday. Veronica was a wonderful friend, a person one could count on, for real. One who did not complain about things in life, and it was always ready for a lunch, a book discussion, a woman's march and/or anything else she thought was important. 

You really had to know her very well to realize she was fighting a cancer for many years. She was a true engineer, very proud of her work on improving performance of systems (at Samsung), but she rarely talked to us about her work. We have a women's lunch, the BWITCH (something like `Brazilian Women in Tech' lunch), which, like so many of these meetings, sometimes increased, sometimes waned a bit, depending on how busy people were. Lately, we were in a not-so-active phase, after many zoom meetings over the pandemic. And I thought everything was getting better for Veronica's health, I really don't know where I find this misplaced optimism of mine. 

On Sunday there was a memorial for her, and many people discovered many facets of Veronica's personality that we knew nothing about, like I learned that she did martial arts! The hiker, the pottery artist, the devoted wife and mother, the woman committed to improving society I knew well, but there were many other sides that I knew nothing about. I used to tease her quite a bit about being an engineer thru and thru and in the last weeks I had been planning to talk to her about persimmons. We have a persimmon tree and we don't like the fruit, which Veronica likes, so I always write to her to come and collect some, which she organizes carefully to make sure they finish ripening at the right time, in the right way. This year the fruit is ready to be picked up much earlier than usual, and I had been meaning to write her for a while saying it, when the horrible news arrived that she was in a coma. Some days later she was gone.  I did not manage to say goodbye. So I'm saying it here: Rest in power, Veronica!

Saturday, October 22, 2022

Ada Lovelace Day 2022: Bertha Lutz

 This is a year of politics in Brazil. The election for President of the Republic is going on at the moment, we had the first turn, the second turn of voting is within a week. It occurred to me that I don't know much about the feminist movement in Brazil. I don't know enough about History, and I wish I did. So this blog post is about when voting for women started in Brazil.

The state of Rio Grande do Norte was a pioneer in providing in 1926, in its Electoral Law,  that "all citizens who meet the conditions required by law may vote and be voted on, without distinction of sex." The following year,  Celina Guimarães Viana became the first female voter in the country and, in April 1928, the first woman to vote.

In 1929, Alzira Soriano won 60% of the votes and on January 1 of the following year she was sworn in as mayor of Lajes, in Rio Grande do Norte. She was the first woman in Latin America to assume the government of a city.  It was only in 1932, during the government of Getúlio Vargas, that women gained the right to vote and were able to run for political office. In 1933, Carlota Pereira de Queirós became the first Brazilian federal deputy from São Paulo.

But the post is about Bertha Lutz because she was a scientist, a biologist.

Bertha Maria Júlia Lutz (August 2, 1894 – September 16, 1976) was a Brazilian feminist activist, biologist, educator, diplomat and politician.  She was the daughter of Adolfo Lutz, scientist and pioneer of tropical medicine. She was also one of the most significant figures of feminism and education in Brazil in the early 20th century. 

She specialized in amphibians and, in 1919, became secretary and researcher at the National Museum of Rio de Janeiro, being the second woman to be part of the public service in the country. She was later promoted to head of the Museum's Botany department, a position she held until her retirement in 1964. In August 1965, she received the title of professor emeritus at the Federal University of Rio de Janeiro (UFRJ).

 In 1922, she organized the First Feminist Congress of Brazil and represented Brazilian women at the General Assembly of the League of Women Voters, held in the United States, where she was elected vice president of the Pan-American Society of Women. After returning to Brazil, she helped found the Brazilian Federation for Feminine Progress (FBPF), of which she was president until 1942 and whose main banner was the demand for women's suffrage.

 In 1929, Bertha and other members of the FBPF created the União Universitária Feminina (Feminine University Union), which in 1961 was renamed the Brazilian Association of University Women. One of the primary goals of the organization was to encourage higher education by the female population. In 1937, the Union was formally invited to participate in the creation of the National Union of Students (UNE). The Brazilian suffragette movement had a great victory on February 24, 1932, the date on which President Getúlio Vargas, through Decree No. 21 076, installed the new Electoral Code and guaranteed the right of women to vote in the country. 

So yes, this jumble of dates and names above is just to say that women can vote and be voted in Brazil since 1932! But in Brazil the law tends to be good, but not sufficiently enforced. As the popular saying goes "leis são como vacinas, umas pegam, outras não". But the real history is always more complicated than what we hear: here (link) is an example.

And, it is true, in 2011, Brazil elected its first woman President, Dilma Roussef. But politics is still dominated by men, in absurd numbers. One of the new developments in which I am putting lots of hope in is the 'Bancada do Cocar' (the headress caucus. More about that, especially the numbers in Brazilian politics, later on.