## The Stacks project?

The Stacks project is an open source textbook on algebraic geometry, which “*aims to build up enough basic algebraic geometry as foundations for algebraic stacks*“, implying “*a good deal of theory on commutative algebra, schemes, varieties, algebraic spaces, has to be developed en route*“. Algebraic geometry is a branch in mathematics that studies **the geometry of zeros of a polynomial system**. Despite the complexity and central importance in mathematics, various connections between machine learning and algebraic geometry has been shown in the literature: see S. Watanabe (2009), S. Arora et al (2012), V. Chandrasekaran et al (2012), G. Blekherman et al (2013), R. Livni et al (2013), F. J. Király et al (2014), and the Wikipedia page on algebraic statistics for examples.

Hope that you are now motivated to know more about algebraic geometry through the Stacks project. Let’s first have a peek on the project from some statistics. Currently (HEAD: `a90a2ea`

, feel free to check the statistics from the newest HEAD), the project has

- 221 contributors
- 5190 pages after compilation
- 96 chapters
- 550 references
- 15457 tags (labels in the .tex files)
- 18 MB of .tex files

Apparently this is a daunting amount of texts. We need visualizations to help get ideas of the tome.

## The provided API for tags

Stacks has provided API for retrieving statements and graphs given query tags. A statement is the text content labeled with the query. A graph contains nodes and links. The nodes include the query and other related tags with meta-data, while the links are relations between the involved tags. Graphs are described by d3.js-oriented JSON format with three categories: *force-directed*, *cluster*, and *collapsible*, for each tag. The differences are:

Force-directed: if the query tag is a theorem (lemma/proposition/…, same below), the API returns a set of dependent tags of theorems as nodes, and the the cross-references in the proofs as links (the source refers the target). In other cases, a single node will be returned and the links will be empty (even there are cross references in the corresponding text).

Cluster: similar to

*force-directed*, but in the form of a tree instead of a general graph, with the query tag as the root. So nodes may be duplicated. If the query tag is not a theorem, the tree only contains the root.Collapsible: always a tree of four levels with the query tag as the root. The levels are root > chapter > section > related tags or again the query itself. If the query tag is a chapter or a section, intermediate nodes will be duplicated to reach the depth of four. Since all the tags are at the leaves, the tag dependency is less clearer than in

*cluster*.

The text above will be like buzz sound before seeing examples: see Figure 1 for some possible visualizations for tag `015I`

(*Lemma 13.21.2*), created directly from the returned JSON by the API.

(a) “Cluster” tree layout | (b) Force-direct graph. Tags are draggable. |

(c) “Collapsible” tree layout. Click the blue circles to expand or collapse the intermediate nodes. | (d) “Collapsible” force-direct graph. Tags are draggable. Click the blue rectangles to expand or collapse the intermediate nodes. |

Some visualizations in Figure 1 are interactive so you can play with them. The visualizations can also be enhanced in various ways, like showing the tag statements from the statement API when mouse over the tags, to help the users understand why the tags are connected. (**Update**: the Stacks project actually has its own implementation of the graphs which do show the statements when hovering over the tag nodes. Check the Extras part in the lemma’s explanation page, for example)

## What else to visualize?

With the graph API as building blocks, we can certainly do more. But to be motivated, we need to first ask ourselves: *What do we want to see?* Or, *what do we care about in this data?*

Among all the aspects of the book, let’s focus on one specific question in this post: *which parts of the book are important to the subject?*

There are immediately two follow-ups:

- What are the “parts”?
- How to measure the importance quantitatively?

Different answers give different insights. I will show them by different visualizations in the following sections.

### Scatter plot for the theorem importance

Let’s start with the theorems. Intuitively a theorem is important if

- it is cross-referred to by many later theorems, or
- it cross-refers to many earlier theorems.

In the first case, we can say the theorem is fundamental since it acts as a basis for others, while in the second case, the theorem may be comprehensive since it requires many preliminaries. This motivates a scatter plot of “times of being cross-referred to” (“#cross-referred”) vs. “times of cross-referring to others” (“#cross-referring”) for each theorem. Figure 2 shows the plot. The 10 theorems to the top-right of the dashed line are assumed to be important. The slope is determined in an empirical way: sort the coordinates and take the 10 largest $x$ and $y$ values, say $(x_1,\cdots,x_{10})$ and $(y_1,\cdots,y_{10})$, then let $\mathrm{slope}=-(\sum_{i=1}^{10} y_i/x_i)/10$. I move the line until there are 10 (and only 10) points to its top-right. Table 1 shows the information of the selected tags.

We can for example verify the fundamental role of `00DV`

, Nakayama lemma, by Wikipedia:

Informally, the lemma immediately gives a precise sense in which finitely generated modules over a commutative ring behave like vector spaces over a field. It is an important tool in algebraic geometry, because it allows local data on algebraic varieties, in the form of modules over local rings, to be studied pointwise as vector spaces over the residue field of the ring.

Well, as a non-expert on algebraic geometry, I can still feel how it is crucial to the subject from the wording.

We can also investigate other theorems in the table. They are mostly lists of properties of some mathematics objects or long proofs, with `01UA`

somehow as an exception. We will investigate it more later.

### Chord diagram for chapter relation

We can create the same kind of scatter plot for sections and chapters if pooling the number of cross-references at section or chapter level. But the scatter plot has at least one drawback: it does not tell where the cross-references come and go. Of course for theorems we can only go back to the full graph, which may be monstrous. However, at the level of chapter, we can resort to chord diagram to track the cross-references in the proofs. Chord diagram is a tool for visualizing a numerical square matrix with positive entries, which can encode pairwise- and self- relations between entities. In a chord diagram matrix entries are visualized as chords between two arcs on the same circle. Say we have a matrix $M=(m_{ij})_{n\times n}$, a chord diagram for $M$ can be constructed so that

- The angular span of arc $i$ is proportional to $\sum_j m_{ij}$
- The chord for $m_{ij}$ has a gradually changing width between arc $i$ and arc $j$ :

i. the width on arc $i$ is proportional to $m_{ij}/\sum_j m_{ij}$

ii. the width on arc $j$ is proportional to $m_{ij}/\sum_i m_{ij}$

See also an illustrative example made by d3.js. For the Stacks project, we can let $m_{ij}$ be the number of cross references in the proofs in chapter $i$ that refers to the theorems in chapter $j$. Additionally, chapters have been partitioned into seven “topics” in the project (*Preliminaries*, *Schemes*, etc, see stacks/stacks-project/chapters.tex), so another chord diagram for the topics is also available. Figure 3 shows a combination of the two chord diagrams.

From Figure 3, we can again see in the chapter view that *Commutative Algebra* is an important chapter as it is cross-referred to everywhere in later chapters, which coincides with Table 1. Additionally, as is hidden in the scatter plot, *Commutative Algebra* is more influential on *Morphisms of Schemes*, *Varieties*, and *Dualizing Complexes* than most of the others (obviously related chapters like *More on Algegbra* is ignored). For the topic view, we can see for example that as the name suggests, *Topics in Scheme Theory* is an extension of *Schemes*, and does not have much to do with the subsequent chapters. The proofs cross-referred to in later chapters are mostly from topic *Schemes*. This suggests a tree-like structure in the content dependency.

As a case study, we can go back and try to find out why `01UA`

(*Lemma 28.24.9*) is so popular, as it is not a list of properties nor a long proof. We start from its chapter *Morphisms of Schemes*. If following the chords we will see the chapter is mostly cross-referred by *More on Morphisms* and *Morphisms of Algebraic Spaces*, while the theorems in *Morphisms of Algebraic Spaces* is majorly cross-referred by *More on Morphisms of Spaces* and *Morphisms of Algebraic Stacks*. So I will guess `01UA`

can be found in these four morphisms-related chapters. Actually this can be verified in Figure 4, a tree layout for the theorems that cross-refers `01UA`

, including the chapters they are in.

## Why the visualizations?

We have created some visualizations to answer a version of the original question “which theorems/chapters of the book are important to the subject?” *without* any knowledge in algebraic geometry. But we have a missing tile – why are the visualizations like this in the sense of mathematics? The absence of the answer will make the figures less helpful for the practitioners, since the figures don’t help them much to *understand* the subject from the mathematics side. For example, we don’t know why Nakayama’s lemma is so important before looking into Wikipedia. And we don’t understand why *Lemma 28.24.9* does not in the chapter *Descent and Algebraic Spaces*, though it appears in both *Descent* and *Algebraic Spaces*. The concerns here, are

- What mathematics notions are handled in each chapter (section/theorem/…)?
- How are the notions related by theorems?
- Ultimately, which notions are important to the subjects? And why?

Let’s try to answer the questions by further mining the texts and the tag graph of Stacks.

(to be continued)