# Probabilistic Circuits: Representations, Inference, Learning and Applications

## Abstract

## Duration

Half-day (3hrs and 45 mins plus coffee break)## Slides

## Outline

- Tractable inference (on the inherent trade-off of expressiveness and tractability)
- Probabilistic circuits (a unified framework for tractable probabilistic modeling)
- Learning circuits (learning their structure and parameters from data)
- Representations and theory (tracing the boundaries of tractability and connections to other formalisms)

### Tractable inference (40 mins)

**Summary**

In this section we motivate the need for tractable probabilistic inference, revisit past and present popular probabilistic models, and classify them w.r.t. the classes of probabilistic queries for which they guarantee tractable inference and their expressiveness.

**Outline**

We introduce probabilistic reasoning and modeling as the de facto frameworks to deal with real-world decision making scenarios where uncertainty arises at any step of the decision making process. To this end, we motivate the adoption of probabilistic generative models through examples of decision making applications in AI application domains such as robotics, computer vision and speech recognition. We argue in favor of exact inference as in many of the above scenarios approximations without guarantees would make the decision making process brittle.

We review commonly used probabilistic generative models such as Bayesian Networks, Markov Random Fields, and the more rencent Generative Adversarial Networks and Variational Autoencoders, as they enjoy a considerable amount of attention due to their expressiveness. However, their capability for performing exact probabilistic inference is limited to a small set of queries and otherwise requires resorting to approximation routines.

In parallel, we will introduce the notation required throughout the whole tutorial and classify probabilistic queries into classes sharing the same computational efforts to be answered. These query classes encompass arbitrary marginals, conditionals and MAP queries as well as more advanced ones, for example involving counting, thresholding and constraints as logical formulas.

While doing so, we will introduce tractable probabilistic models (TPMs) as representations guaranteeing exact and polytime inference for certain classes of probabilistic queries. Among the reviewed TPMs we will show treewidth-bounded probabilistic graphical models (PGMs), trees and mixtures thereof.

**Outcome**

At the end of this Section, the audience should have an understanding of the tension between tractability and expressiveness. Moreover, they should have familiarized with the two main modeling ideas: factorized and mixture models as they will be the building blocks of the next Section.

### Probabilistic circuits (PCs) (60 mins)

**Summary**

We introduce the framework of PCs by first principles, building in a bottom-up way a grammar for tractable probabilistic models. We show how structural properties cleanly map to tractable query classes and discuss PC expressiveness and applications.

**Outline**

PCs can be represented as computational graphs, out of which we will build a grammar for tractable models. We provide the production rules of this grammar, consisting of stacking factorizations (products), mixtures (sums) and input nodes (simple TPMs such as Bernoullis, Gaussians and trees).

We will define when query classes become tractable by the introduction of key structural properties (sufficient conditions) over the computational graphs of PCs. Namely, we will introduce decomposability and smoothness enabling tractable marginal inference and determinism allowing tractable MAP inference.

We discuss how the operational PC framework clearly differs from the declarative one of PGMs. At the same time we will show how PCs can be cast as deep neural networks with peculiar constraints. In the next Section it will be made clear in which ways these constraints require PCs additional "care" during learning w.r.t. classical neural networks.

**Outcome**

At the end of this section, the audience should be familiar with the syntax and semantics of PCs. Moreover, these ideas will provide them the principles behind learning the structure and parameters of PCs which will be the subject of next section.

### Learning circuits (60 mins)

**Summary**

We provide the audience the main algorithmic principles to learn PCs from data. We guide them through a high-level review of the burgeoning literature of parameter and structure learning by highlighting how the structural properties introduced in the previous section favor efficient learning.

**Outline**

In recent years, many different formalisms for TPMs have been proposed, and for each, several learning algorithms have been devised. We abstract from their different implementation and syntactical details and provide the audience with the algorithmic principles to learn TPMs as PCs from data.

We provide an introduction on parameter learning by starting from simple TPMs encoding parametric distributions such as Bernoullis and Gaussians, revising how to perform maximum likelihood estimation. We generalize this simple learning principle for the larger class of exponential families.

We discuss how to translate parameter learning for mixture models to that for PCs. We go through the explicit introduction of the mixture's latent variables in the circuit to derive the update rules of a general expectation maximization (EM) scheme for PCs. Then, we show how in presence of determinism EM update rules can be cast as closed-form updates performing maximum likelihood estimation.

We highlight several approaches to deal with structure learning: generating random or handcrafted structures as well as learning them from data. For the latter learning principle, we compare local search learning schemes when we consider circuits that are deterministic or not.

We complete the section by touching on discriminative learning approaches.

**Outcome**

By this point, we expect the audience to be familiar with the concept that structural properties enable certain tractable query classes and as such also dictate which learning scheme to adopt to induce PCs from data.

### Representations and theory (40 mins)

**Summary**

In this section we will familiarize the audience with the theoretical foundations behind tractable inference and the PC framework, while showing advanced query classes and how to formally cast the TPMs introduced so far as PCs.

**Outline**

We will zoom in on the current state-of-the-art for TPMs, disentangling and making sense of the “alphabet soup” of models (ACs, CNets, PSDDs, SPNs, etc.) that populate this landscape. We will show how these models can be represented as probabilistic circuits under the unifying framework introduced before. More specifically, we will discuss which structural properties of PCs delineate each model class and enable different kinds of tractability.

Furthermore, we will more formally discuss if the sufficient structural conditions introduced in the previous sections are also necessary conditions for tractable inference.

We will then provide a bridge between probabilistic circuits and their counterparts in propositional logics: logical circuits. Logical circuits have been extensively researched for automated reasoning and verification for enabling efficient computations over boolean formulas via analogous structural properties as those for PCs.

Inspired by the logical circuit literature, we will sketch a map of tractable representations with PCs w.r.t. different probabilistic query classes. We will then relate the model classes in the map to each other w.r.t. their expressive efficiency.

**Outcome**

At the end of this Section, we hope people in the audience will have acquired a general overview of the theoretical foundations of the PC framework, as well as of the open questions surrounding them. This bridges to the conclusion section, where we delineate some key challenges for tractable probabilistic inference.