Renato Lui Geh

I am a Computer Science PhD student at the University of California, Los Angeles. I'm fortunate to be advised by Prof. Guy Van den Broeck and be a part of StarAI Lab.

I'm interested in the intersection of probabilistic machine learning and symbolic reasoning. Some specific interests of mine include (but are not limited to): tractable probabilistic models, logic and probabilistic circuits, knowledge representation and compilation, logic and probabilistic programming (and their intersection), weighted model counting, weighted model integration, and probabilistic graphical models.

In summary, I'm interested in studying combinatorial problems in a probabilistic setting.

Previously, I received my MSc and BSc in Computer Science under the supervision of Prof. Denis Deratani Mauá at the University of São Paulo in my home country Brazil 🇧🇷.

You can reach me at renatolg@cs.cs.ucla.ucla.edu.

Publications

Adversarial Tokenization
Renato Lui Geh*, Zilei Shao*, Guy Van den Broeck.
arXiv 2025.
[abs] [www] [code]
tl;dr
We show a previously unknown vulnerability of LLMs in addressing tokenization attacks whereby simply retokenizing an unsafe request in a different way from the standard elicits dangerous responses in state-of-the-art LLMs.
Where is the signal in tokenization space?
Renato Lui Geh, Honghua Zhang, Kareem Ahmed, Benjie Wang, Guy Van den Broeck.
EMNLP 2024.
Oral presentation.
[abs] [poster] [video] (starts at 01:12:30)
tl;dr
We show that for a given string, there are an exponential number of tokenizations different from the default tokenization, with some of them retaining quite a bit of semantic signal. Some of these tokenizations are given low probability, and others are even more likely than the default tokenization according to the LLM distribution. By aggregating many tokenizations besides the default one, we show that (1) we can formalize this problem as a neurosymbolic task and (2) we can achieve considerable improvement across multiple models and QA datasets. We prove that computing the most likely tokenization or the marginal probability of a string over all of its tokenizations is NP-hard.
dPASP: A Probabilistic Logic Programming Environment For Neurosymbolic Learning and Reasoning
Renato Lui Geh, Jonas L. Gonçalves, Igor Cataneo Silveira, Denis D. Mauá, Fabio Cozman.
KR 2024.
[abs]
tl;dr
We implement a new neurosymbolic probabilistic logic programming framework for a wide selection of different semantics, including the stable semantics for non-stratified programs, least-undefined stable semantics for allowing inconsistencies/contradictions in the knowledge base, the credal semantics for imprecise probabilities. Our framework features exact inference by enumeration, approximate inference by optimality, learning from missing data, and an easy-to-use interface for embedding neural components and differentiable functions into the program. We experimentally show that our careful C implementation is orders of magnitude faster compared to competitors in the MNIST Add benchmark.
LUNCH: an Answer Set Programming System for Course Scheduling
Ana Y. F. de Lima, Briza M. D. de Sousa, Daniel P. Cardeal, Jessica Y. N. Sato, Lorenzo B. Salvador, Renato L. Geh, Bruna Bazaluk.
ENIAC 2023.
[doi]
tl;dr
We implement a flexible course scheduling system based on logic programming, where the user can explicitly define their scheduling constraints as a logic program.
Scalable Learning of Probabilistic Circuits
Renato Lui Geh (advisor: Denis D. Mauá).
MSc thesis.
Best MSc in AI at CTDIAC@BRACIS 2022!
Best MSc in Computing at CTD@CSBC 2023!
[slides]
tl;dr
We gently introduce probabilistic circuits (PCs) as a tool for probabilistic modeling. We then provide a thorough survey on existing structure learning algorithms for PCs. We then propose two new structure learning algorithms for scalably learning PCs: one from just data, and the other from both data and a logic formula.
Fast And Accurate Learning of Probabilistic Circuits by Random Projections
Renato L. Geh, Denis D. Mauá.
TPM Workshop 2021.
[talk] [slides] [poster]
tl;dr
A fast and simple structure learning algorithm for learning probabilistic circuits by random projections.
Learning Probabilistic Sentential Decision Diagrams Under Logic Constraints by Sampling and Averaging"
Renato L. Geh, Denis D. Mauá.
UAI 2021.
[pmlr] [talk] [slides] [poster]
tl;dr
We propose a new structure learning algorithm for a particular class of probabilistic circuits that allows for learning from both data and a logic formula.
Learning Probabilistic Sentential Decision Diagrams by Sampling
Renato Geh, Denis Mauá, Alessandro Antonucci.
KDMiLe 2020.
[doi] [talk] [slides]
tl;dr
Preliminary results on structure learning for probabilistic sentential decision diagrams.
End-To-End Imitation Learning of Lane Following Policies Using Sum-Product Networks
Renato L. Geh, Denis D. Mauá.
ENIAC 2019.
[doi] [poster]
tl;dr
An undergrad project paper where we use probabilistic circuits as a self-driving policy under a mobile robot compute-constrained setting.