Publications

When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search

Recent studies developed jailbreaking attacks, which construct jailbreaking prompts to fool LLMs into responding to harmful questions. …

Xuan Chen, Yuzhou Nie, Wenbo Guo, Xiangyu Zhang

When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search

ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

Backdoor attacks have emerged as a prominent threat to natural language processing (NLP) models, where the presence of specific …

Lu Yan, Zhuo Zhang, Guanhong Tao, Kaiyuan Zhang, Xuan Chen, Guangyu Shen, Xiangyu Zhang

ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning

Backdoor attacks pose a severe threat to the supply chain management of deep reinforcement learning (DRL) policies. Despite initial …

Xuan Chen, Wenbo Guo, Guanhong Tao, Xiangyu Zhang, Dawn Song

BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning

Towards Behavior-Level Explanation for Deep Reinforcement Learning

While Deep Neural Networks (DNNs) are becoming the state-of-the-art for many tasks including reinforcement learning (RL), they are …

Xuan Chen, Zifan Wang, Yucai Fan, Bonan Jin, Pitor Mardziel, Carlee Joe-Wong, Anupam Datta

Preprint Project