Xuan Chen
Xuan Chen
Home
Experience
Publications
Light
Dark
Automatic
Publications
Type
Conference paper
Date
2024
2023
2020
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
Recent studies developed jailbreaking attacks, which construct jailbreaking prompts to fool LLMs into responding to harmful questions. …
Xuan Chen
,
Yuzhou Nie
,
Wenbo Guo
,
Xiangyu Zhang
PDF
Code
ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP
Backdoor attacks have emerged as a prominent threat to natural language processing (NLP) models, where the presence of specific …
Lu Yan
,
Zhuo Zhang
,
Guanhong Tao
,
Kaiyuan Zhang
,
Xuan Chen
,
Guangyu Shen
,
Xiangyu Zhang
PDF
Code
BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning
Backdoor attacks pose a severe threat to the supply chain management of deep reinforcement learning (DRL) policies. Despite initial …
Xuan Chen
,
Wenbo Guo
,
Guanhong Tao
,
Xiangyu Zhang
,
Dawn Song
PDF
Code
Towards Behavior-Level Explanation for Deep Reinforcement Learning
While Deep Neural Networks (DNNs) are becoming the state-of-the-art for many tasks including reinforcement learning (RL), they are …
Xuan Chen
,
Zifan Wang
,
Yucai Fan
,
Bonan Jin
,
Pitor Mardziel
,
Carlee Joe-Wong
,
Anupam Datta
Preprint
Project
Cite
×