CEVA: LLM Safety Evaluation and Defense Framework

CEVA Lab
LLM Security Research 2026

*Equal Contribution

Abstract

When deployed in open-ended scenarios, large language models frequently face security threats such as jailbreak attacks, prompt injection, tool-chain abuse and privacy leakage. CEVA provides an end-to-end LLM safety evaluation methodology: first, we construct attack benchmarks and automated test scripts covering multiple risk dimensions; second, we introduce a hierarchical risk scoring system to quantify model robustness across tasks; finally, we propose a joint defense baseline spanning input filtering, inference-time guardrails and output auditing. Experiments show that our approach reliably reproduces high-risk cases and significantly reduces harmful response rates and sensitive information exposure.

If you find our work helpful, please consider citing:

BibTeX

@article{ceva2026,
      title={CEVA: Comprehensive Evaluation for LLM Safety},
      author={Huang and CEVA Team and AI Security Group},
      year={2026},
      url={https:://arxiv.org/abs/}
}