Foundamental Knowledge

less than 1 minute read

Published: October 09, 2025

PPL(困惑度)是衡量语言模型性能的一个核心指标，可以理解为模型在预测下一个词的时候的等概率词汇表。困惑度低 → 模型对下一个词“很有把握” PPL是一个 >=1 的实数，数值越小越好 PPL=1 → 模型100%知道下一个词选啥(理想情况) PPL=10→模型在 10 个词之间摇摆不定比如 GPT-4 的 PPL≈10.2，虽然不是特别低，但在语言模型中已属不错，因为自然语言本身就是复杂且多样的。

核采样 (nucleus sampling) 只需要概率从大到小先进行排序，然后累加，超过0.9的话

生成文本的评估指标 diversity of generated text(多样性) fluency(流利度)

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

(COLM 2024) DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion

less than 1 minute read

Published: September 28, 2025

Advantages:

more lightweight than the baselines

using paired samples for generating steering vectors

First, we propose a novel approach that detoxifies LMs via representation engineering in activation spaces. It surpasses the previous SOTA methods in both detoxification performance and maintenance of generation quality with lower computational demands and acceptable inference time. 首先，我们提出了一种新颖的方法，通过激活空间中的表示工程来解毒LMs。它以较低的计算需求和可接受的推理时间，在解毒性能和保持生成质量方面都超过了之前的SOTA方法

wardell-H

Foundamental Knowledge

Share on

You May Also Enjoy

Jailbreak attack

Analysis of the Apple Ecosystem

Only Apple can Do?

硬件分析

USB Type-C

(COLM 2024) DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion

using paired samples for generating steering vectors