Plurai: 3 billion parameter small model surpasses 20 billion specialized guardrail model

robot
Abstract generation in progress

CryptoWorld News reports that the barred framework proposed by Plurai automatically generates synthetic training data by providing task descriptions and a small number of unlabeled samples, training a customized content guardrail to review whether AI outputs violate regulations. The qwen2.5-3b fine-tuned with this dataset (30亿 parameters) comprehensively outperforms OpenAI’s oss-safeguard-20b (200亿 parameters) on tasks including dialogue strategy, agent output validation, and medical compliance, and it also exceeds GPT-4.1 when used directly. The framework splits tasks into multiple dimensions and specifically generates samples in the boundary zones that are prone to being misclassified. After generation, the samples must go through an “asymmetric debate” process to ensure the accuracy of sample labels. The evaluation code and datasets have been open-sourced on GitHub and Hugging Face.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments