JetBrains’ open-source push this time is genuinely solid—its 12B model only activates 2.5B, with costs tightly controlled. The MTP draft acceleration is also kind of interesting.

View Original
CoinNetwork
JetBrains Open Source Mellum-2 Code Model: Built-in MTP Header Support for Speculative Decoding Acceleration
JetBrains open-source Mellum-2, a 12B parameter code-mixed expert model. To control inference costs, only about 2.5 billion parameters are activated per token, with weights already released on Hugging Face, Apache 2.0. A new multi-token prediction (MTP) module has been added, where the MTP head acts as a draft model to accelerate sampling during inference. Provides three versions: basic, dialogue, and thinking; the thinking version can display explicit reasoning chains before output. Benchmarks: Humaneval 41.46%, MMLU 70.87%.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned