The time-tone color decoupling design is quite interesting. Finally, you don't have to listen to the monotonous canned sounds of AI voiceovers. Looking forward to trying it out.

View Original
CoinNetwork
Crypto界网消息,小米大模型应用团队发布并开源视频音效生成框架ControlFoley。该模型的重点是“可控性”,能够根据画面配音,也能接受文字描述或参考音频,让声音按创作者意图生成。ControlFoley采用基于cav-mae改造的时空音视频编码器,并引入“时间-音色解耦”策略,确保声音与画面同步。该模型在多个常规视频配音测试中达到开源SOTA水平,项目的技术报告、代码、模型权重和演示均已开放。
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments