Microsoft Researchers excel in Perplexity's self-developed benchmark: dual-model review advances to Frontier, Cowork synchronously opens long-task proxy

robot
Abstract generation in progress

CoinJie.com news: According to monitoring by 1M AI News, Microsoft has simultaneously rolled out two new Microsoft 365 Copilot capabilities via Frontier (an enterprise user pre-release try-it-first program; participants can test Copilot features that haven’t officially launched yet). Researcher (the in-Copilot deep research Agent) adds two new multi-model collaboration modes: Critique and Council. Critique involves collaboration between Anthropic and models from OpenAI: one is responsible for planning, retrieval, and drafting, while the other specializes in reviewing and refining. When Auto is selected, it’s enabled by default. Council also runs two models in parallel; each generates a complete report, and then a separate evaluation model summarizes the similarities and differences. Microsoft uses GPT-5.2 as the judging model (the strictest among the three judging methods in the original paper). It tests Critique on the DRACO benchmark (100 complex research questions released by Perplexity researchers, covering 10 fields). With an overall score, Critique performs 7.0 points higher than the best system in the benchmark, Perplexity Deep Research (using Claude Opus 4.6), representing a relative improvement of 13.88%. The original DRACO paper did not include Critique; this is data Microsoft obtained from its own testing using the same evaluation protocol. Copilot Cowork is aimed at longer, multi-step work: it first generates a plan based on the goal, then advances step by step across tools and files, showing progress along the way, and users can step in at any time. Microsoft cited Capital Group as an early trial case, saying it has been used for project planning, scheduling, creating deliverables, and preparing executive readouts.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin