Microsoft Turns Copilot Into a Multi-Model Research Assistant With Smarter Critique
Microsoft has rolled out a major upgrade to its Copilot research assistant, aiming to make AI responses more dependable by using more than one model at a time. The change is designed to improve output quality, reduce mistakes, and speed up how users build research and reports—an area where many people still worry about AI hallucinations and inconsistent results.
The update centres on a new capability inside Copilot’s “Researcher” agent, which Microsoft is positioning as a more reliable partner for everyday work. Instead of leaning on a single AI engine for each response, Copilot will now coordinate multiple models for the same task.
Microsoft Copilot research now “critiques” across models
Dubbed “Critique”, the feature lets Copilot’s Researcher pull responses from both OpenAI’s GPT and Anthropic’s Claude. In practice, GPT will be used to generate a first draft or raw answer. Then, Claude steps in to evaluate that output—checking it for accuracy, clarity, and overall quality—before the user sees the final result.
Microsoft says this workflow is not just about getting different versions of the same answer. It is about combining strengths: GPT for drafting and Claude for review. That matters because users often don’t just want fast responses—they want information they can trust, particularly when they’re researching complex topics or preparing documents for work.
Nicole Herskowitz, corporate vice president of Microsoft 365 and Copilot, described the multi-vendor approach as especially valuable. She said having access to different models from different providers is attractive, but Microsoft’s key goal is going further—so customers actually benefit from models working together rather than switching between them manually.
Herskowitz also tied the upgrade to practical outcomes for users. Microsoft expects the multi-model method to help move workflows faster, improve output reliability, and better manage AI hallucinations. Hallucinations—when a system confidently produces incorrect or fabricated information—are still a core concern for businesses and professionals trying to use AI responsibly.
Another important piece is that Microsoft plans to make the process more interactive over time. The company said it expects the workflow to become bi-directional in the future. That would mean GPT could also review Claude’s drafts, creating a back-and-forth checking loop rather than a one-way “draft then review” process.
So, the pitch is clear: fewer unverified answers, less rework for users, and a research assistant that behaves more like a team of specialists than a single chatbot.
In addition to Critique, Microsoft is pushing another update meant for transparency and user control—features the company knows matter as more organisations evaluate AI tools for productivity and governance.
“Council” lets users compare AI responses side-by-side
Microsoft is also introducing “Council”, a feature that allows users to compare responses from different AI models side-by-side. Rather than hiding model differences behind a single final answer, Council is meant to make it easier for users to judge which response is stronger, more accurate, or better suited to their needs.
This move aligns with how many teams are currently rolling out AI internally. Professionals want to see how the system arrived at its output, and they want an easy way to compare alternatives—especially when the stakes are higher, such as policy writing, customer-facing drafts, or competitive research.
The upgrades arrive as Microsoft expands availability of its agentic AI tools. The company is making more of its Copilot Cowork functionality available through its “Frontier” programme, which gives members earlier access to experimental or newly released Copilot features.
Copilot Cowork itself builds on a viral concept in the AI world: autonomous or semi-autonomous AI agents that can take action across tasks. Microsoft launched Cowork earlier this month in testing mode, tying it to Anthropic’s Claude Cowork and reflecting growing demand for AI systems that do more than chat—they help drive work forward.
The timing also underscores the competitive pressure Microsoft faces. Tech giants and AI platforms are racing to improve assistants, with Google’s Gemini and other autonomous agent tools increasingly competing for attention. As companies look for AI that can genuinely assist with tasks, Microsoft is clearly trying to differentiate its Copilot experience with multi-model workflows and agent-like behaviour.
Meanwhile, Microsoft’s market performance is sending mixed signals. Shares rose by about 1% on Monday, but the stock remains on track for its weakest quarter since the 2008 global financial crisis. That decline—roughly 25%—reflects investor concerns that AI excitement is cooling even as major firms keep announcing new tools.
Even so, Microsoft’s strategy is consistent: keep improving Copilot in ways that are visible to users. Multi-model generation, model-to-model review, and easier comparison tools like Council all point to one message—Copilot should feel more accurate, more helpful, and less risky to depend on.
Microsoft’s latest Copilot upgrades suggest the next phase of assistant AI is less about producing one perfect answer and more about combining different models to validate and refine outputs—a shift that could make AI research tools far more practical for everyday work, especially in environments where reliability matters.