You're offline - Playing from downloaded podcasts
Back to All Episodes
Podcast Episode

Perplexity AI Upgrades Deep Research with Claude Opus 4.5, Claims State-of-the-Art Performance

February 5, 2026

Audio archived. Episodes older than 60 days are removed to save server storage. Story details remain below.

Perplexity has upgraded its Deep Research tool to run on Anthropic's Claude Opus 4.5 model, achieving top performance on a new open-source benchmark called DRACO. The upgrade is available to Max subscribers immediately and will roll out to Pro users soon.

Major Upgrade Brings Advanced Reasoning to AI Search

Perplexity has announced a significant upgrade to its Deep Research tool, now powered by Anthropic's Claude Opus 4.5 model. The upgrade combines advanced reasoning capabilities with Perplexity's proprietary search engine and sandbox infrastructure, positioning it as a leading research tool in the AI search space.

New Benchmark Shows Perplexity Leading the Pack

Alongside the upgrade, Perplexity has released DRACO, a new open-source benchmark designed to evaluate deep research agents based on real-world usage patterns rather than isolated skills. The Deep Research Accuracy, Completeness, and Objectivity benchmark includes one hundred tasks across ten domains: Academic, Finance, Law, Medicine, Technology, General Knowledge, UX Design, Personal Assistant, Shopping, and Needle in a Haystack.

According to Perplexity's testing, their upgraded Deep Research tool achieved a normalised score of sixty-seven point one five percent, compared to fifty-eight point nine seven percent for Google Gemini Deep Research and fifty-two point zero six percent for OpenAI's o3 model. The company reported that rankings remained consistent across different judge models.

Strong Performance in Professional Domains

The largest performance gaps appeared in Medicine, General Knowledge, and Technology domains, where Perplexity outperformed the second-best system by nine to twelve percentage points. The company's highest absolute performance came in Law at eighty-six percent and Academic at eighty point two percent.

Built for Real-World Research

Unlike traditional benchmarks that test isolated skills like fact retrieval, DRACO was constructed from anonymised Perplexity Deep Research requests and designed to create complex, open-ended tasks that mirror actual research needs. The benchmark also measures efficiency, with Perplexity achieving the lowest average latency at just under eight minutes while maintaining top accuracy scores.

The DRACO benchmark, rubrics, and methodology have been made fully open-source and are available on Hugging Face. The upgrade is available immediately for Max subscribers and will roll out to Pro users in the coming days.

Published February 5, 2026 at 10:15am

More Recent Episodes