⚡️ Microsoft tests its AI graders

ADVERTISE | PODCASTS | EXECUTIVE’S PASS | B2B TRAINING

Hey

Welcome to The AI Report — the #1 AI newsletter for business leaders.

You don’t need to be technical. Just informed.

• 1. 🤨 Run open-source LLMs in real production with Nebius
• 2. ⚡️ Microsoft tests its AI graders
• 3. 💼 Your Business Briefing
• 4. 📈 How Dan created a 6-figure business with Innovating with AI
• 5. ✍️ Today’s Policy Corner
• 6. 🗞️ The News Bulletin

Are you ready to launch AI transformation for your company?

Get Started

TOGETHER WITH NEBIUS

Run open-source LLMs in real production

Capture live traffic, fine-tune and optimize, then deploy your own checkpoints to dedicated GPU endpoints. Choose hardware, set scaling limits, and select region. Stable latency, predictable cost, clear data residency.

From LLM to production system, in one platform

Latest in AI

Microsoft tests its AI graders

🚨 Our Report

Microsoft has published a deep look at how its Copilot Studio team validates AI evaluation systems, addressing a growing enterprise concern: if you're using AI to grade your AI agents, how do you know the grader is right? The answer, according to Microsoft's data science team, involves treating evaluators like any other model that needs testing.

🔓 Key Points

The team uses controlled synthetic datasets with intentional degradations to test whether graders catch real problems. By introducing known flaws into high-quality responses, they can measure grader accuracy against ground truth.
Microsoft tracks two core metrics for grader quality: true positive rate and true negative rate, measuring how often graders catch real issues versus how often they avoid false alarms. Both must be high for evaluation to be trustworthy.
Generated datasets play a larger role than production data in most evaluation workflows, partly due to compliance restrictions but also because they let teams evaluate agents before exposing them to real users.

🔐 Relevance

As enterprises move from experimental AI deployments to production-scale agents, evaluation becomes a foundational capability. Teams can't just trust their agents; they need to trust the systems measuring those agents. Microsoft's framework offers a template for organizations building internal evaluation pipelines.

FULL STORY

The AI Report Podcast

Microsoft and Mayo Clinic Build an AI for Healthcare + Europe Pushes Back on US Tech | AI News in 5

WATCH NOW

THE BUSINESS BRIEFING: TALENT ACQUISITION (powered by Upscaile)

Emirates NBD (UAE's largest bank, $170B in assets) was burning 8,000 recruiter hours per year on manual screening for high-volume roles. Call center positions alone drew 10,000 applicants each. Phone screens were inconsistent, time-to-offer sat at 45 days, and top candidates accepted competing offers while waiting for feedback. They deployed AI-powered video assessments integrated directly into Oracle ATS in January 2025.

Tool used: AI video assessment platform — asynchronous, skills-based candidate evaluation with automated scoring against competency models.

Result: Time-to-offer dropped from 45 days to 9 days (80% reduction). $400,000 saved in 11 months. 8,000 recruiter hours reclaimed. Quality of hire up 20%+. Candidate NPS improved by over 100%.

The lesson: The win came from eliminating the queue before human review, not from automating the final hiring decision. Recruiters still made offers. AI scored competencies (communication clarity, problem-solving) the moment candidates applied, producing ranked shortlists before recruiters logged in.

Steal this: Audit where your candidates sit longest between application and first human contact. If it's initial screening, test AI-scored video assessments for your next high-volume role. Pick one competency model, integrate with your ATS, and measure days-to-shortlist before and after.

FIND OUT MORE

TOGETHER WITH INNOVATING WITH AI

How Dan Built a 6-Figure AI Consultancy and Quit His 9-to-5

Dan had no tech background and no business experience – just a love for AI and a hunch it could become something more.

Through The AI Consultancy Project, he landed his first clients, found a niche, and built a real business. This case study breaks down his journey – the early stumbles, the system that worked, and how he made the leap.

Read the Full Case Study — Free →

THE POLICY CORNER

New York mandates AI disclosure labels for ads—first state enforcement begins now.

Any advertiser creating film, TV, or digital ads with AI-generated synthetic performers for New York audiences must clearly disclose when a person on screen isn't real. Gov. Hochul signed Senate Bill S8420A in December 2025, and enforcement began June 10, 2026. Applies to all ad producers and creators, regardless of company location, if the ad runs in New York.

Deadline: In effect now. Violations open producers to legal challenges and consumer protection enforcement.

Risk: Beyond state penalties, failure to disclose synthetic performers exposes brands to false advertising claims, consumer lawsuits, and reputational damage as viewers increasingly question what's real.

Your move: Audit all active campaigns running in New York. If using AI-generated performers, add disclosure text or labels this week. Check with your creative and legal teams to ensure compliance with S8420A disclosure requirements.

FIND OUT MORE

AI News

🛡️ Anthropic apologizes for hidden Claude Fable guardrails: Company reverses course on covert distillation blocks after backlash, will now visibly route flagged queries through Opus 4.8 instead of silently degrading outputs. FULL STORY
🇹🇼 Taiwan considers criminalizing AI chip smuggling to China: Government weighs sweeping export controls beyond existing US blacklists as part of trade negotiations, risking backlash from Beijing while appeasing Washington. FULL STORY
⚡ Context compression breakthrough cuts LLM input 16x without accuracy hit: NYU-Columbia research team open-sources Latent Context Language Models achieving 8.8x faster inference on long contexts, solving production bottleneck standard KV cache methods can't touch. FULL STORY
💾 Xiaomi open-sources MiMo Code with persistent memory: AI coding agent maintains context across extended dev sessions through background subagent and automated weekly memory compression, outperforming Claude Code on SWE-Bench Pro. FULL STORY

Trending AI Tools (Sponsored by the AI Executive’s Pass)

A curated look at the AI tools quietly transforming how teams work.

Gamma* — Create stunning presentations and websites from a single prompt, no design skills required
AlterMe is a DNA-based fitness system with personalized coaching and wearables
Tanka is an AI messaging platform with long-term memory for team collaboration

^{*indicates a sponsored tool, if any}

⚡️ Why pay for 4–6 separate AI tools at full price when the AI Executive’s Pass fixes that?

Get the pass here →

The Money

Capital rushes to AI infrastructure layer Investors bet AI's bottleneck won't be models — it'll be power, compute infrastructure, and manufacturing pipelines. Two deals this week signal capital chasing capacity over capability.

Deals to know:

KKR/Helix Digital Infrastructure (Launch, $10B) -- KKR partners with Nvidia, Vistra, and Kuwait Investment Authority to build AI data center infrastructure. First deployment targets U.S. power grid constraints. Investors: KKR, Nvidia, Kuwait Investment Authority, Vistra
Prometheus (Series B, $12B at $41B valuation) -- Bezos-backed "artificial general engineer" building AI tools to compress design-to-manufacturing timelines for physical goods. 150 employees across San Francisco, London, Zurich. Investors: JPMorgan, BlackRock, Goldman Sachs, DST Global, Arch Venture Partners

Signal: Smart money moves downstream. When JPMorgan, BlackRock, and sovereign wealth funds back infrastructure over frontier models, they're pricing in margin compression at the application layer and betting on picks-and-shovels plays through 2028.

Thoughts on today's edition?

Hit me up on LinkedIn, I read every message.

Refer a Friend

Latest episode: What AI Will Never Replace at Work → Listen here
Want to reach 400k+ decision-makers? → Sponsor us

Until next time, Arturo and Liam.

P.S. Unsubscribe if you don’t want us in your inbox anymore.

Past newsletters

Browse All

Hey