The Agentic Wire
Archive

Search

Every story we've curated, in one place. Type a phrase, a tool name, or a researcher. Quotes match phrases; a leading - excludes.

20 matches for world models
  1. Physical

    Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

    arXiv:2605.00412v1 Announce Type: new Abstract: World models have recently re-emerged as a central paradigm for embodied intelligence, robotics, autonomous driving, and model-based reinforcement learning. However,…

  2. Physical

    Latent State Design for World Models under Sufficiency Constraints

    This paper reframes world models as a latent-state design problem under sufficiency constraints, organizing methods by what the state is meant to preserve and support. That lens should help robotics and physical-AI…

  3. Agentic

    Medicare's new payment model is built for AI, and most of the tech world has no idea

    Medicare’s new ACCESS payment model creates a reimbursement path for AI agents that monitor patients between visits and coordinate follow-up care. For builders, the significance is less the policy headline than the fact…

  4. GenAI

    MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes

    arXiv:2605.06897v1 Announce Type: new Abstract: The rise of Internet of Things (IoT) devices in the physical world necessitates voice-based interfaces capable of handling complex user experiences. While modern Large…

  5. Industry

    ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?

    arXiv:2605.00468v1 Announce Type: new Abstract: Plain Language Summaries (PLS) aim to make research accessible to lay readers, but they are typically written in a one-size-fits-all style that ignores differences in…

  6. Industry

    Impact of Task Phrasing on Presumptions in Large Language Models

    arXiv:2605.00436v1 Announce Type: new Abstract: Concerns with the safety and reliability of applying large-language models (LLMs) in unpredictable real-world applications motivate this study, which examines how task…

  7. GenAI

    Benchmarking POS Tagging for the Tajik Language: A Comparative Study of Neural Architectures on the TajPersParallel Corpus

    arXiv:2605.04576v1 Announce Type: new Abstract: This paper presents the first benchmark for the task of automatic part-of-speech (POS) tagging for the Tajik language. Despite the existence of multilingual language…

  8. GenAI

    FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

    arXiv:2605.00706v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly applied in financial scenarios. However, they may produce harmful outputs, including facilitating illegal activities or…

  9. GenAI

    ReplaySCM: A Benchmark for Executable Causal Mechanism Induction from Interventions

    arXiv:2605.08197v1 Announce Type: new Abstract: Most causal benchmarks for language models score local answers or graph structure. We introduce ReplaySCM, a 1,300 item benchmark for executable causal mechanism induction…

  10. Industry

    FedACT: Concurrent Federated Intelligence across Heterogeneous Data Sources

    arXiv:2605.00011v1 Announce Type: cross Abstract: Federated Learning (FL) enables collaborative intelligence across decentralized data source devices in a privacy-preserving way. While substantial research attention has…

  11. GenAI

    Reasoning Compression with Mixed-Policy Distillation

    arXiv:2605.08776v1 Announce Type: new Abstract: Reasoning-centric large language models (LLMs) achieve strong performance by generating intermediate reasoning trajectories, but often incur excessive token usage and high…

  12. Industry

    Microsoft World-R1 for 3D-Consistent Video Generation (4 minute read)

    World-R1 applies reinforcement learning to video generation using 3D and vision-language feedback, aiming to improve spatial consistency without changing the base model architecture. It’s a useful signal for teams…

  13. edge ai

    What's New This Month: Spring 2026 Edition

    This event is designed for engineers and developers who want to go deep into the tools, architecture, and MLOps workflows that can help you take your ideas and turn them into real-world solutions. Don't miss these…

  14. GenAI

    PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat

    arXiv:2605.07201v1 Announce Type: new Abstract: This paper describes our system for the EEUCA 2026 Shared Task on Understanding Toxic Behavior in Gaming Communities. The task involves classifying World of Tanks chat…

  15. Infra

    FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis

    arXiv:2605.05499v1 Announce Type: new Abstract: The widespread adoption of camera-equipped mobile devices and wearables has enabled convenient capture of meal images, making food recognition a key component for real…

  16. Agentic

    MIND-Skill: Quality-Guaranteed Skill Generation via Multi-Agent Induction and Deduction

    arXiv:2605.08670v1 Announce Type: new Abstract: Large language model (LLM) powered AI agents have emerged as a promising paradigm for autonomous problem-solving, yet they continue to struggle with complex, multi-step…

  17. Agentic

    Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling

    This paper compares common test-time scaling strategies for language models through a compute-efficiency lens, including self-consistency, self-refinement, multi-agent debate, and mixture-of-agents. It is useful for…

  18. GenAI

    GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models

    GR-Ben introduces a benchmark for evaluating process reward models beyond math-heavy tasks, targeting general reasoning and decision-making failures in LLM intermediate steps. It matters for teams building test-time…

  19. GenAI

    NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models

    arXiv:2605.07051v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and mathematics…

  20. GenAI

    A Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networks

    arXiv:2605.05476v1 Announce Type: new Abstract: Knowledge graphs automatically constructed from text are increasingly used in real-world applications. However, their inherent noise, fragmentation, and semantic…