Weekly

This week

May 6–13, 2026 · 30 stories

GenAI

Variational Linear Attention: Stable Associative Memory for Long-Context Transformers
This paper proposes Variational Linear Attention, an online least-squares formulation that stabilizes linear attention memory with an adaptive penalty matrix. It targets a core bottleneck in long-context transformers:…
cs.LG updates on arXiv.org
May 13
Score 9.2
Trajectory Models for Few-Step Diffusion (22 minute read)
This paper replaces standard diffusion denoising with conditional normalizing flows to get four-step image generation without giving up exact likelihood training. The self-distillation angle makes it especially relevant…
TLDR AI Feed
May 12
Score 9.2
The Sequence Knowledge #858: How State Space Models Went from Curiosity to Serious Transformer Competitor
State space models are moving from a niche alternative to a credible transformer competitor, with tradeoffs that matter for long-context efficiency and scaling. The piece is a useful snapshot of where SSMs fit, and…
TheSequence
May 12
Score 8.8
Backbone-Equated Diffusion OOD via Sparse Internal Snapshots
This paper tightens the evaluation of diffusion-based OOD detectors by controlling for backbone choice and test-time budget, then proposes sparse internal feature snapshots as a fairer detector family. It matters most…
cs.LG updates on arXiv.org
May 13
Score 8.2
Foundation Model Scaling (34 minute read)
AWS outlines how foundation model scaling is moving past pre-training into post-training and test-time compute, with infrastructure implications for each stage. The piece is useful for engineers tracking where compute,…
TLDR AI Feed
May 12
Score 7.8
ASD-Bench: A Four-Axis Comprehensive Benchmark of AI Models for Autism Spectrum Disorder
ASD-Bench introduces a four-axis benchmark for autism spectrum disorder screening across children, adolescents, and adults. It gives researchers a more structured way to compare classical ML, deep learning, and…
cs.LG updates on arXiv.org
May 13
Score 7.6
Oversmoothing as Representation Degeneracy in Neural Sheaf Diffusion
This paper reframes oversmoothing in neural sheaf diffusion as a representation-degeneracy problem and brings quiver/sheaf theory to bear on the dynamics. It is mathematically rich, but the practical payoff for GenAI…
cs.LG updates on arXiv.org
May 13
Score 7.3
MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media
arXiv:2605.06940v1 Announce Type: new Abstract: Annotation automation via Large Language Models (LLMs) is the core approach for scaling NLP datasets; however, LLM behavior with respect to closed-set instructions in…
cs.CL updates on arXiv.org
May 12
Score 7.0
IntentGrasp: A Comprehensive Benchmark for Intent Understanding
arXiv:2605.06832v1 Announce Type: new Abstract: Accurately understanding the intent behind speech, conversation, and writing is crucial to the development of helpful Large Language Model (LLM) assistants. This paper…
cs.CL updates on arXiv.org
May 12
Score 7.0
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria
arXiv:2605.08354v1 Announce Type: new Abstract: Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment.…
cs.AI updates on arXiv.org
May 12
Score 7.0
DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules
arXiv:2605.08614v1 Announce Type: new Abstract: Monitoring complex industrial assets relies on engineer-authored symbolic rules that trigger based on sensor conditions and prompt technicians to perform corrective…
cs.AI updates on arXiv.org
May 12
Score 7.0
Reasoning Compression with Mixed-Policy Distillation
arXiv:2605.08776v1 Announce Type: new Abstract: Reasoning-centric large language models (LLMs) achieve strong performance by generating intermediate reasoning trajectories, but often incur excessive token usage and high…
cs.AI updates on arXiv.org
May 12
Score 7.0
NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models
arXiv:2605.07051v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and mathematics…
cs.CL updates on arXiv.org
May 12
Score 7.0
Log analysis is necessary for credible evaluation of AI agents
arXiv:2605.08545v1 Announce Type: new Abstract: Agent benchmarks typically report only final outcomes: pass or fail. This threatens evaluation credibility in three ways. First, scores may be inflated or deflated by…
cs.AI updates on arXiv.org
May 12
Score 7.0
CERSA: Cumulative Energy-Retaining Subspace Adaptation for Memory-Efficient Fine-Tuning
arXiv:2605.08174v1 Announce Type: new Abstract: To mitigate the memory constraints associated with fine-tuning large pre-trained models, existing parameter-efficient fine-tuning (PEFT) methods, such as LoRA, rely on…
cs.LG updates on arXiv.org
May 12
Score 7.0
Beyond Single Ground Truth: Reference Monism as Epistemic Injustice in ASR Evaluation
arXiv:2605.07084v1 Announce Type: new Abstract: Automatic speech recognition (ASR) evaluation compares system output to ground truth transcripts, with Word Error Rate (WER) quantifying the distance between them. But…
cs.CL updates on arXiv.org
May 12
Score 7.0
Human-Inspired Memory Architecture for LLM Agents
arXiv:2605.08538v1 Announce Type: new Abstract: Current LLM agents lack principled mechanisms for managing persistent memory across long interaction horizons. We present a biologically-grounded memory architecture…
cs.AI updates on arXiv.org
May 12
Score 7.0
Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation
arXiv:2605.07111v1 Announce Type: new Abstract: Recent literature on fine-tuning Large Language Models highlights a fundamental debate. While Full Fine-Tuning (FFT) provides the representational plasticity required for…
cs.CL updates on arXiv.org
May 12
Score 7.0
On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective
arXiv:2605.08368v1 Announce Type: new Abstract: Debates about large language model post-training often treat supervised fine-tuning (SFT) as imitation and reinforcement learning (RL) as discovery. But this distinction…
cs.AI updates on arXiv.org
May 12
Score 7.0
GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations
arXiv:2605.07053v1 Announce Type: new Abstract: Benchmarks like GSM8K are popular measures of mathematical reasoning, but leaderboard gains can overstate true capability due to memorization of fixed test sets. Most…
cs.CL updates on arXiv.org
May 12
Score 7.0
Gradient-Based LoRA Rank Allocation Under GRPO: An Empirical Study
arXiv:2605.07366v1 Announce Type: new Abstract: Adaptive rank allocation for LoRA, allocating more parameters to important layers and fewer to unimportant ones, consistently improves efficiency under supervised…
cs.CL updates on arXiv.org
May 12
Score 7.0
Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion
arXiv:2605.07013v1 Announce Type: new Abstract: Diffusion language models (DLMs) promise parallel, order-agnostic generation, but on standard benchmarks they have historically lagged behind autoregressive models in…
cs.CL updates on arXiv.org
May 12
Score 7.0
Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care
arXiv:2605.08533v1 Announce Type: new Abstract: Clinical decision-making in emergency medicine demands rapid, accurate diagnoses under uncertainty. Despite benchmark progress, evidence for LLMs as interactive aids in…
cs.AI updates on arXiv.org
May 12
Score 7.0
RewardHarness: Self-Evolving Agentic Post-Training
arXiv:2605.08703v1 Announce Type: new Abstract: Evaluating instruction-guided image edits requires rewards that reflect subtle human preferences, yet current reward models typically depend on large-scale preference…
cs.AI updates on arXiv.org
May 12
Score 7.0

Agentic

Medicare's new payment model is built for AI, and most of the tech world has no idea
Medicare’s new ACCESS payment model creates a reimbursement path for AI agents that monitor patients between visits and coordinate follow-up care. For builders, the significance is less the policy headline than the fact…
AI News & Artificial Intelligence | TechCrunch
May 13
Score 7.5
AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization
arXiv:2605.08704v1 Announce Type: new Abstract: Multi-agent reasoning has shown promise for improving the problem-solving ability of large language models by allowing multiple agents to explore diverse reasoning paths.…
cs.AI updates on arXiv.org
May 12
Score 7.0
Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge
arXiv:2605.08518v1 Announce Type: new Abstract: Competition retrospectives are useful when they explain what a leaderboard measured, how hidden evaluation changed conclusions, and which design patterns were rewarded. We…
cs.AI updates on arXiv.org
May 12
Score 7.0

Industry

How open model ecosystems compound
A reflective analysis of how open model ecosystems can reinforce themselves through participation, iteration, and distribution. The piece is most useful as a strategic read on why open-first AI communities can compound…
Interconnects AI
May 12
Score 8.2
Running Codex safely at OpenAI (6 minute read)
OpenAI outlines the safety controls behind Codex, including sandboxing, approval gates, and constrained execution environments. It’s a useful look at how agentic coding systems are being boxed in to reduce blast radius…
TLDR AI Feed
May 11
Score 7.9

agentic ai

[PDF] 2026 Agentic Coding Trends Report - Anthropic
3 Foundation trends: The tectonic shift 4 Trend 1 The software development lifecycle changes dramatically The way we interact with computers is undergoing one of its most significant changes since the graphical user…
Tavily · Agentic Ai
Score 7.0