Section

GenAI

Foundation models, retrieval, evals, multimodal, voice — the building blocks of generative AI.

182 stories

GenAI
Backbone-Equated Diffusion OOD via Sparse Internal Snapshots
This paper tightens the evaluation of diffusion-based OOD detectors by controlling for backbone choice and test-time budget, then proposes sparse internal feature snapshots as a fairer detector family. It matters most…
cs.LG updates on arXiv.orgMay 13Score 8.2
May 13
Score 8.2
GenAI
Variational Linear Attention: Stable Associative Memory for Long-Context Transformers
This paper proposes Variational Linear Attention, an online least-squares formulation that stabilizes linear attention memory with an adaptive penalty matrix. It targets a core bottleneck in long-context transformers:…
cs.LG updates on arXiv.orgMay 13Score 9.2
May 13
Score 9.2
GenAI
Oversmoothing as Representation Degeneracy in Neural Sheaf Diffusion
This paper reframes oversmoothing in neural sheaf diffusion as a representation-degeneracy problem and brings quiver/sheaf theory to bear on the dynamics. It is mathematically rich, but the practical payoff for GenAI…
cs.LG updates on arXiv.orgMay 13Score 7.3
May 13
Score 7.3
GenAI
ASD-Bench: A Four-Axis Comprehensive Benchmark of AI Models for Autism Spectrum Disorder
ASD-Bench introduces a four-axis benchmark for autism spectrum disorder screening across children, adolescents, and adults. It gives researchers a more structured way to compare classical ML, deep learning, and…
cs.LG updates on arXiv.orgMay 13Score 7.6
May 13
Score 7.6
Industry
How open model ecosystems compound
A reflective analysis of how open model ecosystems can reinforce themselves through participation, iteration, and distribution. The piece is most useful as a strategic read on why open-first AI communities can compound…
Interconnects AIMay 12Score 8.2
May 12
Score 8.2
GenAI
The Sequence Knowledge #858: How State Space Models Went from Curiosity to Serious Transformer Competitor
State space models are moving from a niche alternative to a credible transformer competitor, with tradeoffs that matter for long-context efficiency and scaling. The piece is a useful snapshot of where SSMs fit, and…
TheSequenceMay 12Score 8.8
May 12
Score 8.8
GenAI
Mirror, Mirror on the Wall: Can VLM Agents Tell Who They Are at All?
arXiv:2605.08816v1 Announce Type: new Abstract: In the animal kingdom, mirror self-recognition is a canonical probe of higher-order cognition, emerging only in some species. We ask whether an analogous functional…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Echo-LoRA: Parameter-Efficient Fine-Tuning via Cross-Layer Representation Injection
arXiv:2605.08177v1 Announce Type: new Abstract: Parameter-efficient fine-tuning (PEFT) has become a practical route for adapting large language models to downstream tasks, with LoRA-style methods being particularly…
cs.LG updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Reasoning Compression with Mixed-Policy Distillation
arXiv:2605.08776v1 Announce Type: new Abstract: Reasoning-centric large language models (LLMs) achieve strong performance by generating intermediate reasoning trajectories, but often incur excessive token usage and high…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Towards Customized Multimodal Role-Play
arXiv:2605.08129v1 Announce Type: new Abstract: Unified multimodal understanding and generation models enable richer human-AI interaction. Yet jointly customizing a character's persona, dialogue style, and visual…
cs.LG updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
RewardHarness: Self-Evolving Agentic Post-Training
arXiv:2605.08703v1 Announce Type: new Abstract: Evaluating instruction-guided image edits requires rewards that reflect subtle human preferences, yet current reward models typically depend on large-scale preference…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care
arXiv:2605.08533v1 Announce Type: new Abstract: Clinical decision-making in emergency medicine demands rapid, accurate diagnoses under uncertainty. Despite benchmark progress, evidence for LLMs as interactive aids in…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Log analysis is necessary for credible evaluation of AI agents
arXiv:2605.08545v1 Announce Type: new Abstract: Agent benchmarks typically report only final outcomes: pass or fail. This threatens evaluation credibility in three ways. First, scores may be inflated or deflated by…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
TTCD:Transformer Integrated Temporal Causal Discovery from Non-Stationary Time Series Data
arXiv:2605.08111v1 Announce Type: new Abstract: The widespread availability of complex time series data in various domains such as environmental science, epidemiology, and economics demands robust causal discovery…
cs.LG updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
CERSA: Cumulative Energy-Retaining Subspace Adaptation for Memory-Efficient Fine-Tuning
arXiv:2605.08174v1 Announce Type: new Abstract: To mitigate the memory constraints associated with fine-tuning large pre-trained models, existing parameter-efficient fine-tuning (PEFT) methods, such as LoRA, rely on…
cs.LG updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Human-Inspired Memory Architecture for LLM Agents
arXiv:2605.08538v1 Announce Type: new Abstract: Current LLM agents lack principled mechanisms for managing persistent memory across long interaction horizons. We present a biologically-grounded memory architecture…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification
arXiv:2605.08448v1 Announce Type: new Abstract: Semi-supervised learning approaches have been investigated as a means to enhance the analysis of social media data in disaster management contexts. In this work, we…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction
arXiv:2605.08220v1 Announce Type: new Abstract: The automated extraction of data from scientific charts is a critical task for large-scale literature analysis. While multimodal Large Language Models (LLMs) show promise,…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
DataArc-SynData-Toolkit: A Unified Closed-Loop Framework for Multi-Path, Multimodal, and Multilingual Data Synthesis
arXiv:2605.08138v1 Announce Type: new Abstract: Synthetic data has emerged as a crucial solution to the data scarcity bottleneck in large language models (LLMs), particularly for specialized domains and low-resource…
cs.LG updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Do Foundation Model Embeddings Improve Cross-Country Crop Yield Generalisation? A Leave-One-Country-Out Evaluation in Sub-Saharan Africa
arXiv:2605.08113v1 Announce Type: new Abstract: Accurate predictions of smallholder maize yields across national boundaries are critical for food security planning in sub-Saharan Africa, yet most published benchmarks…
cs.LG updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
The Safety-Aware Denoiser for Text Diffusion Models
arXiv:2605.08116v1 Announce Type: new Abstract: Recent work on text diffusion models offers a promising alternative to autoregressive generation, but controlling their safety remains underexplored. Existing safety…
cs.LG updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits
arXiv:2605.08200v1 Announce Type: new Abstract: A pervasive intuition holds that vision-language models (VLMs) are most trustworthy when their attention maps look sharp: concentrated attention on the queried region…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
arXiv:2605.08202v1 Announce Type: new Abstract: Offline reinforcement learning (RL) faces a critical challenge of overestimating the value of out-of-distribution (OOD) actions. Existing methods mitigate this issue by…
cs.LG updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria
arXiv:2605.08354v1 Announce Type: new Abstract: Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment.…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training
arXiv:2605.08144v1 Announce Type: new Abstract: Diffusion models have achieved remarkable success across a wide range of generative tasks, yet their training paradigm largely treats injected noise as uniformly…
cs.LG updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
ReplaySCM: A Benchmark for Executable Causal Mechanism Induction from Interventions
arXiv:2605.08197v1 Announce Type: new Abstract: Most causal benchmarks for language models score local answers or graph structure. We introduce ReplaySCM, a 1,300 item benchmark for executable causal mechanism induction…
cs.LG updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
PLACO: A Multi-Stage Framework for Cost-Effective Performance in Human-AI Teams
arXiv:2605.08388v1 Announce Type: new Abstract: Human-AI teams play a pivotal role in improving overall system performance when neither the human nor the model can achieve such performance on their own. With the advent…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Measuring What Matters: Benchmarking Generative, Multimodal, and Agentic AI in Healthcare
arXiv:2605.08445v1 Announce Type: new Abstract: AI models are increasingly deployed in live clinical environments where they must perform reliably across complex, high-stakes workflows that standard training and…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective
arXiv:2605.08368v1 Announce Type: new Abstract: Debates about large language model post-training often treat supervised fine-tuning (SFT) as imitation and reinforcement learning (RL) as discovery. But this distinction…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules
arXiv:2605.08614v1 Announce Type: new Abstract: Monitoring complex industrial assets relies on engineer-authored symbolic rules that trigger based on sensor conditions and prompt technicians to perform corrective…
cs.AI updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
DARE: Diffusion Language Model Activation Reuse for Efficient Inference
arXiv:2605.08134v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to auto-regressive (AR) models, offering greater expressive capacity and potential for…
cs.LG updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations
arXiv:2605.07053v1 Announce Type: new Abstract: Benchmarks like GSM8K are popular measures of mathematical reasoning, but leaderboard gains can overstate true capability due to memorization of fixed test sets. Most…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes
arXiv:2605.06897v1 Announce Type: new Abstract: The rise of Internet of Things (IoT) devices in the physical world necessitates voice-based interfaces capable of handling complex user experiences. While modern Large…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Beyond Single Ground Truth: Reference Monism as Epistemic Injustice in ASR Evaluation
arXiv:2605.07084v1 Announce Type: new Abstract: Automatic speech recognition (ASR) evaluation compares system output to ground truth transcripts, with Word Error Rate (WER) quantifying the distance between them. But…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion
arXiv:2605.07013v1 Announce Type: new Abstract: Diffusion language models (DLMs) promise parallel, order-agnostic generation, but on standard benchmarks they have historically lagged behind autoregressive models in…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Structural Rationale Distillation via Reasoning Space Compression
arXiv:2605.07139v1 Announce Type: new Abstract: When distilling reasoning from large language models (LLMs) into smaller ones, teacher rationales for similar problems often vary wildly in structure and strategy. Like a…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models
arXiv:2605.07051v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and mathematics…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning
arXiv:2605.07269v1 Announce Type: new Abstract: Indirect prompt injection remains a persistent weakness in retrieval-augmented and tool-using LLM systems, and the problem becomes harder to characterise in multilingual…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat
arXiv:2605.07201v1 Announce Type: new Abstract: This paper describes our system for the EEUCA 2026 Shared Task on Understanding Toxic Behavior in Gaming Communities. The task involves classifying World of Tanks chat…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
IntentGrasp: A Comprehensive Benchmark for Intent Understanding
arXiv:2605.06832v1 Announce Type: new Abstract: Accurately understanding the intent behind speech, conversation, and writing is crucial to the development of helpful Large Language Model (LLM) assistants. This paper…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation
arXiv:2605.07111v1 Announce Type: new Abstract: Recent literature on fine-tuning Large Language Models highlights a fundamental debate. While Full Fine-Tuning (FFT) provides the representational plasticity required for…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media
arXiv:2605.06940v1 Announce Type: new Abstract: Annotation automation via Large Language Models (LLMs) is the core approach for scaling NLP datasets; however, LLM behavior with respect to closed-set instructions in…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Gradient-Based LoRA Rank Allocation Under GRPO: An Empirical Study
arXiv:2605.07366v1 Announce Type: new Abstract: Adaptive rank allocation for LoRA, allocating more parameters to important layers and fewer to unimportant ones, consistently improves efficiency under supervised…
cs.CL updates on arXiv.orgMay 12Score 7.0
May 12
Score 7.0
GenAI
Trajectory Models for Few-Step Diffusion (22 minute read)
This paper replaces standard diffusion denoising with conditional normalizing flows to get four-step image generation without giving up exact likelihood training. The self-distillation angle makes it especially relevant…
TLDR AI FeedMay 12Score 9.2
May 12
Score 9.2
GenAI
Foundation Model Scaling (34 minute read)
AWS outlines how foundation model scaling is moving past pre-training into post-training and test-time compute, with infrastructure implications for each stage. The piece is useful for engineers tracking where compute,…
TLDR AI FeedMay 12Score 7.8
May 12
Score 7.8
GenAI
EnvSimBench: A Benchmark for Evaluating and Improving LLM-Based Environment Simulation
arXiv:2605.07247v1 Announce Type: new Abstract: Scalable AI agents training relies on interactive environments that faithfully simulate the consequences of agent actions. Manually crafted environments are expensive to…
cs.AI updates on arXiv.orgMay 11Score 7.0
May 11
Score 7.0
GenAI
The E-MHC-Geo Transformer: Adaptive Geodesic Operations with Guaranteed Orthogonality
arXiv:2605.06729v1 Announce Type: new Abstract: We present the E-MHC-Geo Transformer, a novel architecture that unifies Manifold-Constrained Hyper-Connections (mHC), Deep Delta Learning (DDL), and the Cayley transform…
cs.LG updates on arXiv.orgMay 11Score 7.0
May 11
Score 7.0
GenAI
Mitigating Cognitive Bias in RLHF by Altering Rationality
arXiv:2605.06895v1 Announce Type: new Abstract: How can we make models robust to even imperfect human feedback? In reinforcement learning from human feedback (RLHF), human preferences over model outputs are used to…
cs.AI updates on arXiv.orgMay 11Score 7.0
May 11
Score 7.0
GenAI
Agentick: A Unified Benchmark for General Sequential Decision-Making Agents
arXiv:2605.06869v1 Announce Type: new Abstract: AI agent research spans a wide spectrum: from RL agents that learn from scratch to foundation model agents that leverage pre-trained knowledge, yet no unified benchmark…
cs.AI updates on arXiv.orgMay 11Score 7.0
May 11
Score 7.0
GenAI
Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion
arXiv:2605.06720v1 Announce Type: new Abstract: Antibody therapeutics are among the most successful modern medicines, yet computationally designing antibodies with desirable binding and developability properties remains…
cs.LG updates on arXiv.orgMay 11Score 7.0
May 11
Score 7.0