This paper proposes Variational Linear Attention, an online least-squares formulation that stabilizes linear attention memory with an adaptive penalty matrix. It targets a core bottleneck in long-context transformers:…
This paper replaces standard diffusion denoising with conditional normalizing flows to get four-step image generation without giving up exact likelihood training. The self-distillation angle makes it especially relevant…
State space models are moving from a niche alternative to a credible transformer competitor, with tradeoffs that matter for long-context efficiency and scaling. The piece is a useful snapshot of where SSMs fit, and…
This paper tightens the evaluation of diffusion-based OOD detectors by controlling for backbone choice and test-time budget, then proposes sparse internal feature snapshots as a fairer detector family. It matters most…
AWS outlines how foundation model scaling is moving past pre-training into post-training and test-time compute, with infrastructure implications for each stage. The piece is useful for engineers tracking where compute,…
ASD-Bench introduces a four-axis benchmark for autism spectrum disorder screening across children, adolescents, and adults. It gives researchers a more structured way to compare classical ML, deep learning, and…
This paper reframes oversmoothing in neural sheaf diffusion as a representation-degeneracy problem and brings quiver/sheaf theory to bear on the dynamics. It is mathematically rich, but the practical payoff for GenAI…
arXiv:2605.06940v1 Announce Type: new Abstract: Annotation automation via Large Language Models (LLMs) is the core approach for scaling NLP datasets; however, LLM behavior with respect to closed-set instructions in…
arXiv:2605.06832v1 Announce Type: new Abstract: Accurately understanding the intent behind speech, conversation, and writing is crucial to the development of helpful Large Language Model (LLM) assistants. This paper…
arXiv:2605.08354v1 Announce Type: new Abstract: Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment.…
arXiv:2605.08614v1 Announce Type: new Abstract: Monitoring complex industrial assets relies on engineer-authored symbolic rules that trigger based on sensor conditions and prompt technicians to perform corrective…
arXiv:2605.08776v1 Announce Type: new Abstract: Reasoning-centric large language models (LLMs) achieve strong performance by generating intermediate reasoning trajectories, but often incur excessive token usage and high…
arXiv:2605.07051v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and mathematics…
arXiv:2605.08545v1 Announce Type: new Abstract: Agent benchmarks typically report only final outcomes: pass or fail. This threatens evaluation credibility in three ways. First, scores may be inflated or deflated by…
arXiv:2605.08174v1 Announce Type: new Abstract: To mitigate the memory constraints associated with fine-tuning large pre-trained models, existing parameter-efficient fine-tuning (PEFT) methods, such as LoRA, rely on…
arXiv:2605.07084v1 Announce Type: new Abstract: Automatic speech recognition (ASR) evaluation compares system output to ground truth transcripts, with Word Error Rate (WER) quantifying the distance between them. But…
arXiv:2605.08538v1 Announce Type: new Abstract: Current LLM agents lack principled mechanisms for managing persistent memory across long interaction horizons. We present a biologically-grounded memory architecture…
arXiv:2605.07111v1 Announce Type: new Abstract: Recent literature on fine-tuning Large Language Models highlights a fundamental debate. While Full Fine-Tuning (FFT) provides the representational plasticity required for…
arXiv:2605.08368v1 Announce Type: new Abstract: Debates about large language model post-training often treat supervised fine-tuning (SFT) as imitation and reinforcement learning (RL) as discovery. But this distinction…
arXiv:2605.07053v1 Announce Type: new Abstract: Benchmarks like GSM8K are popular measures of mathematical reasoning, but leaderboard gains can overstate true capability due to memorization of fixed test sets. Most…
arXiv:2605.07366v1 Announce Type: new Abstract: Adaptive rank allocation for LoRA, allocating more parameters to important layers and fewer to unimportant ones, consistently improves efficiency under supervised…
arXiv:2605.07013v1 Announce Type: new Abstract: Diffusion language models (DLMs) promise parallel, order-agnostic generation, but on standard benchmarks they have historically lagged behind autoregressive models in…
arXiv:2605.08533v1 Announce Type: new Abstract: Clinical decision-making in emergency medicine demands rapid, accurate diagnoses under uncertainty. Despite benchmark progress, evidence for LLMs as interactive aids in…
arXiv:2605.08703v1 Announce Type: new Abstract: Evaluating instruction-guided image edits requires rewards that reflect subtle human preferences, yet current reward models typically depend on large-scale preference…
Medicare’s new ACCESS payment model creates a reimbursement path for AI agents that monitor patients between visits and coordinate follow-up care. For builders, the significance is less the policy headline than the fact…
arXiv:2605.08704v1 Announce Type: new Abstract: Multi-agent reasoning has shown promise for improving the problem-solving ability of large language models by allowing multiple agents to explore diverse reasoning paths.…
arXiv:2605.08518v1 Announce Type: new Abstract: Competition retrospectives are useful when they explain what a leaderboard measured, how hidden evaluation changed conclusions, and which design patterns were rewarded. We…
A reflective analysis of how open model ecosystems can reinforce themselves through participation, iteration, and distribution. The piece is most useful as a strategic read on why open-first AI communities can compound…
OpenAI outlines the safety controls behind Codex, including sandboxing, approval gates, and constrained execution environments. It’s a useful look at how agentic coding systems are being boxed in to reduce blast radius…
3 Foundation trends: The tectonic shift 4 Trend 1 The software development lifecycle changes dramatically The way we interact with computers is undergoing one of its most significant changes since the graphical user…