Section

Fine-tuning, LoRA & Post-training

SFT, DPO/GRPO/RLHF, LoRA/QLoRA, distillation, model merging, and synthetic data generation for post-training.

10 stories

GenAI
Foundation Model Scaling (34 minute read)
AWS outlines how foundation model scaling is moving past pre-training into post-training and test-time compute, with infrastructure implications for each stage. The piece is useful for engineers tracking where compute,…
TLDR AI FeedMay 12Score 7.8
May 12
Score 7.8
GenAI
SCALE-LoRA: Auditing Post-Retrieval LoRA Composition with Residual Merging and View Reliability
This paper studies how to reuse a pool of LoRA adapters for new tasks after retrieval, focusing on composition and auditing rather than training a fresh adapter. Its residual merging and view-reliability analysis should…
cs.AI updates on arXiv.orgMay 6Score 9.3
May 6
Score 9.3
GenAI
PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs
This paper studies RLHF for shaping an LLM’s feedback into a professor-like style while preserving diagnostic accuracy. It’s relevant for teams building personalized tutoring or critique systems, especially where tone…
cs.AI updates on arXiv.orgMay 6Score 9.0
May 6
Score 9.0
GenAI
SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning
SciResearcher tackles a core bottleneck in deep research agents: how to scale them for frontier scientific reasoning without relying only on brittle web-browsing or knowledge-graph pipelines. It looks relevant for teams…
cs.AI updates on arXiv.orgMay 6Score 10.0
May 6
Score 10.0
Agentic AI
A Coding Implementation to Parsing, Analyzing, Visualizing, and Fine-Tuning Agent Reasoning Traces Using the lambda/hermes-agent-reasoning-traces Dataset
A tutorial walks through parsing and visualizing an agent-reasoning-traces dataset, then sketches fine-tuning workflows on top of it. It may be useful as a hands-on notebook, but it reads more like a generic…
MarkTechPostMay 2Score 3.5
May 2
Score 3.5
GenAI
Tracing the Goblin Quirk in GPT Models (6 minute read)
OpenAI traces a quirky GPT-5.1 behavior back to reward signals from personality tuning, illustrating how small optimization choices can steer model style. The piece is more of a model-behavior note than a builder-facing…
TLDR AI FeedMay 1Score 7.0
May 1
Score 7.0
GenAI
Reinforcement fine-tuning with LLM-as-a-judge
A vendor blog explains reinforcement fine-tuning with an LLM-as-a-judge for Amazon Nova models. It may be useful as an implementation overview, but it reads more like a product-oriented walkthrough than a substantive…
Artificial IntelligenceApr 30Score 4.8
Apr 30
Score 4.8
Industry
Granite 4.1 LLMs: How They're Built (13 minute read)
IBM’s Granite 4.1 family shifts to dense decoder-only models and a staged training recipe aimed at stronger instruction following and tool use. The write-up is useful for builders tracking how enterprise LLMs are being…
TLDR AI FeedApr 30Score 7.9
Apr 30
Score 7.9
GenAI
Announcing Together AI and Adaption Partnership
A partnership announcement ties Together Fine-Tuning into Adaption’s workflow for dataset optimization, evaluation, and deployment. The integration may streamline open-model tuning, but the post reads as a product…
Together.aiApr 30Score 3.5
Apr 30
Score 3.5
Agentic AI
Accelerate RL rollouts by up to 50% with distribution-aware speculative decoding
A distribution-aware speculative decoding method targets rollout bottlenecks in RL post-training, with the team reporting up to 50% faster generation without reward loss. It is most relevant for builders optimizing…
Together.aiApr 24Score 7.7
Apr 24
Score 7.7