Steepish Descent

Modeling Training

In a previous blog post I combined the Gibbs policy and Smoothed IGW to develop a…
Vibe Overthinking It

A comedy of automated autism. My previous post about Softmax Muon is (as far I can…
Softmax Muon

Vibe-research is really productive: this idea would have taken me weeks to formalize myself, but instead…
Smoothed IGW for LLMs

A connection between contextual bandit exploration in infinite action spaces and LLM-RL algorithms leveraging the Gibbs…
Note-taking: Continual Learning Shortcut?

“Notes for your future self” is an effective zero-shot pattern for continual learning. Can it be…
Reducing Sample Complexity

There’s no data like less data? The current paradigm appears sufficient to automate a large portion…
$\log^2$ Divergence

A great divergence … or … the greatest divergence? In my last post I derived reward…
The Gibbs Policy

The only thing better than running a learning algorithm is pretending to run it. In my…
Pertinacity and the Puke

My opinions on why importance weighted estimators make bad objective functions.
Beyond Clipping

Importance-weight clipping is used everywhere in machine learning. Can we get rid of it? Spoiler alert:…