-
Modeling Training
In a previous blog post I combined the Gibbs policy and Smoothed IGW to develop a…
-
Vibe Overthinking It
A comedy of automated autism. My previous post about Softmax Muon is (as far I can…
-
Softmax Muon
Vibe-research is really productive: this idea would have taken me weeks to formalize myself, but instead…
-
Smoothed IGW for LLMs
A connection between contextual bandit exploration in infinite action spaces and LLM-RL algorithms leveraging the Gibbs…
-
Note-taking: Continual Learning Shortcut?
“Notes for your future self” is an effective zero-shot pattern for continual learning. Can it be…
-
Reducing Sample Complexity
There’s no data like less data? The current paradigm appears sufficient to automate a large portion…
-
$\log^2$ Divergence
A great divergence … or … the greatest divergence? In my last post I derived reward…
-
The Gibbs Policy
The only thing better than running a learning algorithm is pretending to run it. In my…
-
Pertinacity and the Puke
My opinions on why importance weighted estimators make bad objective functions.
-
Beyond Clipping
Importance-weight clipping is used everywhere in machine learning. Can we get rid of it? Spoiler alert:…