Steepish Descent

Unleash the chains of thought.

  • Modeling Training

    In a previous blog post I combined the Gibbs policy and Smoothed IGW to develop a…

  • Vibe Overthinking It

    A comedy of automated autism. My previous post about Softmax Muon is (as far I can…

  • Softmax Muon

    Vibe-research is really productive: this idea would have taken me weeks to formalize myself, but instead…

  • Smoothed IGW for LLMs

    A connection between contextual bandit exploration in infinite action spaces and LLM-RL algorithms leveraging the Gibbs…

  • Note-taking: Continual Learning Shortcut?

    “Notes for your future self” is an effective zero-shot pattern for continual learning. Can it be…

  • Reducing Sample Complexity

    There’s no data like less data? The current paradigm appears sufficient to automate a large portion…

  • $\log^2$ Divergence

    A great divergence … or … the greatest divergence? In my last post I derived reward…

  • The Gibbs Policy

    The only thing better than running a learning algorithm is pretending to run it. In my…

  • Pertinacity and the Puke

    My opinions on why importance weighted estimators make bad objective functions.

  • Beyond Clipping

    Importance-weight clipping is used everywhere in machine learning. Can we get rid of it? Spoiler alert:…