Tanul Singh

ML Engineer · 5+ years in NLP & LLMs

# trained from scratch. no pre-trained weights.

Initialized with a Mechanical Engineering degree, pre-trained on curiosity, and fine-tuned by an unreasonable number of late nights — 5+ years of gradient descent through NLP, LLMs, and generative AI. Currently inference-serving at Apple. I write about ML here so you don't have to train from scratch too.

→ read the blog → see my training curve → see my full forward pass

model.fit(life, epochs=∞, lr=persistence)

My Training Curve

Each milestone is a neuron. Experiences flow forward. Lessons backpropagate. The loss is still decreasing.

forward pass (life moving forward)backprop (lessons learned)turning points

training_loss.plot()

“You don’t need a low initial loss. You need a good learning rate and the patience to keep training.”

— the philosophy that took me from Mechanical Engineering to Apple

latest

Recent writing

My experiments with RoPE

Pre-Training: Teaching a Model to Predict

Beyond Attention: Anatomy of a Modern Transformer

Want to know more?

Watch my conversations with people who shaped the Indian ML community:

Discussion with Dhruv (CodeBasics)Discussion with Abhishek Thakur (4x Kaggle GM)