
DeBERTa is the New King
Explore how DeBERTa revolutionizes NLP with its innovative Disentangled Attention Mechanism and Enhanced Mask Decoder, surpassing previous models like BERT and RoBERTa in performance and setting a new benchmark in the field
ML Engineer · 5+ years in NLP & LLMs
# trained from scratch. no pre-trained weights.
Initialized with a Mechanical Engineering degree, pre-trained on curiosity, and fine-tuned by an unreasonable number of late nights — 5+ years of gradient descent through NLP, LLMs, and generative AI. Currently inference-serving at Apple. I write about ML here so you don't have to train from scratch too.
Apple — multi-agentic systems & LLM research
Notebooks GM, Competitions Master
Dynamic intent detection system
ME degree → ML through sheer will
model.fit(life, epochs=∞, lr=persistence)
Each milestone is a neuron. Experiences flow forward. Lessons backpropagate. The loss is still decreasing.
training_loss.plot()
“You don’t need a low initial loss. You need a good learning rate and the patience to keep training.”
— the philosophy that took me from Mechanical Engineering to Apple
latest

Explore how DeBERTa revolutionizes NLP with its innovative Disentangled Attention Mechanism and Enhanced Mask Decoder, surpassing previous models like BERT and RoBERTa in performance and setting a new benchmark in the field

Discover how Longformer overcomes the limitations of traditional Transformer models by introducing an attention mechanism that scales linearly with sequence length, enabling efficient processing of long documents
Watch my conversations with people who shaped the Indian ML community: