Experience#


Microsoft R&D India#

Software Engineer II

What I worked on#

At Microsoft, I worked on internal infrastructure platforms used across 17,000+ microservices, with a focus on data integrity, reliability, and developer productivity at scale.

My primary responsibility was building and scaling a Distributed Metadata Correction Platform — a self-healing, event-driven system designed to maintain consistency across Microsoft’s service inventory during burst traffic and failure scenarios.


Distributed Metadata Correction Platform#

This platform continuously validated and corrected service metadata across Microsoft’s internal ecosystem. It was explicitly designed to handle retry storms, partial failures, and high-concurrency updates without human intervention.

Key architectural decisions & impact:

  • Event-driven architecture: Designed a 3-layer system — ingestion, compute, and persistence — using Azure Service Bus, Azure Functions, and Cosmos DB, capable of processing millions of metadata events.
  • Concurrency & data integrity: Introduced adaptive concurrency control after observing repeated retry storms in production. The system dynamically favored optimistic updates (≈95%), falling back to pessimistic locking (≈5%) for high-conflict paths using Redis, E-Tags, and Cosmos DB transactional semantics.
  • Resilience under burst traffic: Implemented traffic reshaping and backoff strategies to absorb sudden spikes without cascading failures, achieving 99.9%+ metadata consistency.
  • Observability & diagnostics: Integrated Azure Data Explorer (Kusto) to enable deep root-cause analysis across service boundaries, significantly reducing mean-time-to-diagnosis during incidents.

Developer Productivity & Platform Tooling#

Alongside core infrastructure work, I focused heavily on reducing developer toil:

  • Built internal tooling that cut environment setup time from 60+ minutes to under 20 minutes
  • Automated validation and correction workflows, reducing manual service audits by ~70%
  • Designed admin tools for enforcing data hygiene across thousands of services
  • Contributed to “test-in-production” telemetry patterns to improve observability for Windows platform teams
  • Mentored teams and ran office hours to drive adoption of reliability and telemetry best practices

Tech Stack: C#, Azure (Service Bus, Functions, Cosmos DB, Redis), Kusto, PowerApps, CI/CD, Event-driven systems


Amazon#

Software Development Engineer

What I worked on#

At Amazon, I was a core engineer on the Alexa Device Farm Infrastructure, responsible for enabling deterministic, repeatable testing across physical Alexa devices running 24/7.

The challenge was fundamentally different from pure software systems: physical devices fail unpredictably, and flaky tests were blocking releases.


Alexa Device Farm Infrastructure#

I helped design and build a platform that orchestrated, diagnosed, and tested 100+ physical devices (Echo, Fire Tablets, Fire devices) continuously.

Key contributions:

  • Device orchestration platform: Designed a multi-layer system managing device lifecycle, health checks, provisioning, and diagnostics for long-running physical test fleets.
  • Deterministic automation: Converted 80–85% of manual test cases (from sanity to E2E) into automated, deterministic test suites, significantly accelerating release cycles.
  • Reliability improvements: Introduced signature-based failure classification, reducing false positives from ~40% to ~15% and improving overall platform uptime to 98%+.
  • Operational efficiency: Automated device provisioning and Wi-Fi registration, saving 1,000+ QA hours annually and reducing daily setup overhead for engineers.
  • CI/CD integration: Integrated automated validation into pipelines to enable faster, more reliable build qualification for hardware-software integration.

Tech Stack: Java, Python, AWS (EC2, Lambda), Docker, Device Automation, CI/CD


OMG Labs#

Founding Engineer / Early Team Member

At OMG Labs, I worked in an early-stage startup environment wearing product, engineering, and execution hats.

  • Led end-to-end delivery from strategy and design to implementation and deployment
  • Helped drive a subscription-based customer acquisition model, increasing traffic by 15–20%
  • Integrated e-commerce tooling (search, email marketing), improving digital engagement by 25–35%
  • Played a key role in securing Amazon Launchpad selection, enabling international exposure
  • Coordinated across product, marketing, and engineering teams to ensure fast, agile execution