SudhanshuSingh

DevOps / Site Reliability Engineer

Sudhanshu Singh profile image
@Sudhanshu069LinkedIn+ Connect

I'm a DevOps / SRE engineer who owns production reliability end-to-end. At RegisterKaro I single-handedly migrated our platform from DigitalOcean to AWS and now run it solo for 50,000+ users — Infrastructure as Code with Terraform and Ansible, CI/CD with blue/green deploys, and full observability via Prometheus, Grafana, and Loki. A lot of the job is keeping production fast, reliable, and cheap: live diagnostics and incident response, slow-query and pipeline optimization that took our database down a tier, and a ~20% cut in infrastructure cost. Underneath it is a real engineering foundation — Node.js / TypeScript services, RabbitMQ, Redis, and distributed-systems correctness — which is what lets me fix production at the code level, not just the infra level.

Skills

  • AWS
  • Terraform
  • Ansible
  • Docker
  • GitHub Actions
  • Nginx
  • Linux
  • DigitalOcean
  • Prometheus
  • Grafana
  • Loki
  • MongoDB
  • PostgreSQL
  • Redis
  • RabbitMQ
  • Node.js
  • TypeScript
  • Express.js
  • REST APIs
  • Webhooks
  • JWT/OAuth
  • Socket.IO
  • React.js

Experience

Mar 2025 - Present

RegisterKaro (Safe Ledger Pvt Ltd)

DevOps / Site Reliability Engineer

  • Single-handedly migrated production from DigitalOcean to AWS (ap-south-1) — provisioned the full stack with Terraform and Ansible (multi-AZ EC2 Auto Scaling Group, ALB, ECR, ElastiCache/Valkey, Secrets Manager, all in private subnets), ran a zero-downtime DNS cutover for 50,000+ customers, and retired the legacy stack 4 days early.
  • Drove a ~20% cut in infrastructure cost (~$11.6K/year) — relocated compute to the database's region, set up Atlas↔AWS VPC peering (eliminated ~$390/mo of NAT egress), and decommissioned DigitalOcean; authored the vendor-bill-backed cost brief prepared for the CEO.
  • Downsized the production MongoDB Atlas cluster a full tier (M50 → M40, ~$7.6K/year) — hand-analyzed slow-query logs and ran a pipeline and index optimization campaign: inverted the heaviest aggregation pipeline (23.5s → ~80ms) and dropped ~244 redundant indexes across 16 collections, reclaiming ~43% of slow-query time and ~2 TB/month of disk reads.
  • Cut ~8,000+ redundant MongoDB ops/min (peak ~17K fleet-wide) — built live production diagnostics (event-loop lag and per-endpoint counters across both PM2 workers), Redis-cached auth, notification, and dashboard hot paths (auth: 6–7 DB ops/request → ~0), and moved per-request writes to cron, eliminating multi-second event-loop freezes (~46s → <0.2s).
  • Hardened reliability and security — multi-AZ Auto Scaling, decoupled API health checks from RabbitMQ, enforced TLS and AUTH on ElastiCache, and replaced the database's 0.0.0.0/0 access with VPC peering and an explicit allowlist; validated a ~5-minute recovery objective in a restore drill.
  • Fixed a long-standing webhook race condition under concurrent RabbitMQ ingestion (atomic findOneAndUpdate and prefetch tuning) and added a Redis distributed lock for idempotent deduplication that degrades gracefully when Redis is unavailable.
  • Built GitHub Actions CI/CD across the 4-service stack with blue/green deploys (ALB target-group swap) plus rolling ASG/SSM deploys and automated rollback, and stood up the Grafana + Loki + Promtail observability stack for incident response.

May 2024 - Jan 2025

Segwitz

Software Engineer (Frontend)

  • Built performant React dashboards using code splitting and lazy loading, collaborating with the design team to reduce initial load time by 10%.
  • Implemented OAuth 2.0 authentication with token refresh, protected routes, and secure session handling for SPA dashboards, coordinating with the backend team on API contracts.
  • Contributed to frontend architecture decisions and component library conventions, conducting code reviews on shared UI modules.

← Back to home