Site Reliability Engineer(SRE) - 2

Bengaluru, Karnataka, India | SRE | Full-time


About MoEngage

MoEngage is an intelligent customer engagement platform, built for customer-obsessed marketers and product owners. We enable hyper-personalization at scale across multiple channels like mobile push, email, in-app, web push, on-site messages, and SMS. With AI-powered automation and optimization, brands can analyze audience behavior and engage consumers with personalized communication at every touchpoint across their lifecycle.

Fortune 500 brands and Enterprises across 35 countries such as Deutsche Telekom, Samsung, Ally Financial, Vodafone, and McAfee along with internet-first brands such as Flipkart, Ola, OYO, and Bigbasket use MoEngage to orchestrate their cross-channel campaigns and engage efficiently with their customers sending 50 billion messages to 500 million consumers every month!

Our vision is to build the world’s most trusted customer engagement platform for the mobile-first world.

We promise to care about your customers as much as you do. And that justifies our top ratings for service and support in Gartner Magic Quadrant, Gartner Peer Insights, and G2 Summer Reports. We have also been recognized as one of the 25 Highest Rated Private Cloud Computing Companies To Work For in a list released by Battery Ventures, a global investment firm based on the employee feedback on Glassdoor where employees reported the highest levels of satisfaction at work during the first six months of the pandemic."

As part of the Engineering team at MoEngage, here are some things you can expect:

  • Take ownership and be responsible for what you build - no micro management
  • Work with A players (some of the best talent in the country), and expedite your learning curve and career growth
  • Make in India and build for the world at scale of 500M active users, which no other internet company in the country has seen
  • Learn together from different teams on how they scale to millions of users and billions of messages. 
  • Explore the latest in topics like Data Pipeline, MongoDB, ElasticSearch, Kafka, Spark, Samza and share with the team 

Here are some of the challenging areas you can expect to work as part of the SRE team :

  1. Work with one of the largest Elasticsearch cluster deployment
  2. Work on a large scale MongoDB installation
  3. Work on a large scale Kafka cluster
  4. Maintain services once they are live by measuring and monitoring availability, latency and overall system reliability.
  5. Work closely with team members to ensure best practices and strategic goals are incorporated into development work.
  6. Collaborate with other engineering teams to identify and anticipate changing requirements and opportunities to improve the development environment.Define and iterate team process, collaboration, and focuses on overall team velocity with different stakeholders, including product, design, etc.
  7. Monitoring at scale with Prometheus and the likes
  8. Dockerizing and orchestrating with K8S and the likes
  9. Implementing best practices, challenging status quo, tab on industry and technical trends, changes and developments to ensure team is always striving for best in class work.
  10. Manage capacity, build security into every layer and reduce cost
  11. Implement secure networking, key management, user management, access management, process management, image management.
  12. Effectively lead and manage team deliverable (short/long term) project planning and coaching, quarterly reviews, participation in the selection process for new hires, technical and non-technical guidance to the team.

Skill Requirements:

  • Proven experience in handling large infrastructure and distributed systems like Kafka, Yarn, Elastic Search etc..
  • Tech Stack - Python, Falcon, Elastic Search, MongoDB, AWS (SQS S3), Linux, Map Reduce.
  • Familiarity with Python related technologies and frameworks like Django or Pyramid.
  • Experience with Unix/Linux operating systems internals and administration (e.g. filesystems, inodes, system calls, etc) or networking (e.g. TCP/IP, routing, network topologies, and hardware, SDN, etc)
  • Familiarity with at least one of the cloud computing infrastructure - GCP / Azure / AWS
  • Familiarity with task queue frameworks like Celery or Pika is a plus.
  • Source code management and Implementation of security best practices.
  • Familiarity with any one container orchestration tools (K8's, ECS, swarm) build, artefact, packaging, service discovery management tools.
  • Know-how of gathering metrics across distributed system (instances/container) & generating automated notification, reports.
  • Prowess in analysing App bottlenecks, performance degradation and implementing automated process/tools to detect such anomalies.
  • Good understanding & implementation experience using 12-factor App principles.

Mandatory Skills:

  • 3+yrs Experience on AWS platform.

  • Proficiency in python or shell scripting languages.

  • Having mindset as Automate anything.

  • Experience with AWS cost explorer, billing analysis and various cost optimization techniques.

Good to have:

  • AWS Certified Solutions Architect certification preferred

  • Certification in Kubernetes Administrator (CKA).

  • Certification in Kubernetes Application Developer (CKAD)

  • Experience with configuration management tools & Strong code analysis skills in Python

At MoEngage, we are passionate about our team and technology - see below to know more about us and technology.

Tech @MoEngage | Scale @MoEngage | Life @MoEngage

We handle more than a billion messages every day. Rest assured, you will be surrounded by really smart and passionate people as we scale much more to build a world-class technology team.