Site Reliability Engineer at Vodacom Tanzania

Job Role Insights

  • Date posted

    2026-02-03

  • Closing date

    2026-02-16

  • Hiring location

    Dar es Salaam

  • Career level

    Middle

  • Qualification

    Bachelor Degree

  • Experience

    2 Years

  • Quantity

    1 person

  • Gender

    both

  • Job ID

    127501

Job Description

At Vodafone, we’re not just shaping the future of connectivity for our customers – we’re shaping the future for everyone who joins our team. When you work with us, you’re part of a global mission to connect people, solve complex challenges, and create a sustainable and more inclusive world. If you want to grow your career whilst finding the perfect balance between work and life, Vodafone offers the opportunities to help you belong and make a real impact.

What You’ll Do

The Site Reliability Engineer ensures the scalability, performance, and reliability of big data platforms (Hadoop, Spark, Flink, Kafka, etc.) by bridging software engineering and operations. The role focuses on automation, monitoring, incident management, fault tolerance, and disaster recovery to maintain high availability across data clusters. Additionally, it involves proactively resolving bottlenecks, enforcing SLAs, optimizing resources, and securing data pipelines, enabling efficient, continuous, and reliable delivery of analytics and data-driven services at scale.

Key Accountabilities And Decision Ownership

  • Platform Reliability and Performance: Ensure high availability, scalability, and optimal performance of big data platforms (e.g., Hadoop, Spark, Kafka, HDFS, Iceberg) through proactive monitoring, tuning, and capacity management
  • Automation and Infrastructure as Code: Design and implement automated deployment, configuration, and recovery processes using tools like Ansible, Terraform, or Kubernetes to improve operational efficiency and reduce human error
  • Incident Management and Root Cause Analysis: Lead incident response for critical big data systems, perform detailed post-incident reviews, and implement corrective actions to prevent recurrence
  • Observability and Monitoring: Develop and maintain comprehensive observability frameworks (metrics, logging, alerting) using tools such as Prometheus, Grafana, or ELK to ensure early detection of anomalies and service degradations
  • Security, Compliance, and Change Governance: Enforce data platform security controls, manage configuration changes, and ensure compliance with organizational and regulatory standards for data protection and access

Who You Are

Core competencies, knowledge, and experience

  • Technical Expertise: Strong hands-on experience with big data ecosystems (Hadoop, Spark, Kafka, HDFS, Hive, Flink, Iceberg, Trino) and distributed systems performance tuning, troubleshooting, and optimization
  • Reliability Engineering: Proficient in applying SRE principles—automation, monitoring, incident response, fault tolerance, and resilience engineering—to maintain system uptime and reliability
  • DevOps & Automation: Skilled in CI/CD pipelines, Infrastructure as Code (e.g. Terraform, Ansible), and container orchestration platforms such as Kubernetes for scalable data workloads
  • Monitoring & Observability: Deep understanding of metrics collection, alerting, and visualization tools (Prometheus, Grafana) to ensure proactive system health management
  • Security & Governance: Knowledge of authentication and authorization frameworks (Kerberos, Ranger, OAuth2), encryption standards, and compliance best practices for big data environments
  • Collaboration & Communication: Strong problem-solving, documentation, and cross-functional communication skills, enabling effective collaboration with data engineers, platform teams, and security teams and other key stakeholders

Must have technical/professional qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, Computer Engineering, or equivalent
  • 2+ years of experience in Site Reliability Engineering, DevOps, or Big Data Platform Engineering within large-scale, distributed environments
  • Proven experience managing and optimizing Hadoop ecosystem components (HDFS, Hive, Spark, Flink, Iceberg, Trino, etc.)
  • Hands-on experience with Linux systems administration, network troubleshooting, and performance optimization in production clusters

Not a perfect fit?

Worried that you don’t meet all the desired criteria exactly? At Vodafone we are passionate about empowering people and creating a workplace where everyone can thrive, whatever their personal or professional background. If you’re excited about this role but your experience doesn’t align exactly with every part of the job description, we encourage you to still apply as you may be the right candidate for this role or another opportunity.

What's In It For You

Who we are

We are a leading international Telco, serving millions of customers. At Vodafone, we believe that connectivity is a force for good. If we use it for the things that really matter, it can improve people's lives and the world around us. Through our technology we empower people, connecting everyone regardless of who they are or where they live and we protect the planet, whilst helping our customers do the same.

Belonging at Vodafone isn't a concept; it's lived, breathed, and cultivated through everything we do. You'll be part of a global and diverse community, with many different minds, abilities, backgrounds and cultures. ;We're committed to increase diversity, ensure equal representation, and make Vodafone a place everyone feels safe, valued and included.

Interested in this job?

12 days left to apply

Apply now

Share this opportunity

Help others find their dream job

How to Apply

Apply now
Send message
Cancel