Site Reliability Engineer

Description:

As a Site Reliability Engineer 1, you will work on the reliability, uptime, and overall performance of services and infrastructure.

You will collaborate with engineering and operations teams to design, deploy,and support highly reliable systems. This role requires a blend of software engineering skills, an understanding of cloud platforms, and a commitment to continuous improvement and automation in the systems you manage.

Key Responsibilities:

● System Monitoring & Performance: Monitor, troubleshoot, and resolve issues related to production systems, including availability, latency, and error rates.

● Incident Response: Respond to incidents and outages, investigate root causes, and implement solutions to prevent recurrence.

● Automation & Efficiency: Automate repetitive tasks, manage system configurations, and contribute to the development of monitoring and alerting systems.

● Collaboration: Work closely with software engineers to build scalable, reliable, and efficient systems. Participate in the design and implementation of new features with a focus on reliability.

● Capacity Planning: Assist in planning and scaling infrastructure to meet business requirements, analyzing capacity trends, and ensuring sufficient resources.

●Documentation & Knowledge Sharing: Document system configurations, runbooks, and incident post-mortems. Share knowledge within the team to improve operational efficiency.

● Infrastructure as Code: Utilize tools like Terraform, Ansible, or Kubernetes for infrastructure provisioning and management.

● On-Call Support: Participate in the on-call rotation to provide support and troubleshooting assistance for production systems.

Requirements:

Educational Background: Bachelor’s degree in computer science, Information Technology, or a related field (or equivalent work experience).

● Technical Skills:

Basic knowledge of Linux/Unix systems and command-line tools.
Understanding of cloud platforms (AWS, GCP, Azure, OpenShift etc.)
Familiarity with containerization technologies (Docker, Kubernetes).
Experience with monitoring and alerting tools (Prometheus, Grafana, Nagios, etc.).
Basic programming or scripting experience (Python, Bash, Go, etc.).
Knowledge of version control systems (e.g., Git).
Problem-Solving Skills: Ability to troubleshoot and resolve technical issues in production environments.
Communication Skills: Clear and concise communication, both written and verbal, to interact with cross-functional teams and document processes.
Learning Attitude: Eagerness to learn and grow in the field of Site Reliability Engineering, with the ability to adapt to new technologies and tools.

Preferred (but not required):

Experience with automation tools (Ansible, Terraform, Chef, etc.).
Familiarity with CI/CD pipelines and tools (Jenkins, GitLab, CircleCI).
Experience in managing databases or distributed systems.
Knowledge of security best practices in the context of cloud and infrastructure management.

Organization	TechnoGenics SMC PVT LTD
Industry	Engineering Jobs
Occupational Category	Site Reliability Engineer
Job Location	Lahore,Pakistan
Shift Type	Morning
Job Type	Full Time
Gender	No Preference
Career Level	Intermediate
Experience	2 Years
Posted at	2026-05-06 6:08 pm
Expires on	2026-07-25