We are looking for a SRE. The ideal candidate would be a self-motivated, go-getter, out of the box thinker, and ready to work in a high-energy start-up environment. He/she must demonstrate a high level of ownership, integrity, and leadership skills and be flexible and adaptive with a strong desire to learn & excel.
- Advanced Degree in Computer Science or relevant engineering discipline
- Practitioner experience with containerization (docker & Kubernetes), cloud technologies, tools (Jenkins, CodeDeploy) and practices (CI/CD patterns, automated provisioning & release, GitOps, IaC)
- Hands on experience Deploying and managing Highly Available, Scalable and resilient AWS/AZURE cloud application.
- Expertise in Infrastructure automation tools like Terraform, Ansible or CloudFormation
- Strong knowledge in at least one scripting language, preferably Python/Golang
- Strong experience of monitoring solution like Prometheus, Grafana, Kibana, ELK
- Outstanding interpersonal skills, the ability to innovate, inspire, and collaborate with cross group/functional teams with a high degree of independence and success
- Excellent written and verbal communication
Roles and Responsibilities:
- Support and help manage the whole AWS infrastructure for all production sites for world class uptime and resiliency metrics.
- Help Build, scale, and secure application cloud infrastructure using tools like Terraform, Kubernetes, and Docker.
- Build and maintain robust CI/CD pipelines with code Deploy and Bitbucket pipelines.
- Advocate and implement industry best practices for configuration management and build/deployment automation.
- Work closely with developers to provide insight into operational, security, and performance considerations.
- Work closely with developers during the deployment and testing phases to provide insight into operational, security, and performance considerations.
- Participate in an on-call rotation to triage and analyze abnormalities in system operation leveraging instrumentation like ELK.
- Perseverance to debug complex problems across the whole stack.
Site Reliability Engineer
- Create tooling that works across cloud providers like AWS, Azure.
- Help optimize and define engineering processes.
Good to have:
- Passion for writing great, simple, clean, and efficient code.
- Should be a fast learner and have excellent problem-solving capabilities.
- Should have excellent written and verbal communication skills.
- Experience in working with large-scale distributed systems is a plus.
- Should be able to independently design and build components for the automation platform.
- Should assist in the maintenance of the tools and troubleshooting the issues.
Why should you join Opcito?
We are a dynamic start-up that believes in designing transformation solutions for our customers with our ability to unify quality, reliability, and cost-effectiveness at any scale. Our core work culture focuses on adding material value to client products by leveraging best practices in DevOps like continuous integration, continuous delivery, and automation, coupled with disruptive technologies like cloud, containers, serverless computing, and microservice-based architectures.
Here are some of the perks of working with Opcito:
- Outstanding career development and learning opportunities.
- Competitive compensation depending on experience and skill.
- Friendly team and enjoyable work environment.
- Flexible working schedule.
- Corporate and social events.