Skip to content

isgSearch – Sr. Reliability Engineer – Toronto, ON

Company: isgSearch

Location: Toronto, ON

Expected salary:

Job date: Sun, 08 Dec 2024 02:25:48 GMT

Job description: Sr. Reliability Engineer – Fully Remote in CanadaRequirements:

  • Expertise in incident management and resolving live production issues.
  • Strong troubleshooting skills, with a focus on performance optimization for large-scale applications.
  • Proven experience in developing and maintaining reliable monitoring and alerting systems in high-demand environments.
  • 7+ years of experience with the .NET Framework (C#), ensuring production stability.
  • Proficiency in Kubernetes, Docker, and cloud platforms (GCP preferred).
  • Experience with monitoring tools like Prometheus, Grafana, and Kibana.
  • Familiarity with incident management tools such as FreshDesk and Confluence.
  • Strong critical thinking and problem-solving abilities.
  • Solid project management skills with a focus on scalability and system reliability.

VMWare Our client…Our client is a leading fintech company with a strong presence across Canada, driving innovation in financial services.Responsibilities:

  • Operational Support: Provide live support for client applications, monitoring services to detect critical failures, and ensuring fast recovery with minimal downtime.
  • Incident Resolution: Lead the response to production issues, ensuring resolution within SLA and SLO timelines. Conduct root cause analysis and implement permanent solutions.
  • Monitoring & Reporting: Enhance monitoring systems and alerting mechanisms to proactively detect issues. Prepare data-driven reports to present findings clearly.
  • System Stability & Scalability: Offer expert guidance on improving system stability and scalability across production environments.
  • Process Automation: Drive initiatives to automate operational processes, improving efficiency across LiveOps.
  • Postmortem & Continuous Improvement: Lead postmortem meetings, documenting findings and action items for future prevention.
  • Cross-Functional Collaboration: Work with engineering teams to quickly resolve issues and implement long-term fixes.
  • Team Leadership & Mentorship: Guide and mentor junior reliability engineers, ensuring high standards are maintained.
  • On-Call Support: Participate in after-hours on-call rotation for production support.

No comment yet, add your voice below!


Add a Comment

Your email address will not be published. Required fields are marked *