• Collaborate with the engineering team on projects as the expert on reliability, performance, and efficiency.
• Automate away the process of managing capacity, safely deploy software, and mitigate system failures.
• Proactively identify and develop areas to improve monitoring/alerting, reliability, performance, and automation
• Ability to root cause / troubleshoot issues in a fast paced environment, and implement solutions to prevent them from happening again
• Participate in on call 24*7*365 rotation to respond to alerts or outages
• Work closely to support engineering teams
• Look for areas to improve: remove bottlenecks, eliminate waste, improve performance, and reduce costs.
• Previous experience in an SRE or related role: DevOps, platform engineering, software engineering
• CS Degree (or related field) and/or a demonstrable, solid understanding of CS fundamentals.
• Proficient coder: strong with at least one programming language. (Golang a plus)
• Deep understanding of Linux system internals / OS fundamentals.
• Experience with distributed / highly available systems architecture, theory and practice.
• Understanding of container and orchestration tools like Docker, kubernetes, mesos, etc
• Experience with an infrastructure-as-code tool (terraform, cloudformation, etc) [tf preferred]
• Previous experience building and maintaining production systems in the cloud (AWS preferred)
• Knowledge of security best practices operating in the cloud
• Experience with configuratioon management tools like Chef, Puppet, or Ansible
• Working knowledge of networking and common internet protocols (http, ssl, dns, tcp/ip)
• Previous experience working on production, user facing internet applications at scale
For more information on this job visit: https://vc5consulting.com/
VC5 Consulting has been named by business journals as one of the best places to work.
We offer benefits such as weekly pay, health insurance, 401k and even profit sharing to our consultants.