Site Reliability Engineer, Incident Response

0
207

Site Reliability Engineers (SRE's) are responsible for keeping all user-facing services and other Chainlink production systems running smoothly. SRE's are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments.

The experience of our team feeds back into other engineering groups within the company, as well as to Chainlink Node Operators.

Your Impact

  • Code infrastructure automatio
  • Improve our Prometheus monitoring and building new metrics
  • Improve integrations with Slack/PagerDuty and other tools
  • PagerDuty rotation to respond to Chainlink's availability incidents and provide support for service engineers with customer incidents
  • Use your on-call shift to prevent incidents from ever happening
  • Make monitoring and alerting alert on symptoms and not on outages
  • Document every action so your findings turn into repeatable actions–and then into automation
  • Debug production issues across senrvices and levels of the stack.

Requirements

  • Think about systems – edge cases, failure modes, behaviors, specific implementations
  • Know your way around or are interested in learning PromQL (Prometheus Query Language)
  • Have strong programming/scripting skills – Javascript, Python, Go, etc
  • You are interested in collaborating and communicating asynchronously
  • Have an urge to document all the things so you don't need to learn the same thing twice
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it
  • Have a fast delivery and iterating mindset

Our Stack

  • Golang, TypeScript, Solidity, Postgres, Terraform, AWS

Our Principles

At Chainlink Labs, we’re committed to the key operating principles of ownership, focus, and open dialogue. We practice complete ownership, where everyone goes the extra mile to own outcomes into success. We understand that unflinching focus is a superpower and is how we channel our activity into technological achievements for the benefit of our entire ecosystem. We embrace open dialogue and critical feedback to arrive at an accurate and truthful picture of reality that promotes both personal and organizational growth.

About Chainlink Labs

Chainlink is the industry standard oracle network for connecting smart contracts to the real world. With Chainlink, developers can build hybrid smart contracts that combine on-chain code with an extensive collection of secure off-chain services powered by Decentralized Oracle Networks. Managed by a global, decentralized community of hundreds of thousands of people, Chainlink is introducing a fairer model for contracts. Its network currently secures billions of dollars in value for smart contracts across the decentralized finance (DeFi), insurance, and gaming ecosystems, among others. The full vision of the Chainlink Network can be found in the Chainlink 2.0 whitepaper. Chainlink is trusted by hundreds of organizations—from global enterprises to projects at the forefront of the blockchain economy—to deliver definitive truth via secure, reliable data.

This role is location agnostic anywhere in the world, but we ask that you overlap some working hours with Eastern Standard Time (EST).

We are a fully distributed team and have the tools and benefits to support you in your remote work environment.

Chainlink Labs is an Equal Opportunity Employer.