Senior SRE

Описание вакансии

About us

P2P.org is the largest institutional staking provider with a TVL of over $10 and a market share exceeding 20% in restaking.

We are continually focused on researching and improving our infrastructure to extract maximum APR while enhancing security. For instance, in ETH and SOL, our NRR is on average 10% higher than the market, and in DOT, it's 20% higher.

We also place significant focus and resources on launching new networks such as TON, Avail, Monad, Babylon, Story, Berachain, and others, along with yield products. From restaking, where we are the largest operator with a 20+% market share, to yield aggregators on stablecoins.

Our clients include BitGo, Copper, Crypto.com, Ledger, ByBit, Bitget, OKX, HTX, Bitvavo, SBI, and others, who choose us for our client-centric approach and extensive product line from unified API to widgets and custom dApps.

We are also actively expanding our product line, exploring RWA, data, yield, and service products for banks, exchanges, custodians, and wallets.

P2P.org unites talented individuals globally.

Despite our distributed team, we share a passion for decentralized finance - a fairer system for all. We code, learn, create, and connect to shape finance's future.

P2P.org boasts a strong reputation and network. We prioritize customer satisfaction and, as tech enthusiasts, develop innovative solutions that bolster our brand.

Who we are looking for

We’re looking for an experienced Site Reliability Engineer who excels in building scalable, secure, and automated infrastructure. You’ll collaborate with multiple engineering teams to solve complex reliability challenges and drive continuous improvement through automation, observability, and data-driven insights.

You will

Ensure high reliability and scalability of multi-regional infrastructure and shared platforms
Promote DevOps Enablement culture - support teams in interacting with CI/CD pipelines, observability systems, and secret management tools
Design, maintain, and evolve automation for deployment, monitoring, and incident response
Advance the technology stack through automation and innovation, using data-driven insights to improve performance, security, cost-efficiency, and eliminate repetitive manual tasks
Proactively identify and mitigate system anomalies before they impact users or SLAs
Maintain clear documentation and create tooling to improve reliability and operational transparency
Collaborate cross-functionally with developers, data engineers, and platform teams to ensure smooth operations and fast incident recovery

You have

At least 4 years of experience as SRE Engineer
Kubernetes (advanced): Deep hands-on expertise managing production clusters for 2+ years
Mindset: Highly proactive, collaborative, and eager to help others succeed
CI/CD & GitOps: Proficiency with ArgoCD and GitHub Actions; strong understanding of automated delivery pipelines
Observability: Proven experience operating and troubleshooting VictoriaMetrics (Prometheus), Loki and OpenTelemetry stack
Security Mindset: Expert in security hardening and least-privilege principles; Hands-on with HashiCorp Vault (Cluster management, Secrets Operator, Vault Injector)
Programming / Scripting: Skilled in Shell, and at least one of Python or Golang
English: B2+ or higher

What we offer

Remote working in an international distributed team
Full-time Contractor (Indefinite-term Consultancy Agreement)
Competitive salary level in $ (we can also pay in Crypto)
Well-being program
Mental Health care program
Compensation for education, including foreign language study programs & professional growth courses
Equipment & co-working reimbursement program
Overseas conferences, community immersion
Positive and friendly communication culture