Middle Site Reliability Engineer (SRE)

Тбилиси

Описание вакансии

Social Discovery Group (SDG) is the 3rd largest social discovery company in the world, uniting 60+ brands with 500 million users. We solve the problems of loneliness, isolation, and disconnection by transforming virtual intimacy into the new normal. Our portfolio includes online communication platforms focusing on AI, game mechanics, and video streaming - Dating.com, DateMyAge, Cupid Media, Dil Mil, Kiseki, and others.

SDG invests in IT startups around the world. Our investments include Open AI, Patreon, Flo, Clubhouse, Woebot, Flure, Astry, Coursera, Academia.edu, and many others.

We bring together a team of like-minded people and IT professionals specializing in the creation and development of globally impactful social discovery products. Our international team of 1200 professionals and digital nomads works all over the world.

Our teams of digital nomads work remotely from Cyprus, Malta, the USA, Armenia, Georgia, Kazakhstan, Montenegro, Poland, Latvia, Serbia, Spain, Portugal, UAE, Israel, Turkey, Thailand, Indonesia, Japan, Hong Kong, Australia and many other locations.

In August 2024, we achieved Great Place to Work US Certification™! This achievement reflects our core belief that a truly exceptional workplace is built on trust, pride, and camaraderie—not just great perks.

We are looking for a Middle Site Reliability Engineer (SRE).

Your main tasks will be:

Deploy and manage updates to the company’s systems in a safe and controlled manner;
Manage and maintain infrastructure environments using infrastructure-as-code tools (Ansible, Terraform);
Support and improve containerised environments with Podman, Docker, and Kubernetes;
Develop and maintain automation scripts using Shell (Bash, PowerShell) and programming languages (Python or Go);
Manage source control systems and workflows (Git);
Monitor and maintain message queues, particularly RabbitMQ (AMQP);
Collaborate with DevOps, development, and QA teams to ensure smooth CI/CD workflows and system reliability;
Troubleshoot system and application issues, and proactively improve system performance;
Administer Cloudflare and Akamai platforms, including DNS management, WAF rules, and caching policies.

We expect from you:

Experience with source control systems such as Git;
Proficiency in shell scripting (Bash and/or PowerShell);
Practical experience in at least one programming language (Python or Go preferred);
Hands-on experience with infrastructure-as-code tools (e.g. Ansible, Terraform);
Experience with containerization (Docker, Podman) and container orchestration (Kubernetes);
Knowledge of messaging systems using AMQP (RabbitMQ in particular);
Familiarity with Linux and/or Windows server environments;
Administer Cloudflare and Akamai platforms, including DNS management, WAF rules, and caching policies.

Nice to have:

Knowledge of CI/CD pipelines and tools (e.g. GitLab CI)
Cloud platform experience (AWS, GCP)
Experience with monitoring tools (e.g. Zabbix, Prometheus, Grafana, Victoria metrics);
Skills in building and analysing service-to-service interaction maps (service-to-service traffic visusalization) — using tools like Kiali, Prometheus/Grafana, Jaeger, or other observability platforms

What do we offer:

REMOTE OPPORTUNITY to work full-time;
Vacation 28 calendar days per year;
7 wellness days per year (time off) that can be used to deal with household issues, to lie down and recover without taking sick leave;
Bonuses up to $5000 for recommending successful applicants for positions in the company;
50% payment for professional training, international conferences and meetings;
Corporate discount for English lessons;
Health benefits. According to the paychecks, if you are not eligible for corporate medical insurance, the company will compensate you with up to $1,000 gross per employee per year. This can be spent on self-purchase of health insurance or on doctors’ fees for yourself and close relatives (spouse, children);
Workplace organisation. The company provides all employees with an equipped workspace and all necessary equipment (table, armchair, Wi-Fi, etc.) in our offices or co-working locations. At the other locations, the company provides reimbursement for workplace costs up to $1000 gross once every 3 years, according to the paychecks. This money can be spent on the rent of the co-working room, on equipping the working place at home (desk, chair, Internet, etc.) during those 3 years.
Internal gamified gratitude system: receive bonuses from colleagues and exchange them for our merchandise, team building activities, massage certificates, etc.

Sounds good? Join us now!

Мы ищем инженера среднего уровня по надежности сайта (SRE).

Ваши основные задачи будут заключаться в следующем:

Безопасное и контролируемое развертывание и управление обновлениями систем компании;
Управление и обслуживание инфраструктурных сред с помощью инструментов «инфраструктура как код» (Ansible, Terraform);
Поддержка и улучшение контейнерных сред с помощью Podman, Docker и Kubernetes;
Разработка и обслуживание скриптов автоматизации с использованием Shell (Bash, PowerShell) и языков программирования (Python или Go);
Управление системами контроля версий и рабочими процессами (Git);
Мониторинг и обслуживание очередей сообщений, в частности RabbitMQ (AMQP);
Сотрудничать с командами DevOps, разработчиков и QA для обеспечения бесперебойной работы CI/CD и надежности системы;
Устранять неполадки в системе и приложениях, а также проактивно улучшать производительность системы;
Администрировать платформы Cloudflare и Akamai, включая управление DNS, правила WAF и политики кэширования.

Мы ожидаем от вас:

Опыт работы с системами контроля версий, такими как Git;
Владение скриптами оболочки (Bash и/или PowerShell);
Практический опыт работы по крайней мере с одним языком программирования (предпочтительно Python или Go);
Практический опыт работы с инструментами инфраструктуры как кода (например, Ansible, Terraform);
Опыт работы с контейнеризацией (Docker, Podman) и оркестрацией контейнеров (Kubernetes);
Знание систем обмена сообщениями, использующих AMQP (в частности, RabbitMQ);
Знакомство с серверными средами Linux и/или Windows;
Администрирование платформ Cloudflare и Akamai, включая управление DNS, правилами WAF и политиками кэширования.