MLOps Engineer

Impact Tech Ltd.

Abroad (Cyprus) or Remote

03.07.2021.

SQL Apache NoSQL ElasticSearch Python Docker Jenkins PostgreSQL Jira Confluence DevOps Golang Cloud MongoDB Agile Kafka Kubernetes intermediate

Impact Tech LTD is looking for an MLOps Engineer as a lead role for our DevOps environment for AI team with his experiences, best practices, and a collaborative attitude to help drive DevOps initiatives. The responsibilities include both managing and building processes for automation as well as contributing to the development of internal tools to achieve operational efficiency.

Key Responsibilities

Maintain AI infrastructure clusters
Maintain models training infrastructure (GPU clusters)
Deploy and maintain Kubeflow infrastructure
Design and implement alerts system for models quality and AI services availability
Deploy and maintain hyper-parameters tuning infrastructure
Prioritizing requests from AI team fairly while demonstrating a sense of empathy
Maintain and enhance our CI/CD pipelines for AI
Collaborate with data engineering team to support production grade AI system
Develop automation flows that enable fast delivery and replace manual operating procedures wherever they exist to enable self-service operations
Drive analysis, design, and development of automation tools for deployment, development, and operational tasks
Deploy & manage monitoring/observability infrastructure for staging & production
Collaborate with DevOps team to enhance common infrastructure
Make sure new environments meet requirements and conform to best practices

Required Qualifications

2+ years’ experience within hands-on technical DevOps/Cloud engineering
Good knowledge of Python or Golang
Experience with Kubernetes deployment patterns and tools such as Helm, Kustomize and Operators
Experience utilizing DevOps tool chains including Jenkins, Docker, SonarQube, GitHub
Experience with tools used for observability such as Elasticsearch, Kibana, Grafana, Prometheus, Jaeger etc.
Experience with SQL & NoSQL databases such as PostgreSQL and MongoDB
Experience with event steaming tools (i.e. Apache Kafka) and architecture patterns
Exposure to Agile environments (use of Jira/Confluence, sprints, etc.)
Good understanding of Machine Learning project life-cycle
Great communication skills and team player mentality

Desirable

Experience with production grade machine learning systems
Advanced knowledge of Fairing frameworks or Kubeflow
Experience with development of custom Kubernetes operators
Experience with AutoML infrastructure
Infrastructure as Code experience (Terraform, CloudFormation, etc.)
Experience with Azure public clouds is a plus
Understanding of network engineering and security principles (e.g. protocols, routing, switching, filtering, firewall rules, etc.)