GenAI Ops Engineer

  • Location

    Washington

  • Sector:

  • Job type:

    Temporary

  • Salary:

    Negotiable

  • Contact:

    Alyssa Hank

  • Contact email:

    a.hank@ioassociates.com

  • Job ref:

    BBBH146622_1726532982

  • Consultant:

    Alyssa Hank

Location: Remote
Contract Duration: 3 month contract (with possibility of extension)


Job Overview:
We are seeking a skilled GenAI Ops Engineer to join a 3+ month platform support project. The ideal candidate will play a critical role in ensuring the smooth operation of AI/ML models and APIs, providing both user-level and project-level support. You will work in an agile environment alongside a small, collaborative team to maintain and optimize platform operations on AWS SageMaker and Kubernetes.

Key Responsibilities:

  • User-Level Support:
    • Provide user support, including troubleshooting access issues, responding to user inquiries, and offering education and documentation to ensure effective usage of GenAI tools and platforms.
  • Project-Level Support:
    • Handle new requests and escalations related to GenAI models and APIs.
    • Provide hands-on maintenance of deployed AI/ML models and ensure the platform is functioning optimally.
  • Platform Maintenance and Engineering:
    • Oversee infrastructure, particularly AWS SageMaker, to ensure model deployments are efficient and reliable.
    • Collaborate with platform engineering teams to support the SageMaker Inference, Kubernetes services, and troubleshoot any issues that arise.
  • Model and API Management:
    • Maintain and optimize the API layer, ensuring fast and reliable access to deployed models.
    • Work with TensorRT, TGI, and similar frameworks to manage inference for Large Language Models (LLMs).

Required Skills and Experience:

  • AWS SageMaker Inference:
    Experience in deploying and managing AI models on AWS SageMaker or a similar ML platform.
  • Kubernetes Service Layer:
    Hands-on experience with Kubernetes, particularly in managing service layers implemented in Golang.
  • TGI or LLM Frameworks:
    Exposure to TensorRT, TGI, or LLM inference frameworks is essential, especially for troubleshooting and optimizing model performance.
  • Golang:
    Experience with Golang is a plus, particularly if you've worked with proxies or backend services in Golang.

Nice to Have:

  • Experience working in an agile team environment.
  • Experience with troubleshooting application issues at both the platform and application levels.