Silverbee is a fast-moving, seed-stage startup backed by Tier 1 Silicon Valley VCs. After a successful funding round, we are growing our core engineering team to build a production-grade agentic AI platform. Based in Tel Aviv, our mission is to become the service delivery backbone for the next generation of marketing departments.
This is a senior individual contributor role with full ownership over production AI behavior, not a research or experimentation-only position.
Own the end-to-end behavior of a production agentic AI system built with LangGraph, from prompt and architecture design through evaluation, deployment, and production debugging
Design, implement, and evolve agent architectures (planning, execution, supervision, routing, tool usage) with a clear focus on quality, determinism, latency, and cost
Take full ownership of system prompt design and tuning, treating prompts as first-class production artifacts that are versioned, evaluated, tested, and deployed as part of the SDLC
Continuously improve agent performance by running controlled experiments with models, prompting strategies, memory, and tooling, always tied to measurable impact
Design, run, and evolve evaluation loops (offline evals, regression testing, and production signals) that directly gate releases and drive architectural decisions
Turn insights from evaluations, traces, and real production usage into concrete improvements in prompts, agents, and system design
Raise the bar on what “production-ready AI” means at Silverbee: reliability, observability, repeatability, and maintainability
Mentor other engineers and help establish best practices for building, evaluating, and operating production-grade AI systems
Strong hands-on experience owning LLM-powered systems that run in production, including diagnosing and fixing real-world failures
Practical experience designing and operating AI evaluation frameworks, including offline evals, regression testing, and qualitative and quantitative metrics
Deep understanding of prompt engineering as an engineering discipline: structure, constraints, iteration, debugging, and performance tuning
Experience with agent frameworks such as LangChain / LangGraph, or equivalent agentic architectures
Solid software engineering skills (Python and/or TypeScript), including system design, testing, and code quality
Strong product sense, with the ability to reason clearly about trade-offs between correctness, speed, cost, and flexibility
Proven ability to operate in ambiguous problem spaces and be accountable for outcomes, not just experimentation
Clear communication skills and the ability to work effectively with product managers, domain experts, and other non-AI stakeholders
The reason this job vacancy is written in English is that we are looking for a candidate who has proficiency in the language at the B1-B2 level.
If you are interested in participating in our project, please apply for the vacancy! We look forward to meeting you!
Legacy Online School
Тбилиси
до 7000 USD