Contract: Senior AI Engineer
Job Overview
Upwork ($UPWK) is the world’s work marketplace. We serve everyone from one-person startups to over 30% of the Fortune 100 with a powerful, trust-driven platform that enables companies and talent to work together in new ways that unlock their potential.
Last year, more than $3.8 billion of work was done through Upwork by skilled professionals who are gaining more control by finding work they are passionate about and innovating their careers.
This is an engagement through Upwork’s Hybrid Workforce Solutions (HWS) Team. Our Hybrid Workforce Solutions Team is a global group of professionals that support Upwork’s business. Our HWS team members are located all over the world.
This hybrid engagement supports the development and production hardening of Natural Language Query (NLQ) systems, AI agents, and related applications powering talk-to-data experiences. The work focuses on improving NLQ accuracy and semantic understanding while building robust, scalable, and observable AI systems suitable for enterprise production environments.
This engagement requires senior-level software engineering depth, combined with AI evaluation and semantic modeling expertise, to translate research concepts into reliable, deployable systems.
Work/Project Scope:
- Design, implement, and maintain production-grade AI services supporting NLQ and AI agent workflows.
- Evaluate the NLQ system accuracy using quantitative and qualitative methods (precision/recall, semantic correctness, result equivalence).
- Build automated evaluation pipelines and regression test suites integrated into CI/CD workflows.
- Design and validate ontology-driven semantic models and knowledge graphs to improve NL answers accuracy and consistency.
- Analyze NLQ failures and define structured error taxonomies, instrumentation, and logging to drive continuous improvement.
- Develop and expose well-designed APIs for NLQ services, evaluation systems, and semantic layers.
- Contribute to system architecture decisions including service boundaries, scalability, latency, and reliability tradeoffs.
- Support cloud-native deployment, monitoring, and operational readiness of AI services.
Must Haves (Required Skills):
- Strong software engineering background building production systems (Python, APIs, distributed services).
- Experience evaluating NLQ or LLM-based systems, including precision/recall and semantic correctness.
- Hands-on expertise designing automated test frameworks and evaluation pipelines.
- Experience with cloud-native architectures, CI/CD, and production deployment of AI systems.
- Knowledge of semantic modeling, ontologies, or knowledge graphs in data-driven applications.
- Proven ability to design scalable, reliable systems bridging AI models, data platforms, and applications.
Upwork is proudly committed to fostering a diverse and inclusive workforce. We never discriminate based on race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical condition), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.
Additionally, to the extent permitted under applicable law, a criminal background check may be required as a condition of engagement.
We use BrightHire, an AI-enabled tool, to record interviews and summarize interview transcripts. The tool allows the interviewer to focus on the discussion and does not score or evaluate talent or make recommendations. The interview transcripts are reviewed, and decisions are only made by humans. Any individual who prefers not to have their interview recorded through BrightHire can opt out when the interview is scheduled.
To learn more about how Upwork processes and protects your personal information as part of the application process, please review our Global Job Applicant Privacy Notice
Make Your Resume Now