[VCK] Senior Data Engineer (AI Ingestion Platform)

Softwaremind

Argentina

Posted June 19, 2026

Full-time Mid-Senior Level

Job Overview

About the Project

Software Mind is building a private, tenant-isolated AI assistant for the real estate title and settlement industry. The platform is a retrieval-first (RAG) system that ingests historical email, documents, and structured metadata into a per-tenant vector index, and serves grounded, cited, expert-weighted answers through a chat-style Q&A interface with single sign-on and full audit logging.

The platform is AWS-native with a Python/FastAPI backend, Vue.js frontend, OpenSearch/Pinecone vector store, and OpenAI/Anthropic/Bedrock as LLM provider. You will join a senior, cross-functional LATAM-based team where hands-on AI delivery experience not just familiarity is the baseline expectation.

You own the ingestion and processing backbone of the platform the pipelines that transform raw email and document corpora into clean, PII-minimised, chunked, and indexed data in the per-tenant vector store. This is the foundational layer the AI extraction gateway depends on; quality here directly determines system accuracy.

Your Responsibilities

Build and own the historical email ingestion pipeline via Microsoft Graph API

Implement SharePoint / OneDrive document ingestion pipeline with scoped folder access

Design and implement the PII minimisation pre-processing layer

Build the vector store indexing workflow (OpenSearch/Pinecone) with per-tenant data isolation

Define and implement the data processing schema; produce and maintain schema documentation

Build the OCR routing orchestrator and integrate OCR service for scanned documents

Implement the raw text / content extraction layer for all supported document types

Define and prototype push vs. pull ingestion strategy, from one-time PoC through to incremental nightly pipeline

Ensure data lineage and audit traceability are built into pipeline outputs from the outset
Tech Stack: Python, Microsoft Graph API, AWS (S3, DynamoDB, Lambda), OpenSearch, Pinecone, OCR Tooling, PII Libraries, NER Libraries, Docker, Jira, Confluence

[VCK] Senior Data Engineer (AI Ingestion Platform)

Job Overview

Ready to Apply?