We are building an AI layer on top of an existing fraud detection product used by banks. The goal is to allow bank users to describe fraud scenarios in natural language, and have the system automatically generate performant, validated SQL detection rules — removing the dependency on deep SQL expertise on the client side.
The existing system runs Java services with SQL Server and Oracle backends. The new AI layer is Python-based — no Java integration is required from candidates. The core pattern is not direct NLP-to-SQL. Instead, it uses an intermediate JSON layer: natural language input is parsed into a structured JSON representation (fields, aggregations, filters, thresholds), which then generates the final SQL. This allows validation and performance checks before execution.
LLM preference is self-hosted Llama for financial data privacy, with Azure OpenAI as an alternative. A model-agnostic interface is required. Fine-tuning is a last resort only.
RAG is required — DB schemas, column mappings, and existing customer rules are injected as context. Evaluation agents call a REST API to run generated SQL and return false positive rate and latency metrics.
The system must support multi-tenancy with tenant-isolated data. Deployment is cloud-first, with on-prem capability for security-sensitive clients in a later phase.
~300 columns, basic types only. Billions of rows, ~4,000 transactions/second, 19-day retention window. Thousands of real production rules available as training corpus. Key challenge: per-tenant column mappings that vary by client.
This is not greenfield. An initial version exists with some agents already defined and early training begun. We need an engineer who can review the existing design, provide architectural feedback, and accelerate delivery.
Critical Skills
Required Skills
Nice to Have