Wise Step Recruiting · Remote (Romania) · Remote · Contract · Senior ·B2B ·6 months, expected to extend

ML Engineer — Fraud Detection Platform

Python RAG LLM NLP SQL Databricks Azure Llama Text-to-SQL

About the Role

We are building an AI layer on top of an existing fraud detection product used by banks. The goal is to allow bank users to describe fraud scenarios in natural language, and have the system automatically generate performant, validated SQL detection rules — removing the dependency on deep SQL expertise on the client side.

Project Context

The existing system runs Java services with SQL Server and Oracle backends. The new AI layer is Python-based — no Java integration is required from candidates. The core pattern is not direct NLP-to-SQL. Instead, it uses an intermediate JSON layer: natural language input is parsed into a structured JSON representation (fields, aggregations, filters, thresholds), which then generates the final SQL. This allows validation and performance checks before execution.

LLM preference is self-hosted Llama for financial data privacy, with Azure OpenAI as an alternative. A model-agnostic interface is required. Fine-tuning is a last resort only.

RAG is required — DB schemas, column mappings, and existing customer rules are injected as context. Evaluation agents call a REST API to run generated SQL and return false positive rate and latency metrics.

The system must support multi-tenancy with tenant-isolated data. Deployment is cloud-first, with on-prem capability for security-sensitive clients in a later phase.

Data Scale

~300 columns, basic types only. Billions of rows, ~4,000 transactions/second, 19-day retention window. Thousands of real production rules available as training corpus. Key challenge: per-tenant column mappings that vary by client.

Current State

This is not greenfield. An initial version exists with some agents already defined and early training begun. We need an engineer who can review the existing design, provide architectural feedback, and accelerate delivery.

What You’ll Do

  • Review and provide architectural feedback on the existing AI system design.
  • Design and implement the RAG pipeline, injecting DB schemas, column mappings, and existing rules as context.
  • Build and refine the NLP → structured JSON → SQL generation pipeline.
  • Develop a model-agnostic LLM interface supporting Llama and Azure OpenAI.
  • Implement evaluation agents that measure false positive rate and query latency.
  • Ensure multi-tenant data isolation across all AI components.

Requirements

Critical Skills

  • Strong RAG architecture experience — schema and rule context injection.
  • Hands-on NLP-to-SQL or text-to-SQL work in production environments.
  • Experience with Databricks Genie (required).

Required Skills

  • LLM evaluation: false positive rate, query latency measurement.
  • Model-agnostic LLM interface design (Llama / OpenAI / Azure OpenAI).
  • Multi-tenant AI system design.

Nice to Have

  • Java (for reading/understanding existing services).
  • Databricks platform experience beyond Genie.