Asking your business data questions in plain English

Every company has the same pattern. Someone needs a number. Revenue by product line. Budget variance by department. Top customers by margin.

If they are technical, they write a query. If they are not, they ask someone who can. Or they dig through a dashboard that almost answers their question but not quite. Or they export to Excel and spend an hour pivoting.

Every ad hoc question means waiting. And most business questions are ad hoc.

What we built

We built a system where you ask a question in plain language. The AI generates the appropriate database query. The database executes it. Only the results come back to the conversation. The AI then presents the answer in a readable format.

The AI never sees your raw data. It generates the query, the database runs it, and only the small result set returns. This matters for cost, for performance, and for data privacy.

Running in production

This is currently running internally at DigiDuo. Our dataset: 70,000 rows across 6 business tables — budget, transactions, orders, customers, revenue targets, and employees.

Here is what actual interactions look like:

Question: “What were our top 5 customers by margin last quarter?”

Customer	Gross Margin	Margin %
Acme Solutions	34,200 EUR	42.1%
Nordic Partners	28,750 EUR	38.6%
Baltic Freight	21,400 EUR	35.2%
TechServe Group	19,800 EUR	33.9%
DataPoint Ltd	17,600 EUR	31.4%

Question: “Show me departments that exceeded budget by more than 10% in any month this year.”

Department	Month	Budget	Actual	Overspend
Marketing	February	8,000 EUR	9,640 EUR	+20.5%
Engineering	March	22,000 EUR	25,100 EUR	+14.1%
Marketing	March	8,000 EUR	8,960 EUR	+12.0%

Question: “Compare actual revenue vs targets for Q1, broken down by product line.”

Product Line	Target	Actual	Variance
Consulting	120,000 EUR	134,500 EUR	+12.1%
Managed Services	85,000 EUR	79,200 EUR	-6.8%
Training	30,000 EUR	28,400 EUR	-5.3%
Licensing	45,000 EUR	51,100 EUR	+13.6%

No SQL. No dashboard hunting. No waiting for the data team.

Why not just paste data into ChatGPT?

Fair question. Here is why that does not work at scale:

Data size. 70,000 rows do not fit in an AI context window. Even if they did, performance degrades badly with large inputs.

Cost. Sending large datasets as context with every query is enormously expensive. Our approach sends a small query description, not the data itself.

No live connection. Pasting data is a snapshot. By the time you finish your analysis, the data may be stale. Our system queries live data every time.

Privacy. With our approach, the AI generates the query but never sees the underlying dataset. Only the filtered, aggregated results enter the conversation.

The secret ingredient: a context layer

The system works because of a context layer — a markdown file that tells the AI what the data actually means.

Table relationships. Column naming conventions. Business logic rules. What “margin” means in your company. Which fiscal year definition to use. How departments map to cost centres.

Without this context layer, the AI generates wrong SQL. It guesses at column names. It misunderstands relationships. It applies the wrong filters.

With it, accuracy is high. The AI knows that cust_id in the orders table maps to customer_id in the customers table. It knows that your fiscal year starts in April. It knows that “margin” means gross margin, not net.

This context file takes a few hours to write. It saves hundreds of hours of wrong answers.

Three data modes

We built three modes depending on what the situation requires:

Direct file query. The AI queries the source file directly every time. Always current. Best for data that changes frequently.

Loaded tables. Data is loaded into a local database for faster querying. A stable snapshot. Best for analysis sessions where you need speed and consistency.

Scheduled refresh. An automated pipeline that refreshes the database on a set schedule — daily, hourly, whatever fits. Best for production use where you need both speed and freshness.

What it takes to set up

For a standard setup: 1-2 weeks. That includes understanding your data, building the context layer, configuring the query system, and testing with real questions from your team.

It works with CSV files, Excel exports, ERP data dumps, and any structured data source. If your data lives in rows and columns, this approach works.

The bottom line

The question is not whether your team needs data. It is whether they can get it without asking someone else.