Natural Language Q&Ai with Oracle 26ai

Many organizations desire to have as real-time as possible analytic systems that give them the capability to analyse their data sourced from various core business systems, at quite a granular level with all the built in governance and security to keep that data secure.

I am being asked with increasing frequency about how users can ask questions like “With Whom did I spend most with last month and what was it on and how much was the split in Dollars and British Pounds” without employing any specific tools, and also have that optimized so it’s both as accurate as possible and also doesn’t use an excessive quantity of tokens.

A lot of organization source core data from Oracle Cloud ERP/HCM/EPM, and one way we can perform Natural Language queries with data stored in an Oracle database is to use Select AI, which uses a Large Language Model (aka LLM). In this example i’ll use OpenAI as my provider and an Oracle Autonomous 26ai database, but Select AI has been back-ported to the non-autonomous database too.

Setting It Up

Bit of admin first, as you’ll need to grant execute on DBMS_CLOUD_AI to your schema user first as well as appending a network Access Control Entry (ACE) so the database user/schema is allowed to make outbound HTTP calls from the Database to the host api.openapi.com (obviously this is set to whichever LLM provider you decide to use).

GRANT EXECUTE ON DBMS_CLOUD_AI TO BEYOND;

BEGIN
DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
host => ‘api.openai.com’,
ace => xs$ace_type(
privilege_list => xs$name_list(‘http’),
principal_name => ‘BEYOND’,
principal_type => xs_acl.ptype_db
)
);
END;
/

Create a credential so that we have it saved in encrypted form in the database so we are not storing the API key in readable format. I simply give it a name and then provide my OpenAI API username and password.

Begin
DBMS_CLOUD.CREATE_CREDENTIAL(
  credential_name => 'OPENAI_CRED_2025',
  username        => 'My_OpenAI_User',
  password        => 'MY_OpenAI_API'
);
End;

Create a profile in the database which refers to the credential and then sets a bunch of important things.

BEGIN
  DBMS_CLOUD_AI.CREATE_PROFILE(
    profile_name => 'PO_NL2SQL',
    attributes   => '{
      "provider": "openai",
      "credential_name": "OPENAI_CRED_2025",
      "object_list": [
        { "owner": "BEYOND", "name":   "COSMOS_FUSION_PO_DATA" }
      ],
      "enforce_object_list": true,
      "conversation": "false",
      "comments": true,
      "constraints": true,
      "annotations": true
    }'
  );
END;

In the example above I specifically reference a single table to use, so only this is considered when the LLM creates queries. I also allow the passing of comments, annotations and constraints. Note I didn’t tell it which model to use – just to show you don’t need to – but you should ( as depending on the tasks you’re asking of it, certain models from the provider you choose will have different uses/trade-offs e.g. more tokens used but improved results, and i’d trial that and assess results.

We can change the various attributes of the profile, so for example I want to add additional tables I can do something like this :

BEGIN
  DBMS_CLOUD_AI.SET_ATTRIBUTE(
    profile_name    => 'PO_NL2SQL',
    attribute_name  => 'object_list',
    attribute_value => '[
      { "owner": "BEYOND", "name": "COSMOS_FUSION_PO_DATA" },
      { "owner": "BEYOND", "name": "COSMOS_SUPPLIER_FOCUS" }
    ]'
  );
END;
/

A quick word on Annotations. If you’re not familiar with Annotations then they are are new from 23ai (back ported to 19.28), but what’s the difference between annotations and comments? Ulrike Schwinn has a very focused blog on Annotations, where she points out that annotations are available for different kinds of database objects, i.e. not just Tables but also views, MVs etc. Also comments are only a “freeform value” whereas annotations can have name:value pairs and thus have multiple for each object, whereas you cannot assign multiple comments for the same object.

What Data Shall We Use?

For our example we will use the two tables we set in the profile about. Firstly one is a denormalized table of some Cloud ERP procurement data from a test database. This holds suppliers, PO numbers, dates, items, order quantity, etc, etc. This is called COSMOS_FUSION_PO_DATA

The next is a little new table created by marketing, highlighting suppliers with a flag of “Y” next to the ones we want to focus more attention on to improve supplier relationships.

I have created Primary and Foreign keys on the tables. These are hugely important bits of metadata ( and relational design! ) so take that as a given. Let us give the LLM even more to go on though, let’s provide a suite of annotations at both table and column level.

Cosmos_Fusion_Po_Data

ALTER table cosmos_fusion_po_data
  ANNOTATIONS (
    ADD grain         'One row per PO line',
    ADD measure       'Spend = SUM(line_amount)',
    ADD derived_rule  'line_amount = ordered_quantity * unit_price',
    ADD currencies    'GBP (UK pounds), USD (US dollars)',
    ADD join_hint     'Join to COSMOS_SUPPLIER_FOCUS on supplier_number',
    ADD category_text 'Use item_description for category-style analysis'
  );


ALTER TABLE cosmos_fusion_po_data
  MODIFY (
    supplier_number ANNOTATIONS ( ADD joins_to 'COSMOS_SUPPLIER_FOCUS.SUPPLIER_NUMBER' ),
    item_description ANNOTATIONS ( ADD synonyms 'category, item, description' ),
    line_amount ANNOTATIONS ( ADD business_name 'PO Line Spend' ),
    po_currency ANNOTATIONS ( ADD allowed_values 'GBP, USD' )
  );

Cosmos_Supplier_Focus

ALTER TABLE cosmos_supplier_focus
  ANNOTATIONS (
    ADD purpose    'Flag suppliers that are currently being focused on.  This table only has the supplier number, supplier name and a focus_flag to indicate Y if they are under focus on null if not',
    ADD grain      'One row per supplier_number',
    ADD join_hint  'Join to COSMOS_FUSION_PO_DATA on supplier_number',
    ADD semantics  'focus_flag=Y means supplier is in focus; NULL means not'
  );

ALTER TABLE cosmos_supplier_focus
  MODIFY (
    supplier_number ANNOTATIONS (
      ADD role 'primary_key',
      ADD joins_to 'COSMOS_FUSION_PO_DATA.SUPPLIER_NUMBER'
    ),
    focus_flag ANNOTATIONS (
      ADD allowed_values 'Y or NULL',
      ADD meaning 'Y = focus supplier; NULL = not in focus'
    )
  );

Note that the SQL Developer plug in for VSCode provides a rather nice interface for AI data enrichment.

Let’s ask some questions

Before we start, we just need to set the profile. We can of course have multiple different profiles if we want to use different LLMs or pass/hide parameters. In this example i’m in a SQL interface, if I was to do this in APEX for a “friendly front end” we need to use GENERATE (see here) and i’ll leave that for another article.

Begin
   DBMS_CLOUD_AI.SET_PROFILE('PO_NL2SQL');
End;

Explain The SQL

If we use explainsql then we can get some great feedback of the SQL that the LLM would run based on the metadata passed, which is dependent on the profile settings and also the metadata objects we have created. e.g. we define some annotations and we set that they are passed in the profile.

Rather than building up from some super simple cases, let’s dive in with something that covers a bit of ground.

This actually has some complexity to it.

We are asking for GBP purchase orders only

We are restricting on the last 2 months

We just want to have the data for suppliers we classed as “in focus”

“How much” implies an aggregation of the spend

The fantastic thing about explainsql is that is gives us some SQL and then explains why it was generated. Let’s take a look. I think this is key as if the results are not what we are expecting, we can sharpen our “prompting” and enhance our metadata ( or perhaps change our model ).


SELECT
    SUM(po."LINE_AMOUNT") AS "TOTAL_SPEND"
FROM
    "BEYOND"."COSMOS_FUSION_PO_DATA" po
JOIN
    "BEYOND"."COSMOS_SUPPLIER_FOCUS" sf
ON
    po."SUPPLIER_NUMBER" = sf."SUPPLIER_NUMBER"
WHERE
    po."PO_CURRENCY" = 'GBP'
    AND sf."FOCUS_FLAG" = 'Y'
    AND po."PO_RATE_MONTH" >= ADD_MONTHS(TRUNC(SYSDATE), -2);
```

Explanation:
- The query retrieves the total spend on in-focus suppliers for GBP purchase orders in the last 2 months.
- The tables are aliased as "po" for COSMOS_FUSION_PO_DATA and "sf" for COSMOS_SUPPLIER_FOCUS.
- The SELECT statement calculates the SUM of the "LINE_AMOUNT" column from the "po" table as the "TOTAL_SPEND".
- The JOIN operation links the "po" table with the "sf" table on the "SUPPLIER_NUMBER" column.
- The WHERE clause filters the results to consider only GBP purchase orders ("PO_CURRENCY" = 'GBP'), in-focus suppliers ("FOCUS_FLAG" = 'Y'), and purchase orders from the last 2 months (po."PO_RATE_MONTH" >= ADD_MONTHS(TRUNC(SYSDATE), -2)).

Show the SQL

Before we run anything, we can see the SQL that is actually generated. Checking this out, it looks in line with what we would expect. We are summing the line amount as we annotated that “spend” means the sum of line amount and that’s exactly what we have here. It also joins on the Supplier_Number as that’s a Foreign Key and we also annotated that. Looking at the code it’s also added in the three restrictions we needed, it gets the currency correct and the focus_flag and then also does the correct syntax restriction to only process the data for the prior 2 months.

Run The SQL

Finally we can run the SQL and see the result we really wanted.

Summary

Select AI, when enabled via a conformed data structure, with keys and metadata to assist the LLM, we have shown that we can ask natural language queries of our data and get those questions answered.

A vital aspect is having that trusted data to do that with, and if you’re sourcing data from Oracle Cloud ERP/HCM or EPM then Beyond Cosmos is a super easy way to get that near-real time data, aggregated for performance.