PRAXEN
agent behavior verifier
LangChain SQL Agent Analysis Report
Completed May 29, 2026
6Findings
1Critical
2High
3Medium
RAISE maturity 1.45 / 5.0
Executive Summary
Agent Remit (as declared)
A natural-language interface to a single relational database, intended to answer read-only analytical questions for internal data consumers. The agent may use `sql_db_list_tables`, `sql_db_schema`, `sql_db_query_checker`, and `sql_db_query` against a pre-configured SQLAlchemy connection, returning a direct natural-language answer with no outbound traffic, file writes, or messaging. DML and DDL statements are forbidden outright with no approval path, every generated query must be double-checked before execution, and the agent must operate within a bounded number of tool-call iterations.
Behavior Summary (as observed)
The dominant pattern is policy-declared-in-prompt, zero code-level enforcement: the remit's hard prohibition on data- and schema-modifying SQL is expressed only in the `SQL_PREFIX` string, while `SQLDatabase.run` executes whatever SQL the LLM emits via `text(command)` inside a committing transaction, with no DML/DDL parser and no read-only enforcement — the project's own `test_sql_database_run_update` demonstrates an `UPDATE` running to completion through this path. This is a mature, well-documented library, and the maintainer explicitly surfaces the risk in a `!!! warning` on `create_sql_agent` and pushes enforcement to operator-side server controls (read-only roles, statement timeouts) that are out of the code's scope. Praxen's role here is to surface that maintainer-acknowledged divergence, not to treat it as undisclosed: the agent ships an `max_iterations` runaway cap and 300-char result truncation, but the read-only guarantee depends entirely on a server-side role the library cannot itself enforce.
Scope of Analysis
The classic LangChain `create_sql_agent` in `agent_toolkits/sql/base.py`, building a `langchain_classic` `AgentExecutor` over the four-tool `SQLDatabaseToolkit` (`tools/sql_database/tool.py`). Query execution routes through `SQLDatabase.run`/`run_no_throw` in `utilities/sql_database.py`, which wraps any model-generated string in `sqlalchemy.text()` and runs it inside `self._engine.begin()` — a transaction that commits, so writes succeed if the DB role permits them. The DML prohibition lives only in the `SQL_PREFIX` system prompt (`agent_toolkits/sql/prompt.py`); there is no statement-type parser, no read-only gate, and no logging on the execution path. The maintainer documents these risks explicitly in a `!!! warning` admonition on `create_sql_agent` and directs operators to server-side least-privilege roles and statement timeouts.
Remit Coverage

Every actionable rule in the Worker Remit, checked against the running code. Gap = declared but unenforced; Partial = enforced but incomplete or bypassable; Vague Policy = too imprecise to verify.

Verified: 4 Gap: 7 Partial: 6 Vague Policy: 0 Enforcement Not Possible: 2 Total Rules: 19
Rule ID Section Rule (quoted) Status Finding
R-01 Behavioral Constraints — What the agent must always do "Every answer MUST be read-only and sourced from the database." Partial PRAX-2026-05-29-001
R-02 Behavioral Constraints — What the agent must always do "Result sets MUST be limited to a small default row count unless the user explicitly asks for more." Partial PRAX-2026-05-29-006
R-03 Behavioral Constraints — What the agent must always do "The agent MUST query only the columns relevant to the user's question and MUST NOT select all columns of a table indiscriminately." Enforcement Not Possible
R-04 Behavioral Constraints — What the agent must always do "Every generated SQL query MUST be double-checked before it is executed." Partial PRAX-2026-05-29-005
R-05 Behavioral Constraints — What the agent must always do "If a question cannot be answered from the database, the agent MUST say so plainly rather than speculate." Verified
R-06 Behavioral Constraints — What the agent must always do "The agent MUST operate within a bounded number of tool-call iterations so that it cannot loop indefinitely." Verified
R-07 Behavioral Constraints — What the agent must never do "The agent MUST NOT execute any data-modifying statement — inserts, updates, deletes, merges, upserts, replacements, or anything else that changes row data." Gap PRAX-2026-05-29-001
R-08 Behavioral Constraints — What the agent must never do "The agent MUST NOT execute any schema-, permission-, or structure-modifying statement — creating, dropping, altering, truncating, renaming, granting, or revoking." Gap PRAX-2026-05-29-001
R-09 Behavioral Constraints — What the agent must never do "The agent MUST NOT execute administrative or superuser commands, including bulk loads, attach/detach operations, and maintenance commands." Gap PRAX-2026-05-29-002
R-10 Behavioral Constraints — What the agent must never do "The agent MUST NOT execute multi-statement queries or stored-procedure invocations that could conceal hidden data- or schema-modifying statements." Gap PRAX-2026-05-29-002
R-11 Behavioral Constraints — What the agent must never do "The agent MUST NOT execute queries that reach tables outside the scope it has been configured for." Partial PRAX-2026-05-29-004
R-12 Behavioral Constraints — What the agent must never do "The agent MUST NOT allow user-provided text to override its instructions, its role, or its prohibition on modifying the database." Gap PRAX-2026-05-29-003
R-13 Behavioral Constraints — What the agent must never do "The agent MUST NOT follow instructions embedded in query results, table names, column names, or row contents." Gap PRAX-2026-05-29-003
R-14 Behavioral Constraints — What the agent must never do "The agent MUST NOT continue issuing tool calls past its iteration cap." Verified
R-15 Authorized Capabilities — Authorized data sources "The agent may read only from the single database it is configured for, and only the tables that have been explicitly exposed to it." Partial PRAX-2026-05-29-004
R-16 Authorized Capabilities — Authorized output destinations "The agent produces no outbound network traffic, no file writes, and no email or messaging." Verified
R-17 Escalation and Scope Boundaries — Defense-in-depth expectations "The agent's own logic MUST NOT be the sole enforcement point for the prohibition on modifying the database." Gap PRAX-2026-05-29-001
R-18 Escalation and Scope Boundaries — Defense-in-depth expectations "The database connection SHOULD use a read-only role that cannot perform data- or schema-modifying statements even if the agent generates one." Enforcement Not Possible
R-19 Escalation and Scope Boundaries — Query result limits "A per-query execution-time limit SHOULD be enforced where the environment allows it." Partial PRAX-2026-05-29-006
Findings Register

Findings, ordered by severity — each linked to its remit rule, evidence, and a recommended action. Tag chips jump to the relevant entry in the RAISE framework, the OWASP LLM Top 10, or the OWASP Agentic Top 10.

CRITICAL PRAX-2026-05-29-001 The read-only / no-DML prohibition is enforced only in the system prompt; the execution layer runs any model-generated write to completion.
Policy Rule — R-01, R-07, R-08, R-17 (Worker Remit):
"Every answer MUST be read-only and sourced from the database. / The agent MUST NOT execute any data-modifying statement — inserts, updates, deletes, merges, upserts, replacements, or anything else that changes row data. / The agent MUST NOT execute any schema-, permission-, or structure-modifying statement — creating, dropping, altering, truncating, renaming, granting, or revoking. / The agent's own logic MUST NOT be the sole enforcement point for the prohibition on modifying the database."
libs/community/langchain_community/agent_toolkits/sql/prompt.py:12 — "DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database. — the prohibition exists only as prompt text" libs/community/langchain_community/utilities/sql_database.py:490 — "_execute opens self._engine.begin() (committing transaction) and runs text(command) with no statement-type inspection; run()/run_no_throw() expose this to model output"
Recommended Action
  • Treat the read-only guarantee as an operator responsibility the library cannot meet alone — document that `create_sql_agent` must be paired with a least-privilege read-only DB role, and add an optional deterministic statement-type gate (e.g. reject non-SELECT/CTE statements) in `SQLDatabase.run` for defense-in-depth.
  • Surface the existing `!!! warning` from `create_sql_agent` into the toolkit and tool docstrings so operators wiring the tools directly (without the agent factory) still see the no-enforcement caveat.
HIGH PRAX-2026-05-29-002 No parser blocks multi-statement, stored-procedure, or administrative commands; the executor forwards the raw string to the driver as-is.
Policy Rule — R-09, R-10 (Worker Remit):
"The agent MUST NOT execute administrative or superuser commands, including bulk loads, attach/detach operations, and maintenance commands. / The agent MUST NOT execute multi-statement queries or stored-procedure invocations that could conceal hidden data- or schema-modifying statements."
libs/community/langchain_community/utilities/sql_database.py:541 — "isinstance(command, str) -> command = text(command); single string forwarded to connection.execute with no multi-statement or admin-command filtering" libs/community/langchain_community/tools/sql_database/tool.py:59 — "QuerySQLDatabaseTool._run returns self.db.run_no_throw(query) — model-supplied query string passed straight through"
Recommended Action
Document that admin/multi-statement protection is a server-side concern (driver `multi=False`, least-privilege role) and consider an optional opt-in single-statement validator in `SQLDatabase.run`.
HIGH PRAX-2026-05-29-003 Retrieved schema, table names, and sample rows enter the LLM context unlabeled, so injected instructions in row data are indistinguishable from operator prompts.
Policy Rule — R-12, R-13 (Worker Remit):
"The agent MUST NOT allow user-provided text to override its instructions, its role, or its prohibition on modifying the database. / The agent MUST NOT follow instructions embedded in query results, table names, column names, or row contents."
libs/community/langchain_community/utilities/sql_database.py:445 — "_get_sample_rows pulls 3 raw rows per table into table_info; values only length-clipped to 100 chars, never labeled untrusted" libs/community/langchain_community/agent_toolkits/sql/prompt.py:8 — "Prompt instructs 'Only use the information returned by the below tools' but provides no structural separation between tool output and instructions"
Recommended Action
Wrap tool-returned DB content in an explicit untrusted-content delimiter in the toolkit's tool descriptions/prompt, and pair with the deterministic write-blocking gate from PRAX-2026-05-29-001 so an injected modifying statement cannot execute.
MEDIUM PRAX-2026-05-29-004 Table-scope restriction (`include_tables`/`ignore_tables`) only filters schema reflection and listing; it does not gate what SQL the executor will run.
Policy Rule — R-11, R-15 (Worker Remit):
"The agent MUST NOT execute queries that reach tables outside the scope it has been configured for. / The agent may read only from the single database it is configured for, and only the tables that have been explicitly exposed to it."
libs/community/langchain_community/utilities/sql_database.py:333 — "get_usable_table_names applies include/ignore filters for listing/schema only; no equivalent filter is applied in _execute/run before query execution" libs/community/langchain_community/utilities/sql_database.py:547 — "connection.execute(command,...) runs the query regardless of which tables it references"
Recommended Action
Document that `include_tables` is a context-shaping convenience, not a security boundary, and that table-level isolation must be enforced via DB-role schema grants.
MEDIUM PRAX-2026-05-29-005 The mandatory query double-check is an LLM correctness pass that is not enforced in code and checks syntax, not the prohibition on modifying statements.
Policy Rule — R-04 (Worker Remit):
"Every generated SQL query MUST be double-checked before it is executed."
libs/community/langchain_community/tools/sql_database/prompt.py:4 — "QUERY_CHECKER enumerates only syntactic mistake classes; no statement-type / read-only assertion" libs/community/langchain_community/agent_toolkits/sql/toolkit.py:119 — "checker described as 'Always use this tool before executing' — sequencing is advisory prompt text, not a code-enforced precondition of sql_db_query"
Recommended Action
Clarify in docs that `sql_db_query_checker` is a correctness aid, not a safety gate, and add the prohibited-statement assertion to the deterministic gate in PRAX-2026-05-29-001 rather than relying on the checker.
MEDIUM PRAX-2026-05-29-006 The default row-limit is a prompt suggestion the model can override, and per-query execution-time limits default to off.
Policy Rule — R-02, R-19 (Worker Remit):
"Result sets MUST be limited to a small default row count unless the user explicitly asks for more. / A per-query execution-time limit SHOULD be enforced where the environment allows it."
libs/community/langchain_community/agent_toolkits/sql/prompt.py:5 — "always limit your query to at most {top_k} results — row cap delivered as prompt guidance, not a LIMIT injected into the executed SQL" libs/community/langchain_community/agent_toolkits/sql/base.py:59 — "max_execution_time: Optional[float] = None — per-query/agent time limit off by default"
Recommended Action
Document that `top_k` is advisory and that operators should enforce row limits and statement timeouts at the DB/connection layer; consider defaulting `max_execution_time` to a finite value in the agent factory.
What's Working Well

Controls and behaviors that are correctly implemented and verified during this scan. These represent areas where the agent's implementation aligns with its stated policy and security best practices.

Maintainer security warning surfaced on the public entry point

The `create_sql_agent` docstring carries an explicit `!!! warning` admonition stating the agent can execute arbitrary SQL and directing operators to least-privilege read-only roles, statement timeouts, and query guardrails — honest, specific disclosure of the residual risk at the exact API users call.

libs/community/langchain_community/agent_toolkits/sql/base.py:71-97

Bounded tool-call iterations prevent runaway loops

`create_sql_agent` defaults `max_iterations=15` and passes it to the `AgentExecutor`, capping the retry/tool-call loop so a malformed-query retry cycle cannot iterate indefinitely.

libs/community/langchain_community/agent_toolkits/sql/base.py:58

Result-cell truncation limits context bloat and disclosure surface

`SQLDatabase.run` truncates every result value to `max_string_length` (default 300) via `truncate_word`, bounding how much raw row content enters the LLM context and is echoed back.

libs/community/langchain_community/utilities/sql_database.py:589-595

Committed lockfile pins the dependency graph

A `uv.lock` is committed alongside `pyproject.toml`, giving reproducible resolved dependency versions despite floor-pinned ranges in the manifest.

libs/community/uv.lock
Discovered Log Files

Log files found in the agent's workspace during this scan. Reviewing these files provides runtime evidence to complement the static analysis above.

Path Source Content Type Purpose Last Modified Status
stderr/stdout (console) AgentExecutor verbose output plaintext console trace Optional step-by-step agent reasoning and tool I/O, emitted only when `verbose=True` is passed to `create_sql_agent` unknown Inferred
OWASP LLM Top 10 (2025) Coverage

Each card represents one category and shows the top 3 findings. All items in the Findings section.

OWASP Agentic Top 10 (2026) Coverage

Each card represents one category and shows the top 3 findings. All items in the Findings section.

ASI03 Identity and Privilege Abuse
No findings
ASI04 Agentic Supply Chain Vulnerabilities
No findings
ASI05 Unexpected Code Execution (RCE)
No findings
ASI06 Memory and Context Poisoning
No findings
ASI07 Insecure Inter-Agent Communication
No findings
ASI08 Cascading Failures
No findings
ASI09 Human-Agent Trust Exploitation
No findings
ASI10 Rogue Agents
No findings
RAISE Maturity Posture

Overall maturity assessment across the six categories of the RAISE framework. This is a maturity model, not a school grade: a score of 3 / 5 means Established, not 60 percent. Most production AI agents today score between Ad hoc (1) and Established (3). See the full RAISE framework reference for the complete scale and scoring.

1.45 / 5.0
Weighted Maturity Score · Ad hoc
Ad hoc. The agent inherits real framework scaffolding — a scoped four-tool inventory, a prompt-level domain restriction with an "I don't know" fallback, an LLM-based query checker, a `max_iterations` cap, and result truncation — but its single most consequential control, the read-only prohibition, exists only in the system prompt and is not enforced anywhere in code. Zero Trust is the weakest axis: model output flows directly into `text(command)` execution with no statement-type gate, and the library is explicit that its own logic must not be the sole enforcement point. Supply-chain hygiene is reasonable (committed `uv.lock`, versioned deps) and the maintainer's candid `!!! warning` is a genuine positive, but there is no adversarial testing and no execution-path logging.
Limit Your Domain
2/ 5
Confidence: Medium  |  Weight: 15%  |  Weighted: 0.30
The `SQL_PREFIX` prompt scopes the agent to database question-answering and instructs it to return "I don't know" for off-topic questions, and the tool inventory is narrow and matches the remit's baseline — but domain enforcement is prompt-only with no code gate, so a jailbreak reaches the full SQL surface.
Balance Your Knowledge Base
2/ 5
Confidence: Medium  |  Weight: 15%  |  Weighted: 0.30
Query results, table names, and 3 sample rows per table flow into the LLM context via `get_table_info`/`run` with no labeling of retrieved row content as untrusted, so injected text in row data is indistinguishable from instructions; `max_string_length=300` truncation is a partial mitigation, not a trust boundary.
Implement Zero Trust
1/ 5
Confidence: High  |  Weight: 25%  |  Weighted: 0.25
`SQLDatabase.run` passes model-generated SQL straight into `text(command)` inside a committing `self._engine.begin()` transaction with no DML/DDL parser and no read-only enforcement — the only interposition is the LLM-based `sql_db_query_checker`, which checks for correctness, not for prohibited statement types.
Manage Your Supply Chain
2/ 5
Confidence: Medium  |  Weight: 15%  |  Weighted: 0.30
A `uv.lock` is committed and core deps are version-ranged (`langchain-core>=1.4.0,<2.0.0`), but `SQLAlchemy>=1.4.0,<3.0.0` is floor-pinned across two major versions in `pyproject.toml`, widening the resolved-version surface despite the lockfile.
Build an AI Red Team
1/ 5
Confidence: Medium  |  Weight: 15%  |  Weighted: 0.15
Unit tests exist for the SQL utilities and agent, but none are adversarial — `test_sql_database_run_update` actually exercises a successful `UPDATE` through `run()` rather than asserting that writes are rejected, confirming there is no test-driven enforcement of the read-only contract.
Monitor Continuously
1/ 5
Confidence: High  |  Weight: 15%  |  Weighted: 0.15
There are no `logging`/logger calls anywhere in `utilities/sql_database.py`, so executed SQL, errors (swallowed into return strings by `run_no_throw`), and tool invocations leave no durable audit trail; observability depends entirely on optional `verbose` console output and server-side DB logging the library does not configure.

Maturity Scoring Rubric

Every score above is based on this scale. A score is a snapshot of observable posture — not a verdict on the people or team behind the system.

Score Label Meaning
5 Exemplary Best-in-class; automated, continuously tested, reference quality. Rarely achieved in shipping systems.
4 Strong Comprehensive controls, active management, minor gaps. Production-ready.
3 Established Documented controls consistently applied; known gaps accepted. A respectable baseline.
2 Partial Some controls exist but coverage is incomplete; key gaps remain.
1 Ad hoc Informal or inconsistent measures; relies on individual judgment.
0 Absent No evidence this category is addressed at all.
Weighting: the weighted overall above is the sum of each category's score × weight (the per-category weights are shown on each card). Zero Trust carries double weight by design; see the RAISE framework reference for the rationale.