How we audit every skill

Every skill audited before it reaches your desk.

AI agents with the wrong instructions can leak data, make unauthorized calls, or act outside their stated purpose. We audit every skill in the catalog against a 5-dimension rubric with scanner evidence, semantic review, and a written rationale for every score.

Why this matters for CRE

A rent roll has tenant PII. A T-12 has a sponsor’s financials. An IC memo carries privileged investment views. When a skill touches these artifacts, you are extending your fiduciary surface to the agent that runs it.

A skill with the wrong instructions does not just produce bad output. It can leak data to undisclosed endpoints, read files outside its stated scope, or accept override phrases from untrusted input. The controls that catch these cases before a skill ships are the same controls your IC would ask about in diligence.

The rubric

Five weighted dimensions. Each scored 1 to 5 with a written rationale. Rolled up to a single overall score and a verdict.

Dimension	Weight	What it asks	Score 5 looks like
Purpose & Capability	3	Does the skill's actual behavior match its stated purpose?	Every network call, file operation, and tool request serves the stated task. No hidden capabilities.
Instruction Scope	3	Can the prompt be weaponized or redirected?	Tight scope, explicit guardrails, no hidden directives, no Unicode or base64 tricks.
Install Mechanism	2	Are the dependencies safe and reasonable?	Standard library only, or well-known tools installed from trusted sources.
Credentials	2

Overall score

Each dimension scores 1 to 5. The overall score is their weighted average, computed as:

Overall=

3 × Purpose & Capability

+3 × Instruction Scope

+2 × Install Mechanism

+2 × Credentials

+1 × Persistence & Privilege

Verdict thresholds

≥ 4.0Verified3.0 to 3.99Caution

What goes into an audit

Four layers per skill, running every time a skill is added or changed.

Scanner layer

Static code analysis, behavioral review of any bundled code, prompt-injection detection, and secrets scanning. All local, no outbound calls.

Code analysis via Cisco AI Skill Scanner, checking every bundled file for known-bad patterns and integrity issues.
Behavioral analysis of any bundled code, tracing how inputs flow through it.
Detection for prompt-injection patterns, multi-agent attack patterns, invisible payloads, PII harvesting, and provider secrets.
Secrets scanning via Trufflehog. Any findings are redacted before the audit record is stored.

Semantic review

What scanners cannot catch. An LLM auditor reads the skill in full and applies four checks the scanner cannot judge.

Purpose mismatch: any network call, subprocess, or file access that does not serve the stated workflow

What each verdict means

Three verdicts. The threshold determines what happens to the skill in the catalog.

Overall ≥ 4.0

Verified

All five dimensions pass. The skill ships in the catalog.

3.0 to 3.99

Caution

Has findings but not dangerous. Ships in the catalog. The audit trail notes what the reviewer found so a user can evaluate for themselves.

< 3.0

Flagged

Significant concerns. Excluded from the public catalog until remediated. Any Flagged count in the metrics below reflects audits that have not been cleared for listing.

The catalog by the numbers

Live from the audit table. Updated every time a skill is audited or re-audited.

Skills audited

296

Verified

94.3%

279 of 296

Caution

5.7% of catalog

Flagged

excluded from catalog

About Caution: Caution skills have findings that warrant user awareness but do not indicate danger. They ship in the catalog alongside Verified skills so you can evaluate the specifics in the audit trail.

Last audit: May 7, 2026

One audit, end to end

Every adjective in the rubric is backed by specific written text. Here is a real audit row rendered as it appears in the database.

Example

deal-quick-screen

Verified

Overall

5/5

high confidence

Purpose & Capability

The bundled Python script (quick_screen.py) imports only argparse, json, sys, and typing, reads JSON input via --json or stdin, and prints a JSON result to stdout. No network calls, no file I/O, no subprocess usage. Every computation directly serves the stated deal-screening workflow (cap rate, DSCR, IRR scenarios, KEEP/KILL verdict).

Instruction Scope

Instructions are tightly scoped to CRE deal triage. The SKILL.md includes an explicit When to Activate section with negative triggers (do NOT trigger for full underwriting, education, portfolio analysis), a detailed Red Flags and Failure Modes section, and documented Chain Notes. No override phrases, no hidden Unicode, no base64 payloads, no references to system files.

Scope and limitations

This program reviews every skill in the catalog against our rubric. It is not a substitute for your own security review.

Audits cover the exact copy of the skill shipped in our catalog. That copy is static and does not change after publication. If you pull a skill from its upstream repository instead, you bypass both the catalog copy and its audit.
How a skill behaves depends on the host agent you run it in, the inputs you give it, and the environment around it. Your operational context matters.
Skills are made available without warranty. You remain responsible for deciding which skills to install, how to isolate them, and how to handle sensitive client and portfolio data when using them.

Custom skills go through the same audit.

Anything MetaProp Labs builds for your firm is audited with the same rubric and the same pipeline. The trail lives in the same place the catalog audits live.

Learn about Custom Skills

How we audit every skill

Every skill audited before it reaches your desk.

Why this matters for CRE

The rubric

Five weighted dimensions. Each scored 1 to 5 with a written rationale. Rolled up to a single overall score and a verdict.

Dimension	Weight	What it asks	Score 5 looks like
Purpose & Capability	3	Does the skill's actual behavior match its stated purpose?	Every network call, file operation, and tool request serves the stated task. No hidden capabilities.
Instruction Scope	3	Can the prompt be weaponized or redirected?	Tight scope, explicit guardrails, no hidden directives, no Unicode or base64 tricks.
Install Mechanism	2	Are the dependencies safe and reasonable?	Standard library only, or well-known tools installed from trusted sources.
Credentials	2

Overall score

Each dimension scores 1 to 5. The overall score is their weighted average, computed as:

Overall=

3 × Purpose & Capability

+3 × Instruction Scope

+2 × Install Mechanism

+2 × Credentials

+1 × Persistence & Privilege

Verdict thresholds

≥ 4.0Verified3.0 to 3.99Caution

What goes into an audit

Four layers per skill, running every time a skill is added or changed.

Scanner layer

Static code analysis, behavioral review of any bundled code, prompt-injection detection, and secrets scanning. All local, no outbound calls.

Code analysis via Cisco AI Skill Scanner, checking every bundled file for known-bad patterns and integrity issues.
Behavioral analysis of any bundled code, tracing how inputs flow through it.
Detection for prompt-injection patterns, multi-agent attack patterns, invisible payloads, PII harvesting, and provider secrets.
Secrets scanning via Trufflehog. Any findings are redacted before the audit record is stored.

Semantic review

What scanners cannot catch. An LLM auditor reads the skill in full and applies four checks the scanner cannot judge.

Purpose mismatch: any network call, subprocess, or file access that does not serve the stated workflow

What each verdict means

Three verdicts. The threshold determines what happens to the skill in the catalog.

Overall ≥ 4.0

Verified

All five dimensions pass. The skill ships in the catalog.

3.0 to 3.99

Caution

Has findings but not dangerous. Ships in the catalog. The audit trail notes what the reviewer found so a user can evaluate for themselves.

< 3.0

Flagged

Significant concerns. Excluded from the public catalog until remediated. Any Flagged count in the metrics below reflects audits that have not been cleared for listing.

The catalog by the numbers

Live from the audit table. Updated every time a skill is audited or re-audited.

Skills audited

296

Verified

94.3%

279 of 296

Caution

5.7% of catalog

Flagged

excluded from catalog

Last audit: May 7, 2026

One audit, end to end

Every adjective in the rubric is backed by specific written text. Here is a real audit row rendered as it appears in the database.

Example

deal-quick-screen

Verified

Overall

5/5

high confidence

Purpose & Capability

Instruction Scope

Scope and limitations

This program reviews every skill in the catalog against our rubric. It is not a substitute for your own security review.

Audits cover the exact copy of the skill shipped in our catalog. That copy is static and does not change after publication. If you pull a skill from its upstream repository instead, you bypass both the catalog copy and its audit.
How a skill behaves depends on the host agent you run it in, the inputs you give it, and the environment around it. Your operational context matters.
Skills are made available without warranty. You remain responsible for deciding which skills to install, how to isolate them, and how to handle sensitive client and portfolio data when using them.

Custom skills go through the same audit.

Anything MetaProp Labs builds for your firm is audited with the same rubric and the same pipeline. The trail lives in the same place the catalog audits live.

Learn about Custom Skills

Every skill audited before it reaches your desk.

Why this matters for CRE

The rubric

What goes into an audit

Scanner layer

Semantic review

What each verdict means

Verified

Caution

Flagged

The catalog by the numbers

One audit, end to end

deal-quick-screen

Scope and limitations

Custom skills go through the same audit.

Every skill audited before it reaches your desk.

Why this matters for CRE

The rubric

What goes into an audit

Scanner layer

Semantic review

What each verdict means

Verified

Caution

Flagged

The catalog by the numbers

One audit, end to end

deal-quick-screen

Scope and limitations

Custom skills go through the same audit.

Scoring and narrative

Integrity controls