The AI supply chain: Your biggest unmanaged risk
Posted By
Abhijit Kharat
We spend enormous energy securing application layers, APIs, and cloud infrastructure. But the AI supply chain — the models, datasets, weights, and dependencies sitting beneath your product — remains almost entirely unaudited. According to the JFrog 2026 Software Supply Chain Security State of the Union — a vendor report drawing on 18.2 billion artifacts and 1,508 security professionals surveyed — 97% of organisations claim some form of AI model governance, while 53% self-host models sourced from public registries where 495 malicious AI models were identified. Self-hosting is not inherently the problem. Doing it without scanning, vetting, or provenance tracking is. That is the gap attackers are already inside.
Third-party models, open-source weights, and the provenance problem
Most teams pull models from Hugging Face or GitHub without questioning what they are inheriting. AI model provenance — the documented lineage of a model including where it was trained, on what data, and by whom — is the foundation of LLM supply chain security, and it is being skipped entirely.
What makes open-source weights risky?
Serialised model files can carry embedded malicious payloads that clear standard security scans and only activate under specific conditions in production. This risk is most acute in pickle-based formats — approximately 95% of malicious models identified on Hugging Face used PyTorch's pickle serialisation. Safer alternatives like safetensors significantly reduce this attack surface and should be the default where available. OWASP's Top 10 for LLM Applications identifies this under LLM03:2025 — Supply Chain — covering untrusted model sourcing, compromised training data, and vulnerable third-party components as core risk categories. Before deploying any third-party model, confirm:
- Who published this model and what is their verifiable identity?
- Is there a training data disclosure or model card?
- Has the model been evaluated for backdoor behaviours?
Without affirmative answers, you are absorbing unquantified AI supply chain risk into a system your customers trust.
Fine-tuning pipeline risks — data poisoning entering through training data sources
Fine-tuning is where many teams feel safest. You control the process — except you often do not control where the data came from, and that is the problem.
How data poisoning attacks work
Data poisoning attacks inject malicious samples into training data before or during fine-tuning. The model carries the corrupted behaviour into production, passing standard evaluations until a trigger is met. Scraped datasets, third-party vendors, and public repositories are all common sources — and all potential entry points. The minimum mitigation bar:
- A hash-verified, version-controlled dataset registry
- Anomaly detection across training batches — acknowledging no single control reliably catches subtle poisoning
- Red-team evaluations designed to surface trigger-based behaviours
Embedding and vector database security — an overlooked attack surface
Vector databases power most RAG-based AI applications, yet they are rarely treated as security-critical infrastructure. That needs to change because this attack surface sits completely outside the model layer.
Why this is an infrastructure problem, not a model problem
If an attacker injects malicious content into documents being embedded, those poisoned embeddings are stored and retrieved at inference time. The model responds based on what is retrieved and never "knows" the data is compromised. Hardening this means:
- Validating all input before embedding
- Enforcing access controls on retrieval APIs
- Isolating pipelines that ingest public or unverified data
Vector database security belongs in your threat model today — not after your first RAG-related incident.
Dependency risk — when your AI app inherits vulnerabilities from foundation model providers
Your AI application sits on top of a foundation model provider, an orchestration framework, and a web of third-party AI service dependencies. Every layer is a potential entry point, and most are outside your direct control.
The transitive risk problem in practice
This is AI dependency risk in its most consequential form: you did not introduce the vulnerability, but you own the consequence. When your provider pushes a model update, your application's behaviour shifts. When their infrastructure carries a CVE, you inherit the exposure. Cisco security research in 2026 specifically highlighted agentic AI and MCP server integrations as the sharpest emerging dependency risk — where agents operating with elevated permissions inherit trust from every connected provider. Treat foundation model providers — OpenAI, Anthropic, or any open-weight alternative — with the same third-party risk rigour as any critical vendor.
Building an AI Bill of Materials (AI-BOM) — what it should include
An AI Bill of Materials (AI-BOM) is a structured inventory of every component in your AI system — models, datasets, fine-tuning records, dependencies, and their provenance. It captures what a traditional SBOM was never designed to cover: the data and model layers that define how your AI actually behaves. A production-ready AI-BOM must cover:
- Model inventory — name, version, source, architecture, cryptographic hash
- AI model provenance — training data sources, lineage, data ownership
- Fine-tuning records — datasets used, modification history, validation results
- Dependency graph — frameworks, libraries, all third-party AI services
- Embedding components — vector database, embedding model, chunking strategy
- Known vulnerabilities — CVEs and patch status across all components
CycloneDX ML-BOM v1.7 and SPDX 3.0 AI Profile are the two production-ready formats in 2026. Start with one. Start with your highest-risk system. A strategy deck will not satisfy an auditor — a concrete document will.
Closing the AI governance gap starts now
AI supply chain security is not a future concern. Build your AI-BOM. Enforce AI model provenance. Audit your data sources. Treat every foundation model dependency as the privileged integration it is. The organisations that cannot answer where their models came from are running blind — in a threat landscape that has already mapped their exposure.
If you need support securing your AI supply chain — from model provenance and fine-tuning pipeline governance to AI-BOM implementation — reach out to Opcito's experts now.













