Sovereign AI Australia - IRAP, ISM, and What Actually Changes

What "sovereign" actually means

Sovereign AI in the Australian context is a set of architectural and operational decisions. Three things have to be true simultaneously:

Data residency. Source data, embeddings, vector indexes, prompt history, and model outputs are stored in AU regions only. Backups and replication targets also AU.
Inference locality. Model inference is performed inside Australia. No cross-border routing for the LLM call.
Operational control. Privileged access is subject to Australian law. Personnel with admin access are in AU jurisdiction or have appropriate clearances. Vendor support pathways are structured so foreign access cannot be invoked silently.

If any of those three are false, the deployment is not sovereign in the way an IRAP assessor or a serious procurement team would recognise.

The five frameworks that drive the work

You do not design sovereign AI against a single rule. You design against a stack of overlapping frameworks. The big five for AU operators:

1. ISM (Information Security Manual)

Published by the Australian Signals Directorate. Defines the controls that IRAP-endorsed assessors evaluate when assessing systems for OFFICIAL, OFFICIAL: Sensitive, PROTECTED, and above. Updated quarterly. If your deployment is going through IRAP assessment, the ISM is the rulebook.

2. PSPF (Protective Security Policy Framework)

Applies to non-corporate Commonwealth entities. Defines information classification handling (OFFICIAL through TOP SECRET), personnel security (Baseline, NV1, NV2, PV), and physical security. Where the ISM tells you the technical controls, the PSPF tells you the organisational ones.

3. Essential Eight

Mitigation strategies published by the ASD. Maturity Level 1 through 3. Application control, patching, restriction of admin, MFA, daily backups, and so on. ML2 is the default sane baseline for any AU mid-market deployment. ML3 is appropriate for government, defence, regulated financial entities, and high-risk targets.

4. APRA CPS 234 and CPS 230

Applies to APRA-regulated financial entities (banks, insurers, super funds). CPS 234 is the information security prudential standard. CPS 230 is the broader operational risk standard. Both apply to AI deployments that touch material risk processes; both require supplier risk assessment and incident notification.

5. Australian Privacy Act and APP

Applies to all AU businesses over $3M revenue (and many under, especially in regulated sectors). APP 8 specifically governs cross-border disclosure of personal information, which is the clause that makes the difference between a US-hosted LLM call and an AU-hosted one. APP 11 governs security of personal information generally.

There are other frameworks (My Health Records Act for healthcare, SOCI Act for critical infrastructure, DISP for defence industry, TSSR for telco), but most sovereign AI engagements come down to which combination of these five applies.

The four architectural patterns and when each fits

Sovereign AI in practice resolves to one of four patterns. Picking the right one for your classification and risk appetite is the most important decision in the design.

Pattern A — AU-region commercial endpoint, no extras

Inference routed to Azure OpenAI Australia East, AWS Bedrock ap-southeast-2 (Claude, Llama, Mistral, Titan), or Google Vertex AI australia-southeast1. Data stays in AU during the inference call. Vendor's enterprise terms apply (no training on customer data, AU-region routing, audit logging).

Fits: OFFICIAL workloads, most APRA-regulated workloads, commercial deployments where vendor risk is acceptable.

Cost impact: Roughly 1.0x to 1.2x a non-sovereign deployment. Vendor commercial APIs are slightly more expensive than their consumer tiers, and audit logging adds some overhead.

Watch out for: The vendor is operating under foreign law. The US CLOUD Act, for example, can technically compel a US-headquartered provider to disclose data regardless of where it's stored. For OFFICIAL data this is usually an acceptable risk. For PROTECTED data it is not.

Pattern B — Open-weight model in your VPC

Open-weight models (Llama 3.x and 4.x, Mistral, Mixtral, Qwen, GPT-OSS, Gemma) deployed on GPU instances inside your AU-region VPC. Inference served via Ollama, vLLM, TensorRT-LLM, or Bedrock self-hosted. No third-party endpoint involved.

Fits: PROTECTED workloads, sensitive defence work, IP-sensitive engagements, any workload where the foreign-jurisdiction risk on Pattern A is unacceptable.

Cost impact: Roughly 1.5x to 2.5x a Pattern A deployment. Most of the cost is GPU capacity (commonly 4-8 A100 or H100 instances depending on model size and concurrency), plus ongoing operations for model updates and inference reliability.

Watch out for: Capability gap. Open-weight models in 2026 are strong but the frontier models (GPT-5, Claude Opus 4.x) generally outperform them on complex reasoning, long-context tasks, and code generation. For most operational AI workloads this gap is acceptable. For frontier reasoning workloads it is not.

Pattern C — IRAP-assessed sovereign cloud

Deployment to AWS sovereign regions, Azure Australia Central (Canberra) with current PROTECTED IRAP assessment, or Vault Cloud (PROTECTED-certified sovereign cloud). Both the models and the surrounding infrastructure sit inside the assessment boundary.

Fits: PROTECTED-classified workloads where IRAP assessment is required. Government agencies, defence industry primes, contracts that mandate IRAP-assessed environments.

Cost impact: Roughly 2.0x to 3.5x a Pattern A deployment. Sovereign cloud capacity is more expensive than commercial cloud. The architecture, controls map, and evidence work add to the build cost.

Watch out for: Available services are a subset of commercial cloud. If your design depends on the latest commercial-only service, it won't be there. Plan the architecture against the assessed service catalogue, not against your existing AWS or Azure environment.

Pattern D — Hybrid, with data classification routing

Sensitive workloads on Pattern B or C, less sensitive workloads on Pattern A. Routing logic tied to data classification. Common where an operator has a mix of PROTECTED and OFFICIAL workloads and wants the cost-efficient pattern for each.

Fits: Organisations with mixed classifications. Federal agencies running both public-facing and internal workloads. Financial entities with both retail customer data (APP-bound) and institutional/wholesale data (less sensitive).

Cost impact: Depends on the mix. The routing layer itself adds 5-15% but each workload pays its own pattern cost.

Watch out for: Classification creep. If sensitive data leaks into a less-sensitive routing path because of a bug or a classification mistake, you have a real incident. The routing logic needs strong access controls and audit trails.

Classification handling: what changes at each tier

OFFICIAL. Pattern A is commonly acceptable. AU-region commercial endpoints with vendor enterprise terms. Audit logging required. Verification through your internal security team.
OFFICIAL: Sensitive. Pattern A acceptable in many cases, Pattern B preferred where the data is particularly sensitive (PII, financial detail, legal privilege). IRAP-aware architecture starting here even if not formally assessed.
PROTECTED. Pattern B or Pattern C. Pattern A only where the specific endpoint has current PROTECTED IRAP assessment. Formal IRAP assessment commonly required. Evidence trail is the deliverable, not an afterthought.
SECRET and above. Pattern C, scoped on a case-by-case basis. Appropriately cleared personnel. Physical security review. Infrastructure on or connected to ASD-managed networks. Engaged through a DISP-cleared primary or in a cleared environment.

What an IRAP assessment is actually testing

Operators often arrive at sovereign AI conversations with a vague sense that "IRAP" means "approved by the government". It's more specific than that.

An IRAP-endorsed assessor evaluates a system against the ISM controls applicable to the classification level. The assessor produces a report. The report goes to your authorising officer, who decides whether to authorise the system to operate at the classification level. IRAP itself does not approve or deny systems. It produces the evidence on which authorising officers decide.

Practical implication: there's no "IRAP certified" stamp. The system gets assessed, the assessor describes residual risks, the authorising officer makes a call. A well-prepared architecture and evidence trail makes that call easy. A poorly-prepared one means months of remediation work.

Model selection inside a sovereign deployment

The market for open-weight models that fit Pattern B has moved fast over the last 18 months. Current state of play for AU operators:

Llama 3.x and 4.x (Meta). Strong general capability. Available in 8B, 70B, and 405B parameter sizes. Common default for in-VPC deployment.
Mistral and Mixtral. Mixture-of-experts variants are strong on efficiency. Useful for cost-sensitive deployments.
GPT-OSS (OpenAI). Open-weight models released by OpenAI. Strong default for in-VPC deployment, particularly the mid-size variants.
Qwen 2.5 and 3 (Alibaba). Strong multilingual capability. Sovereignty position is on the inference (which runs in your VPC), not on the weights' origin. Some procurement teams flag Chinese-origin models regardless; have the conversation early.
Gemma (Google). Strong on small-to-mid sizes. Useful for constrained-resource and edge deployments.

Model choice is configuration, not architecture, in a well-designed deployment. Swapping the underlying model should take hours, not months. If your sovereign AI partner is locking you into a specific model in a way that makes future swaps expensive, that's a design flaw to push back on.

Evidence: what an IRAP assessor will ask for

An IRAP assessment is primarily an evidence exercise. The assessor wants to see, for each applicable ISM control, either evidence that the control is implemented or a documented justification for why it doesn't apply. Common evidence artefacts:

Architecture diagram showing every component, network boundary, and data flow.
Data flow inventory showing every system the AI reads from, every system it writes to, where embeddings and logs live, where backups go.
Identity model documentation: how authentication, authorisation, and access scoping work end-to-end.
Control mapping aligning the deployment to the applicable ISM controls.
Audit logs for every query, every retrieved record, every action, every admin activity. Retained for the period your framework requires.
Change history for the infrastructure-as-code, model version pins, prompt template changes, and configuration changes.
Incident response runbook for likely failure modes.

If these are produced during the build, the assessment is straightforward. If they're retrofitted before the assessment, the work doubles and the assessment surfaces gaps. Plan to produce them as the build runs.

Cross-border transfers: where the actual risk lives

The legal exposure on sovereign AI in Australia comes mostly from APP 8 (cross-border disclosure of personal information). The technical exposure comes from telemetry, error reporting, and third-party libraries that may quietly route data to vendor regions outside Australia.

Practical checks for any sovereign deployment:

Vendor telemetry — disabled or routed to AU sinks.
Error reporting (Sentry, Datadog, etc.) — confirmed AU-region.
Container image registries — pulled from AU mirrors or local registries, not vendor regions.
Model weights — pinned and stored in AU object storage, not loaded from foreign hubs on each deployment.
Backup destinations — confirmed AU and confirmed they don't replicate cross-region.
DNS — using AU-resolvers, particularly for internal service discovery.

Most cross-border leakage on AU deployments is unintentional and lives in one of these layers, not in the core architecture. Audit them.

Where to start

If you're at the start of a sovereign AI conversation, the three questions to answer before talking to any delivery partner:

What classification? OFFICIAL, OFFICIAL: Sensitive, PROTECTED, or higher. The honest answer to this question is the input that drives every architectural decision.
What frameworks apply? PSPF, ISM (and IRAP), APRA CPS 234, Privacy Act, sector-specific. List them. Knowing the stack helps you size the evidence work realistically.
What's the risk appetite on foreign-jurisdiction commercial endpoints? If you can use Pattern A, the engagement is cheaper and faster. If you can't, Pattern B or C is the path. Decide early because it changes the entire build.

The Bedstone sovereign AI Australia page lays out the engagement structure for the build itself. The pricing page covers cost. If your situation needs a real conversation, the call prep page tells you what to bring.

Sovereign AI in Australia: what actually changes.