# GoComply R&D Activity Register

## Financial Year: 2024-25 (1 July 2024 - 30 June 2025)
## Entity: GoComply Pty Ltd (ACN: [TO BE COMPLETED])
## ABN: [TO BE COMPLETED]

---

## CORE R&D ACTIVITIES

### Activity 1: RAG Pipeline for Regulatory Compliance Analysis

**Activity Title:** Development of a Retrieval-Augmented Generation pipeline for automated
Australian financial regulatory compliance assessment

**Start Date:** July 2024
**End Date:** Ongoing (expected completion: December 2025)
**ANZSRC Code:** 4602 - Artificial Intelligence (Primary), 3501 - Accounting, Auditing and Accountability (Secondary)

**Hypothesis:**
A retrieval-augmented generation (RAG) system combining full-text search (FTS5) over
pre-chunked regulatory text with a large language model (Claude Sonnet) can achieve
>85% accuracy in identifying regulatory non-compliance in uploaded financial services
documents, where accuracy is measured against expert compliance officer assessments.

**Technical Uncertainty:**
- No existing system automates compliance assessment across the breadth of Australian
  financial regulations (APRA, ASIC, AUSTRAC, RBA, Privacy Act, etc.)
- It was unknown whether FTS5 retrieval over 1,813 regulatory chunks would surface
  sufficiently relevant context for LLM analysis (retrieval quality vs. chunk granularity)
- It was uncertain whether a general-purpose LLM could correctly interpret the nuanced,
  cross-referencing nature of prudential standards (e.g., CPS 230 references to CPS 232,
  CPG 230, and SPS 220)
- The optimal chunk size, overlap, and indexing strategy for regulatory text was unknown
  and required systematic experimentation
- Hallucination rates for compliance-critical outputs were unpredictable — a false positive
  could cause unnecessary remediation; a false negative could expose a client to enforcement

**Systematic Progression of Work:**
1. Literature review of existing RegTech solutions (Diligent, OneSumX, Protiviti) — none
   offered automated document scanning against Australian regulations
2. Hypothesis: FTS5 with BM25 ranking can retrieve relevant regulatory chunks given
   compliance-oriented queries extracted from uploaded documents
3. Experiment 1: Chunk sizes tested — 512, 1024, 2048 tokens. Measured retrieval precision
   at each size against a manually-annotated test set of 50 compliance findings
4. Experiment 2: Query generation strategies — direct extraction vs. LLM-reformulated queries.
   Measured recall of relevant regulations
5. Experiment 3: Verification layer — LLM self-verification of findings against source text
   to reduce hallucination. Measured false positive rate before/after
6. Experiment 4: Multi-regulation cross-referencing — testing whether the pipeline could
   identify gaps across interconnected standards (e.g., CPS 230 + CPS 234 + CPG 230)
7. Evaluation against expert compliance assessments on 10 real-world policy documents

**Outcome:** [TO BE COMPLETED AT YEAR END]
- Current accuracy: ~78% (below target, further experimentation needed)
- Key learning: chunk granularity is critical — clause-level chunks outperform section-level
- Ongoing: verification layer reduces false positives by ~40% but introduces latency

**New Knowledge Generated:**
- Optimal chunk granularity for Australian regulatory text is clause-level (avg 200-400 tokens)
  rather than section-level (1000+ tokens)
- Cross-referencing between APRA standards requires explicit relationship modelling —
  BM25/FTS5 alone cannot capture regulatory interdependencies
- LLM verification layer is essential for compliance-critical outputs but must be balanced
  against response time constraints

---

### Activity 2: Algorithmic Compliance Rule Engine

**Activity Title:** Development of a deterministic compliance rule engine capable of
evaluating documents against 1,975+ regulatory rules without AI dependency

**Start Date:** July 2024
**End Date:** Ongoing
**ANZSRC Code:** 4612 - Software Engineering (Primary), 3501 - Accounting (Secondary)

**Hypothesis:**
A keyword-and-pattern-based rule engine can achieve >70% detection rate for regulatory
non-compliance across 100+ Australian financial regulations, providing a reliable
fallback when AI-based scanning is unavailable and a baseline for validating AI results.

**Technical Uncertainty:**
- No prior art exists for a comprehensive keyword-pattern rule engine covering the breadth
  of Australian financial regulation (APRA + ASIC + AUSTRAC + RBA + Privacy Act + ESG)
- It was unknown whether keyword patterns could reliably detect nuanced compliance gaps
  (e.g., "adequate" risk appetite vs. "documented" risk appetite — both keyword matches
  but different regulatory requirements)
- The interaction between rules across different regulatory frameworks was unpredictable —
  a document might satisfy CPS 230 keywords but miss the CPG 230 guidance interpretation
- Scaling from 21 initial rules to 1,975+ required understanding whether rule density
  would increase false positive rates beyond acceptable thresholds
- The appropriate weighting and severity assignment for rules across different regulatory
  domains required experimentation

**Systematic Progression of Work:**
1. Initial hypothesis: keyword presence/absence can proxy for compliance assessment
2. Experiment 1: 21 rules across 4 APRA standards — tested against 5 real policy documents.
   Measured precision (68%) and recall (45%). Hypothesis partially validated.
3. Experiment 2: Expanded to 200 rules with regex patterns. Precision improved (72%) but
   false positive rate increased to 35%. Required severity weighting experiments.
4. Experiment 3: Rule interaction testing — 50 rule combinations across CPS 230/234/CPG 230.
   Discovered that 15% of rules conflicted or duplicated across standards.
5. Experiment 4: Scale to 1,975 rules. Developed automated conflict detection to identify
   overlapping patterns. Achieved <5% cross-rule conflict rate.
6. Experiment 5: Severity calibration — expert review of 100 findings to calibrate
   critical/high/medium/low severity assignments across regulation types.
7. Comparative evaluation: rule engine vs. RAG pipeline on same test documents

**Outcome:** [TO BE COMPLETED AT YEAR END]
- 1,975 rules operational across 100+ regulations
- False positive rate: ~22% (target: <15%, further refinement needed)
- Rule engine serves as validation baseline for RAG pipeline findings

**New Knowledge Generated:**
- Keyword-pattern approaches work well for prescriptive regulations (CPS 234 information
  security) but poorly for principles-based regulations (CPS 220 risk management)
- Automated conflict detection between rules is essential beyond ~500 rules
- Severity calibration requires regulation-domain-specific tuning, not universal thresholds

---

### Activity 3: AI Governance Back-Testing Methodology (Sentinel)

**Activity Title:** Development of a novel AI governance back-testing methodology that
retroactively analyses historical enforcement cases to validate compliance scanning accuracy

**Start Date:** March 2026 (extends into FY2025-26, document FY2024-25 preparatory work)
**End Date:** Ongoing
**ANZSRC Code:** 4602 - Artificial Intelligence (Primary), 3505 - Business Law and Taxation (Secondary)

**Hypothesis:**
An AI system can retroactively analyse publicly available enforcement case data (AUSTRAC,
APRA, ASIC proceedings) and, given the institution's pre-enforcement documentation profile,
accurately predict which regulatory breaches would have been detected — thereby validating
the compliance scanner's detection capability and generating a quantifiable "detection rate"
metric.

**Technical Uncertainty:**
- No prior methodology exists for back-testing AI compliance tools against historical
  enforcement actions
- It was unknown whether publicly available enforcement data contains sufficient detail
  to reconstruct a meaningful pre-enforcement compliance profile
- The causal relationship between document-level compliance gaps and actual enforcement
  outcomes is complex and potentially non-deterministic
- Whether the methodology would generalise across different types of enforcement
  (AML/CTF vs. prudential vs. consumer protection) was uncertain

**Systematic Progression of Work:**
1. Literature review: no academic or commercial precedent for compliance scanner back-testing
2. Case study 1: CBA AUSTRAC 2018 ($700M penalty) — reconstructed pre-enforcement compliance
   profile from public AUSTRAC proceedings, APRA findings, and CBA annual reports
3. Tested GoComply scanner against reconstructed CBA profile — measured detection rate of
   known AML/CTF failures
4. Case study 2: Westpac AUSTRAC 2020 ($1.3B penalty) — repeated methodology to test
   generalisability across institutions and enforcement types
5. Analysis: compared predicted vs. actual enforcement findings across both cases
6. Hypothesis refinement: adjusted scanner weights based on back-test results

**Outcome:** [PRELIMINARY — ONGOING]
- CBA case: scanner detected 7 of 9 key compliance failures identified in AUSTRAC proceedings
- Westpac case: scanner detected 6 of 8 key failures
- Methodology demonstrates generalisability across institutions
- Published as interactive reports at gocomply.com.au/sentinel/

**New Knowledge Generated:**
- Back-testing methodology is viable and produces quantifiable detection metrics
- Public enforcement data is sufficient for meaningful compliance profile reconstruction
- Detection rates vary significantly by regulation type — AML/CTF (high) vs. governance (lower)

---

## SUPPORTING R&D ACTIVITIES

### Supporting Activity 1: Regulatory Knowledge Base Construction

**Relationship to Core Activities:** Directly supports Activities 1, 2, and 3 by providing
the regulatory text corpus required for both RAG retrieval and rule generation.

**Description:**
Construction of a comprehensive, structured knowledge base of Australian financial
regulations comprising 228 sources and 1,813 pre-processed chunks. This involved:
- Systematic collection and digitisation of regulatory instruments across 6 regulatory bodies
- Development of a chunking strategy optimised for compliance analysis (clause-level granularity)
- Metadata extraction (clause references, regulation identifiers, effective dates)
- Quality validation against source documents

**Dominant Purpose:** To enable core R&D activities (not for commercial distribution of
regulatory text)

---

### Supporting Activity 2: PDF Document Extraction Pipeline

**Relationship to Core Activities:** Directly supports Activities 1 and 2 by converting
uploaded documents into analysable text.

**Description:**
Development of a document extraction pipeline using the pdf-extract Rust crate, with
custom post-processing for compliance document structures (numbered clauses, tables,
appendices, cross-references). Experimentation was required to:
- Handle diverse PDF formats from different financial institutions
- Preserve document structure (headings, sections, clause numbering) for accurate scanning
- Extract tables and structured data that contain compliance-critical information

---

### Supporting Activity 3: Compliance Scoring Algorithm

**Relationship to Core Activities:** Directly supports Activities 1 and 2 by converting
raw findings into actionable compliance scores.

**Description:**
Development of a scoring methodology that aggregates individual findings into an overall
compliance score (0-100). This required experimentation with:
- Weighting schemes across different severity levels
- Normalisation across different regulation types (prescriptive vs. principles-based)
- Threshold calibration for compliance/non-compliance classification

---

## EXCLUDED ACTIVITIES (Not Claimed)

The following activities are explicitly NOT included in the R&D claim:
- Web application UI/UX development (standard CRUD, HTML templates)
- Stripe payment integration
- Docker containerisation and Cloud Run deployment
- Blog content creation
- Marketing and sales activities
- Standard authentication (JWT/bcrypt)
- Cloudflare Worker proxy configuration