# GoComply Technical Overview for R&D Assessors

## What GoComply Does

GoComply is an AI-powered regulatory compliance scanner for Australian financial services
institutions. It analyses uploaded documents (risk management frameworks, board papers,
policies, procedures) against 200+ Australian regulatory sources to identify compliance
gaps, missing obligations, and regulatory risks.

## Why This Required R&D

### The Problem
Australian financial institutions face regulatory requirements from 6+ bodies (APRA, ASIC,
AUSTRAC, RBA, OAIC, ACCC) spanning hundreds of individual regulatory instruments. Manual
compliance assessment requires teams of specialists and takes weeks per document. No
automated solution existed that could:

1. Scan documents against the full breadth of Australian financial regulations
2. Provide verifiable, citation-backed compliance findings
3. Handle the semantic complexity of cross-referencing regulatory frameworks
4. Achieve accuracy sufficient for professional compliance use

### The Knowledge Gap
At the commencement of this project, the state of the art in RegTech was:
- **Diligent/Galvanize**: Document management, not automated compliance scanning
- **OneSumX (Wolters Kluwer)**: Regulatory change management, not document analysis
- **Protiviti**: Consulting-led, not automated
- **Generic AI tools**: No Australian regulatory knowledge, no structured compliance output

No existing system could automatically analyse an uploaded document against Australian
regulations and produce structured, citation-backed compliance findings.

## Technical Architecture

```
                    ┌─────────────────────────────────────┐
                    │         Document Upload              │
                    │    PDF/TXT up to 50MB                │
                    └──────────────┬──────────────────────┘
                                   │
                    ┌──────────────▼──────────────────────┐
                    │      Text Extraction Pipeline        │
                    │  pdf-extract → section splitting     │
                    │  (Supporting Activity 2)              │
                    └──────────────┬──────────────────────┘
                                   │
               ┌───────────────────┼───────────────────┐
               │                                       │
    ┌──────────▼──────────┐             ┌──────────────▼──────────┐
    │   RAG AI Scanner     │             │   Rule-Based Scanner     │
    │   (Core Activity 1)  │             │   (Core Activity 2)      │
    │                      │             │                          │
    │ 1. Query Generation  │             │ 1,975 keyword-pattern    │
    │ 2. FTS5 Retrieval    │             │    rules across 100+     │
    │    (1,813 chunks)    │             │    regulations           │
    │ 3. LLM Analysis      │             │                          │
    │    (Claude Sonnet)   │             │ Deterministic, no API    │
    │ 4. Verification      │             │    dependency            │
    │ 5. Citation Check    │             │                          │
    └──────────┬──────────┘             └──────────────┬──────────┘
               │                                       │
               └───────────────────┬───────────────────┘
                                   │
                    ┌──────────────▼──────────────────────┐
                    │     Scoring & Report Generation      │
                    │     (Supporting Activity 3)           │
                    │                                      │
                    │  Weighted aggregation → 0-100 score  │
                    │  Severity classification             │
                    │  Regulatory citation mapping         │
                    │  HTML report with print CSS          │
                    └──────────────┬──────────────────────┘
                                   │
                    ┌──────────────▼──────────────────────┐
                    │      Sentinel Back-Testing            │
                    │      (Core Activity 3)                │
                    │                                      │
                    │  Historical enforcement cases →      │
                    │  Reconstruct compliance profile →    │
                    │  Run scanner → Measure detection     │
                    │  rate → Validate methodology         │
                    └─────────────────────────────────────┘
```

## Technology Stack

| Component | Technology | R&D Relevance |
|-----------|-----------|---------------|
| Language | Rust (2024 edition) | Performance-critical for real-time scanning |
| Web Framework | Axum 0.8 | Not R&D (standard web serving) |
| Rule Engine | Custom Rust (src/rules.rs) | **Core Activity 2** |
| RAG Pipeline | Custom (src/rag.rs + src/embed.rs) | **Core Activity 1** |
| Text Search | SQLite FTS5 | **Core Activity 1** (retrieval experiments) |
| AI Model | Anthropic Claude Sonnet | **Core Activity 1** (analysis + verification) |
| Knowledge Base | 1,813 JSON chunks, 228 sources | **Supporting Activity 1** |
| PDF Extraction | pdf-extract crate + custom | **Supporting Activity 2** |
| Scoring | Custom algorithm (src/report.rs) | **Supporting Activity 3** |
| Sentinel | Custom back-test engine | **Core Activity 3** |
| Database | SQLite (rusqlite) | Not R&D (standard data storage) |
| Auth | JWT + bcrypt | Not R&D (standard authentication) |
| Payments | Stripe Checkout | Not R&D (standard integration) |
| Deploy | Docker → Cloud Run | Not R&D (standard deployment) |

## Scale of R&D Effort

| Metric | Value |
|--------|-------|
| Total Rust source files | 35 |
| Lines of R&D-relevant code | ~8,000 (rag.rs, rules.rs, embed.rs, scan.rs, sentinel) |
| Regulatory rules developed | 1,975 |
| Regulatory sources indexed | 228 |
| Regulatory chunks processed | 1,813 |
| Regulations covered | 100+ individual instruments |
| Enforcement cases analysed | 2 (CBA AUSTRAC 2018, Westpac AUSTRAC 2020) |
| Blog posts (non-R&D) | 32 |
| Git commits (total) | 50+ |
| R&D-related commits | ~35 (estimated 70%) |

## Innovation vs. Routine Development

### What IS R&D (claimed):
- Designing and testing the RAG retrieval pipeline for regulatory text
- Experimenting with chunk sizes, query strategies, and verification layers
- Building and scaling the rule engine with conflict resolution
- Developing the Sentinel back-testing methodology
- Creating the regulatory knowledge base with optimised chunking

### What is NOT R&D (excluded from claim):
- Building the web application (HTML templates, routing, authentication)
- Integrating Stripe payments
- Writing blog posts
- Docker containerisation and Cloud Run deployment
- Cloudflare Worker proxy configuration
- Standard CRUD database operations
- Marketing, sales, and customer outreach