Gerrit van Wyk

Gerrit van Wyk

Principal Engineer | Technical Architect

Building scalable data platforms that deliver business impact across Azure and Databricks ecosystems

About Me

I build data platforms that scale technically, commercially, and operationally. My experience spans data engineering, platform modernisation, and cloud-native architecture, primarily across Azure and Databricks ecosystems, with a rare blend of hands-on technical depth and strategic platform ownership.

As Staff Engineer and Data Platform Owner at First Quantum Minerals, I inherited a platform running at 1 pound per pipeline, with an average ingestion time of 23 minutes and a 67% success rate. Through a ground-up architectural overhaul, re-platforming onto Azure Synapse, Azure Functions, and Databricks with a unified governance layer via Unity Catalog. I transformed that baseline entirely. The platform now runs 130 pipelines for every 1 pound previously spent on one. Average pipeline execution sits below 5 minutes, with a +99% success rate. That's 13x the pipeline volume at one-ninth of the cost, while simultaneously improving reliability and data quality across the board.

Beyond the numbers, I delivered end-to-end data solutions including scalable batch and streaming pipelines, custom APIs, real-time ingestion architectures, and an enterprise-grade data governance and lineage framework. Cost-optimisation was embedded into architectural decisions from the start, not bolted on afterward.

Prior to that, at Lewis Group, I led the modernisation of legacy systems, built custom RPA solutions, and automated critical financial processes including IFRS 9 & 16 reporting, multi-system reconciliation and audit workflows. Eliminating significant manual overhead and accelerating insight delivery across finance and operations.

I thrive where engineering rigour meets business impact building systems that are modular, observable, well-documented, and built to last. I'm equally passionate about mentoring teams in platform best practices that improve reliability and, ultimately, data trust.

Experience

Data Platform Owner / Principal Engineer

First Quantum Minerals Ltd

Aug 2023 – Present
  • Independently architect, design, develop, and maintain end-to-end data solutions within Azure and Databricks platform
  • Architect and implement end-to-end data acquisition strategies using Azure Synapse, Azure Functions, Azure Databricks following medallion architecture principles
  • Design streaming ingestion using Azure Event Hubs, Azure Service Bus, and Databricks Structured Streaming
  • Design, develop, and deploy custom APIs using Azure Functions (Python & C#) with serverless, event-driven processing
  • Architect and implement reusable ETL/ELT boilerplate templates with custom transformation libraries (Rust & Python)
  • Architect enterprise-wide data quality frameworks with automated validation rules and custom monitoring solutions
  • Implement governance solutions using Unity Catalog with group policy enforcement, access controls, and compliance systems
  • Architect Databricks cluster optimization strategies, Delta Lake performance tuning, and cost optimization frameworks using serverless computing patterns
Full Details →

Manager - Technology Platform

Lewis Group

Jun 2020 – Jul 2023
  • Eliminated duplicate data functions across the Group and led migration from legacy platforms to modern data platforms
  • Designed and implemented a data platform and operating model supporting operational reporting and advanced analytics
  • Managed data warehouses, BI, and analytics initiatives while continuously evolving platform technologies
  • Led Revolving Credit implementation across business systems and reviewed/restructured head office databases
  • Established and built the operational team for data platform operations
  • Delivered technologies, tools, and methodologies to unlock value from data assets while ensuring regulatory and privacy compliance
Full Details →

Finance System & Data Analyst

Lewis Group

Jan 2019 – Dec 2019
  • IT14SD reconciliation between taxes, VAT, imports, customs, and PAYE for multi-year phased development
  • Revising and checking tax-related reports, creating additional reporting and models for accounting and tax purposes
  • Completion of the transfer pricing model under the guidance of the Group Technical Manager
  • Satisfying BLNS revenue authorities requirements for country-specific accounting records

IS_FBP

Vodacom

Oct 2013 – Dec 2018
  • Led deep-dive TCO investigations across the IT estate, engineering automated Python data extraction pipelines that replaced manual data wrangling
  • Owned OpEx and CapEx financial control processes, including demand vs actuals reconciliation, PO validation, and month-end variance commentary
  • Systematically rebuilt legacy Excel reporting stack using VBA automation, Python data pipelines, and programmatic executive presentation generation
  • Designed Qlik Sense dashboards with custom Angular interactive widgets for IT Key Metrics and cost efficiency benchmarking
  • Maintained IT Key Metrics aligned to industry-standard frameworks supporting IT Costing Models and global benchmarking

Key Projects

Kafka to Delta Lake Streaming Ingestion

Built a streaming ingestion service that consumes messages from Apache Kafka and writes them to Delta Lake tables on Azure Blob Storage (ABFSS). Supports JSON and Avro deserialization with JMESPath transforms for record enrichment. Buffers records in-memory by time or size, then flushes to Parquet with ACID Delta transactions. Handles schema drift by reading authoritative schema from the Delta log, coerces JSON types to match, and detects external schema changes. Implements horizontal scaling via Kafka consumer group rebalancing with per-partition offset tracking using Delta txn actions. Includes a Dead Letter Queue for failed messages, raw passthrough mode, and full config.toml / environment variable configuration.

Architecture →

Custom RPA File Processing System (Built In-House)

Designed and developed a custom RPA platform (not off-the-shelf) using Python, PostgreSQL, RabbitMQ, and Django. Manages source/destination handling, archive destinations, file extensions, user notifications, headers, file size limits, cell formats/boundaries, and delimiter/fixed-width processing — with metadata and history fully stored in PostgreSQL.

Distributed Data Processing Engine

A centralised pipeline management platform that coordinates data copy operations between SQL databases and Azure targets. Inspired by Azure Data Factory’s limitations with Self-hosted Integration Runtime, this application provides unlimited control over data extraction and transformation patterns. The same pipelines that ran on SHIR caused VMs to run at 80-100% CPU; after implementing si_app, CPU usage barely reached 20% with ingestion eight times faster on average.

Architecture →

Lightweight Distributed Message Queue

A lightweight, distributed message queue broker written in Go with zero external dependencies. Features a simple TCP protocol with human-readable telnet commands, role-based authentication with queue permissions, optional file-based persistence, peer-to-peer replication for high availability, and HTTP monitoring endpoints. Designed as a development/testing alternative to Kafka or RabbitMQ for small to medium applications.

Architecture →

IFRS 16 Amortization System & IFRS 9 Redesign

Built a complete IFRS 16 amortization engine with Django UI, summaries, and graphs supporting full file uploads and individual lease editing. Replaced the slow SAS-based manual IFRS 9 process, rebuilding entirely in MSSQL with full automation — delivering the same calculation results with significantly improved performance.

Monitoring System Architecture

A distributed monitoring solution designed to collect metrics from multiple agent types and route them to different Azure services based on message paths. The system consists of a central collector service and various specialized agents that can be deployed across different environments.

Architecture →

Skills & Competencies

Cloud & Big Data Platforms

Azure Synapse Analytics Azure Databricks Azure Data Factory Azure Functions Unity Catalog Event Hubs Service Bus

Programming & Scripting

Python C# Rust Go C PowerShell Bash Scala VBA

Databases & Message Queues

T-SQL (Sybase ASE & MSSQL) PgSQL MySQL Redis Apache Kafka RabbitMQ

Core Competencies

Azure & Databricks Architecture
End-to-End Data Platform Design
Data Governance & Lineage Frameworks
Cost-Optimized Cloud Architecture
Serverless & Event-Driven Computing
Team Mentoring & Engineering Best Practices

Education & Certifications

BComp (NQF Level 7)

Bachelor of Computing

NQF 7

IBM Applied Data Science with Python

IBM

Professional Certificate

IBM Big SQL: Analyze Big Data

IBM

Professional Certificate

Get in Touch

I'm currently available for new opportunities where I can contribute at the intersection of engineering rigour and business impact. Let's connect.