← Back to Projects
Kafka2Delta — Per-Message Processing Loop
Kafka to Delta Lake streaming ingestion service written in Rust.
Each iteration of the run loop fetches one Kafka message, checks for duplicates via offset tracking, deserialises the payload (JSON/Avro), applies JMESPath transforms, coerces field types against the Delta table schema, and buffers the result. When the buffer reaches its time or size limit, it flushes to Parquet and commits a Delta transaction. Failed messages at any stage are routed to the Dead Letter Queue with one-at-a-time commits. A rebalance handler resets state when partitions are reassigned.
Main Loop — Per-Message Processing
sequenceDiagram
participant LIB as lib
participant SER as serialization
participant TRN as transforms
participant COE as coercions
participant VB as value_buffers
participant WRT as writer
participant DLQ as dead_letters
participant MET as metrics
participant KAF as kafka
Note over LIB: Step 0 Check Rebalance
LIB->>LIB: handle_rebalance
Note over LIB: Step 1 Fetch Message
LIB->>KAF: consumer stream next
alt Message received
Note over LIB: Step 1a Duplicate Check
LIB->>LIB: should_process_offset
alt Already committed
LIB->>LIB: skip message
end
Note over LIB: Step 1b Deserialize
LIB->>SER: deserialize bytes
SER->>DLQ: DeadLetter on failure
SER-->>LIB: Ok or Err
alt Deser OK
LIB->>MET: message_deserialized
Note over LIB: Step 1c Transform
LIB->>TRN: transform
TRN-->>LIB: Ok or Err
alt Transform OK
LIB->>MET: message_transformed
Note over LIB: Step 1d Coerce
LIB->>COE: coerce value
Note over LIB: Step 1e Buffer
LIB->>VB: add partition offset value
VB-->>LIB: Ok or AlreadyProcessed
else Transform FAIL
LIB->>MET: message_transform_failed
LIB->>DLQ: write_dead_letter
DLQ->>TRN: DLQ transform
DLQ->>WRT: insert_all
DLQ-->>LIB: Ok
end
else Deser FAIL
LIB->>MET: message_deserialization_failed
LIB->>DLQ: write_dead_letter
DLQ-->>LIB: Ok
else Empty payload
LIB->>LIB: warn skip no DLQ
end
else Timeout
Note over LIB: Latency Timer Expired
LIB->>LIB: latency_timer_expired true
end
Processing Steps
- Rebalance check — Reset state if Kafka partitions were reassigned
- Fetch — Pull next message from the consumer stream
- Dedup — Skip if this offset was already committed via Delta
txn - Deserialize — Parse bytes as JSON, Avro, or raw passthrough
- Transform — Apply JMESPath expressions to enrich or reshape
- Coerce — Cast field types to match the authoritative Delta table schema
- Buffer — Stage the record until time/size threshold triggers a flush
- Dead Letter — Any failure above routes to DLQ with immediate commit