← Back to Projects

Kafka2Delta — Per-Message Processing Loop

Kafka to Delta Lake streaming ingestion service written in Rust.

Each iteration of the run loop fetches one Kafka message, checks for duplicates via offset tracking, deserialises the payload (JSON/Avro), applies JMESPath transforms, coerces field types against the Delta table schema, and buffers the result. When the buffer reaches its time or size limit, it flushes to Parquet and commits a Delta transaction. Failed messages at any stage are routed to the Dead Letter Queue with one-at-a-time commits. A rebalance handler resets state when partitions are reassigned.

Main Loop — Per-Message Processing

sequenceDiagram participant LIB as lib participant SER as serialization participant TRN as transforms participant COE as coercions participant VB as value_buffers participant WRT as writer participant DLQ as dead_letters participant MET as metrics participant KAF as kafka Note over LIB: Step 0 Check Rebalance LIB->>LIB: handle_rebalance Note over LIB: Step 1 Fetch Message LIB->>KAF: consumer stream next alt Message received Note over LIB: Step 1a Duplicate Check LIB->>LIB: should_process_offset alt Already committed LIB->>LIB: skip message end Note over LIB: Step 1b Deserialize LIB->>SER: deserialize bytes SER->>DLQ: DeadLetter on failure SER-->>LIB: Ok or Err alt Deser OK LIB->>MET: message_deserialized Note over LIB: Step 1c Transform LIB->>TRN: transform TRN-->>LIB: Ok or Err alt Transform OK LIB->>MET: message_transformed Note over LIB: Step 1d Coerce LIB->>COE: coerce value Note over LIB: Step 1e Buffer LIB->>VB: add partition offset value VB-->>LIB: Ok or AlreadyProcessed else Transform FAIL LIB->>MET: message_transform_failed LIB->>DLQ: write_dead_letter DLQ->>TRN: DLQ transform DLQ->>WRT: insert_all DLQ-->>LIB: Ok end else Deser FAIL LIB->>MET: message_deserialization_failed LIB->>DLQ: write_dead_letter DLQ-->>LIB: Ok else Empty payload LIB->>LIB: warn skip no DLQ end else Timeout Note over LIB: Latency Timer Expired LIB->>LIB: latency_timer_expired true end

Processing Steps

  1. Rebalance check — Reset state if Kafka partitions were reassigned
  2. Fetch — Pull next message from the consumer stream
  3. Dedup — Skip if this offset was already committed via Delta txn
  4. Deserialize — Parse bytes as JSON, Avro, or raw passthrough
  5. Transform — Apply JMESPath expressions to enrich or reshape
  6. Coerce — Cast field types to match the authoritative Delta table schema
  7. Buffer — Stage the record until time/size threshold triggers a flush
  8. Dead Letter — Any failure above routes to DLQ with immediate commit
← Back to Projects