feat: batch job history by shydefoo · Pull Request #150 · caraml-dev/caraml-store

shydefoo · 2025-06-06T10:07:59Z

Summary

This MR adds the feature to record each ingestion job as a batch job record in the database
The SDK is modified to retrieve a list of BatchJobRecord for users to inspect batch ingestion jobs triggered for each feature table
This capability can be extended to support historical retrieval jobs as well

Implementation

DB Schema changes

New table added, the changes can be found in db/migration/V3__BatchJobRecord.sql
The difference between a BatchJobRecord and Job is that a Job reflects real time data in the Kubernetes Cluster, whereas a BatchJobRecord refers to data persisted in the DB.

SparkApplication changes

New label added to each sparkapplication when it is created or updated: caraml.dev/record: <some-uuid>. This uuid is used as the BatchJobRecord ID in the DB
Each time the sparkapplication is created or updated, this label is updated, ensuring that each time an ingestion job is triggered, a new uuid is used, creating a new record in the DB
This also applies to SparkApplications created from a ScheduledSparkapplication

SparkApplication watcher

The watcher is run in a separate thread, and opens a Watcher to receive events from the K8s api-server
The watcher only receives events from SparkApplications that have the caraml.dev/record label.
Based on the events, the watcher will update the DB based on the SparkApplication state

Python SDK

New SDK method is added to list BatchJobRecords, with optional parameters start and end to filter BatchJobRecords within a certain time window.
Sample response:

start = datetime.now() - timedelta(days=1)
client = Client("localhost:6565")
response = client.list_batch_job_records("sample", "jaeger_driver_quality_hbase_2", start)
>> response[0]
id: "793df3fe-4c89-49ae-a9db-4b65ee1c8b85-1750832112447993441"
job_id: "caraml-0f5b6f4f4d69b7cb-1750832112447993441"
type: BATCH_INGESTION_JOB
status: JOB_STATUS_PENDING
job_start_time {
  seconds: 1750832115
}
job_end_time {
  seconds: -1
}
batch_ingestion {
  table_name: "jaeger_driver_quality_hbase_2"
  project: "sample"
  start_time_param {
    seconds: 1750659325
  }
  end_time_param {
    seconds: 1750832125
  }
}
spark_app_manifest: "...."

Others

I'm no java expert, so any feedback to improve code structure / best practices is welcomed

shydefoo · 2025-06-25T06:47:07Z

caraml-store-registry/src/main/java/dev/caraml/store/sparkjob/JobService.java

  static final String FEATURE_TABLE_LABEL = LABEL_PREFIX + "table";
  static final String FEATURE_TABLE_HASH_LABEL = LABEL_PREFIX + "hash";
  static final String PROJECT_LABEL = LABEL_PREFIX + "project";
+  static final String RECORD_JOB_LABEL = LABEL_PREFIX + "record-id";


This label stores the batch job record id, which is unique to each job.

shydefoo · 2025-06-25T06:50:09Z

caraml-store-registry/src/main/java/dev/caraml/store/sparkjob/JobService.java

+          .getMetadata()
+          .getLabels()
+          .containsKey("sparkoperator.k8s.io/scheduled-app-name")) {
+        // get sparkapplication name suffix


This step is performed because the SparkApplication created from the ScheduledSparkapplication will contain the same set of labels passed to the ScheduledSparkapplication. Since the record-id is passed only once, we need another way to differentiate Sparkapplications created from the same ScheduledSparkapplication in the db.

This is where the suffix in the Sparkapplication name is used, since all Sparkapplications created from a ScheduledSparkapplication has the format caraml-65d8c6f9340adee8-1750762810975636102, where 1750762810975636102 is unique.

shydefoo · 2025-06-25T06:51:24Z

caraml-store-registry/src/main/java/dev/caraml/store/sparkjob/JobService.java

+  public void watchSparkApplications(String namespace, String labelSelector) {
+    Watchable<SparkApplication> watch;
+    try {
+      watch = sparkOperatorApi.watch(namespace, labelSelector);


The label selector here is used to only record Sparkapps that have the specific labels:

caraml.dev/type=Batch_INGESTION_JOB

caraml.dev/record-id

shydefoo added 11 commits June 5, 2025 16:32

init commit for batch job history

e38a3c1

Fix build issues

d2d9a3f

Fix time format

b2ef630

Implement watcher, change id type to string

608908a

Set watcher to be async

fb084c8

Update record on MODIFIED event

e3d116f

Fix formatting

2bfda64

Add BatchJobRecord protobuf

31f5ba2

Add grpc service to list batch job records

d1e57be

Update proto spec to list batch job records

3da6af7

Add status to sparkapp cr, fix logic when updating db

321a806

shydefoo changed the title ~~Add batch job history~~ feat: batch job history Jun 10, 2025

shydefoo added 2 commits June 11, 2025 14:48

Update feast spark proto, add python & go generated code

4775d55

Fix formatting

474f36b

shydefoo requested review from deadlycoconuts, mbruner and vinoth-gojek June 24, 2025 09:25

shydefoo self-assigned this Jun 24, 2025

shydefoo requested review from anantadwi13, andreas-aji, bthari and naufalandika June 24, 2025 09:29

shydefoo marked this pull request as ready for review June 25, 2025 02:07

shydefoo added 3 commits June 25, 2025 12:00

Handle ingestionTimestamp for multiple days

5c1accd

Handle sparkapps created from scheduledsparkapplications

b3fd2bb

Fix formatting

efde4b6

shydefoo commented Jun 25, 2025

View reviewed changes

Add constructor for SparkApplicationState

ed20863

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: batch job history#150

feat: batch job history#150
shydefoo wants to merge 17 commits intomainfrom
add-batch-job-history

shydefoo commented Jun 6, 2025 •

edited

Loading

Uh oh!

shydefoo Jun 25, 2025

Uh oh!

shydefoo Jun 25, 2025

Uh oh!

shydefoo Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shydefoo commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implementation

DB Schema changes

SparkApplication changes

SparkApplication watcher

Python SDK

Others

Uh oh!

shydefoo Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

shydefoo Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

shydefoo Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shydefoo commented Jun 6, 2025 •

edited

Loading