Conversation
| static final String FEATURE_TABLE_LABEL = LABEL_PREFIX + "table"; | ||
| static final String FEATURE_TABLE_HASH_LABEL = LABEL_PREFIX + "hash"; | ||
| static final String PROJECT_LABEL = LABEL_PREFIX + "project"; | ||
| static final String RECORD_JOB_LABEL = LABEL_PREFIX + "record-id"; |
There was a problem hiding this comment.
This label stores the batch job record id, which is unique to each job.
| .getMetadata() | ||
| .getLabels() | ||
| .containsKey("sparkoperator.k8s.io/scheduled-app-name")) { | ||
| // get sparkapplication name suffix |
There was a problem hiding this comment.
This step is performed because the SparkApplication created from the ScheduledSparkapplication will contain the same set of labels passed to the ScheduledSparkapplication. Since the record-id is passed only once, we need another way to differentiate Sparkapplications created from the same ScheduledSparkapplication in the db.
This is where the suffix in the Sparkapplication name is used, since all Sparkapplications created from a ScheduledSparkapplication has the format caraml-65d8c6f9340adee8-1750762810975636102, where 1750762810975636102 is unique.
| public void watchSparkApplications(String namespace, String labelSelector) { | ||
| Watchable<SparkApplication> watch; | ||
| try { | ||
| watch = sparkOperatorApi.watch(namespace, labelSelector); |
There was a problem hiding this comment.
The label selector here is used to only record Sparkapps that have the specific labels:
caraml.dev/type=Batch_INGESTION_JOBcaraml.dev/record-id
Summary
BatchJobRecordfor users to inspect batch ingestion jobs triggered for each feature tableImplementation
DB Schema changes
db/migration/V3__BatchJobRecord.sqlBatchJobRecordandJobis that aJobreflects real time data in the Kubernetes Cluster, whereas a BatchJobRecord refers to data persisted in the DB.SparkApplication changes
caraml.dev/record: <some-uuid>. This uuid is used as the BatchJobRecord ID in the DBSparkApplication watcher
caraml.dev/recordlabel.Python SDK
Others