aws-glue-datalake

Using Pyspark with AWS Glue to build an analytics data lake using the Ticketmaster Discovery API. The process is orchestrated by a Glue workflow, and then the final data can be queried using Athena.

Glue workflow steps

Download raw data from Ticketmaster APIs a. events-raw job: gets event data b. venues-raw job: gets venue data c. both jobs run in parallel and write to json files partitioned by the process_date
Clean raw data a. events-processed job: performs transforms on events data to make it more consumable b. venues-processed job: same as above for venues c. each job starts after successful completion of its "raw" predecessor d. jobs write to parquet, partitioned by process_date
Crawl processed data a. Glue crawler refreshes schema and partitions for ticketmaster.events and ticketmaster.venues tables b. crawler is triggered after both "processed" jobs complete successfully

Deployment

The process of creating the supporting AWS resources and uploading code artifacts for jobs is managed through Terraform.

Pre-setup

Before running this, create an S3 state bucket and a DynamoDB table for state locking manually or in a different repo. The DynamoDB table must have PK = "LockID" of type string.

Create a file called terraform/backend-demo.tfvars with the following lines:

bucket         = "{name of your state bucket}"
key            = "{s3 key for your Terraform state file}"
region         = "{AWS region for the bucket}"
dynamodb_table = "{name of your state lock DDB table}"

Ticketmaster API Key

To create a personal (free) Ticketmaster Discovery API key, follow the steps on https://developer.ticketmaster.com/products-and-docs/apis/getting-started/ to create a key. After you've created your key, save the value to load into AWS secrets manager later.

Terraform Steps

Before running, ensure that you have valid AWS credentials in your terminal and a role that allows you to deploy all the resources contained.

Initialize Only run this when you first setup:

./terraform/scripts/init.sh

Plan Re-run anytime your configuration changes (including scripts):

./terraform/scripts/plan.sh

Apply If the changes outlined in your plan look good, apply them:

./terraform/scripts/apply.sh

(First time only) Set secret value for API key After creating your Ticketmaster API key and running terraform apply, set the value of ticketmaster-api-key-{env} as:

{
  "consumer_key": "your consumer key value",
  "consumer_secret": "your consumer secret value"
}

replacing the values with your real ones.

If the apply succeeded and you loaded the secret, your jobs should be ready to run.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
glue		glue
terraform		terraform
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

aws-glue-datalake

Glue workflow steps

Deployment

Pre-setup

Ticketmaster API Key

Terraform Steps

About

Uh oh!

Releases

Packages

Languages

StLWallace/aws-glue-datalake

Folders and files

Latest commit

History

Repository files navigation

aws-glue-datalake

Glue workflow steps

Deployment

Pre-setup

Ticketmaster API Key

Terraform Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages