Description
Build a scraper for the Texas Advanced Computing Center events page at tacc.org/tacc/events. Lower volume than the other sources but valuable for research-oriented events, workshops, and training sessions.
Stack
- Language: TypeScript
- Runtime: Cloudflare Workers (scheduled cron trigger)
- Storage: Cloudflare D1
- Client surface: React Native app
Hi-Fi UI Requirements (what the design demands from the scraper)
Events from TACC render in the same flyer-first card system as all other sources. Many TACC events are multi-day workshops and often have no flyer, so the no-flyer fallback variant matters here.
- Flyer image is the hero when available. Cards support vertical, square, horizontal, and no-flyer variants. We need:
image_url at highest resolution available (prefer original upload over thumbnail).
image_width, image_height, and image_aspect_ratio (vertical / square / horizontal / none).
image_mime_type and image_alt_text when present.
- Expect a meaningful share of TACC events to resolve to
image_aspect_ratio = "none" — the no-flyer card must still render cleanly with the data below.
- "Posted by [Org]" is shown.
host_organization defaults to "TACC" but should capture the sub-program (e.g., "Frontera", "Stampede3", "TACC Institute") when listed. Populate host_organization_slug for routing.
- Card shows date + short time. Store
start_datetime / end_datetime as ISO 8601 (America/Chicago). Multi-day workshops must preserve full start/end — the client will render a range (e.g., Mon 3/2 – Fri 3/6).
- Card shows a short location string (e.g.,
ACES 2.302 or Virtual). Store location_short (≤ 40 chars) and location_full.
- Interest grouping. TACC events map to research-oriented interests (training, symposium, workshop). Capture
categories accurately so they flow into the interests system.
Scope
- Crawl the TACC events listing and paginate through all upcoming events.
- Extract the following fields per event:
title
description
start_datetime / end_datetime (ISO 8601, America/Chicago — handle multi-day workshops)
location_short, location_full
host_organization (default: "TACC"; capture sub-program if listed), host_organization_slug
event_url
image_url, image_width, image_height, image_aspect_ratio, image_mime_type, image_alt_text
categories (e.g., training, symposium, workshop)
registration_url if separate
source = "tacc"
- Map into the shared D1
events schema.
- Deduplicate via a stable
source_event_id.
- Upsert into D1.
Deliverables
scrapers/tacc.ts worker module exporting a run() entrypoint.
- Unit tests with saved fixtures covering: single-day event, multi-day workshop, virtual event, event with no flyer (exercise no-flyer card variant).
- Dry-run script.
- Per-event error isolation.
Acceptance Criteria
- ≥ 95% of listed TACC events are captured per run.
- Multi-day events are stored with correct start/end datetimes and render as a date range on the client.
image_aspect_ratio correctly classified (including "none") for ≥ 95% of events.
location_short renders in ≤ 40 chars.
- No duplicates across repeat runs.
- D1 row count for
source = "tacc" matches site listings (±5%).
- CI passes lint, typecheck, and tests.
Out of Scope
- UI work in the React Native app.
- Integration with TACC account / allocation data.
- Cross-source deduplication.
Description
Build a scraper for the Texas Advanced Computing Center events page at
tacc.org/tacc/events. Lower volume than the other sources but valuable for research-oriented events, workshops, and training sessions.Stack
Hi-Fi UI Requirements (what the design demands from the scraper)
Events from TACC render in the same flyer-first card system as all other sources. Many TACC events are multi-day workshops and often have no flyer, so the no-flyer fallback variant matters here.
image_urlat highest resolution available (prefer original upload over thumbnail).image_width,image_height, andimage_aspect_ratio(vertical / square / horizontal /none).image_mime_typeandimage_alt_textwhen present.image_aspect_ratio = "none"— the no-flyer card must still render cleanly with the data below.host_organizationdefaults to"TACC"but should capture the sub-program (e.g., "Frontera", "Stampede3", "TACC Institute") when listed. Populatehost_organization_slugfor routing.start_datetime/end_datetimeas ISO 8601 (America/Chicago). Multi-day workshops must preserve full start/end — the client will render a range (e.g.,Mon 3/2 – Fri 3/6).ACES 2.302orVirtual). Storelocation_short(≤ 40 chars) andlocation_full.categoriesaccurately so they flow into the interests system.Scope
titledescriptionstart_datetime/end_datetime(ISO 8601, America/Chicago — handle multi-day workshops)location_short,location_fullhost_organization(default: "TACC"; capture sub-program if listed),host_organization_slugevent_urlimage_url,image_width,image_height,image_aspect_ratio,image_mime_type,image_alt_textcategories(e.g., training, symposium, workshop)registration_urlif separatesource="tacc"eventsschema.source_event_id.Deliverables
scrapers/tacc.tsworker module exporting arun()entrypoint.Acceptance Criteria
image_aspect_ratiocorrectly classified (including"none") for ≥ 95% of events.location_shortrenders in ≤ 40 chars.source = "tacc"matches site listings (±5%).Out of Scope