-
Notifications
You must be signed in to change notification settings - Fork 112
feat: Add Twitter/X post support (daily-api) #3437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add infrastructure for PostgreSQL schema isolation to enable parallel Jest workers within CI jobs. Each worker gets its own schema to prevent data conflicts between tests. Changes: - Add TYPEORM_SCHEMA env var support and auto-schema selection based on JEST_WORKER_ID when ENABLE_SCHEMA_ISOLATION=true - Set PostgreSQL search_path at connection level for raw SQL queries - Add createWorkerSchema() to copy table structures, views, and migrations data from public schema to worker schemas - Use pg_get_serial_sequence() for sequence resets to handle different sequence naming conventions Known limitation: Database triggers are not copied as they reference functions in the public schema. Schema isolation is opt-in via ENABLE_SCHEMA_ISOLATION=true environment variable. Addresses ENG-283
Enable parallel test execution within CI jobs by giving each Jest worker its own PostgreSQL schema. This significantly improves test throughput. Changes: - Update CircleCI to use --maxWorkers=4 with ENABLE_SCHEMA_ISOLATION=true - Add test:parallel npm script for local parallel test execution - Enhance createWorkerSchema() to copy: - Table structures (LIKE ... INCLUDING ALL) - Views with schema references updated - Materialized views with schema references updated - All user-defined functions with schema references updated - Triggers with schema and function references updated The schema isolation copies all database objects from public schema to worker-specific schemas (test_worker_1, test_worker_2, etc.), allowing tests to run in parallel without data conflicts. Addresses ENG-284
Fixes several issues with PostgreSQL schema isolation for parallel Jest workers: 1. FK constraint copying: Tables copied with INCLUDING ALL don't include FK constraints. Now explicitly copy FK constraints with correct schema references so CASCADE and SET NULL actions work properly. 2. Seed data copying: Copy critical seed data (ghost user '404', system user, system sources, etc.) to worker schemas so tests don't fail when expecting these records. 3. Trigger function search_path: Add SET search_path clause to plpgsql functions so unqualified table names in trigger bodies resolve to the correct worker schema instead of defaulting to public. 4. Hardcoded schema references: Remove explicit 'public.' references from cron jobs (updateViews, updateDiscussionScore, checkReferralReminder) so they work with schema isolation. 5. Increased beforeAll timeout to 60s to accommodate FK constraint copying. Test results with schema isolation: 180/198 test suites pass (3785/3916 tests).
Prevent deletion of predefined seed/reference data tables during test cleanup to maintain test stability and ensure critical data remains intact.
When CREATE TABLE ... LIKE ... INCLUDING ALL copies tables, column
defaults still reference the original public schema sequences. This
caused FK constraint violations when tests used TypeORM's save() with
@PrimaryGeneratedColumn('increment') - the database used the wrong
sequence position instead of starting at 1.
Changes:
- Create new sequences in worker schemas and update column defaults
- Remove seed data copying for tables where tests create own fixtures
(advanced_settings, source_category, prompt)
- Use schema-qualified table names in sequence reset logic
…ions PostgreSQL's pg_matviews.definition returns normalized SQL where table names appear unqualified, but internally retains OID references to the original tables. Simply setting search_path before CREATE VIEW didn't work - views still bound to public schema tables. Solution: Explicitly replace all FROM/JOIN table references with schema-qualified versions using regex patterns. This handles: - FROM tablename - JOIN tablename - FROM (tablename alias - PostgreSQL's parenthesized JOIN format This fixes materialized views like trending_post, trending_tag, and tag_recommendation to correctly query worker schema tables instead of public schema tables. Test results: tags.ts now passes 15/15 (was 9/15 before fix)
Replace the approach of copying schema structure with running actual migrations for worker schemas. This ensures exact parity with how the schema was built. Key changes: - Add replaceSchemaReferences() to transform public schema refs in migrations - Add wrapQueryRunner() to intercept SQL queries during migration execution - Fix migration ordering to use 13-digit timestamp extraction - Reduce pool size to 10 for tests to avoid connection exhaustion - Replace flushall() with targeted deleteKeysByPattern() in boot.ts - Skip pub/sub test in parallel mode (channels can't be worker-isolated) Results: 197/198 test suites pass consistently with 2 parallel workers
…iency - Create __tests__/globalSetup.ts to run migrations once before all workers - Remove dead createWorkerSchema code from setup.ts (now in globalSetup) - Add globalSetup to jest.config.js This prevents each Jest worker from running migrations independently, reducing memory usage and avoiding SIGKILL/OOM issues in CI.
- Add TweetPost entity with tweet-specific columns - Include tweetId, author info, content, media, thread support - Export from posts index ENG-301
- Add tweet-specific columns (tweetId, author info, content, media, thread) - Add unique index on tweetId ENG-306
- Add TweetMedia and TweetData GraphQL types - Add tweet-specific fields to Post type (tweetId, author info, content, media, thread) ENG-303
- Add Twitter URL pattern detection (twitter.com, x.com) - Add extractTweetId and extractTweetInfo functions - Route Twitter URLs to create TweetPost instead of ArticlePost ENG-304
- Add TweetPost to contentTypeFromPostType mapping - Add tweet-specific fields to Data interface - Update fixData to populate tweet fields when content_type is tweet ENG-305
|
🍹 The Update (preview) for dailydotdev/api/prod (at ba5f39b) was successful. Resource Changes Name Type Operation
+ vpc-native-api-clickhouse-migration-2969532d kubernetes:batch/v1:Job create
~ vpc-native-hourly-notification-cron kubernetes:batch/v1:CronJob update
~ vpc-native-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-bg-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-personalized-digest-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-clean-zombie-images-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-trending-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-views-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-tags-str-cron kubernetes:batch/v1:CronJob update
+ vpc-native-api-db-migration-2969532d kubernetes:batch/v1:Job create
~ vpc-native-clean-stale-user-transactions-cron kubernetes:batch/v1:CronJob update
~ vpc-native-post-analytics-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-source-public-threshold-cron kubernetes:batch/v1:CronJob update
~ vpc-native-private-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-ws-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-temporal-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-daily-digest-cron kubernetes:batch/v1:CronJob update
~ vpc-native-validate-active-users-cron kubernetes:batch/v1:CronJob update
- vpc-native-api-clickhouse-migration-ffae6b22 kubernetes:batch/v1:Job delete
~ vpc-native-calculate-top-readers-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-user-companies-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-tag-recommendations-cron kubernetes:batch/v1:CronJob update
~ vpc-native-user-profile-updated-sync-cron kubernetes:batch/v1:CronJob update
+- vpc-native-k8s-secret kubernetes:core/v1:Secret create-replacement
~ vpc-native-generic-referral-reminder-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-current-streak-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-highlighted-views-cron kubernetes:batch/v1:CronJob update
- api-sub-api.parse-opportunity-feedback gcp:pubsub/subscription:Subscription delete
~ vpc-native-clean-gifted-plus-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-source-tag-view-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-opportunities-cron kubernetes:batch/v1:CronJob update
~ vpc-native-post-analytics-history-day-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-generate-search-invites-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-users-cron kubernetes:batch/v1:CronJob update
~ vpc-native-sync-subscription-with-cio-cron kubernetes:batch/v1:CronJob update
- vpc-native-api-db-migration-ffae6b22 kubernetes:batch/v1:Job delete
~ vpc-native-check-analytics-report-cron kubernetes:batch/v1:CronJob update
~ vpc-native-personalized-digest-cron kubernetes:batch/v1:CronJob update
|
| otelExporterOtlpEndpoint: http://otel-collector.local.svc.cluster.local:4318/v1/traces | ||
| otelTracesSampler: always_on | ||
| paddleApiKey: topsecret | ||
| paddleApiKey: pdl_sdbx_apikey_01kdq5zxjqkw13cnqcfrx8zqf9_w4972CNdrYn296TxNRffP7_AII |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future, you can add your local variables to .env inside .infra, and pulumi+tilt will automagically sync it to adhoc environment.
| @ChildEntity(PostType.Tweet) | ||
| export class TweetPost extends Post { | ||
| @Column({ type: 'text' }) | ||
| @Index({ unique: true }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @Index({ unique: true }) | |
| @Index('IDX_post_tweetId', { unique: true }) |
Ensure the name matches the one in migration
| @Index({ unique: true }) | ||
| tweetId: string; | ||
|
|
||
| @Column({ type: 'text' }) | ||
| tweetAuthorUsername: string; | ||
|
|
||
| @Column({ type: 'text' }) | ||
| tweetAuthorName: string; | ||
|
|
||
| @Column({ type: 'text', nullable: true }) | ||
| tweetAuthorAvatar?: string; | ||
|
|
||
| @Column({ type: 'boolean', default: false }) | ||
| tweetAuthorVerified: boolean; | ||
|
|
||
| @Column({ type: 'text' }) | ||
| tweetContent: string; | ||
|
|
||
| @Column({ type: 'text', nullable: true }) | ||
| tweetContentHtml?: string; | ||
|
|
||
| @Column({ type: 'jsonb', nullable: true }) | ||
| tweetMedia?: TweetMedia[]; | ||
|
|
||
| @Column({ type: 'timestamp', nullable: true }) | ||
| tweetCreatedAt?: Date; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it be a jsonb column? There's already 58 columns :lolsob:
Alternatively a new table that this links to?
Summary
Test plan