A dbt project demonstrating agricultural data analytics using Trino with 1 million+ rows of synthetic agriscience data.
This project contains:
- 1 million rows of crop yield transaction data
- 5 source tables (crop_yields, farms, crops, seasons, regions)
- 5 staging models for data preparation
- 3 advanced marts models with complex transformation logic:
- Aggregated yield analytics
- Farm performance metrics with YoY trends
- Crop rotation and sustainability analysis
.
├── dbt_project.yml # Main dbt project configuration
├── profiles.yml # Trino connection profile (reference)
├── models/
│ ├── staging/ # Staging models (views)
│ │ ├── sources.yml # Source table definitions
│ │ ├── stg_crop_yields.sql
│ │ ├── stg_farms.sql
│ │ ├── stg_crops.sql
│ │ ├── stg_seasons.sql
│ │ └── stg_regions.sql
│ └── marts/ # Final analytical models (tables)
│ ├── schema.yml # Model documentation and tests
│ ├── yield_summary.sql
│ ├── farm_performance_metrics.sql
│ └── crop_rotation_analysis.sql
├── seeds/ # CSV seed files (if any)
├── tests/ # Custom data tests
├── macros/ # Custom SQL macros
└── README.md # This file
-
crop_yields (1,000,000 rows) - Transactional data
- yield_id, farm_id, crop_id, season_id, region_id
- harvest_date, acres_planted, actual_yield_per_acre, total_yield
- quality_grade, moisture_content, field_number
-
farms (500 rows) - Farm information
- farm_id, farm_name, region_id, total_acres, organic_certified
-
crops (15 rows) - Crop types
- crop_id, crop_name, crop_category, avg_yield_per_acre
-
seasons (15 rows) - Planting/harvest seasons (2020-2024)
- season_id, year, season_name, start_date, end_date
-
regions (50 rows) - Agricultural regions across 10 Midwest states
- region_id, region_name, state, avg_rainfall_inches, avg_temp_f
-
yield_summary - Aggregated yield analytics with:
- Regional breakdowns (state, region)
- Crop analysis (name, category)
- Temporal grouping (year, season)
- Farm type comparison (organic vs conventional)
- Key metrics: total acres, average yields, production totals
-
farm_performance_metrics - Comprehensive farm performance analysis featuring:
- Year-over-year growth rates (production & yield)
- 3-year rolling averages for trend analysis
- Regional rankings and percentile scores
- Performance classification (Top Performer, Above Average, etc.)
- Efficiency metrics vs regional benchmarks
- Organic vs conventional peer comparisons
- Land utilization and crop diversity metrics
-
crop_rotation_analysis - Advanced sustainability analytics including:
- Shannon and Simpson diversity indices
- Crop rotation pattern analysis (monoculture vs rotation)
- Sequential crop sequencing by field over time
- Rotation effectiveness scoring (1-5 scale)
- Yield impact analysis (rotation vs monoculture)
- Sustainability score (0-100) based on rotation practices
- Rotation strategy classification
- Soil health indicators derived from planting patterns
pip install dbt-trinoCopy profiles.yml to ~/.dbt/profiles.yml and update with your Trino connection details:
trino_demo:
target: dev
outputs:
dev:
type: trino
method: none
user: your_username
host: your_trino_host
port: 8080
database: your_catalog
schema: dbt
threads: 4
http_scheme: httpOr set environment variables:
export TRINO_USER=your_username
export TRINO_HOST=your_trino_host
export TRINO_PORT=8080
export TRINO_CATALOG=your_catalogcd /data/leo/demo
dbt debug# Compile models to check for errors
dbt compile
# Run all models
dbt run
# Run specific model
dbt run --select yield_summary
# Run tests
dbt test
# Generate documentation
dbt docs generate
dbt docs servedbt run --select staging.*dbt run --select marts.*dbt run --full-refreshdbt test --select yield_summarydbt build- Time Range: 2020-2024 (5 years, 15 seasons)
- Geographic Coverage: 50 regions across 10 Midwest states
- Crop Types: 15 varieties (Grains, Legumes, Oilseeds, Fiber, Vegetables, Forage, Roots)
- Farm Types: Organic certified and conventional farms
- Quality Grades: A through E
- Field Distribution: 1-50 fields per record
- Moisture Content: 8-20%
After running dbt run, you can query the output:
-- Top producing regions by crop
SELECT
region_name,
crop_name,
total_yield
FROM dbt_marts.yield_summary
ORDER BY total_yield DESC
LIMIT 10;
-- Organic vs conventional comparison
SELECT
organic_certified,
AVG(avg_yield_per_acre) as avg_yield,
SUM(total_acres_planted) as total_acres
FROM dbt_marts.yield_summary
GROUP BY organic_certified;
-- Seasonal trends
SELECT
year,
season_name,
SUM(total_yield) as total_production
FROM dbt_marts.yield_summary
GROUP BY year, season_name
ORDER BY year, season_name;-- Top performing farms by efficiency
SELECT
farm_name,
region_name,
year,
avg_yield_per_acre,
yield_percentile,
performance_tier
FROM dbt_marts.farm_performance_metrics
WHERE year = 2024
ORDER BY yield_percentile DESC
LIMIT 10;
-- Year-over-year growth analysis
SELECT
farm_name,
year,
total_production,
yoy_production_growth_pct,
rolling_3yr_avg_production
FROM dbt_marts.farm_performance_metrics
WHERE farm_id = 1
ORDER BY year;
-- Organic vs conventional performance comparison
SELECT
organic_certified,
AVG(avg_yield_per_acre) as avg_efficiency,
AVG(yield_vs_regional_avg_pct) as avg_vs_regional_pct,
COUNT(DISTINCT farm_id) as farm_count
FROM dbt_marts.farm_performance_metrics
WHERE year = 2024
GROUP BY organic_certified;-- Sustainability leaders
SELECT
farm_name,
region_name,
year,
sustainability_score,
diversity_classification,
rotation_strategy,
shannon_diversity_index
FROM dbt_marts.crop_rotation_analysis
WHERE year = 2024
ORDER BY sustainability_score DESC
LIMIT 10;
-- Rotation yield benefit analysis
SELECT
rotation_strategy,
AVG(avg_yield_monoculture) as avg_monoculture_yield,
AVG(avg_yield_rotated) as avg_rotation_yield,
AVG(rotation_yield_benefit_pct) as avg_benefit_pct,
COUNT(*) as farm_count
FROM dbt_marts.crop_rotation_analysis
WHERE year = 2024
GROUP BY rotation_strategy;
-- Diversity and performance correlation
SELECT
diversity_classification,
AVG(sustainability_score) as avg_sustainability,
AVG(unique_crops) as avg_crop_count,
AVG(rotation_pct) as avg_rotation_pct
FROM dbt_marts.crop_rotation_analysis
GROUP BY diversity_classification
ORDER BY avg_sustainability DESC;- Verify Trino is running and accessible
- Check credentials in profiles.yml
- Ensure catalog and schema exist
- Run
dbt compileto see detailed error messages - Check that all source tables exist in the
dbtschema - Verify Trino SQL syntax compatibility
- Consider adding indexes on foreign key columns
- Adjust thread count in profiles.yml
- Use
dbt run --selectfor specific models
- ✅ Sophisticated window functions (rolling averages, year-over-year comparisons, rankings)
- ✅ Complex transformation logic (diversity indices, rotation patterns)
- ✅ Comprehensive data quality tests (38 tests across sources and models)
- ✅ Performance tier classifications and percentile rankings
- ✅ Sustainability scoring algorithms
- Add incremental models for near real-time updates
- Implement custom macros for common agriscience calculations
- Create materialized views for frequently accessed aggregations
- Add snapshot models for slowly changing dimensions
- Build dashboard-ready models optimized for specific BI tools
For dbt documentation: https://docs.getdbt.com/ For Trino SQL reference: https://trino.io/docs/current/
Project: Trino dbt Demo 0.1 Data Context: Agriscience Source Type: Trino Models: 8 (5 staging + 3 marts) Tests: 38 Source Rows: 1,000,000+