Easily find and add relevant features to your ML & AI pipeline from hundreds of public, community and premium external data sources, including open & commercial LLMs
🚀 Awesome features of Upgini Python Library
⭐️ Automatically find only relevant features that give accuracy improvement for ML model. Not just correlated with target variable
⭐️ Automated feature generation from the sources: feature generation with Large Language Models' data augmentation, RNNs, GraphNN; multiple data source ensembling
⭐️ Automatic search key augmentation from all connected sources. If you do not have all search keys in your search request, such as postal/zip code, Upgini will try to add those keys based on the provided set of search keys. This will broaden the search across all available data sources
⭐️ Calculate accuracy metrics and uplifts after enrichment existing ML model with external features
⭐️ Check the stability of accuracy gain from external data on out-of-time intervals and verification datasets. Mitigate risks of unstable external data dependencies in ML pipeline
⭐️ Easy to use - single request to enrich training dataset with all of the keys at once: date/datetime, country, postal/ZIP code, country, phone number, hashed email/HEM, IP-address
⭐️ Simple Drag & Drop Search UI:

| Data source | Countries | History, years | # sources for ensemble | Update | Search keys | API Key required |
|---|---|---|---|---|---|---|
| Historical weather & Climate normals | 68 | 22 | - | Monthly | date, country, postal/ZIP code | No |
| Location/Places/POI/Area/Proximity information from OpenStreetMap | 221 | 2 | - | Monthly | date, country, postal/ZIP code | No |
| International holidays & events, Workweek calendar | 232 | 22 | - | Monthly | date, country | No |
| Consumer Confidence index | 44 | 22 | - | Monthly | date, country | No |
| World economic indicators | 191 | 41 | - | Monthly | date, country | No |
| Markets data | - | 17 | - | Monthly | date, datetime | No |
| World mobile & fixed broadband network coverage and perfomance | 167 | - | 3 | Monthly | country, postal/ZIP code | No |
| World demographic data | 90 | - | 2 | Annual | country, postal/ZIP code | No |
| World house prices | 44 | - | 3 | Annual | country, postal/ZIP code | No |
| Public social media profile data | 104 | - | - | Monthly | date, email/HEM, phone | Yes |
| Car ownership data and Parking statistics | 3 | - | - | Annual | country, postal/ZIP code, email/HEM, phone | Yes |
| Geolocation profile for phone & IPv4 & email | 239 | - | 6 | Monthly | date, email/HEM, phone, IPv4 | Yes |
👉 Details on datasets and features
We maintain a fork of MLE-bench that compares agent performance on tabular data. It uses exactly the same setup as the original benchmark from OpenAI and differs just in the leaderboard view. We focus on tabular tasks and use normalized score instead of medal percentage to compare differently scaled scores. The leaderboard is recomputed upon updating submitted runs from OpenAI repo.