Upgini • Data search & enrichment for Machine Learning and AI

🔍 Upgini • Intelligent data search & enrichment for Machine Learning and AI

Easily find and add relevant features to your ML & AI pipeline from hundreds of public, community and premium external data sources, including open & commercial LLMs

🚀 Awesome features of Upgini Python Library

⭐️ Automatically find only relevant features that give accuracy improvement for ML model. Not just correlated with target variable
⭐️ Automated feature generation from the sources: feature generation with Large Language Models' data augmentation, RNNs, GraphNN; multiple data source ensembling
⭐️ Automatic search key augmentation from all connected sources. If you do not have all search keys in your search request, such as postal/zip code, Upgini will try to add those keys based on the provided set of search keys. This will broaden the search across all available data sources
⭐️ Calculate accuracy metrics and uplifts after enrichment existing ML model with external features
⭐️ Check the stability of accuracy gain from external data on out-of-time intervals and verification datasets. Mitigate risks of unstable external data dependencies in ML pipeline
⭐️ Easy to use - single request to enrich training dataset with all of the keys at once: date/datetime, country, postal/ZIP code, country, phone number, hashed email/HEM, IP-address
⭐️ Simple Drag & Drop Search UI:

📊 Total: 239 countries and up to 41 years of history

Data source	Countries	History, years	# sources for ensemble	Update	Search keys	API Key required
Historical weather & Climate normals	68	22	-	Monthly	date, country, postal/ZIP code	No
Location/Places/POI/Area/Proximity information from OpenStreetMap	221	2	-	Monthly	date, country, postal/ZIP code	No
International holidays & events, Workweek calendar	232	22	-	Monthly	date, country	No
Consumer Confidence index	44	22	-	Monthly	date, country	No
World economic indicators	191	41	-	Monthly	date, country	No
Markets data	-	17	-	Monthly	date, datetime	No
World mobile & fixed broadband network coverage and perfomance	167	-	3	Monthly	country, postal/ZIP code	No
World demographic data	90	-	2	Annual	country, postal/ZIP code	No
World house prices	44	-	3	Annual	country, postal/ZIP code	No
Public social media profile data	104	-	-	Monthly	date, email/HEM, phone	Yes
Car ownership data and Parking statistics	3	-	-	Annual	country, postal/ZIP code, email/HEM, phone	Yes
Geolocation profile for phone & IPv4 & email	239	-	6	Monthly	date, email/HEM, phone, IPv4	Yes

👉 Details on datasets and features

🏆 MLE-Bench Tabular

We maintain a fork of MLE-bench that compares agent performance on tabular data. It uses exactly the same setup as the original benchmark from OpenAI and differs just in the leaderboard view. We focus on tabular tasks and use normalized score instead of medal percentage to compare differently scaled scores. The leaderboard is recomputed upon updating submitted runs from OpenAI repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgini • Data search & enrichment for Machine Learning and AI

🔍 Upgini • Intelligent data search & enrichment for Machine Learning and AI

🚀 Awesome features of Upgini Python Library

📊 Total: 239 countries and up to 41 years of history

🏆 MLE-Bench Tabular

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!