Skip to content
@upgini

Upgini • Data search & enrichment for Machine Learning and AI

Easily find and add relevant features to your ML & AI pipeline from hundreds of public, community and premium external data sources, including LLMs

Easily find and add relevant features to your ML & AI pipeline from hundreds of public, community and premium external data sources, including open & commercial LLMs

🚀 Awesome features of Upgini Python Library

⭐️ Automatically find only relevant features that give accuracy improvement for ML model. Not just correlated with target variable
⭐️ Automated feature generation from the sources: feature generation with Large Language Models' data augmentation, RNNs, GraphNN; multiple data source ensembling
⭐️ Automatic search key augmentation from all connected sources. If you do not have all search keys in your search request, such as postal/zip code, Upgini will try to add those keys based on the provided set of search keys. This will broaden the search across all available data sources
⭐️ Calculate accuracy metrics and uplifts after enrichment existing ML model with external features
⭐️ Check the stability of accuracy gain from external data on out-of-time intervals and verification datasets. Mitigate risks of unstable external data dependencies in ML pipeline
⭐️ Easy to use - single request to enrich training dataset with all of the keys at once: date/datetime, country, postal/ZIP code, country, phone number, hashed email/HEM, IP-address
⭐️ Simple Drag & Drop Search UI: upgini_data_search_for_ML

📊 Total: 239 countries and up to 41 years of history

Data source Countries History, years # sources for ensemble Update Search keys API Key required
Historical weather & Climate normals 68 22 - Monthly date, country, postal/ZIP code No
Location/Places/POI/Area/Proximity information from OpenStreetMap 221 2 - Monthly date, country, postal/ZIP code No
International holidays & events, Workweek calendar 232 22 - Monthly date, country No
Consumer Confidence index 44 22 - Monthly date, country No
World economic indicators 191 41 - Monthly date, country No
Markets data - 17 - Monthly date, datetime No
World mobile & fixed broadband network coverage and perfomance 167 - 3 Monthly country, postal/ZIP code No
World demographic data 90 - 2 Annual country, postal/ZIP code No
World house prices 44 - 3 Annual country, postal/ZIP code No
Public social media profile data 104 - - Monthly date, email/HEM, phone Yes
Car ownership data and Parking statistics 3 - - Annual country, postal/ZIP code, email/HEM, phone Yes
Geolocation profile for phone & IPv4 & email 239 - 6 Monthly date, email/HEM, phone, IPv4 Yes

👉 Details on datasets and features

We maintain a fork of MLE-bench that compares agent performance on tabular data. It uses exactly the same setup as the original benchmark from OpenAI and differs just in the leaderboard view. We focus on tabular tasks and use normalized score instead of medal percentage to compare differently scaled scores. The leaderboard is recomputed upon updating submitted runs from OpenAI repo.

Pinned Loading

  1. upgini upgini Public

    Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & co…

    Python 349 26

  2. mle-bench mle-bench Public

    Forked from openai/mle-bench

    MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

    Python 1

Repositories

Showing 7 of 7 repositories

Top languages

Loading…

Most used topics

Loading…