Skip to content

Releases: brightdata/haystack-brightdata

v0.1.0

04 Jan 14:25

Choose a tag to compare

🎉 Initial Release

We're excited to announce the first official release of haystack-brightdata - the official integration between Haystack and Bright Data's powerful web scraping and data extraction platform!

✨ Features

Components

This release includes three production-ready Haystack components:

BrightDataSERP

  • Execute search queries across multiple search engines (Google, Bing, Yahoo, etc.)
  • Geo-targeting support (200+ countries)
  • Multiple search types: web, images, news, shopping, jobs
  • Device type simulation (desktop, mobile, iOS, Android)
  • Parsed JSON results by default for easy integration
  • Configurable result count (up to 100)

BrightDataUnlocker

  • Access geo-restricted and bot-protected websites
  • Automatic CAPTCHA solving
  • Bypass anti-bot measures
  • Multiple output formats: HTML, Markdown, Screenshot
  • Country-specific access for geo-spoofing

BrightDataWebScraper

  • Extract structured data from 43+ supported websites
  • Categories include:
    • E-commerce (10 datasets): Amazon, Walmart, eBay, Home Depot, Zara, Etsy, Best Buy
    • LinkedIn (5 datasets): Person profiles, Company profiles, Jobs, Posts, People Search
    • Social Media (16 datasets): Instagram, Facebook, TikTok, YouTube, X/Twitter, Reddit
    • Business Intelligence (2 datasets): Crunchbase, ZoomInfo
    • Search & Commerce (6 datasets): Google Maps, Google Shopping, App Stores, Zillow, Booking.com
    • Other (4 datasets): GitHub, Yahoo Finance, Reuters
  • Automatic async polling for long-running scrape jobs
  • Dataset-specific parameter validation
  • Helper methods for dataset discovery

Developer Experience

  • Seamless Haystack Integration: All components work natively with Haystack 2.0+ pipelines
  • Flexible Authentication: API key via environment variable or direct parameter
  • Comprehensive Documentation: Detailed docstrings, README, and example scripts
  • Type Safety: Full type hints and mypy support
  • Production Ready: Formatted with Black, linted with Ruff, tested with pytest

📦 Installation

pip install haystack-brightdata

🚀 Quick Start

from haystack_brightdata import BrightDataSERP

# Initialize and run
serp = BrightDataSERP()
result = serp.run(query="Haystack AI framework", num_results=10)
print(result["results"])  # Parsed JSON

📚 Examples

The release includes 4 example scripts:

  • example_serp.py - SERP API usage examples
  • example_unlocker.py - Web Unlocker usage examples
  • example_scraper.py - Web Scraper for all 43+ datasets
  • example_pipeline.py - Haystack pipeline integration

📋 Requirements

  • Python >= 3.8
  • haystack-ai >= 2.0.0
  • pydantic >= 2.0.0
  • requests >= 2.28.0
  • aiohttp >= 3.8.0

Code Quality

  • Black formatted (line length: 100)
  • Ruff linted (zero warnings)
  • Mypy type checked
  • Full Pydantic validation

Highlights

  • Zero Breaking Changes: This is the first stable release
  • Production Tested: All components tested with live Bright Data API
  • Enterprise Ready: Supports corporate proxies and custom SSL certificates
  • Async Support: Built-in async methods for high-performance applications

Documentation

🙏 Acknowledgments

Built with ❤️ for Haystack by deepset, powered by Bright Data.

Special thanks to the Bright Data team for their excellent API and documentation.

Notes

  • Bright Data Subscription Required: You need a valid Bright Data account and API key to use this package
  • Rate Limits: Subject to your Bright Data subscription plan limits
  • API Compatibility: Tested with Bright Data API v3

Support


Full Changelog: Initial release (v0.1.0)

Install: pip install haystack-brightdata