Releases: brightdata/haystack-brightdata
Releases · brightdata/haystack-brightdata
v0.1.0
🎉 Initial Release
We're excited to announce the first official release of haystack-brightdata - the official integration between Haystack and Bright Data's powerful web scraping and data extraction platform!
✨ Features
Components
This release includes three production-ready Haystack components:
BrightDataSERP
- Execute search queries across multiple search engines (Google, Bing, Yahoo, etc.)
- Geo-targeting support (200+ countries)
- Multiple search types: web, images, news, shopping, jobs
- Device type simulation (desktop, mobile, iOS, Android)
- Parsed JSON results by default for easy integration
- Configurable result count (up to 100)
BrightDataUnlocker
- Access geo-restricted and bot-protected websites
- Automatic CAPTCHA solving
- Bypass anti-bot measures
- Multiple output formats: HTML, Markdown, Screenshot
- Country-specific access for geo-spoofing
BrightDataWebScraper
- Extract structured data from 43+ supported websites
- Categories include:
- E-commerce (10 datasets): Amazon, Walmart, eBay, Home Depot, Zara, Etsy, Best Buy
- LinkedIn (5 datasets): Person profiles, Company profiles, Jobs, Posts, People Search
- Social Media (16 datasets): Instagram, Facebook, TikTok, YouTube, X/Twitter, Reddit
- Business Intelligence (2 datasets): Crunchbase, ZoomInfo
- Search & Commerce (6 datasets): Google Maps, Google Shopping, App Stores, Zillow, Booking.com
- Other (4 datasets): GitHub, Yahoo Finance, Reuters
- Automatic async polling for long-running scrape jobs
- Dataset-specific parameter validation
- Helper methods for dataset discovery
Developer Experience
- Seamless Haystack Integration: All components work natively with Haystack 2.0+ pipelines
- Flexible Authentication: API key via environment variable or direct parameter
- Comprehensive Documentation: Detailed docstrings, README, and example scripts
- Type Safety: Full type hints and mypy support
- Production Ready: Formatted with Black, linted with Ruff, tested with pytest
📦 Installation
pip install haystack-brightdata🚀 Quick Start
from haystack_brightdata import BrightDataSERP
# Initialize and run
serp = BrightDataSERP()
result = serp.run(query="Haystack AI framework", num_results=10)
print(result["results"]) # Parsed JSON📚 Examples
The release includes 4 example scripts:
example_serp.py- SERP API usage examplesexample_unlocker.py- Web Unlocker usage examplesexample_scraper.py- Web Scraper for all 43+ datasetsexample_pipeline.py- Haystack pipeline integration
📋 Requirements
- Python >= 3.8
- haystack-ai >= 2.0.0
- pydantic >= 2.0.0
- requests >= 2.28.0
- aiohttp >= 3.8.0
Code Quality
- Black formatted (line length: 100)
- Ruff linted (zero warnings)
- Mypy type checked
- Full Pydantic validation
Highlights
- Zero Breaking Changes: This is the first stable release
- Production Tested: All components tested with live Bright Data API
- Enterprise Ready: Supports corporate proxies and custom SSL certificates
- Async Support: Built-in async methods for high-performance applications
Documentation
- README - Complete usage guide
- PLAN.md - Development roadmap (210 tasks completed!)
- Examples - 4 working example scripts
- API Reference - Full component documentation
🙏 Acknowledgments
Built with ❤️ for Haystack by deepset, powered by Bright Data.
Special thanks to the Bright Data team for their excellent API and documentation.
Notes
- Bright Data Subscription Required: You need a valid Bright Data account and API key to use this package
- Rate Limits: Subject to your Bright Data subscription plan limits
- API Compatibility: Tested with Bright Data API v3
Support
- Issues: GitHub Issues
- Documentation: README
- Bright Data Support: support@brightdata.com
Full Changelog: Initial release (v0.1.0)
Install: pip install haystack-brightdata