Releases: everruns/fetchkit
Releases · everruns/fetchkit
Release v0.3.0
Highlights
- Hardened outbound fetch policy across every built-in fetcher: SSRF safeguards, body-size caps, redirect re-validation, and symlink-escape protection
- Bot-auth headers are re-signed on every redirect hop so authenticated fetches survive policy-validated hops
- Enhanced fetchers: YouTube transcript extraction, Wikipedia redirect resolution, HackerNews timestamp display, ArXiv PDF binary indication, RSS content-type detection with HTML-to-Markdown
- Tool JSON contract now exposes conditional-fetch and quality fields (word count, redirect chain, paywall) in the output schema
- Dependency refresh including major bumps for the optional
bot-authfeature (sha20.11,rand0.10)
What's Changed
- fix(cli): quote YAML frontmatter scalar values (#132)
- fix(fetchers): enforce policy for GitHub API subrequests (#131)
- fix(ci): avoid shell interpolation of release tags (#130)
- fix(tool): align JSON contract with conditional fetch fields (#129)
- fix(tool): include quality fields in output schema (#128)
- fix(ci): pin maturin version in python workflow (#127)
- fix(docs): avoid printing GITHUB_TOKEN in cloud quickcheck (#126)
- fix(fetchkit): tighten content-type checks for markdown and text (#125)
- fix(fetchers): enforce twitter fetch hardening limits (#124)
- fix(fetchkit): re-sign bot-auth headers on redirect hops (#123)
- fix(fetchers): enforce policy on GitHub API redirect target (#122)
- fix(fetchers): enforce max_body_size in GitHub issue fetcher (#121)
- fix(fetchers): enforce SSRF safeguards in StackOverflow API fetcher (#120)
- fix(fetchers): enforce body size limits for registry JSON (#119)
- fix(fetchers): enforce body size limits in wikipedia fetcher (#118)
- fix(fetchers): enforce fetch options on YouTube secondary requests (#117)
- fix(fetchers): harden arxiv fetcher input and body limits (#116)
- fix(fetchers): avoid utf-8 panic in hn html stripping (#115)
- fix(convert): avoid unicode offset panic in attribute extraction (#114)
- fix(client): cap batch fetch concurrency (#112)
- fix(ci): bind publish workflow to release tag (#111)
- fix(fetchers): harden youtube transcript handling (#110)
- fix(fetchers): bound HN timestamp formatting (#109)
- fix(ci): pin publish workflow actions in secret-bearing jobs (#108)
- fix(fetchers): enforce RSS body size and timeout limits (#107)
- chore(deps): apply available major bumps (sha2 0.11, rand 0.10) and tighten maintenance spec (#106)
- chore: periodic maintenance — deps refresh and spec/doc alignment (#105)
- fix(fetchers): surface malformed body errors (#104)
- fix(file-saver): block symlink escapes on save (#103)
- fix(python): preserve hardened redirect policy (#102)
- fix(fetchers): cap direct llms bodies (#101)
- fix(fetchers): enforce docs site outbound policy (#100)
- fix(fetchers): enforce rss feed outbound policy (#99)
- docs(readme): list built-in fetchers (#92)
- feat(fetchers): enhance RSSFeedFetcher with content-type detection and html_to_markdown (#91)
- feat(fetchers): enhance HackerNewsFetcher with timestamp display (#90)
- feat(fetchers): enhance ArXivFetcher with PDF binary indication (#89)
- feat(fetchers): enhance YouTubeFetcher with transcript extraction (#88)
- feat(fetchers): enhance WikipediaFetcher with redirect resolution (#87)
- fix(ci): trigger publish workflow explicitly from release (#86)
Full Changelog: v0.2.0...v0.3.0
Release v0.2.0
Highlights
- Pluggable fetchers for GitHub, Wikipedia, YouTube, ArXiv, StackOverflow, HackerNews, RSS, package registries, docs sites, and Twitter
- Batch fetching for concurrent multi-URL requests
- Content-focused extraction with boilerplate stripping and structured metadata
- Conditional fetching with ETag and If-Modified-Since support
- Improved HTML-to-Markdown conversion quality
- Content quality signals: word count, redirect chain, paywall detection
- Optional Web Bot Authentication support
- Hardened outbound fetch policy with proxy isolation and SSRF mitigations
- Live integration test suite behind feature flag
Breaking Changes
- Ambient proxy environment variables are now ignored by default; set them explicitly if needed
What's Changed
- test(fetchers): add live integration tests behind feature flag (#84)
- chore: periodic maintenance — deps update and spec sync (#83)
- feat(fetch): add content quality signals (word_count, redirect_chain, is_paywall) (#82)
- feat(client): add batch_fetch for concurrent multi-URL fetching (#81)
- feat(fetch): add conditional fetching with ETag and If-Modified-Since (#80)
- feat(convert): improve HTML-to-Markdown conversion quality (#79)
- feat(convert): add content-focused extraction with boilerplate stripping (#78)
- feat(convert): add structured metadata extraction from HTML pages (#77)
- feat(fetchers): add RSSFeedFetcher for structured feed parsing (#70)
- feat(fetchers): add HackerNewsFetcher for structured thread extraction (#69)
- feat(fetchers): add ArXivFetcher for paper metadata and abstract (#68)
- feat(fetchers): add YouTubeFetcher for video metadata extraction (#67)
- feat(fetchers): add WikipediaFetcher for article extraction (#66)
- feat(fetchers): add PackageRegistryFetcher for PyPI, crates.io, npm (#65)
- feat(fetchers): add StackOverflowFetcher for clean Q&A extraction (#64)
- feat(fetchers): add DocsSiteFetcher with llms.txt support (#63)
- feat(fetchers): add GitHubCodeFetcher for source file fetching (#62)
- feat(fetchers): add GitHubIssueFetcher for structured issue/PR fetching (#61)
- feat: add process-issues skill for e2e GitHub issue resolution (#60)
- feat: add optional Web Bot Authentication support (#49)
- feat(fetchers): add TwitterFetcher for tweet URL handling (#47)
- feat: skip HTML conversion for non-HTML responses (#48)
- chore(deps): update workspace dependencies and fix flaky proxy tests (#46)
- feat(toolkit): align fetchkit with toolkit library contract (#45)
- fix(security): harden outbound fetch policy (#43)
- docs: clarify latest-main requirement for worktrees (#44)
- fix(security): isolate proxy env in shared runtimes (#42)
- fix(security): block IPv4-compatible and 6to4 IPv6 addresses in SSRF protection (#41)
- fix(security): sanitize reqwest error messages to prevent hostname leakage (#40)
- fix: resolve threat model issues (#37)
Full Changelog: v0.1.3...v0.2.0
Release v0.1.3
Highlights
- Hardened redirect handling to revalidate every hop against FetchKit's SSRF policy
- Tightened allow/block prefix matching to use parsed URL components instead of raw string prefixes
- Added FileSaver trait for saving fetched content to files
- Mitigated 6 open threats from threat model
- Added CLI integration tests and doc tests
What's Changed
- fix(security): harden redirect validation and URL policy matching (#23)
- fix(security): mitigate 6 open threats from threat model (#24)
- fix(cli): disable bin rustdoc to avoid doc collision (#25)
- feat: add FileSaver trait for saving fetched content to files (#27)
- fix(ci): replace external HTTP calls with wiremock in fetch_urls example (#29)
- test: add CLI integration tests, doc tests, Python example, and CI improvements (#31)
- docs: add cargo install from crates.io to README (#22)
- docs: remove duplicate release-process from public docs (#30)
- docs: add git user config requirement to attribution section (#32)
- ci: adopt bashkit release process (#26)
- feat(skills): add /processing-issues skill (#28)
- feat: add /ship command and .agents symlinks (#21)
- chore: add Doppler secrets management and cloud init script (#20)
- chore: add attribution settings and agent attribution policy (#19)
Full Changelog: v0.1.2...v0.1.3
Release v0.1.2
Highlights
- Added SSRF protection with safe-by-default DNS resolution policy
- Private/reserved IP ranges are now blocked by default to prevent server-side request forgery
What's Changed
- feat(security)!: add SSRF protection with safe-by-default DNS policy (#17)
Full Changelog: v0.1.1...v0.1.2
Release v0.1.1
Highlights
- Updated dependencies to latest versions
- Added maintenance spec for periodic upkeep
- Documentation improvements
What's Changed
Full Changelog: v0.1.0...v0.1.1
Release v0.1.0
Highlights
- AI-friendly web content fetching with HTML-to-Markdown and HTML-to-Text conversion
- CLI and MCP server for AI tool integration
- Pluggable fetcher system for URL-specific handling
- Python bindings via PyO3
What's Changed
- feat: add pluggable fetcher system for URL-specific handling (#9) by @chaliy
- docs: add LangChain example for MCP integration (#8) by @chaliy
- refactor(cli): unified md-first output format (#7) by @chaliy
- docs: clarify test classification in AGENTS.md (#6) by @chaliy
- docs: add cloud agent env and complete AGENTS.md placeholders (#5) by @chaliy
- refactor: rename project from webfetch to fetchkit (#4) by @chaliy
- docs: add comprehensive README with installation and usage guide (#3) by @chaliy
- feat: implement webfetch library, CLI, MCP server, and Python bindings (#1) by @chaliy
- feat: add initial webfetch spec and guidance by @chaliy
Full Changelog: https://github.com/everruns/fetchkit/commits/v0.1.0