This project tries to make fetching and parsing RSS feeds easier. With Hera RSS you can discover, fetch and parse RSS feeds.
- simply run
composer require kaishiyoku/hera-rss-crawler - create a new crawler instance using
$heraRssCrawler = new HeraRssCrawler() - discover a feed, for example
$feedUrls = $heraRssCrawler->discoverFeedUrls('https://laravel-news.com/') - pick the feed you like to use; if there were multiple feeds discovered pick one
- fetch the feed:
$feed = $heraRssCrawler->parseFeed($feedUrls->get(0)) - fetch the articles:
$feedItems = $feed->getFeedItems()
- dropped support for PHP 8.0
- dropped support for PHP 7.4
- dropped support for Laravel 8
- FeedItem-method
jsonSerializehas been renamed totoJsonand doesn't returnnullanymore but throws aJsonExceptionif the serialized JSON is invalid.
setRetryCount(int $retryCount): voidDetermines how many retries parsing or discovering feeds will be made when an exception occurs, e.g. if the feed was unreachable.
setLogger(LoggerInterface $logger): voidSet your own logger instance, e.g. a simple file logger.
setUrlReplacementMap(array $urlReplacementMap): voidUseful for websites which redirect to another subdomain when visiting the site, e.g. for Reddit.
setFeedDiscoverers(Collection $feedDiscoverers): voidWith that you can set your own feed discoverers.
You can even write your own, just make sure to implement the FeedDiscoverer interface:
<?php
namespace Kaishiyoku\HeraRssCrawler\FeedDiscoverers;
use GuzzleHttp\Client;
use Illuminate\Support\Arr;
use Illuminate\Support\Collection;
use Illuminate\Support\Str;
use Kaishiyoku\HeraRssCrawler\Models\ResponseContainer;
/**
* Discover feed URL by parsing a direct RSS feed url.
*/
class FeedDiscovererByContentType implements FeedDiscoverer
{
public function discover(Client $httpClient, ResponseContainer $responseContainer): Collection
{
$contentTypeMixedValue = Arr::get($responseContainer->getResponse()->getHeaders(), 'Content-Type');
$contentType = is_array($contentTypeMixedValue) ? Arr::first($contentTypeMixedValue) : $contentTypeMixedValue;
// the given url is no valid RSS feed
if (!$contentType || !Str::startsWith($contentType, ['application/rss+xml', 'application/atom+xml'])) {
return new Collection();
}
return new Collection([$responseContainer->getRequestUrl()]);
}
}The default feed discoverers are as follows:
new Collection([
new FeedDiscovererByContentType(),
new FeedDiscovererByHtmlHeadElements(),
new FeedDiscovererByHtmlAnchorElements(),
new FeedDiscovererByFeedly(),
])The ordering is important here because the discoverers will be called sequentially until at least one feed URL has been found and then stops.
That means that once the discoverer found a feed remaining discoverers won't be called.
If you want to mainly discover feeds by using HTML anchor elements,
the FeedDiscovererByHtmlAnchorElements discoverer should be the first discoverer
in the collection.
parseFeed(string $url): ?FeedSimply fetch and parse the feed of a given feed url. If no consumable RSS feed is being found null is being returned.
discoverAndParseFeeds(string $url): CollectionDiscover feeds from a website url and return all parsed feeds in a collection.
discoverFeedUrls(string $url): CollectionDiscover feeds from a website url and return all found feed urls in a collection. There are multiple ways the crawler tries to discover feeds. The order is as follows:
- discover feed urls by content type
if the given url is already a valid feed return this url - discover feed urls by HTML head elements
find all feed urls inside a HTML document - discover feed urls by HTML anchor elements
get all anchor elements of a HTML element and return the urls of those which includerssin its urls - discover feed urls by Feedly
fetch feed urls using the Feedly API
discoverFavicon(string $url): ?stringFetch the favicon of the feed's website. If none is found then null is being returned.
checkIfConsumableFeed(string $url): boolCheck if a given url is a consumable RSS feed.
Found any issues or have an idea to improve the crawler? Feel free to open an issue or submit a pull request.
- add a Laravel facade
Email: dev@andreas-wiedel.de
Website: https://andreas-wiedel.de