This is a quick script which reads a standard RSS feed and then grabs the response header for each feed item and makes a log file of 404 errors. Useful in debugging sticky situations.
[code language=”php”]
<?php
$link_queue = array();
$exception_queue = array();
$timestamp = date( ‘Y-m-d H:i:s’ );
$rss = simplexml_load_file( $argv[1] );
foreach ( $rss->channel->item as $item )
array_push( $link_queue, $item->link );
foreach ( $link_queue as $url ) {
$handle = curl_init( $url );
curl_setopt( $handle, CURLOPT_RETURNTRANSFER, TRUE );
$response = curl_exec( $handle );
$response_code = curl_getinfo( $handle, CURLINFO_HTTP_CODE );
if ( $response_code !== 200 )
array_push( $exception_queue, "{$url} returned status code {$response_code}" );
curl_close( $handle );
}
if ( count( $exception_queue ) ) {
foreach ( $exception_queue as $e )
echo $timestamp . ‘ > ‘ . $e . "n";
} else {
echo $timestamp . " > All URLs returned status code 200n";
}
die();
// omit
[/code]
Throw it in cron like this:
*/30 * * * * php /path/to/script.php http://somesite.com/feed/ >> /path/to/log 2>&1
Or run it on the command line like this:
php /path/to/the/script.php http://somesite.com/feed/
Leave a Reply