This is a quick script which reads a standard RSS feed and then grabs the response header for each feed item and makes a log file of 404 errors. Useful in debugging sticky situations.
<?php $link_queue = array(); $exception_queue = array(); $timestamp = date( 'Y-m-d H:i:s' ); $rss = simplexml_load_file( $argv[1] ); foreach ( $rss->channel->item as $item ) array_push( $link_queue, $item->link ); foreach ( $link_queue as $url ) { $handle = curl_init( $url ); curl_setopt( $handle, CURLOPT_RETURNTRANSFER, TRUE ); $response = curl_exec( $handle ); $response_code = curl_getinfo( $handle, CURLINFO_HTTP_CODE ); if ( $response_code !== 200 ) array_push( $exception_queue, "{$url} returned status code {$response_code}" ); curl_close( $handle ); } if ( count( $exception_queue ) ) { foreach ( $exception_queue as $e ) echo $timestamp . ' > ' . $e . "n"; } else { echo $timestamp . " > All URLs returned status code 200n"; } die(); // omit
Throw it in cron like this:
*/30 * * * * php /path/to/script.php http://somesite.com/feed/ >> /path/to/log 2>&1
Or run it on the command line like this:
php /path/to/the/script.php http://somesite.com/feed/