A little PHP script to watch for 404 errors

This is a quick script which reads a standard RSS feed and then grabs the response header for each feed item and makes a log file of 404 errors. Useful in debugging sticky situations.

<?php 

$link_queue = array(); 
$exception_queue = array(); 

$timestamp = date( 'Y-m-d H:i:s' ); 
$rss = simplexml_load_file( $argv[1] ); 

foreach ( $rss-&gt;channel-&gt;item as $item )
  array_push( $link_queue, $item-&gt;link );

foreach ( $link_queue as $url ) {
  $handle = curl_init( $url );
  curl_setopt( $handle, CURLOPT_RETURNTRANSFER, TRUE );
  $response = curl_exec( $handle );
  $response_code = curl_getinfo( $handle, CURLINFO_HTTP_CODE );

  if ( $response_code !== 200 )
    array_push( $exception_queue, &quot;{$url} returned status code {$response_code}&quot; );

  curl_close( $handle );
}

if ( count( $exception_queue ) ) {
  foreach ( $exception_queue as $e )
    echo $timestamp . ' &gt; ' . $e . &quot;n&quot;;
} else {
  echo $timestamp . &quot; &gt; All URLs returned status code 200n&quot;;
}

die();

// omit

Throw it in cron like this:
*/30 * * * * php /path/to/script.php http://somesite.com/feed/ >> /path/to/log 2>&1

Or run it on the command line like this:
php /path/to/the/script.php http://somesite.com/feed/

Leave a Reply

%d bloggers like this: