From: Alan Stern <stern@rowland.harvard.edu>
Date: Wed, 30 Mar 2005 20:05:45 +0000 (-0500)
Subject: [SCSI] return success after retries in scsi_eh_tur
X-Git-Tag: v2.6.14-rc1~522^2^2~5
X-Git-Url: http://pilppa.com/gitweb/?a=commitdiff_plain;h=e47373ec1c9aab9ee134f4e2b8249957e9f4c7ef;p=linux-2.6-omap-h63xx.git

[SCSI] return success after retries in scsi_eh_tur

The problem lies in the way the error handler uses TEST UNIT READY to
tell whether error recovery has succeeded.  The scsi_eh_tur function
gives up after one round of retrying; after that it decides that more
error recovery is needed.

However TUR is liable to report sense data indicating a retry is needed
when in fact error recovery has succeeded.  A typical example might be
SK=2, ASC=4, ASCQ=1 (Logical unit in process of becoming ready).  The mere
fact that we were able to get a sensible reply to the TUR should indicate
that the device is working well enough to stop error recovery.

I ran across a case back in January where this happened.  A CD-ROM drive
timed out the INQUIRY command, and a device reset fixed the blockage.
But then the drive kept responding with 2/4/1 -- because it was spinning
up I suppose -- until the error handler gave up and placed it offline.
If the initial INQUIRY had received the 2/4/1 instead, everything would
have worked okay.  It doesn't seem reasonable for things to fail just
because the error handler had started running.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
---

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index e9c451ba71f..688bce74078 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -776,9 +776,11 @@ retry_tur:
 		__FUNCTION__, scmd, rtn));
 	if (rtn == SUCCESS)
 		return 0;
-	else if (rtn == NEEDS_RETRY)
+	else if (rtn == NEEDS_RETRY) {
 		if (retry_cnt--)
 			goto retry_tur;
+		return 0;
+	}
 	return 1;
 }