Separate ATA_EHI_DID_RESET into ATA_EHI_DID_SOFTRESET and
ATA_EHI_DID_HARDRESET. ATA_EHI_DID_RESET is redefined as OR of the
two flags. This patch doesn't introduce any behavior change. This
will be used later to determine whether _SDD is necessary or not.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
->cable_detect() used to be called on by the old ata_bus_probe() path.
Add invocation to ata_eh_revalidate_and_attach() right after IDENTIFYs
are done.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
->post_internal_cmd is simplified EH for internal commands. Its
primary mission is to stop the controller such that no rogue memory
access or other activities occur after the internal command is
released. It may provide error diagnostics by setting qc->err_mask
but this hasn't been a requirement.
To ignore SETXFER failure for CFA devices, libata needs to know
whether a command was failed by the device or for any other reason.
ie. internal command needs to get AC_ERR_DEV right.
This patch makes the following changes to AC_ERR_DEV handling and
->post_internal_cmd semantics to accomodate this need and simplify
callback implementation.
1. As long as the correct bits in the result TF registers are set,
there is no need to set AC_ERR_DEV explicitly. libata EH core
takes care of that for both normal and internal commands.
2. The only requirement for ->post_internal_cmd() is to put the
controller into quiescent state. It needs not to set any err_mask.
3. ata_exec_internal_sg() performs minimal error analysis such that
AC_ERR_DEV is automatically set as long as result_tf is filled
correctly.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
There was a rare report where SB600 reported SERR_INTERNAL and SRST
couldn't get it out of the failure mode. Hardreset on SERR_INTERNAL.
As the problem is intermittent, whether this fixes the problem or not
hasn't been verified yet, but hardresetting the channel on internal
error is a good idea anyway.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
patch 2/4:
Clear tf before doing request sense.
This fixes the AOpen 56X/AKH timeout problem.
(http://bugzilla.kernel.org/show_bug.cgi?id=8244)
Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
For drive side cable detection to work correctly, drives need to be
identified backwards such that the slave device releases PDIAG- before
the mater drive tries to detect cable type. ata_bus_probe() was fixed
by commit f31f0cc2f0 but the new EH path
wasn't fixed. This patch makes new EH path do IDENTIFY backwards.
ata_dev_configure() for new devices are still performed master first.
This is to keep the detection messages in forward order.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
->prereset() returns -ENOENT to tell libata that the port is empty and
reset sequencing should be stopped. This is not an error condition.
Update ata_eh_reset() such that it sets device classes to ATA_DEV_NONE
and return success in on -ENOENT. This makes spurious error message
go away.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Conditionalize all PM related stuff in libata core layer using
CONFIG_PM.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_port has two different id fields - id and port_no. id is
system-wide 1-based unique id for the port while port_no is 0-based
host-wide port number. The former is primarily used to identify the
ATA port to the user in printk messages while the latter is used in
various places in libata core and LLDs to index the port inside the
host.
The two fields feel quite similar and sometimes ap->id is used in
place of ap->port_no, which is very difficult to spot. This patch
renames ap->id to ap->print_id to reduce the possibility of such bugs.
Some printk messages are adjusted such that id string (ata%u[.%u])
isn't printed twice and/or to use ata_*_printk() instead of hardcoded
id format.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The current EH speed down code is more of a proof that the EH
framework is capable of adjusting transfer speed in response to error.
This patch puts some intelligence into EH speed down sequence. The
rules are..
* If there have been more than three timeout, HSM violation or
unclassified DEV errors for known supported commands during last 10
mins, NCQ is turned off.
* If there have been more than three timeout or HSM violation for known
supported command, transfer mode is slowed down. If DMA is active,
it is first slowered by one grade (e.g. UDMA133->100). If that
doesn't help, it's slowered to 40c limit (UDMA33). If PIO is
active, it's slowered by one grade first. If that doesn't help,
PIO0 is forced. Note that this rule does not change transfer mode.
DMA is never degraded into PIO by this rule.
* If there have been more than ten ATA bus, timeout, HSM violation or
unclassified device errors for known supported commands && speeding
down DMA mode didn't help, the device is forced into PIO mode. Note
that this rule is considered only for PATA devices and is pretty
difficult to trigger.
One error can only trigger one rule at a time. After a rule is
triggered, error history is cleared such that the next speed down
happens only after some number of errors are accumulated. This makes
sense because now speed down is done in bigger stride.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* Move forcing device to PIO0 on device disable into
ata_dev_disable(). This makes both old and new EHs act the same
way.
* Speed down only PIO mode on probe failure. All commands used during
probing are PIO commands. There's no point in speeding down DMA.
* Retry at least once after -ENODEV. Some devices report garbled
IDENTIFY data after certain events. This shouldn't cause device
detach and re-attach.
* Rearrange EH failure path for simplicity.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Make ata_down_xfermask_limit() accept @sel instead of @force_pio0.
@sel selects how the xfermask limit will be adjusted. The following
selectors are defined.
* ATA_DNXFER_PIO : only speed down PIO
* ATA_DNXFER_DMA : only speed down DMA, don't cause transfer mode change
* ATA_DNXFER_40C : apply 40c cable limit
* ATA_DNXFER_FORCE_PIO : force PIO
* ATA_DNXFER_FORCE_PIO0 : force PIO0 (same as original with @force_pio0 == 1)
* ATA_DNXFER_ANY : same as original with @force_pio0 == 0
Currently, only ANY and FORCE_PIO0 are used to maintain the original
behavior. Other selectors will be used later to improve EH speed down
sequence.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
libata used two separate sets of variables to record request size and
current offset for ATA and ATAPI. This is confusing and fragile.
This patch replaces qc->nsect/cursect with qc->nbytes/curbytes and
kills them. Also, ata_pio_sector() is updated to use bytes for
qc->cursg_ofs instead of sectors. The field used to be used in bytes
for ATAPI and in sectors for ATA.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_eh_suspend() was returning 0 regardless of failure. This bug has
potential to lose data on suspend. Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
libata EH ignores port-wide actions in per-dev action mask. However,
device resume requests EH_SOFTRESET using per-dev action mask. Under
certain circumstances, this results in not resetting frozen port after
resuming which causes failure of all commands.
This patch allows port-wide actions to be requested in per-dev action
mask. Before EH recovery starts, port-wide actions will be collected.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
libata switched to IRQ-driven IDENTIFY when IRQ-driven PIO was
introduced. This has caused a lot of problems including device
misdetection and phantom device.
ATA_FLAG_DETECT_POLLING was added recently to selectively use polling
IDENTIFY on problemetic drivers but many controllers and devices are
affected by this problem and trying to adding ATA_FLAG_DETECT_POLLING
for each such case is diffcult and not very rewarding.
This patch makes libata always use polling IDENTIFY. This is
consistent with libata's original behavior and drivers/ide's behavior.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
If EH command is issued to a frozen port, it fails with AC_ERR_SYSTEM.
libata used to request sense even when the port is frozen needlessly
adding AC_ERR_SYSTEM to err_mask. Don't do it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Improve failed qc reporting. The original message didn't include the
actual command nor full error status and it was necessary to
temporarily patch the code to find out exactly which command is
causing problem. This patch makes EH report full command and result
TFs along with data direction and length. This change will make bug
reports more useful.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
On some controllers (ICHs in piix mode), there is *NO* reliable way to
determine device presence other than issuing IDENTIFY and see how the
transaction proceeds by watching the TF status register.
libata acted this way before irq-pio and phantom devices caused very
little problem but now that IDENTIFY is performed using IRQ drive PIO,
such phantom devices now result in multiple 30sec timeouts during
boot.
This patch implements ATA_FLAG_DETECT_POLLING. If a LLD sets this
flag, libata core issues the initial IDENTIFY in polling mode and if
the initial data transfer fails w/ HSM violation, the port is
considered to be empty thus replicating the old libata and IDE
behavior.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Make ata_dev_read_id() take @flags instead of @post_reset. Currently
there is only one flag defined - ATA_READID_POSTRESET, which is
equivalent to @post_reset. This is preparation for polling presence
detection.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
libata EH used to perform ata_set_mode() iff the EH session performed
reset as indicated by ATA_EHI_DID_RESET. This is incorrect because
->dev_config() called by revalidation is allowed to modify transfer
mode which ata_set_mode() should take care of. This patch implements
the following two flags.
* ATA_EHI_SETMODE: set during EH to schedule ata_set_mode(). Both new
device attachment and revalidation set this flag.
* ATA_EHI_POST_SETMODE: set while the device is revalidated after
ata_set_mode(). Post-setmode revalidation is different from initial
configuaration and EH revalidation in that ->dev_config() is not
allowed tune transfer mode. LLD can use this flag to determine
whether it's allowed to tune transfer mode. Note that POST_SETMODE
->dev_config() is guaranteed to be preceded by non-POST_SETMODE
->dev_config().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Implement ehi flag ATA_EHI_PRINTINFO. This flag is set when device
configuration needs to print out device info. This used to be handled
by @print_info argument to ata_dev_configure() but LLDs also need to
know about it in ->dev_config() callback.
This patch replaces @print_info w/ ATA_EHI_PRINTINFO and make sata_sil
print workaround messages only on the initial configuration.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.
The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
architecture it's nearly 100 bytes in size. This reduces that by half for the
non-delayable type of event.
Signed-Off-By: David Howells <dhowells@redhat.com>
This removes the layering violation where drivers have to fiddle
directly with EH flags. Instead we now recognize -ENOENT means "no port"
and do the handling in the core code.
This also removes an instance of a call to disable the port, and an
identical printk from each driver doing this. Even better - future rule
changes will be in one place only.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The biggest change is that ata_host_set is renamed to ata_host.
* ata_host_set => ata_host
* ata_probe_ent->host_flags => ata_probe_ent->port_flags
* ata_probe_ent->host_set_flags => ata_probe_ent->_host_flags
* ata_host_stats => ata_port_stats
* ata_port->host => ata_port->scsi_host
* ata_port->host_set => ata_port->host
* ata_port_info->host_flags => ata_port_info->flags
* ata_(.*)host_set(.*)\(\) => ata_\1host\2()
The leading underscore in ata_probe_ent->_host_flags is to avoid
reusing ->host_flags for different purpose. Currently, the only user
of the field is libata-bmdma.c and probe_ent itself is scheduled to be
removed.
ata_port->host is reused for different purpose but this field is used
inside libata core proper and of different type.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Update ata_eh_about_to_do() and ata_eh_done() to improve EH action and
EHI flag handling.
* There are two types of EHI flags - one which expires on successful
EH and the other which expires on a successful reset. Make this
distinction clear.
* Unlike other EH actions, reset actions are represented by two EH
action masks and a EHI modifier. Implement correct about_to_do/done
semantics for resets. That is, prior to reset, related EH info is
sucked in from ehi and cleared, and after reset is complete, related
EH info in ehc is cleared.
These changes improve consistency and remove unnecessary EH actions
caused by stale EH action masks and EHI flags.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* (ata_dev_absent() || ata_dev_ready()) test doesn't indicate
SUSPENDED state properly. Fix it.
* Link resuming resets shouldn't be skipped. Don't skip recovery on
EHI_RESUME_LINK. This doesn't matter for host ports as EHI_RESUME
always coincides with EHI_HOTPLUGGED which makes attached disabled
devices vacant. However, PMP reset causes non-hotplug link-resuming
resets which shouldn't be skipped.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Commit 0662c58b32 updated
ata_eh_autopsy() to OR determined action to ehc->i.action to preserve
action mask set directly into ehc->i.action by nested functions. This
broke action mask clearing on SENSE_VALID case causing revalidation
and EH complete message on successful ATAPI CC.
This patch removes two local variables - action and failed_dev - which
cache ehc->i.action and ehc->i.dev respectively, and make the function
directly modify ehc->i.* fields to remove aliasing issues.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Reimplement controller-wide PM. ata_host_set_suspend/resume() are
defined to suspend and resume a host_set. While suspended, EHs for
all ports in the host_set are pegged using ATA_FLAG_SUSPENDED and
frozen.
Because SCSI device hotplug is done asynchronously against the rest of
libata EH and the same mutex is used when adding new device, suspend
cannot wait for hotplug to complete. So, if SCSI device hotplug is in
progress, suspend fails with -EBUSY.
In most cases, host_set resume is followed by device resume. As each
resume operation requires a reset, a single host_set-wide resume
operation may result in multiple resets. To avoid this, resume waits
upto 1 second giving PM to request resume for devices.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Implement two PM per-dev EH actions - ATA_EH_SUSPEND and
ATA_EH_RESUME. Each action puts the target device into suspended mode
and resumes from it respectively.
Once a device is put to suspended mode, no EH operations other than
RESUME is allowed on the device. The device will stay suspended till
it gets resumed and thus reset and revalidated. To implement this, a
new device state helper - ata_dev_ready() - is implemented and used in
EH action implementations to make them operate only on attached &
running devices.
If all possible devices on a port are suspended, reset is skipped too.
This prevents spurious events including hotplug events from disrupting
suspended devices.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Implement ATA_EHI_NO_AUTOPSY and QUIET. These used to be implied by
ATA_PFLAG_LOADING, but new power management and PMP support need to
use these separately. e.g. Suspend/resume operations shouldn't print
full EH messages and resume shouldn't be recorded as an error.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ap_lock was used because &ap->host_set->lock was too long and used a
lot. Now that &ap->host_set->lock is replaced with ap->lock, there's
no reason to keep ap_lock.
[ed. note: that's not entirely true. ap_lock is a local variable,
caching the results of a de-ref. In theory, if the compiler is smart
enough, this patch is cosmetic. However, since this is not a fast
path (it is the error path), this patch is nonetheless acceptable,
even though it _may_ introduce a performance regression.]
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ata_eh_autopsy() used to directly assign determined action mask to
ehc->i.action thus overriding actions set by some of nested analyze
functions. This patch makes ata_eh_autopsy() add action masks just as
it's done in other places.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ap->flags is way too clamped. Separate out core dynamic flags to
ap->pflags. ATA_FLAG_DISABLED is a dynamic flag but left alone as
it's referenced by a lot of LLDs and it's gonna be removed once all
LLDs are converted to new EH.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Clear related EH action on device detach such that new device doesn't
receive EH actions scheduled for the old one.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Implement and use ata_eh_dev_action() which returns EH action mask for
a device.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Prepare for changes required to support SATA devices
attached to SAS HBAs. For these devices we don't want to
use host_set at all, since libata will not be the owner
of struct scsi_host.
Signed-off-by: Brian King <brking@us.ibm.com>
(with slight merge modifications made by...)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Currently, the only per-dev EH action is REVALIDATE. EH used to
exploit ehi->dev to do selective revalidation on a ATA bus. However,
this is a bit hacky and makes it impossible to request selective
revalidation from outside of EH or add another per-dev EH action.
This patch adds per-dev EH action mask eh_info->dev_action[] and
update EH to use this field for REVALIDATE. Note that per-dev actions
can still be specified at port-level and it has the same effect of
specifying the action for all devices on the port.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ATA_EH_REVALIDATE should be cleared after all devices on the target
port have been revalidated. Fix ata_eh_revalidate_and_attach()
accordingly.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
With ops->probe_init() gone, no user is left in libata-core.c. Move
ata_do_reset() to libata-eh.c and make it static.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Update unload unplug - driver unloading / PCI removal. This is done
by ata_port_detach() which short-circuits EH, disables all devices and
freezes the port. With this patch, EH and unloading/unplugging are
properly synchronized.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Implement bootplug - boot probing via hotplug path. While loading,
ata_host_add() simply schedules probing and invokes EH. After EH
completes, ata_host_add() scans and assicates them with SCSI devices.
EH path is slightly modified to handle this (e.g. no autopsy during
bootplug). The SCSI part is left in ata_host_add() because it's
shared with legacy path and to keep probing order as before (ATA scan
all ports in host_set then attach all).
Signed-off-by: Tejun Heo <htejun@gmail.com>
Implement SCSI part of hotplug.
This must be done in a separate context as SCSI makes use of EH during
probing. SCSI scan fails silently if EH is in progress. In such
cases, libata pauses briefly and retries until every device is
attached.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Implement ATA part of hotplug. To avoid probing broken devices over
and over again, disabled devices are not automatically detached. They
are detached only if probing is requested for the device or the
associated port is offline. Also, to avoid infinite probing loop,
Each device is probed only once per EH run.
As SATA PHY status is fragile, devices are detached only after it has
used up its recovery chances unless explicitly requested by LLDD or
user (LLDD may request direct detach if, for example, it supports cold
presence detection).
Signed-off-by: Tejun Heo <htejun@gmail.com>