Reliability and availability

Subclass of:

714 - Error detection/correction and fault detection/recovery

714100000 - DATA PROCESSING SYSTEM ERROR OR FAULT HANDLING

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application number	Description	Number of patent applications / Date published
714002000	Fault recovery	4063
714025000	Fault locating (i.e., diagnosis or testing)	3110
714048000	Error detection or notification	1162
714470100	Performance monitoring for fault avoidance	329
714047000	Performance monitoring for fault avoidance	108

Document	Title	Date
Entries
20080201600	DATA PROTECTION METHOD OF STORAGE DEVICE - A data protection method of a storage device is provided. In the method, a system management interrupt program orders a hardware control unit to obtain a type and an address message of an error in a block in a first storage device, and stores the type and address message in a second storage device. An interrupt service routine (ISR) reads the type and address message of the error from the second storage device. The ISR orders an operating system to search for a block that may be accessed normally and not damaged in the first storage device, and sets the block as a reserved block. The ISR transmits the address message of the error to the OS, such that the OS copies the data in the block having the error to the reserved block, thereby increasing the available capacity of the storage device and improving the reliability of the computer.	08-21-2008
20080209253	SELECTION OF DATA ARRAYS - Provided are a method, system, and article of manufacture, wherein a plurality of data arrays coupled to a storage controller is maintained. Data arrays are selected from the plurality of data arrays based on predetermined selection rules. Data is stored redundantly in the selected data arrays by writing the data to the selected data arrays.	08-28-2008
20080222446	STATUS DISPLAY CONTROL APPARATUS - an apparatus comprises a data display unit which causes a display device to output display data that indicates a drawing screen complying with the display request, a reliability decision unit which decides a legality of a transmission source of the display request, and which makes an output request for information capable of confirming a reliability of the display data that the data display unit causes the display device to output, on the basis of a result of the decision, and an output unit which outputs the information capable of confirming the reliability of the display data as complies with the output request from the reliability decision unit, separately from the display data that is caused to be outputted by the data display unit.	09-11-2008
20080229139	Space-and Time- Adaptive Nonblocking Algorithms - We explore techniques for designing nonblocking algorithms that do not require advance knowledge of the number of processes that participate, whose time complexity and space consumption both adapt to various measures, rather than being based on predefined worst-case scenarios, and that cannot be prevented from future memory reclamation by process failures. These techniques can be implemented using widely available hardware synchronization primitives. We present our techniques in the context of solutions to the well-known Collect problem. We also explain how our techniques can be exploited to achieve other results with similar properties; these include long-lived renaming and dynamic memory management for nonblocking data structures.	09-18-2008
20080256383	METHOD AND SYSTEM OF PREDICTING MICROPROCESSOR LIFETIME - A method of predicting the lifetime reliability of an integrated circuit device with respect to one or more failure mechanisms includes breaking down the integrated circuit device into structures; breaking down each structure into elements and devices; evaluating each device to determine whether the device is vulnerable to the failure mechanisms and eliminating devices determined not to be vulnerable; estimating, for each determined vulnerable device, the impact of a failure of the device on the functionality of the specific element associated therewith, and classifying the failure into a fatal failure or a non-fatal failure, wherein a fatal failure causes the element employing the given device to fail; determining, for those devices whose failures are fatal, an effective stress degree and/or time; determining one or more of a failure rate and a probability of fatal failure for the devices, and aggregating the same across the structures and the failure mechanisms.	10-16-2008
20080294931	Assisted Problem Remediation - A method (which can be computer implemented) for assisted remediation of at least one problem with a computer system includes the steps of obtaining data from the computer system, the data being indicative of the at least one problem; hypothesizing at least a first candidate remediation process for the problem from among a plurality of annotated remediation process descriptions, based at least in part on the data; associating at least a first attribute with the at least first candidate remediation process; and facilitating presentation of the at least first candidate remediation process with the associated attribute to a remediation agent.	11-27-2008
20090006883	Software error report analysis - Described herein is technology for, among other things, accessing error report information. It involves various techniques and tools for analyzing and interrelating failure data contained in error reports and thereby facilitating developers to more easily and quickly solve programming bugs. Numerous parameters may also be specified for selecting and searching error reports. Several reliability metrics are provided to better track software reliability situations. The reliability metrics facilitate the tracking of the overall situation of failures that happen in the real word by providing metrics based on error reports (e.g., failure occurrence trends, failure distributions across different languages).	01-01-2009
20090013207	PREDICTING MICROPROCESSOR LIFETIME RELIABILITY USING ARCHITECTURE-LEVEL STRUCTURE-AWARE TECHNIQUES - A method of predicting the lifetime reliability of an integrated circuit device with respect to one or more failure mechanisms includes breaking down the integrated circuit device into structures; breaking down each structure into elements and devices; evaluating each device to determine whether the device is vulnerable to the failure mechanisms and eliminating devices determined not to be vulnerable; estimating, for each determined vulnerable device, the impact of a failure of the device on the functionality of the specific element associated therewith, and classifying the failure into a fatal failure or a non-fatal failure, wherein a fatal failure causes the element employing the given device to fail; determining, for those devices whose failures are fatal, an effective stress degree and/or time; determining one or more of a failure rate and a probability of fatal failure for the devices, and aggregating the same across the structures and the failure mechanisms.	01-08-2009
20090049328	Storage system and method of designing disaster recovery constitution - The present invention detects patterns that conform to the user conditions in cases where a disaster recovery constitution is constructed by connecting a plurality of sites. The design system is used in cases where the disaster recovery constitution is provided in a storage system. The site information acquisition section acquires information relating to the constitution in the sites and information relating to the connections between the sites, and stores the information in the site information table. The candidate pattern generation section generates candidate patterns for each of the parameters on the basis of the site information table and a basic pattern table. The candidate pattern evaluation section evaluates the respective candidate patterns by using the user condition table and presents patterns which conform to the user conditions to the user. The document output section generates a construction procedure and operating procedure on the basis of patterns selected by the user.	02-19-2009
20090150711	INFORMATION PROCESSING DEVICE, PROGRAM THEREOF, MODULAR TYPE SYSTEM OPERATION MANAGEMENT SYSTEM, AND COMPONENT SELECTION METHOD - An information processing device includes: storage means containing component information on the components constituting a system having a predetermined function; and processing means for calculating a combination of components necessary for constituting a system required for a service according to the component information, calculating risk information as information on the risk that a physical failure affects the service request for the combination of the components and/or fragment information as information on the deflection degree of the use condition of the components, and ranking the selected component combinations according to a predetermined policy, calculated list information and/or the fragment information.	06-11-2009
20090150712	METHODS, SYSTEMS, AND COMPUTER PROGRAM PRODUCTS FOR DISASTER RECOVERY PLANNING - Formulating an integrated disaster recovery (DR) plan based upon a plurality of DR requirements for an application by receiving a first set of inputs identifying one or more entity types for which the plan is to be formulated, such as an enterprise, one or more sites of the enterprise, the application, or a particular data type for the application. At least one data container representing a subset of data for an application is identified. A second set of inputs is received identifying at least one disaster type for which the plan is to be formulated. A third set of inputs is received identifying a DR requirement for the application as a category of DR Quality of Service (QoS) class to be applied to the disaster type. A composition model is generated specifying one or more respective DR QoS parameters as a function of a corresponding set of one or more QoS parameters representative of a replication technology solution. The replication technology solution encompasses a plurality of storage stack levels. A solution template library is generated for mapping the application to each of a plurality of candidate replication technology solutions. The template library is used to select a DR plan in the form of a replication technology solution for the application.	06-11-2009
20090164832	METHODS AND SYSTEMS FOR GENERATING AVAILABILITY MANAGEMENT FRAMEWORK (AMF) CONFIGURATIONS - Techniques for generating a system model for use by and availability management framework (AMF) are described. Inputs are received, processed and mapped into outputs which are further processed into a configuration file in an Information Model Management (IMM) Service external Markup Language (XML) format which can be used as a system model by an AMF.	06-25-2009
20090276654	SYSTEMS AND METHODS FOR IMPLEMENTING FAULT TOLERANT DATA PROCESSING SERVICES - Systems and methods are provided to implement fault tolerant data processing services based on active replication and, in particular, systems and methods for implementing actively replicated, fault tolerant database systems in which database servers and data storage servers are run as isolated processes co-located within the same replicated fault tolerant context to provide increased database performance.	11-05-2009
20100088538	METHODS AND SYSTEMS FOR COMPUTATION OF PROBABILISTIC LOSS OF FUNCTION FROM FAILURE MODE - A method for determining a probabilistic loss of function of a system includes the steps of determining a plurality of failure mode probabilities, ranking a plurality of functions pertaining to the failure mode probabilities, and identifying a likely function at least substantially lost by the system based at least in part on the plurality of failure mode probabilities and the ranking.	04-08-2010
20100106998	Robust Generative Features - Disclosed are systems and methods for developing robust features for representing data. In embodiments, a linear generative model is computed using data. In embodiments, based upon a robustness measure, a set of features is selected. In embodiments, the set of features may be evaluated to gauge the capacity of the set of features to represent the data. Responsive to the set of features not satisfying an evaluation criterion or criteria, the set of features may be refined until the selected set of features complies with the evaluation criterion or criteria.	04-29-2010
20100125745	METHOD AND APPARATUS FOR MEASURING CUSTOMER IMPACTING FAILURE RATE IN COMMUNICATION NETWORKS - A method and system for measuring a customer impacting failure rate in a communication network are disclosed. For example, the method collects a plurality of customer impacting network failure events, where the plurality of customer impacting network failure events comprises both hardware failure events and software failure events associated with a particular type of router or switch, or a particular type of component of the router or the switch. The method computes a Mean Time Between Outage (MTBO) metric from the plurality of customer impacting network failure events and compares the MTBO metric with a MTBO goal metric, wherein the MTBO goal metric is calculated in accordance with a predicted Mean Time Between Failure (MTBF) metric.	05-20-2010
20100125746	METHOD AND SYSTEM FOR DETERMINING RELIABILITY PARAMETERS OF A TECHNICAL INSTALLATION - A method calculating reliability parameters of a technical installation is provided. The reliability parameters are calculated using a modified Markov minimum cut method in which probabilities of a plurality of components failing on account of a common cause and the property of a component or subassembly with self-diagnosis are concomitantly included in the calculation of the reliability parameters. The input parameters for the calculation model are determined from messages and/or subsystems in the technical installation or from the overall installation. The failure and repair rates calculated may be used to predict the reliability, availability, maintainability and safety of the technical installation.	05-20-2010
20100162027	HEALTH CAPABILITY DETERMINATION SYSTEM AND METHOD - A system and method are provided for the determining of the potential effect(s) that a degraded system, subsystem, or component may have on the overall capabilities of a vehicle or other system, and any mitigating actions that may need to be taken. Mission-related capabilities of the system are decomposed into a plurality of lower-level capabilities that have an impact on the mission-related capabilities. One or more faults that have an impact on at least one of the lower-level capabilities are mapped to appropriate lower-level capabilities. The lower-level capabilities to which the one or more vehicle faults is mapped are computed, and values of the mission-related capabilities are computed from each of the lower-level capabilities.	06-24-2010
20100169703	SYSTEM AND METHOD FOR DETERMINING AVAILABILITY PARAMETERS OF RESOURCE IN HETEROGENEOUS COMPUTING ENVIRONMENT - A system and associated method for determining an incident of a resource in a computing environment. An event pertaining to the resource is processed by a system automation module. The event is represented as an associated event data having parameters of a target state, a target state prior to the event, a current state, and a current state prior to the event. First, the target state is compared to the target state prior to the event to assure that the target state is steady. Wherein a determination that the event is an incident cannot be made after comparing the target state and the current state, the system automation module compares the current state to the current state prior to the event. Upon determining that the event is an incident, the event data is marked and stored in a repository.	07-01-2010
20100205474	REDUNDANT, DISTRIBUTED COMPUTER SYSTEM HAVING SERVER FUNCTIONALITIES - A distributed computer system is disclosed which contains at least two physical computers and at least two services installed in the system. The computers function as servers for at least one of the services. To make the system redundant, at least one of the physical computers, in addition to containing the server functionality of a first service, also contains a virtual machine having the server functionality of a second service. A method is also disclosed for implementing redundant server functionalities in a distributed computer system.	08-12-2010
20100241891	System and method of predicting and avoiding network downtime - The invention teaches using human factors to monitor and manage computer networks. It is emphasized that this abstract is provided to comply with the rules requiring an abstract that will allow a searcher or other reader to quickly ascertain the subject matter of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.	09-23-2010
20110022879	AUTOMATED DISASTER RECOVERY PLANNING - A system and associated method for automated disaster recovery (DR) planning. A DR planning process receives disaster recovery requirements and a target environment configuration from a user to design DR plans for the target environment configuration that meets disaster recovery requirements. The DR planning process accesses a knowledgebase containing information on replication technologies, best practice recipes, and past deployment instances. The DR planning process creates the DR plans by analyzing the disaster recovery requirements into element risks, associating replication technologies to protect each element risks, combining associated replication technologies based on the best practice recipes, and selecting highly evaluated combination based on the past deployment instances. The DR planning process presents the DR plans as classified by replication strategy-architecture combination for each DR plans and marks how strongly each DR plans are recommended.	01-27-2011
20110029804	FLEET MISSION MANAGEMENT SYSTEM AND METHOD USING HEALTH CAPABILITY DETERMINATION - A system and method are provided for planning and controlling a plurality of machines. A mission is assigned to each machine of the plurality of machines. A plurality of system capabilities is computed in each machine and, from the plurality of computed system capabilities, a machine mission capability is computed for each machine. The mission of one or more of the machines may be selectively reassigned based on the computed machine mission capability of each machine.	02-03-2011
20110060936	METHOD AND APPARATUS FOR CORRECTION OF DIGITALLY TRANSMITTED INFORMATION - A method and correct apparatus for correction of at least one digital information item which is transmitted by at least one information source to at least one information sink is provided. The information source can be connected both to an information sink and also to a correction apparatus by means of a data transmission medium. The information processed by the information source for the information sink comprises a first variable name, with the following process steps: a) provision of listed variable names in a second memory area of the correction apparatus, b) then the information source transmits an information item which contains the first variable name, c) extraction of the first variable name from the transmitted information item and saving thereof in a first memory area, d) comparison of the first variable name with a listed variable name and determination of a discrimination criterion on the basis of this comparison, e) a decision is made using the discrimination criterion as to whether the first variable name remains unchanged or is replaced by the listed variable name from method step c) or whether method steps d) and e) are to be repeated by application of an additional, listed variable name and by determining of an additional discrimination criterion.	03-10-2011
20110138218	METHOD AND APPARATUS TO SIMPLIFY HA SOLUTION CONFIGURATION IN DEPLOYMENT MODEL - A method, device and system for generating an HA group according to a user's HA requirement by retrieving an applicable HA pattern, according to a result of HA requirement analysis on the user's HA requirement; generating an initial HA group based on the retrieved HA pattern; performing context rebuilding on a member unit in the initial HA group to obtain a preliminarily configured HA group; generating a member unit based HA group variant for the preliminarily configured HA group according to an HA group redundancy obtained from the user's HA requirement; and performing structure configuration and attribute configuration on a member unit in the generated HA group variant to obtain an HA group that meets the user's HA requirement.	06-09-2011
20110214005	OPTIMIZED PLACEMENT OF VIRTUAL MACHINES IN A NETWORK ENVIRONMENT - Systems and methods for reducing risk of service interruptions for one or more virtual machines (VMs) in a computing environment are provided. The method comprises computing a placement scheme for placing at least one VM on one or more hosts according to a set of placement constraints defined for the VM, wherein the set of placement constraints comprises at least one availability constraint defined for the VM, wherein the availability constraint designates a N resiliency level, wherein N corresponds to number of host failures that may occur before the services provided by the VM are interrupted.	09-01-2011
20110246811	METHOD FOR ESTIMATING THE RELIABILITY OF AN ELECTRONIC CIRCUIT, CORRESPONDING COMPUTERIZED SYSTEM AND COMPUTER PROGRAM PRODUCT - The determination of a reliability guideline of an electronic circuit having a nodal network of components including at least one reconvergence path between a correlation source and a sink, involves at the level of each component of the path, a computation of a conditional probability matrix whose conditioning is related to at least one node of the path situated upstream of the component.	10-06-2011
20110258478	FACADE FOR BUSINESS RISK MINIMIZATION IN CHANGE ADMINISTRATION VIA RISK ESTIMATION AND MISTAKE IDENTIFICATION BY TICKET ANALYSIS - A system and method of employing a façade to intercept change action commands to be carried out on a target IT endpoint resource. The intercepted commands are compared to information on a corresponding change ticket and any differences, along with the information such as target history, are used to compute a risk assessment of the risk in allowing the intercepted change action commands to be executed. Where the risk exceeds a predetermined threshold, the intercepted change action commands may be modified or eventually aborted.	10-20-2011
20110258479	SERVER-TO-SERVER INTEGRITY CHECKING - A method performed by a primary server includes receiving integrity criteria and sending a health check request to a secondary server based on the received integrity criteria. The method also includes receiving integrity information from the secondary server and checking the integrity information against the integrity criteria. The method further includes initiating a non-compliance action if the integrity information does not comply with the integrity criteria.	10-20-2011
20120089859	Method and Device for Exception Handling in Embedded System - A method and a device for handling exceptions in an embedded system are disclosed. The method comprises: establishing an exception callback linked list for an application program when the application program is running; registering an exception handling function and the corresponding relation between the exception handling function and the exception information into the exception callback linked list by the application program; when the exception is captured, searching the corresponding relation between the exception handling function and the exception information to locate an exception handling function matching the captured exception, according to the exception information of the captured exception; after a matched exception handling function is located, calling and executing the matched exception handling function to perform the exception handling. By adopting the method and the device, the direct operation of the exception handling function to the bottom layer hardware is avoided, and the portability and robustness of the software are improved.	04-12-2012
20120192005	SHARING A FAULT-STATUS REGISTER WHEN PROCESSING VECTOR INSTRUCTIONS - The described embodiments provide a processor that executes vector instructions. In the described embodiments, the processor initializes an architectural fault-status register (FSR) and a shadow copy of the architectural FSR by setting each of N bit positions in the architectural FSR and the shadow copy of the architectural FSR to a first predetermined value. The processor then executes a first first-faulting or non-faulting (FF/NF) vector instruction. While executing the first vector instruction, the processor also executes one or more subsequent FF/NF instructions. In these embodiments, when executing the first vector instruction and the subsequent vector instructions, the processor updates one or more bit positions in the shadow copy of the architectural FSR to a second predetermined value upon encountering a fault condition. However, the processor does not update bit positions in the architectural FSR upon encountering a fault condition for the first vector instruction and the subsequent vector instructions.	07-26-2012
20120266011	RELIABILITY BASED DATA ALLOCATION AND RECOVERY IN A STORAGE SYSTEM - A storage system provides highly flexible data layouts that can be tailored based on reliability considerations. The system allocates reliability values to logical containers at an upper logical level of the system based, for example, on objectives established by reliability SLOs. Based on the reliability value, the system identifies a specific parity group from a lower physical storage level of the system for storing data corresponding to the logical container. After selecting a parity group, the system allocates the data to physical storage blocks within the parity group. In embodiments, the system attaches the reliability value information to the parity group and the physical storage units storing the data. In this manner, the underlying physical layer has a semantic understanding of reliability considerations related to the data stored at the logical level. Based on this semantic understanding, the system has the capability to prioritize data operations on the physical storage units according to the reliability values attached to the parity groups.	10-18-2012
20120290867	MATRIX COMPUTATION FRAMEWORK - Described herein are technologies pertaining to matrix computation. A computer-executable algorithm that is configured to execute perform a sequence of computations over a matrix tile is received and translated into a global directed acyclic graph that includes vertices that perform a sequence of matrix computations and edges that represent data dependencies amongst vertices. A vertex in the global directed acyclic graph is represented by a local directed acyclic graph that includes vertices that perform a sequence of matrix computations at the block level, thereby facilitating pipelined, data-driven matrix computation.	11-15-2012
20120317436	OPERATOR MESSAGE COMMANDS FOR TESTING A COUPLING FACILITY - A facility is provided to enable operator message commands from multiple, distinct sources to be provided to a coupling facility of a computing environment for processing. These commands are used, for instance, to perform actions on the coupling facility, and may be received from consoles coupled to the coupling facility, as well as logical partitions or other systems coupled thereto. Responsive to performing the commands, responses are returned to the initiators of the commands.	12-13-2012
20130097454	ELECTRONIC DEVICE AND METHOD FOR PROTECTING SERVERS AGAINST VIBRATION DAMAGE - An electronic device capable of communicating with a plurality of servers includes a storage unit, a vibration unit, a control unit, and a communication unit. The storage unit stores a vibration threshold value. The vibration sensor senses a vibration magnitude of the electronic device. The control unit generates control signals and transmits the control signals to the servers via the communication unit to direct the servers to take certain actions to protect data when the vibration magnitude sensed by the vibration sensor is equal to or greater than the vibration threshold value.	04-18-2013
20130117600	MEMORY MANAGEMENT IN A NON-VOLATILE SOLID STATE MEMORY DEVICE - A computer-implemented method of managing a memory of a non-volatile solid state memory device by balancing write/erase cycles among blocks to level block usage. The method includes monitoring an occurrence of an error during a read operation in a memory unit of the device, where the error is correctable by error-correcting code, and programming the memory unit according to the monitored occurrence of the error, where the step of monitoring the occurrence of an error is carried out for at least one block, and wherein said step of programming includes wear-leveling the monitored block according the error monitored for the monitored block.	05-09-2013
20130138992	Preventing Disturbance Induced Failure In A Computer System - A system and a computer program product for executing a method to prevent failure on a server computer due to internally and/or externally induced shock and/or vibration. The method includes acquiring, by at least one sensor, analog acceleration data of components in a server computer. The data is then converted to digital format and stored within a motor drive assembly processor memory unit. The processor analyzes the stored data for existence of machine degradation. In response to detecting the existence of machine degradation, the motor drive assembly processor initiates remediation procedures. The remediation procedures include controlling rotating speed of moving devices or performing a complete system shut down.	05-30-2013
20130297963	EXPOSING APPLICATION PERFORMANCE COUNTERS FOR APPLICATIONS THROUGH CODE INSTRUMENTATION - Disclosed is a method for adding performance counters to an application after compilation of the application to Common Intermediate Language code without a requirement for code changes to the original application code or application recompilation from the development side. With regard to a further aspect of a particularly preferred embodiment, the invention may provide a method for adding the performance counters by declarative instrumentation of an application at runtime or compile time, without the need for an application developer to hardcode instrumentation logic into the application. An instrumentation configuration file provides declarative definition for performance counters that are to be added to a particular application, and particularly includes a complete list of performance counters that need to be added and settings for each performance counter.	11-07-2013
20140215255	MITIGATING RISKS DURING A HIGH AVAILIBILITY AND DISASTER RECOVERY (HA/DR) REHEARSAL - A method of mitigating risks during a high availability and disaster recovery (HA/DR) rehearsal comprises, with a processor, performing a number of checks on a number of applications to determine the operational performance of the applications, and with the processor, determining if the applications comprise design patterns that indicate potential HA/DR risks.	07-31-2014

Patent applications in class Reliability and availability

Patent applications in all subclasses Reliability and availability

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Reliability and availability

Subclass of:

714 - Error detection/correction and fault detection/recovery

714100000 - DATA PROCESSING SYSTEM ERROR OR FAULT HANDLING

Patent class list (only not empty are listed)

Deeper subclasses: