IT Infrastructure Availability and Reliability Monitoring Systems



DPC engineering infrastructure consists of tens of devices many of which have built-in microprocessor control systems. Number of parameters to be controlled by today's DPC can amount to thousands (climate parameters of separate racks, server consumption currents, security sensor state, etc.). All parameters shall be collected and analyzed in 24x7 real-time mode. This can be ensured only by using unified control and monitoring system which regularly collects information from all data sources (sensors, equipment state) and controls the equipment to ensure the most comfortable operation conditions.

In general, the monitoring levels are divided into:

  • failure monitoring;
  • performance monitoring;
  • service level monitoring.

Implementation of multilevel approach to DPC resource monitoring (in all fields from network infrastructure to services and applications implementing them) allows reaching of the required IT service provision level, IT resource controllability and monitoring.

IT infrastructure availability and reliability monitoring system introduction ensures:

  • almost immediate response to event - data from the most critical points can be received several times per second thus ensuring fast system response, minimize possible negative consequences of emergency situations;
  • no human in making decisions in emergency situation - the majority of threats are known and the action procedure is known; the unification of engineering systems into unified logical network including decision-making mechanisms integrated into automated control systems ensures preset response of the whole DPC engineering infrastructure to emergency situation;
  • lower human factor in engineering infrastructure operation;
  • automated system allows setting necessary parameters of the engineering system operation, control compliance to these parameters, continuously control parameters of environment, power, technological equipment state with independent control devices. If the parameters are out of limits or failures are detected the personnel on duty receives warnings and recommendations on elimination of emergency situation;
  • extended possibilities of engineering systems by extension of the control points thus allowing to guarantee compliance to necessary environmental and equipment operation parameters (for example, regulation of microclimate maintenance system control not only depending on hot air temperature supplied to conditioner but depending on data received from additional sensors installed directly in the equipment air intakes);
  • unified monitoring and control interface - the personnel on duty has unified graphic monitoring and control interface for all DPC engineering systems thus facilitating information perception, allowing real-time control of all parameters, timely detection of changes in operation and execution of maintenance and repairs;
  • possibility to forecast engineering infrastructure element failures - connection of additional sensors and measurement of nonstandard parameters of engineering infrastructure allows prevention of possible emergency (for example, certain value sagging sensors inform that pressure on cover has reached maximum permissible value and further load increase will lead to breaks of constructions).