Data Archiving Systems



Enhancement of electronic communication development, fast growth of information volumes stored in soft form, increase of its value make issues of corporate digital archives more and more important. The share of archiving systems in the total data storage system volume grows steadily. New requirements to secure storage appear. All this inevitably leads to review of criteria accounted for in the creation of data archiving systems.

Open Technologies offers services on creation of data archiving systems.

The state-of-the-art data archiving solutions shall ensure:

  • fast access;
  • high level of protection against information loss;
  • possibility to control secure data storage period;
  • authenticity of the information stored;
  • deduplication of data archived;
  • efficiency of search in the content;
  • scaling maintaining current archive infrastructure.

The today's archiving systems aimed to store static unstructured data are focused on optimum information distribution and effective user access. As a rule, these solutions include specialized archived data storage system, software implementing mechanisms of interaction with client and storage control and components which ensure interaction with adjacent systems (such as, for example, data backup copying and recovery system).

Peculiarities of archive storage system construction

Absence of structure complicates logic organization of data in the repository. To increase the archive materials use ratio and ensure effective implementation of requirements to archiving systems as listed above so called content-oriented approach to storage or content address storage (CAS) method is used. Majority of today's archived data storage systems represent implementation of this very approach. The method implies that the data are stored not as files or units (both in NAS and SAN units) but as some data objects with unique identifiers (Content Address or fingerprint).

The identifier attributed to object is addressed to the client and used in further operations with data. Identifier calculation according to content excludes duplication of objects in the repository. Absence of any repository logic structure (file system, data base, etc.) allows reaching independence from the repository physical characteristics and ensures easy system scaling. Data allocated in the archive by the user or application are available to be deleted or modified only by the archive administrator or they are deleted automatically according to storage policy applied.

backup.png

It is characteristic that system organization according to CAS principle allows both reaching required technical parameters and receiving tangible economic benefit. The total cost of storage of 1 Tb data is significantly lower than the cost of storage of the same data volume in conventional operative storage systems and is comparable to cost of tape storage.

EMC archive storage system implementation

The first commercially successful implementation of CAS-based storage system appeared at the market in 2002. The content addressed archiving platform was developed by EMC Corporation specialists and appeared under the name “EMC Centera”.

EMC Centera is a content addressed storage system constructed by joining of independent nodes and introduction of redundant nodes (RAIN architecture). The system node is represented by standard server with inner discs which are used to allocate CentraStar operation environment files and archive data. Protection against data loss in case of node disc failure or total node failure is ensured by unification of nodes into groups following mirror scheme or checksum calculation scheme. One of the main peculiarities is that in case of failure of one of the system nodes data are protected and the solution functions are preserved as a whole.

All system nodes are identical and used to work in one of three modes: data storage mode, access to archived data mode and combined mode allowing reducing node number.

Clients' interaction with archiving system is ensured by TCP/IP protocol using Gigabit Ethernet interfaces. The same interfaces are used for internode interaction inside the archiving platform. The connections of every node are duplicated.

Data protection against disasters is implemented by asynchronous replication between two or more EMC Centera systems. Both replication and access to data are based on IP protocol and can occur in one or two directions. If necessary, it is possible to construct data replication scheme consisting of several EMC Centera systems in line.

Data Archiving Solutions Software

Due to CAS storage system architecture peculiarities the conventional file and unit access to data is impossible. This facilitates implementation of information security policies and ensures data integrity but requires special interface - Application Platform Interface (API) to ensure information interaction.

The market offers wide range of products of the leading manufacturers of software used to organize archive storage solutions. The main function of these products is interaction of archived data storage system (in particular, EMC Centera), applications and clients.

To construct mono-vendor data archiving solution EMC offers a range of market leading high performance software.

First of all, these are EMC SourceONE family products providing archiving of files, mail messages, application system data implementing eDirectory function and allowing construction of full-fledged solution for Enterprise level data storage.

Integration with applications not directly compatible with EMC Centera is provided by EMC Centera Universal Access representing a gateway for interaction of HTTP, FTP, NFS or CIFS protocols and Centera API. EMC Centera Seek and EMC Centera Chargeback Reporter ensure fast and correct search of information and reporting functions for large volume archives.

Additional protection level of data located in any type storage systems is backup copying. Archiving platform is no exception. Interaction of data archiving solution based on EMC Centera and backup solutions is ensured by EMC Centera Backup and Recovery Module. The module ensures interaction of NDMP and Centera API protocols allowing integration of electronic archive and backup platforms of the leading manufacturers. One of the platforms of the kind is EMC Networker.

Open Technologies has vast experience in implementation of complex integration projects including construction of data storage systems, archiving platforms, data backup and recovery systems. Open Technologies is a long-lasting partner of EMC and other manufacturers of data storage product market. We are always ready to offer optimum solution based on products manufactured by the industry leaders.