Protecting and managing distributed data is a classic storage challenge for most environments. Distributed data is commonly perceived as being at remote locations, outside the data center, or not on a mainframe. Distributed storage includes network-based (LAN servers) storage file servers in a building, campus, or metropolitan area. Distributed storage supports a wide range of applications such as e-mail, database, e-commerce and more. With the proliferation of distributed and duplicated data, the complexity and cost to protect all this data is compounding proportionally. This article examines various techniques, including emerging trends and technologies, for accessing, protecting, and managing distributed data.
Distributed Data Protection Challenges
Network Attached Storage (NAS) over TCP/IP networks for file serving and sharing has become a popular choice for remote storage. NAS storage typically supports either (or both) Network File System (NFS) and Common Internet File System (CIFS). Some environments use both NFS and CIFS with some also using Novell Netware and Apple file sharing. NAS has been a popular choice for distributed data due to its ease of deployment and usage via dedicated file servers (also known as appliances or filers) and via general-purpose servers. Entry-level and Small-Medium-Size Business (SMB) storage subsystems attached to distributed servers also are popular for distributed storage either direct attached, or attached via a storage network interface. Storage networking interfaces and protocols include Internet SCSI (iSCSI) over Ethernet, NAS over IP-based networks, and Fibre Channel.
- Potential threats to data and storage resources increase as more data is created and used outside traditional data center environments. The list of possible threats, and corresponding techniques to protect data, can be extensive. You should avoid simply treating all data the same. A tiered storage protection strategy aligns applications and server tiers to appropriate storage resources and protection categories. For example, some storage may be Redundant Arrays of Inexpensive Disks (RAID)-protected; other storage may use local RAID with a synchronous mirror copy offsite with Point-in-Time (PIT) copies for backups. Infrequently accessed or old data may be archived to some other medium until needed. Other storage may be backed up regularly to an offline medium such as disk or tape. Tiered storage is much more than simply deciding what type of disk drive to allocate to an application, based upon performance and space capacity. A challenge with distributed data, and storage in general, is that with snapshots, RAID, and improved reliability of disk drives, there can be a false sense of security. Data may not be backed up regularly, or, perhaps worse, it’s perceived to be protected, using a combination of RAID and PIT snapshots and isn’t actually being backed up.
Protecting Distributed Data
A first step in data protection is to understand your application needs and business objectives. This involves understanding the value of your data and how and where the data is generated and used. Understanding data needs helps you classify and determine data availability and retention requirements.
Defining Recovery Time Objectives (RTOs) for your environment is also imperative. You may have multiple RTOs for different applications. A well-defined RTO is important, as it’s used to determine how quickly data will be able to be accessed, including restoration, recovery, and restart time. For example, an RTO of zero would require continuous availability while an RTO of 24 hours would require that data be available in 24 hours.
Recovery Point Objective (RPO) is important as it determines to what point you’ve protected your data. RPO is an indicator of data value, how it’s protected, and how much can be lost. An RPO of zero means no data can be lost requiring real-time protection. An RPO of 30 minutes would indicate that up to 30 minutes of data could be lost. Different data and applications may have varying RPO times.
Data Risks and Threats
Backup and Recovery
There are many methods of data backup, including Disk-to-Tape (D2T), Disk-to-Disk (D2D), Disk-to-Disk-to-Tape (D2D2T), and Disk-to-Disk-to-Optical (D2D2O). The benefits of using disk-based backup solutions include enhanced performance of backups, faster recovery, and enhanced reliability of backup and data protection. Many remote and distributed environments that perform data backups tend to use localized backup as opposed to a centralized backup and data protection scheme. Reasons for localized backup include keeping the data protection and management near where the data is used and attempting to avoid high network bandwidth costs and requirements. Another common reason for localized backup, though not necessarily a good one, is the internal politics of an organization.