Does File Sync and Share Create A Data Deduplication Problem?

Posted on September 28, 2017 by wcurtispreston

File sync and share is here to stay, and enterprise file sync and share (EFSS) is how most enterprises add this functionality to their environment. The question is whether the addition of this new service adds additional problems. For example, many EFSS solutions require data to be synced to another destination (either the cloud or a specialized appliance), creating yet another copy of data that IT needs to manage. Is there another way to add the same service without adding these problems?

Duplicate copies are already a problem

With unstructured data, we tend to create duplicates when we share data with other users, because we tend to send them via email, Skype, Slack, or other similar mechanisms that by their very nature create multiple duplicates of each file. Every one of these duplicates increases storage costs and complexity, while also giving hackers another place to attack the files. Duplicates caused by sharing files is one of the problems most EFSS systems are trying to solve. The fact is that today most systems require users to actually make duplicates in order to use the system. Duplication increases the threat surface of the organization and complicates the storage environment. This is creating a significant risk management and security problem for many companies.

EFSS Compounds the Duplication Problem

For an EFSS system, even the hybrid versions, to begin sharing files each user needs to synchronize their entire directory to a secondary source. Once that synchronization is complete, the files must be synchronized to each user that will access them to a special directory, including the originating user. The first copy data challenge with EFSS systems is before a customer even begins sharing files, they must first create several duplicates of each file. In addition, EFSS providers have multiple redundant servers at different geographies. Each server will also have a near-line backup and perhaps an off-site backup. EFSS systems are therefore adding to the duplication problem, not solving it. Again, this increases storage costs to the organization and security risks to the data.

Why is the Cloud a problem for Enterprise File Sync and Share?

The EFSS Transfer Problem

The next challenge with this model is the physics of getting everyone’s user data to each location. If the central location is a cloud storage system, a significant amount of data will need to be transferred from the current location of the data to the cloud location over the Internet. As mentioned previously, that data will then need to be downloaded, also over the Internet, to each user that will share the data. The amount of bandwidth required for such transfers, and the impact on the environment during the transfer can be significant – and that’s expensive and not very productive.

Duplication Compounds EFSS Security Concerns

If a company is concerned of the security ramifications of having data stored in the cloud and chooses to use some type of on-premises hybrid storage for EFSS, they will have another challenge. Since the copy stored on the on-premises storage will be considered the copy of record, they will need to put it on some type of reliable storage. In many cases, this will result in the purchase of an additional storage system just for this requirement. And although duplicating everyone’s data to a local file server will have less of an impact than duplicating it to a public cloud server, there is still an impact on the environment – especially during the initial migration. There also will be the duplication of data from the on-premises storage system to the user devices (laptops, tablets, smartphones).

One of the other challenges with having duplicates in multiple places is version control. Without aggressive file locking techniques, it’s very easy to create multiple versions of the same file, each of which have changes from different people.

The model of synchronizing everything to a central location and then synchronizing it again to other remote locations works well for consumers who are sharing small amounts of data across varying Internet connections. But using that same model within an enterprise has different ramifications, including the storage costs and complexity of storing the additional duplicates, as well as the security risks of constantly creating duplicates on local or remote machines. Duplication quite simply increases your security risks.

In-Place EFSS

What if enterprise users could directly share a user’s file from its original location without creating an additional copy? (Think GoToMyPCTM, but for files.) If a file could be shared from its original location – whether file server or desktop – this would work for all types of files. This model is built more for the corporate environment, where inter-desktop communication is a lot easier than two consumers sharing files from their laptops over the Internet. It solves the problems above of creating duplicates, supporting the bandwidth required to copy the data around, and the unproductive management of the central copy stored on a file server or cloud server. By sharing from the source location, there is actually only one file and that resolves file locking problems. And finally, all of the data remains behind the firewall without requiring an additional file server on site.

StorageSwiss Take

Enterprise IT departments needed file sharing functionality, to stop users from using consumer-grade file sync and share services without the consent or control of IT. It does appear, however, that in a rush to meet this need, companies designing such products failed to take into account the differences between an IT department and consumers sharing data across the Internet. Taking this into consideration allows for a completely different design that doesn’t have the same problems. Sharing files from the original location avoids the creation of duplicates and saves space, while also reducing cost and complexity. It also increases security by reducing the number of places a file can be accessed.

Sponsored by Qnext

Qnext Corp. is a global developer of disruptive apps and private cloud technologies committed to simplifying and protecting your digital life through innovation, imagination and state-of-the-art software.

Their solution, FileFlex was created in response to users need for accessible data but is better than the traditional enterprise file sync and share solution. It virtualizes file access to ALL the company’s disparate storage infrastructure and devices. This enables any server, notebook, desktop, SAN, NAS, public, private or virtual private cloud to be available anytime, anywhere through a secure and private network and single dashboard. The file access virtualization technology behind FileFlex essentially takes the company owned infrastructure and turns it, in its entirety, into a private cloud.

About wcurtispreston

W. Curtis Preston (aka Mr. Backup) is an expert in backup & recovery systems; a space he has been working in since 1993. He has written three books on the subject, Backup & Recovery, Using SANs and NAS, and Unix Backup & Recovery. Mr. Preston is a writer and has spoken at hundreds of seminars and conferences around the world. Preston’s mission is to arm today’s IT managers with truly unbiased information about today’s storage industry and its products.

Tagged with: Cloud, Duplication, EFSS, FS&S, Hybrid, Migration, Qnext, Redundancy
Posted in Blog