This week at Microsoft Ignite, the Azure team announced it added some interesting features to its object storage platform that dramatically increase the value of storing data in the Azure cloud. These features are just now beginning to appear in products, the first of which is focusing on archives.
Very few people seem to know what an archive actually is. Many vendors use the term archive when what they really are delivering is long term retention of backups. One of the biggest differences between a backup and an archive is how searchable the content is. Backups are typically only searchable by file name, a single date, and a single server. “Show me filex in folderx on serverx from datex.”
A basic archive should allow you to search the metadata about each archived object. You should be able to easily retrieve all files generated by the same person, that contain a certain string in their name, or were created in a date range. “Show me all files created by userX during the last n days that contain stringx in their name.” Every part of the metadata should be searchable.
A good archive also allows you to search the content of the objects stored within it. There’s a huge difference between being able to find all files or emails that have “proposal” in their name (or subject) and finding files or emails that have “proposal” anywhere in the document. The latter is much more powerful than the former, especially when satisfying an electronic discovery request.
While many archive products are able to do full text search, they usually do not have the ability to search the contents of audio and video content. If they are able to do so, they usually only understand a single language. Microsoft is trying to change that with its new ability to transcribe audio and video built into its platform, including the ability to translate the text into or out of over 30 languages.
The functionality is quite impressive, but it comes with no user interface. Microsoft simply lets other vendors hook into their functionality via APIs, and that’s what Archive360 has done with its Archive2Azure product. Customers who use Archive2Azure to archive data to Azure will be able to leverage this functionality to search the full text of the transcribed content.
Storage Switzerland discussed Archive360’s FastCollect (previously called Archive2Anywhere) in a previous briefing note. The product is able to pull data out of various archives and store it in the cloud, placing it into a unified archive with federated search. But now customers storing their archives in Azure can perform very complex queries against the content of those archives.
One excellent example was when the Archive360 representative searched for a particular phrase in a video. Not only did the search find videos with that phrase, the interface took us right to the point in the video where that phrase was uttered. The product can produce a transcription of any audio or video segment archived to the product. In addition, it can also produce real-time transcription translation in up to 53 languages.
StorageSwiss Take
The possibilities this opens up are hard to fathom. Most companies already have a dearth of video and audio content. They also often have analytics products that can analyze trends in large sets of data. Imagine the possibilities of marrying these two technologies. In addition, companies required to keep video and audio for regulatory purposes can now have more ease in searching that content in the event of an electronic discovery request.