Wednesday, April 9, 2014

Online File Sharing: A Technical Writer’s Perspective on Hash Checking and Encryption

By Anuradha Satish

As technical communicators, many of us use the Internet to share our work files. SharePoint, Dropbox and Google Docs are among the most common platforms we use to share work.

Recently I came upon a blog article* about Dropbox disrupting the sharing of a document. They alleged copyright infringement based on the DMCA without even looking into the contents of the file. The article described how Dropbox evaluates the legal accuracy or legitimacy of files based on hash algorithms. When a document’s hash code matches one of Dropbox’s blacklisted documents Dropbox can prevent the file from being shared without having to know what it actually contained.

The good news here is that Dropbox is not snooping into our shared files! However, an incident like this makes me wonder how accurate the hash checker is and how safe our documents are when shared online. Let’s take a closer look at what a hash code is and how it operates.

A hash code can be interpreted as a “fingerprint” – it is a unique alpha-numeric code assigned to every document stored in any cloud-shared folder. Here’s a simplified example to show how it works:

  • Document A containing 1,2,3,4 could have a hash code assigned as ar59i3nd
  • Document B containing 2,1,4,3 could have a hash code assigned as b3nj98he

Each document’s hash code serves as a unique identifier. Storage centers, such as Dropbox, use this hash code to identify the correct document. If the document is altered in any way, the hash code changes. Two versions of the same document will always have two separate hash codes. But if the document matches another document, word-to-word, it will have the same hash code.



Storage centers use hash checkers to validate data. In the case of that article that prompted this blog entry, hash checkers allowed Dropbox to prevent the sharing of a file because its hash code matched that of a black-listed document.

What are the direct implications of this to technical communication?

First, the technical communication industry itself is moving towards online and remote work. Many technical communicators, including writers, illustrators, trainers or editors, have worked or will work for remote clients and share files online. This makes our work easily prone to cyber duplication and plagiarism. It is important to be aware of this potential risk when sharing files.

Second, file encryption is becoming extremely important as more data is transmitted online. Cloud storage service providers can provide a basic level of encryption to ensure data security but, by offering this service, the provider then has the ability to access that content at any time. Instead of leaving it up to the online storage providers, we can take control of the encryption process before sharing documents online. Many free encryption tools offer online encryption services.

Third, be aware and alert once your document is shared online. If you think you have shared potentially confidential information, run a search on popular search engines. If you ever come across web pages that have copied content from your work, or contain similar enough content to make you suspect plagiarism, then you can file a DMCA* complaint with the search engine. The law requires the search engine to prohibit displaying the copyrighted content again. It is a cumbersome manual check but it will ensure that you catch the slightly-different versions of your document which is something that a hash checker will miss.

** Be aware of our laws:

  • The Digital Millennium Copyright Act (DMCA) is a United States law against copyright infringement that implements two 1996 treaties of the World Intellectual Property Organization (WIPO). It criminalizes production and dissemination of technology, devices, or services intended to circumvent measures (commonly known as digital rights management or DRM) that control access to copyrighted works. It also criminalizes the act of circumventing an access control, whether or not there is actual infringement of copyright itself. In addition, the DMCA heightens the penalties for copyright infringement on the Internet.
  • In Canada, currently it is legal to download any copyrighted file as long as it is for noncommercial use, but it is illegal to distribute the copyrighted files (e.g. by uploading them to a P2P network). Canadian law makers are proposing Bill C-61, an Act to amend the Copyright Act – a controversial Bill that is similar to the American DMCA.
* Original Blog Article
**Source: http://en.wikipedia.org