Checksums and Verification
Part 1: The 5 W's
This a multi part blog. Links for parts 2 and parts 3 can be found at the bottom of this post.
What is a checksum?
A checksum is a string of numbers and letters that uniquely identify a particular file regardless of the type, ie .mov, .ari, .dng, .png, .txt. Think of them like a fingerprint; no two fingers have the same fingerprint (nope, not even identical twins!) and no two random files have the same checksum; different files = different checksums. This allows people (obviously with the help of computer applications) to compare checksum values to ensure a 100% exact copy of a file is made when duplicating, or offloading files.
Why are checksums important in the media and entertainment industry?
Video files are made up of bytes. Many times when files, especially video files, are being moved around the bytes can be corrupted, inverted or lost completely. This can cause lags or glitches in video during playback. Simple applications like Finder or Explorer do not use checksums. Instead, they "look" at the files or folders being moved or copied and say, "ok, looks like 10 GBs. Yeah I've got room here for that amount" and then they dump the files. The problem occurs when only 9.8 GBs are moved and 0.2 GBs are lost OR the total 10 GBs are moved but bytes 3.2 and 3.3 are switched around so that 3.3 comes before 3.2.... I think you can see the problem here! Finder and Explorer have no safeguards put in place to ensure this doesn't happen.
When and where are checksums appropriate to use?
Checksums can be used when transferring any file type. Many real estate, doctors and lawyers offices use checksums when moving files from one place to another. You can understand why losing files or folders for these industries would be disastrous. In the media and entertainment world, losing files equals losing shots. Sometimes you can not replicate those shots or they would be too costly to replicate. Either way, the short answer to the question is ALWAYS. Any time files or folders are being moved, checksums should be used. Sorry (not sorry) for the Harry Potter reference, I couldn't resist!
Who should use checksums?
Honestly, if you are serious about data integrity - you should be using checksums. Not only that but make sure you have a report with all the checksum values to cover your own butt. Many studios and insurance companies are requiring offload applications like ShotPut Pro that use checksum algorithms. If there is not a report you can't prove that checksums were used.
That's all for this post! I hope it's cleared up what checksums are and why they are important to use. We will be delving more deeply into the world of checksums in the near future. If you have a specific question you would like answered please post a comment here, tweet or Facebook us!
Checksums and Verification Part 2: Define and Decide
Checksums and Verification Part 3: Speed vs Security