Computer Investigations

What is a Hash Value?

A hash value is a result of a calculation (hash algorithm) that can be performed on a string of text, electronic file or entire hard drives contents. The result is also referred to as a checksum, hash code or hashes. Hash values are used to identify and filter duplicate files (i.e. email, attachments, and loose files) from an ESI collection or verify that a forensic image or clone was captured successfully.

Each hashing algorithm uses a specific number of bytes to store a “ thumbprint” of the contents. The following is a list of hash values for the same text file. Regardless of the amount of data feed into a specific hash algorithm or checksum it will return the same number of characters. For example, an MD5 hash uses 32 characters for the thumbprint whether it’s a single character in a text file or an entire hard drive.

HASH

MD5: 464668D58274A7840E264E8739884247

SHA-1: 4698215F643BECFF6C6F3D2BF447ACE0C067149E

SHA-256: F2ADD4D612E23C9B18B0166BBDE1DB839BFB8A376ED01E32FADB03A0D1B720C7

SHA-384:

2707F06FE57800134129D8E10BBE08E2FEB622B76537A7C4295802FBB94755BBEE814B101ED18CC2D0126BD66E5D77B6

SHA-512:

C526BC709E2C771F9EC039C25965C91EAA3451A8CB43651EA4CD813F338235F495D37891DD25FE456FE2A8CA89457629378BE63FB3A9A5AD54D9E11E4272D60C

RIPEMD-128: A868B98EAEC84891A7B7BA620EDDE621

TIGER: F31A22CEED5848E69316649D4BAFBE8F9274DED53E25C02D

PANAMA: 7E703B1798A26A0AF21ECD661CBADB9C72B419455814CA7B82E29EE0C03FA493

CHECKSUM

CRC16: 117C

CRC32: FA2D47D4

ADLER32: CF7D65FF

As you can see there are also various length hashes within a family (SHA-1, SHA-256 et.) The most common hash values are MD5, SHA-1 and SHA-256. The longer hash values require more time to calculate and are designed to reduce the probability of a collision.

What is a Hash Value

A few other ways that hash values are used:

-  Verify a downloaded file was created by the publisher (oppose to a virus infected version)

-   Identify and filter files on the NSRL/NIST list (“deNISTing”)

-   Locate known contraband (illegal images and videos)

Here are a few reasons why hash values are so widely used as a means to validate and compare content:

1)  Privileged Data – There would be obvious issues storing and providing multiple copies of the contents of a company’s files or entire hard drives data in a database to perform a byte comparison. Not to mention illegal images and videos (child pornography) would have to be stored and used in each system scan. These scenarios are unacceptable.

2)  Speed – Comparing an indexed hash value versus what could be billions or trillions of bytes or source data is much quicker. Optimized hash engines (Pinpoint Harvester) can compare thousands of hash values in a second.

3)  Security  – Hashing data is a one way trip. The original data can’t be recreated or reverse engineered from the hash value. This provides additional security that a person can’t determine the source data from the hash.

The argument that data sources could be different and have the same hash value has raised a lot of concern. There are countless threads related to this issue on the litigation support and computer forensic forums. The bottom line is the only way to do an exact comparison of the original data is to store it everywhere you need to deduplicate or verify the information, however, as mentioned about this isn’t a practical alternative.

More complex hashing functions have been introduced (SHA-256, SHA-512 etc.) which will further reduce the likely hood of a collision. It is also worth noting that even in those cases where scientists have created collisions it was a result of exploiting the weaknesses in a specific hash algorithm. The same alterations would not create a collision in a different hashing algorithm.

So, if you still aren’t satisfied with the incredibly remote possibility a collision could happen using a single hash value then the easiest way to implement an extra precaution is to take the time to have your processes calculate hash values from two separate algorithms (i.e. MD5/SHA256) for each item. Unfortunately, most EED applications and forensic imaging tools don’t support this option, especially  in a single pass.

What to Remember

Hash values are a reliable, fast, and a secure way to compare the contents of individual files and media. Whether it’s a single text file containing a phone number or five terabytes of data on a server, calculating hash values are an invaluable process for Deduplication and evidence verification in electronic discovery and computer forensics.

ESI (Electronically Stored Information) Software Challenges

A couple weeks ago, I outlined what computer forensics and electronic discovery have in common and how they differ. I’d like to expand on this topic by identifying some common obstacles encountered when using popular computer forensic software for typical electronic discovery projects.

A typical computer forensic case may involve:

  1. A small quantity of email and/or attachments
  2. Recovered files, internet history, and user activity
  3. Registry entries
  4. Pre-fetch files
  5. Portions of unallocated space

A typical electronic discovery project may involve:

  1. Processing dozens or hundreds of custodian mailstores that results in thousands of potentially relevant emails and/or attachments
  2. Indexing hundreds of gigabytes or multiple terabytes of data
  3. Hosting data online so multiple parties can easily review, identify, and produce files
  4. Converting relevant files to tiff, endorse, and build load files compatible with common litigation support applications
  5. Deduping emails, attachments, and files across dozens of custodians

Generally speaking, the primary obstacles encountered when using off-the-shelf computer forensic software for electronic discovery are:

  1. Inability to create load files from tagged emails, attachments, and other relevant data
  2. No support for tiffing, endorsing, and assigning docIDs
  3. Missing/incomplete links between email and attachments
  4. No clear way to produce carved or partial files recovered from unallocated space

If you anticipate reviewing a large ESI collection using one of the common litigation support review tools, make sure that your service provider can process and produce compatible output files for production sets. Don’t assume that all computer forensic examiners are equipped to handle large scale ESI projects.  On the other hand, not all EED service providers have the appropriate tools to complete a thorough computer investigation.

What is a Forensic Image?

‘Imaging a hard drive’ is a phrase that is commonly used for preserving the contents of a custodian hard drive or server. It can also be used to describe when a custodian hard drive is cloned. It is worth taking some time to understand the differences and the advantages and disadvantages of each process.

Forensic Imaging
A forensic image or evidence file container (such as EnCase, DD, Expert Witness, and SMART) is often created using software that is running on a computer forensic examiner’s laptop or lab computer. The examiner will connect the drive to a write blocker and use software to create a forensic image of the entire contents of the source drive on a separate target hard drive. The process may also capture multiple forensic images to a single hard drive.

Hard Drive Cloning
Cloning a hard drive during collection uses a target drive to make an exact duplicate (bit stream copy) of the original hard drive. This process is normally completed using hardware referred to as hard drive cloning equipment.

A primary difference between imaging and cloning is that the files in a forensic image can’t be accessed by common litigation support applications or electronic discovery software (such as LAW PreDiscovery, Discovery Cracker, and IPRO) or litigation support databases (such as Concordance, Summation, and Ringtail).

Forensic images are designed to be accessed by computer forensic software (such as Encase, FTK, Winhex, and ProDiscover). If you need to access the original custodian information in a forensic image without using computer forensic software, then you will need to have it restored to a hard drive in the original native format. You could also look into purchasing the Mount Image Pro software (http://www.mountimage.com/purchase-forensic-software.php) that will allow you to view the contents of a forensic image without converting or restoring it to the native format.

Cost and Redundancy Considerations
If you want to compare the cost of different computer examiners, keep in mind that the lowest hourly rate doesn’t mean the lowest total price. An examiner using hardware-based cloning equipment can usually complete the process faster than using software to create a forensic image.

If you rely on a single forensic image or hard drive clone and find out later that there was a problem, you probably won’t have a second chance to preserve and collect the information. It’s well worth the additional cost to create a 2nd backup of the source hard drive. When comparing examiner rates, you will need to compare the hourly and per drive costs to determine the total price. Also, consider what you will be charged to restore a forensic image to a new drive, because this may have to be completed before the custodian files can be processed.

Secretly Copying Files To An External USB Drive

Copying corporate data and using it at a competing company (intellectual property/corporate asset theft) is a common and serious concern for companies and their legal counsel. When employees leave companies, there are often questions about the security of the information they previously accessed. Will they use the contacts, forms, or product details as a competitive advantage in their new job?

I had previously written about how to use the file activity records located in the index.dat file to identify when files were accessed. This can help determine if files were copied from a corporate file server. I want to expand on a couple of additional artifacts that can be used and then provide an illustration. There are three primary artifacts that can be used to help determine if someone accesses and copies specific files using an external drive, CD/DVD, flash device, or other storage media.

1) USBStor Registry Entry – Microsoft Windows uses its registry to track information about the computer’s users, operating system, hardware, applications, security, and other relevant information. When USB devices are plugged into a computer, several key artifacts are captured including the make, model, serial number (if available), and when the device was plugged in.

2) Index.dat Access Record – Microsoft Windows uses the index.dat file to track website activity in Internet Explorer. It also contains when and from where files were accessed. We often have to recover deleted or purged activity using programs like NetAnalysis to do a thorough analysis. NetAnalysis can often recover hundreds of thousands of records that are no longer available in the index.dat files on the system.

3) Link File (.lnk shortcut) – Shortcuts can be created by a user and are commonly stored on the desktop. Microsoft Windows also automatically creates shortcuts for files that are accessed in .lnk files. These files store a wealth of information about the source document, including the path, date and time created, written, last accessed, size, volume serial, and several others. This information is encoded and requires special software to display it in a format that is useful.

4) “File Sniper” - Use a product like Harvester from Pinpoint Labs to create a hash list of the suspect files and scan all locations where the files could be in use. It isn’t uncommon for a computer forensic examiner to be asked if there is a way to create a list of files from a corporate network or employees system and check if they are in use by a competitor.

By using the above artifacts, it is possible to determine that files located on a company server or client machine were copied or accessed after a specific date and time. Note that this doesn’t provide the contents of the file and a thorough review would be necessary to make sure it is the same file. However, if the file name and other relevant metadata is a match, it does appear suspicious and may be enough to construct a solid argument that the employee did copy or burn files, access the contents, or used the information. This may lead to criminal and civil charges around possibly benefiting a future employer or a new company that the employee decided to start.

USB Artifacts Illustration (Download PDF here)

Recovering Files From Unallocated Space

Recovering data from a hard drive is one of the most common tasks during a computer investigation. Here are a few of the artifacts which computer investigators may retrieve from unallocated (free) space to assist in a case:

* MS Office documents
* Acrobat files (.pdf)
* Email messages and attachments
* Images in various formats
* Internet history (pages visited, searches)
* Registry files (current and past)
* File access records (when and where files were opened)
* Pre-fetch files (when a specific program was ran)

Many cases revolve around correspondence, work products, whether or not files were stolen or manipulated, and to what length the suspect went to cover up his or her activities. A common misconception among attorneys and litigation support professionals is that all relevant data from a computer hard drive is recovered during electronic discovery processing. The truth is that off-the-shelf electronic discovery software doesn’t index or search data that was deleted and resides in unallocated space. A considerable amount of valuable information is available on computer hard drives, but it resides in an area of the hard drive that may not have been collected from or was not searched during a typical electronic discovery project.

I don’t believe that every project warrants a complete computer investigation. I just want to clarify that if the computers of certain individuals involved in a lawsuit require a more thorough analysis, then a forensic image or hard drive clone is required. In this case, a computer forensic investigator with the skills and appropriate software tools needs to be hired to search deleted items which aren’t typically reviewed during the electronic discovery processing phase.

What is unallocated space? I have provided an illustration that helps show the different states for the physical area of a file, before it was deleted, and then once it is deleted, the different stages of retrieval possible from unallocated space.

UNALLOCATED SPACE ILLUSTRATION (Download PDF here)

Searching for Buried Treasure

Searching and identifying relevant content is a common process for both electronic discovery and computer forensic investigations. But some people don’t realize the challenges associated with indexing hundreds, or even thousands, of different file types and data structures. Mapping the data landscape may not immediately indicate where the textual “treasure” is located.  Twenty years ago, full text searching was pretty simple. We usually had transcripts, and eventually optical character recognition (OCR) that was pretty straightforward to use (except for the less-than-perfect OCR results).

Today electronic based discovery requires our full text search engines to be able to extract the desired text from a wide variety of different file types, email formats, or the contents of the unallocated space on a hard drive. A common process mistake is assuming that all files are searchable.  You hope to locate the relevant data by simply indexing the contents of a hard drive, DVD, or CD and performing a quick search on relevant keywords. Although sometimes it is this simple, there are several common exceptions that will prevent a complete search:

  1. Encrypted and password protected files
  2. Embedded files, sometimes at multiple levels
  3. Archives (ZIP, RAR. and other compressed formats)  and mail stores
  4. Deleted files (some fully recoverable, and some with only minimal artifacts)

Both computer forensics and electronic discovery applications rely on full text search engines to locate relevant evidence. However, the common exceptions need to be handled to ensure that the content is available to the full text search software. I’ve been a fan of dtSearch for many years because it handles large file collections of up to several terabytes, has extensive file type support, and great customer service. dtSearch is also integrated into several popular litigation support and computer forensic applications.

Metadata Analysis – “Fabricated” Documents

One of the common requests we receive is to help a client determine when a document was created, or if it existed at a specific date and time, and when it was last modified. For example, an employment dispute may involve one of the following circumstances:

  1. A memo was handed to an employee during a meeting but the employee denies s/he received the document. The document is presented but it is believed to have been created after the fact. Could the document have existed at the time of the meeting?
  2. An employee produces a document that s/he claims was received from the manager, but management denies the allegations. Did the employee create the file? Can metadata provide any answers?
  3. Bob, the sales manager for Acme Widgets Inc., was working for a competitor during his employment. How long did this go on? What does the metadata of the recovered files tell us? Can it help us track down files he potentially stole from the company?

Here are a few facts that should help to clear up many similar questions:

  1. All metadata and timestamps can be altered. Don’t base your case on the ‘Date Created’ field of a Microsoft Word document alone. Free utilities can be downloaded that can alter this and other metadata fields.
  2. If metadata was altered, it may conflict with other metadata or timestamps within the file, and such discrepancies could raise a strong suspicion.
  3. Analysis of other areas of the computer that could support or deny a claim is often required. For example, in Microsoft Windows, the index.dat files contain records of when the user opens a document. Recovering and analyzing the file access activity in the index.dat could help support claims or metadata (file access dates/times) that suggests the file was created or revised at a specific date and time.

Feel free to download the Pinpoint Labs MetaViewer or MetaDiscover software and review the ‘No-Nonsense Metadata’ white paper. If you need assistance with an investigation, please email examiner@pinpointlabs.com.

When is a Computer Forensic Investigation Needed? (2 of 2)

 In my previous post, I identified several primary differences between computer forensic investigations and electronic discovery processing. Next, I’d like to identify some general case categories and tasks that involve a computer forensic investigator.

Case Categories:

·    Employment disputes

·    Misuse of company computer involving pornography, gambling, blackmail and fraud

·    Embezzlement

·    Breach of contract

·    Software licensing

·    Intellectual property theft

·    Insurance fraud

·    Sexual harassment

Typical Tasks:

·    Recovering deleted files and emails

·    Internet activity analysis

·    Cell phone and smart phone analysis

·    Metadata analysis

·    Providing results, recommendations, and action plan

Even if a civil or criminal investigation doesn’t fall within these case categories, you may still need to involve a computer forensic investigator. Why? Because it is no secret that computers are used as a primary source for communication, work product, and research. The listed tasks could apply to investigating almost any suspect involved in a civil or criminal law suit.

 

When is a Computer Forensic Investigation Needed? (1 of 2)

Electronic discovery and computer forensic investigations often go hand in hand. The challenge for many in the legal community is how to identify what ESI (Electronically Stored Information) requires more than typical electronic discovery processing.

First, computer investigations are technically electronic discovery, and the line between the two disciplines will continue to blur. Several key differences are:

  1. The qualifications and skills required by the individual performing collections and computer investigations
  2. Computer investigations typically recover and analyze areas of the suspect media unavailable through popular electronic discovery software
  3. Electronic discovery processing often involves a significantly larger amount of data
  4. Most computer forensic applications do not create load files or produce tiffs or electronic bates numbers
  5. Computer forensic investigations often require extensive detailed reports of the processes and findings, as well as appropriate affidavits, before the work can begin and then must describe the findings

In my next post, I will discuss the types of cases and suspect information that differentiate computer forensic investigations and typical electronic discovery processing.