Tips & Tricks

ESI Self Collection Drives and Kits

Electronically Stored Information (ESI) self collection drives and kits have become popular in the last few years because they offer an affordable means of collecting electronic data for a legal matter without the need to hire in expensive forensic experts. This article covers what should be included in an ESI collection drive kit as well as some tips to ensure the collections are completed properly.

ESI Self Collection Tips and Resources

Here are a few tips to help ensure a successful ESI self collection:

1) IT Assistance –Have someone on hand with knowledge of the products, how they work and how to overcome any issues encountered. This could be an individual with the legal department, corporate IT, a forensic computer examiner, or a competent vendor.

2) Hard Drives – If the ESI self collection drive is being connected directly to a custodian PC or server, take a look at the 2.5 inch enclosed external hard drives that are powered from a USB port. If collecting data across a network, a Network Attached Storage (NAS) device should be considered.

3) Software – Require these key features from active file collection software (like SafeCopy 2 or Harvester from Pinpoint Labs):

  1. Preserves file timestamps and metadata – Using Windows Explorer to “drag and drop” files does not preserve critical metadata or confirm that the contents were copied exactly.
  2. Creates electronic chain of custody – Report(s) containing details of what happened, source and destination hash values, MAC times, where files were copied from/to and results are the audit trail required for defensibility.
  3. Hash verifies files – Files hashes of the source and destination are verifiable proof of a valid copy.
  4. No local installation – Ideally the software should run from an external device or from the network without installing anything on the host computer.
  5. Automated job tickets – Human involvement opens the risk of human error. Products like Harvester from Pinpoint Labs include features to automate the process with predefined work tickets.
  6. Filtering (Optional) – Filtering at the point of collection reduces the cost of processing the collected data. Some of the filters that can be applied at the point of collection are file types/headers, date ranges, folder names, key words, deduplication, and deNISTing.

4) Evidence Bags – Tamper-proof evidence bags provide additional security and defensibility. The following antistatic bags from Packaging Horizons (http://www.alertsecurityproducts.com/antistaticsecuritybag/index.shtml) are designed for hard drives.

5) Paper Chain of Custody –Most firms are familiar with transferring evidence and have forms already created. Include this form with the drives used in an ESI collection kit.

Larger Collection Alternatives

Putting together ESI self collection kits can save money and eliminate delay and additional costs. Harvester from Pinpoint Labs is offered at a flat rate (you own it) or per collection.

Unease with ESI Self Collections

There has been some concern over custodian self collections. Relying on untrained employees to find, and then properly collect the relevant data may present a defensibility problem.  This problem is overcome easily with automation features of data collection software. These features minimize the number of human errors that can occur by minimizing the amount of employee interaction with the collection process.

What you should know

ESI self collections and kits are here to stay. They significantly reduce discovery costs, perform targeted collections, and are the modern equivalent of boxing up relevant files. However, it is critical to ensure that the process is defensible by preserving the original content, with the correct process, products, and procedures. Further assistance designing an ESI self collection kit for specific project needs, contact one of the project leaders at Pinpoint Labs.

What is a Hash Value?

A hash value is a result of a calculation (hash algorithm) that can be performed on a string of text, electronic file or entire hard drives contents. The result is also referred to as a checksum, hash code or hashes. Hash values are used to identify and filter duplicate files (i.e. email, attachments, and loose files) from an ESI collection or verify that a forensic image or clone was captured successfully.

Each hashing algorithm uses a specific number of bytes to store a “ thumbprint” of the contents. The following is a list of hash values for the same text file. Regardless of the amount of data feed into a specific hash algorithm or checksum it will return the same number of characters. For example, an MD5 hash uses 32 characters for the thumbprint whether it’s a single character in a text file or an entire hard drive.

HASH

MD5: 464668D58274A7840E264E8739884247

SHA-1: 4698215F643BECFF6C6F3D2BF447ACE0C067149E

SHA-256: F2ADD4D612E23C9B18B0166BBDE1DB839BFB8A376ED01E32FADB03A0D1B720C7

SHA-384:

2707F06FE57800134129D8E10BBE08E2FEB622B76537A7C4295802FBB94755BBEE814B101ED18CC2D0126BD66E5D77B6

SHA-512:

C526BC709E2C771F9EC039C25965C91EAA3451A8CB43651EA4CD813F338235F495D37891DD25FE456FE2A8CA89457629378BE63FB3A9A5AD54D9E11E4272D60C

RIPEMD-128: A868B98EAEC84891A7B7BA620EDDE621

TIGER: F31A22CEED5848E69316649D4BAFBE8F9274DED53E25C02D

PANAMA: 7E703B1798A26A0AF21ECD661CBADB9C72B419455814CA7B82E29EE0C03FA493

CHECKSUM

CRC16: 117C

CRC32: FA2D47D4

ADLER32: CF7D65FF

As you can see there are also various length hashes within a family (SHA-1, SHA-256 et.) The most common hash values are MD5, SHA-1 and SHA-256. The longer hash values require more time to calculate and are designed to reduce the probability of a collision.

What is a Hash Value

A few other ways that hash values are used:

-  Verify a downloaded file was created by the publisher (oppose to a virus infected version)

-   Identify and filter files on the NSRL/NIST list (“deNISTing”)

-   Locate known contraband (illegal images and videos)

Here are a few reasons why hash values are so widely used as a means to validate and compare content:

1)  Privileged Data – There would be obvious issues storing and providing multiple copies of the contents of a company’s files or entire hard drives data in a database to perform a byte comparison. Not to mention illegal images and videos (child pornography) would have to be stored and used in each system scan. These scenarios are unacceptable.

2)  Speed – Comparing an indexed hash value versus what could be billions or trillions of bytes or source data is much quicker. Optimized hash engines (Pinpoint Harvester) can compare thousands of hash values in a second.

3)  Security  – Hashing data is a one way trip. The original data can’t be recreated or reverse engineered from the hash value. This provides additional security that a person can’t determine the source data from the hash.

The argument that data sources could be different and have the same hash value has raised a lot of concern. There are countless threads related to this issue on the litigation support and computer forensic forums. The bottom line is the only way to do an exact comparison of the original data is to store it everywhere you need to deduplicate or verify the information, however, as mentioned about this isn’t a practical alternative.

More complex hashing functions have been introduced (SHA-256, SHA-512 etc.) which will further reduce the likely hood of a collision. It is also worth noting that even in those cases where scientists have created collisions it was a result of exploiting the weaknesses in a specific hash algorithm. The same alterations would not create a collision in a different hashing algorithm.

So, if you still aren’t satisfied with the incredibly remote possibility a collision could happen using a single hash value then the easiest way to implement an extra precaution is to take the time to have your processes calculate hash values from two separate algorithms (i.e. MD5/SHA256) for each item. Unfortunately, most EED applications and forensic imaging tools don’t support this option, especially  in a single pass.

What to Remember

Hash values are a reliable, fast, and a secure way to compare the contents of individual files and media. Whether it’s a single text file containing a phone number or five terabytes of data on a server, calculating hash values are an invaluable process for Deduplication and evidence verification in electronic discovery and computer forensics.

ESI (Electronically Stored Information) Software Challenges

A couple weeks ago, I outlined what computer forensics and electronic discovery have in common and how they differ. I’d like to expand on this topic by identifying some common obstacles encountered when using popular computer forensic software for typical electronic discovery projects.

A typical computer forensic case may involve:

  1. A small quantity of email and/or attachments
  2. Recovered files, internet history, and user activity
  3. Registry entries
  4. Pre-fetch files
  5. Portions of unallocated space

A typical electronic discovery project may involve:

  1. Processing dozens or hundreds of custodian mailstores that results in thousands of potentially relevant emails and/or attachments
  2. Indexing hundreds of gigabytes or multiple terabytes of data
  3. Hosting data online so multiple parties can easily review, identify, and produce files
  4. Converting relevant files to tiff, endorse, and build load files compatible with common litigation support applications
  5. Deduping emails, attachments, and files across dozens of custodians

Generally speaking, the primary obstacles encountered when using off-the-shelf computer forensic software for electronic discovery are:

  1. Inability to create load files from tagged emails, attachments, and other relevant data
  2. No support for tiffing, endorsing, and assigning docIDs
  3. Missing/incomplete links between email and attachments
  4. No clear way to produce carved or partial files recovered from unallocated space

If you anticipate reviewing a large ESI collection using one of the common litigation support review tools, make sure that your service provider can process and produce compatible output files for production sets. Don’t assume that all computer forensic examiners are equipped to handle large scale ESI projects.  On the other hand, not all EED service providers have the appropriate tools to complete a thorough computer investigation.

Recovering Deleted Email

It’s important to understand that deleted email is not recovered or indexed using common litigation support or electronic discovery software. These applications only process email that is still visible within the email software.

Some email recovery software can also fall short when restoring deleted email records. Why is that? Because they are designed to undelete email records that still have an entry in the mail store index. Unfortunately, many mail stores will remove those entries once the database is compressed. So many people believe that email cannot be recovered once the mail stores database has been compressed. However, this isn’t always the case.

Deleted email content may still be intact and recoverable. By using software tools designed to ‘carve’ email data, it is still possible to recover the original content. Using the following steps, email can often be recovered even after typical recovery tools fail.

1) Use Winhex, EnCase or other file recovery tools that can recover email fragments
2) Import recovered files (MBOX) through Aid4Mail into Paraben’s Email Examiner
3) Export email and attachments to msg, pst and other formats

Using the same approach to recover email as deleted files can often provide better results than doing a recovery on the individual mail store. As mentioned above, when performing recovery on Mozilla Thunderbird mail stores and others, many programs only recover what is still listed in the index files. If these files are missing, corrupted, or no longer contain the email record, you can try Zmeil from Zero Assumption Recovery
(http://www.z-a-recovery.com/zmeil-email-recovery.htm). Zmeil doesn’t rely on the mail store index; it parses the data files and is a great tool to use for additional verification of recovered email data. Zmeil works great as an inexpensive standalone email recovery tool.

Email communication is often a critical piece of the electronic discovery puzzle. Deleted email doesn’t get fully processed with common electronic discovery software. If you believe you may miss critical evidence because a custodian deleted important emails. then a specialized recovery process should be performed by someone with the appropriate training and knowledge of the process.

Secretly Copying Files To An External USB Drive

Copying corporate data and using it at a competing company (intellectual property/corporate asset theft) is a common and serious concern for companies and their legal counsel. When employees leave companies, there are often questions about the security of the information they previously accessed. Will they use the contacts, forms, or product details as a competitive advantage in their new job?

I had previously written about how to use the file activity records located in the index.dat file to identify when files were accessed. This can help determine if files were copied from a corporate file server. I want to expand on a couple of additional artifacts that can be used and then provide an illustration. There are three primary artifacts that can be used to help determine if someone accesses and copies specific files using an external drive, CD/DVD, flash device, or other storage media.

1) USBStor Registry Entry – Microsoft Windows uses its registry to track information about the computer’s users, operating system, hardware, applications, security, and other relevant information. When USB devices are plugged into a computer, several key artifacts are captured including the make, model, serial number (if available), and when the device was plugged in.

2) Index.dat Access Record – Microsoft Windows uses the index.dat file to track website activity in Internet Explorer. It also contains when and from where files were accessed. We often have to recover deleted or purged activity using programs like NetAnalysis to do a thorough analysis. NetAnalysis can often recover hundreds of thousands of records that are no longer available in the index.dat files on the system.

3) Link File (.lnk shortcut) – Shortcuts can be created by a user and are commonly stored on the desktop. Microsoft Windows also automatically creates shortcuts for files that are accessed in .lnk files. These files store a wealth of information about the source document, including the path, date and time created, written, last accessed, size, volume serial, and several others. This information is encoded and requires special software to display it in a format that is useful.

4) “File Sniper” - Use a product like Harvester from Pinpoint Labs to create a hash list of the suspect files and scan all locations where the files could be in use. It isn’t uncommon for a computer forensic examiner to be asked if there is a way to create a list of files from a corporate network or employees system and check if they are in use by a competitor.

By using the above artifacts, it is possible to determine that files located on a company server or client machine were copied or accessed after a specific date and time. Note that this doesn’t provide the contents of the file and a thorough review would be necessary to make sure it is the same file. However, if the file name and other relevant metadata is a match, it does appear suspicious and may be enough to construct a solid argument that the employee did copy or burn files, access the contents, or used the information. This may lead to criminal and civil charges around possibly benefiting a future employer or a new company that the employee decided to start.

USB Artifacts Illustration (Download PDF here)

What does ‘CCE’ Mean?

The CCE (Certified Computer Examiner) is a certification obtained through ‘The International Society of Forensic Computer Examiners’ (ISFCE). I’ve noticed that many CCE training facilities are geared towards criminal investigations so they don’t necessarily address civil litigation processes and ESI (Electronically Stored Information) requirements. This is because the CCE was originally designed for law enforcement and criminal cases involve child pornography, narcotics, stolen property, counterfeiting, and homicide, just to name a few.

Many CCE’s do work with law firms and understand their needs, but it’s because they gained this from their own experience or have a litigation support background. The CCE is a well respected certification; however, don’t assume that all CCE’s understand civil litigation, ESI procedures, electronic discovery and load files.

Many litigation support professionals have attended training classes offered through the MD5 Group and Jason Park (President). Jason is a veteran litigation support professional and he does an excellent job covering how computer forensics relates to civil litigation. If you or someone on your staff is looking for a strong computer forensics certification and you want a balanced approach that covers civil and criminal investigations give Jason a call.


Encrypted Hard Drive Dangers

You have requested a hard drive clone or image and discover that the contents cannot be culled or reviewed. One reason may be hard drive encryption. Encryption involves ”scrambling” the contents of a file or hard drive so that they cannot be viewed without the appropriate key or password.

To secure data, companies and individuals are increasingly encrypting the contents of their hard drives or USB flash drives. Manufacturers are also building hard drives that automatically encrypt the contents. BitLocker encryption, for example, is available in Windows Vista. Hard drive encryption often requires a ”live” acquisition, which takes place when the system is running and the decrypted contents of the drive can be accessed and copied. Employing best practices, which handles hard drive encryption, is important and will increase in the months and years to come as encrypted hard drives become more common. Here are a few pointers:

  1. Ask the IT contact if any known encryption method is in place.
  2. Computer forensic examiners have tools, such as X-Ways Capture, which will detect several encryption methods.
  3. If drive encryption is identified, create a live image of the system. Creating a live image can take several times longer than a normal acquisition.

Encrypted hard drives pose a challenge and potential delays for both computer investigations and electronic discovery processing. Work with a vendor who is capable of handling encrypted hard drive collections.

Imaging Hard Drives – Will you get what you expect?

If you or a partnering service bureau need to be able to process or review your client’s files from an imaged hard drive, you may be in for a surprise. The results of an imaged hard drive are often stored in a forensic image format or what is referred to as an ”evidence file” container. Common evidence file formats include Encase, DD (RAW), SMART, AFF and Safeback, just to name a few.

These forensic image formats are designed to allow access to the files from computer forensic software. Most electronic discovery and litigation support applications are unable to access the file contents of an imaged drive that is stored as a forensic image. If you need to access the copied files, you have three options.

  1. Request a ”clone” of the source hard drive rather than a forensic image. A clone is created by copying source media to another drive in the same format.
  2. Ask that the forensic image be restored to a clone.
  3. Purchase ”Mount Image Pro” (http://www.mountimage.com/), which will allow you to view the contents of several popular forensic image formats.

It is important to talk to the company or individual performing the collection to ensure that the collected files can be accessed by those performing the electronic discovery processing and review.

Understanding File Timestamps

The terms, ‘file timestamps’ and ‘file metadata’ are often used interchangeably, however, they can have two completely different meanings. I trust the following will help clarify the differences.

1) There are two separate ‘timestamps’ for office documents and several other file types. The first set, is stored in the operating system (Windows, Linux, MacOS) and are different from those stored in the file.

2) The metadata stored in a file (Date Created, Date Last Saved etc.) may also be referred to as the files timestamps and confused with what’s stored by the operating system.

3) The two sets of dates are often very different because the operating system timestamps are easily altered through copying files and automated software processes (virus scanners, indexing). The timestamps in the file metadata are altered when files are saved or edited by the native application.

For example, if a custodian copies a file from their system to a network folder the created and last accessed times displayed in Microsoft Windows would be changed to the date and time of the copy. However, if you view the internal metadata (Date Created, Date Last Saved) in the document properties these values would remain unaltered. If you are looking for the most reliable created or last saved time for a document make sure you use the internal file metadata timestamps.

What are File Headers? (Signatures)

Many file types can be identified by using what’s known as a file header. A file header is a ‘signature’ placed at the beginning of a file, so the operating system and other software know what to do with the following contents.

Many electronic discovery applications will use the file header as a means to verify file types. The common fear is if a custodian changes a files extension or the file wasn’t named using an applications default naming convention, that file will be missed during electronic discovery processing. For example, if I create a Microsoft Word document and name it ‘myfile.001’, instead of ‘myfile.doc’ and then attempt to locate all Microsoft Word files at a later date, I would miss the file if I were looking for all files ending in ‘.doc’. There are specific file extensions associated with the native application.

During a computer forensic investigation file headers are extremely valuable because they allow us to locate the contents of deleted files, user activity logs, registry entries, and other relevant artifacts. For example, if I’m investigating a custodian hard drive for evidence that they were working for a competing company I would want to recover their file activity records. A large number of custodian activity records are often already purged or deleted. By scanning a computers hard drive for the signature related to user activity records we often recover relevant artifacts (file access records) up to several years after they were deleted.