Electronically Stored Information (ESI) self collection drives and kits have become popular in the last few years because they offer an affordable means of collecting electronic data for a legal matter without the need to hire in expensive forensic experts. This article covers what should be included in an ESI collection drive kit as well as some tips to ensure the collections are completed properly.
ESI Self Collection Tips and Resources
Here are a few tips to help ensure a successful ESI self collection:
1) IT Assistance –Have someone on hand with knowledge of the products, how they work and how to overcome any issues encountered. This could be an individual with the legal department, corporate IT, a forensic computer examiner, or a competent vendor.
2) Hard Drives – If the ESI self collection drive is being connected directly to a custodian PC or server, take a look at the 2.5 inch enclosed external hard drives that are powered from a USB port. If collecting data across a network, a Network Attached Storage (NAS) device should be considered.
3) Software – Require these key features from active file collection software (like SafeCopy 2 or Harvester from Pinpoint Labs):
4) Evidence Bags – Tamper-proof evidence bags provide additional security and defensibility. The following antistatic bags from Packaging Horizons (http://www.alertsecurityproducts.com/antistaticsecuritybag/index.shtml) are designed for hard drives.
5) Paper Chain of Custody –Most firms are familiar with transferring evidence and have forms already created. Include this form with the drives used in an ESI collection kit.
Larger Collection Alternatives
Putting together ESI self collection kits can save money and eliminate delay and additional costs. Harvester from Pinpoint Labs is offered at a flat rate (you own it) or per collection.
Unease with ESI Self Collections
There has been some concern over custodian self collections. Relying on untrained employees to find, and then properly collect the relevant data may present a defensibility problem. This problem is overcome easily with automation features of data collection software. These features minimize the number of human errors that can occur by minimizing the amount of employee interaction with the collection process.
What you should know
ESI self collections and kits are here to stay. They significantly reduce discovery costs, perform targeted collections, and are the modern equivalent of boxing up relevant files. However, it is critical to ensure that the process is defensible by preserving the original content, with the correct process, products, and procedures. Further assistance designing an ESI self collection kit for specific project needs, contact one of the project leaders at Pinpoint Labs.
E-Discovery Collections also known as Electronic Evidence Discovery (EED) or Electronic Data Discovery (EDD) can include a review of all the data stored on employee desktop or laptop computers, company servers, camera cards, cell phones, smart phones, GPS devices, digital video recorders, digital answering systems, thumb drives, RAID arrays and any other form of electronic media capable of storing data.
Types of Electronic Discovery Content
Employee Work Product – Computer Files are by far the most common arrangement for a forensic e-discovery collection. Files (also referred to as loose files or active files) are similar to their paper equivalent. They can be copied, moved, and even “shredded”. Work product could include sales reports, QA reports, product or service information, client lists, engineering designs and much more.
Employee Correspondence - Email has practically replaced letters and interoffice memos. A forensic e-discovery collection of correspondence is often a critical piece and can often contain the “smoking gun”. What someone said, to whom, and when are some of the first questions asked in a legal matter. Since emails are a form of documented communication, they comprise highly sought-after data when it comes to legal matters. Emails themselves may be contained in databases, files, or unallocated space.
Customer Relations and Accounting Data – Customer lists, internal notes, and financial records are also a critical component in forensic e-discovery collection or computer forensic investigations. Properly collecting the live database files that store this information can be a challenge. Single entries in a database often require export to another format in order to be useful or even readable by humans. Most databases include this ability.
User Logs – Collecting user logs isn’t always as relevant in an e-discovery collection/review as it is in computer forensics analysis, however, they can be and are worth mentioning. User logs will contain entries about the activities performed on a computer and different user accounts. Attorneys may want to know when emails were sent or received between accounts in case the emails were deleted. Log entries may require conversion into human-readable form before they can be processed.
Raw or Unallocated Data – Unless a forensic image of the source data has been requested a forensically sound e-discovery collection will focus on “active” files. However, it is helpful to understand the difference between “unallocated” and “active” data. Raw or unallocated data is data that resides in segments of the storage media (hard drive, camera card, etc) that are not being used by files. This data can contain all or part of files that were once referenced in the file allocation table but were subsequently deleted. Much of this data can even survive a reformatting of the disk itself. Since this data can come from any number of sources that had once been active on the drive, it can make or break a case where it is suspected that deletions may have occurred.
Tools for Forensic E-Discovery Collection
With the exception of unallocated space, tools such as One Click Collect Harvester from Pinpoint Labs have the ability to collect loose files, emails and whole databases with the added benefits of being able to specify key words, date ranges, domains and email addresses among other very useful filters.
Tools for collecting the unallocated space on a drive usually require an experienced forensic examiner in order to get useful interpretations of the data collected. In cases where this is necessary, it is recommended that a certified computer examiner be hired for the collection and analysis of the data.
Email Collection refers to the identification and isolation of electronic mail (email) messages that pertain to a specific legal matter in civil litigation cases.
What gets collected
What is actually being collected during email collections can be one of two things:
1. Files representing the contents of the transmitted email messages themselves (usually in MSG, HTML, EML or RTF format).
2. Container (or store) files that hold the contents and data associated with multiple email messages, usually all of the emails for a specific custodian.
Whether files for individual emails or container files are collected depend mostly on the type of email system being used by the custodian. If the custodian is a user of Microsoft Outlook for instance, then either container files or individual email files may be produced. If the custodian is a user of a webmail service, such as Gmail or Yahoo!, then it is likely only individual email files can be collected.
How it’s done
Software such as Harvester from Pinpoint Labs can search the PST store files produced by Microsoft Outlook and Exchange email systems for individual emails containing specific criteria, such as who sent the email, who received it, when these actions occurred and whether the subject, body, or attachments contain specified key words. It can also produce the result to either individual email files or whole, reconstructed container files, known as PST regeneration.
With other email systems, either the whole container file can be copied and sorted through manually, or the individual emails can be manually identified and exported as individual email files.
What to remember
As with any data being collected, the two concepts to remember are preservation and validation.
Preservation refers to keeping the metadata about the individual messages as well as the metadata contained within each of the messages intact so as to maintain their admissibility. PST regeneration is especially desirable in this case because it maintains both the email data and the data that linked it to contact data, task list data and other data integrated with these types of email messages.
Validation refers to the policy of insuring, either by hash value comparison (analogous to fingerprints for data) or bit-wise comparison, that the contents of the copy are the same as the contents of the original.
Software such as Harvester and SafeCopy 2, both from Pinpoint Labs, have built-in preservation and validation systems to certify that both of these conditions are always met.
PST Regeneration is used during electronic discovery processing or even during an ESI collection. A Personal Folder File (PST) is a container file created by Microsoft Outlook which stores email messages and other data (i.e. contacts, calendar entries, tasks, to do list etc.)
How it’s done
Regenerating PSTs refers to the identification, isolation and often deduplication of electronic mail (email) messages that pertain to a specific legal matter in civil litigation cases. The filtered email messages are copied to a new “regenerated” PST file. The resulting PST can be considerably smaller than the original and results in the following benefits:
1) Quicker attorney review
2) Electronic Discovery processing and hosting cost reduction
3) Significantly smaller ESI collection
Practical application
PST regeneration is commonly used when there are dozens of archive (backup) PST files that contain many duplicate messages. It is a common practice for companies to set up Microsoft Outlook or Exchange servers to create daily, weekly or monthly PST backups of employee email messages.
The result is potentially dozens of employee backup PST files which contain duplicate messages. Why? Each backup will contain many of the same messages as the last. Only new emails sent or received (that have not been deleted) since the last backup will be considered “unique” to each PST. Regenerating PSTs with only one copy of each email (deduplication) significantly reduces the number of messages and the size of the PST data to be processed or produced.
Maintaining defensibility
Significant cost reductions related to electronic discovery processing and hosting are gained by deduping, performing key word, date range, and email/domain filtering on the emails in PST files. However, it is critical to use an application that is designed to regenerate PSTs in a defensible manner and maintains the chain of custody.
Software such as Harvester from Pinpoint Labs (designed by Certified Computer Examiners (CCE’s)) can regenerate PST files at the point of collection or during in-house processing. Harvester also creates an extensive verification log (chain of custody) for all copied and duplicate messages.
What to remember
Creating deduped, targeted PSTs is common practice in the electronic discovery lifecycle because it saves clients a considerable amount of money as well as reducing attorney review time. PST regeneration may be performed onsite (during an ESI collection) or in-house to cull down responsive data.
ESI (Electronically Stored Information) is the general term for all of the data stored on the hard drives, camera cards, cell phones, GPS devices, digital video recorders, digital answering systems, thumb drives, RAID arrays and any other form of electronic media capable of storing data.
Types of Electronically Stored Information:
Files – Files are by far the most common arrangement for ESI data. Files (also referred to as loose files or active files) can be thought of as data containers similar to files in the real world. They can be copied, moved, and distributed freely on a variety of different media from DVDs to hard disk drives.
Emails - Emails are messages sent from user to another. In their raw form, they are simply a stream of data that contains everything needed to get the message from one user to another user. Since emails are a form of documented communication, they comprise highly sought-after data when it comes to legal matters. Emails themselves may be contained in databases, files, or unallocated space.
Database Entries - Database entries is data stored in a database. This type of data is usually context-specific and may be information pertaining to financial records, personnel entries or other data that is interrelated. Single entries in a database require export to another format in order to be useful or even readable by humans. Most databases include this ability.
Log Entries – Log entries are lines in files or entries in databases that contain information about activity on a particular computer. The more commonly useful log entries pertain to users logging into and out of a computer, accessing specific internet sites, the sending or receiving of email or other messages and the moving, copying or accessing of files on the computer. Log entries may require conversion into human-readable form before they can be processed.
Raw or Unallocated Data - Raw or unallocated data is data that resides in segments of the storage media (hard drive, camera card, etc) that are not being used by files. This data can contain all or part of files that were once referenced in the file allocation table but were subsequently deleted. It can also contain deleted internet history, old information from the computer’s RAM (Random Access Memory) or even old configuration data about the computer itself. Much of this data can even survive a reformatting of the disk itself. Since this data can come from any number of sources that had once been active on the drive, it can make or break a case where it is suspected that deletions may have occurred.
Tools for Collecting ESI
With the exception of unallocated space, tools such as One Click Collect Harvester from Pinpoint Labs have the ability to collect loose files, emails and whole databases with the added benefits of being able to specify key words, date ranges, domains and email addresses among other very useful filters.
Tools for collecting the unallocated space on a drive usually require an experienced forensic examiner in order to get useful interpretations of the data collected. In cases where this is necessary, it is recommended that a certified examiner be hired for the collection and analysis of the data.
Active File Collection refers to the collection of files that are active (not deleted) and pertain to a legal matter or legal hold. In most civil litigation cases, extensive forensic investigations that look at deleted files are unnecessary or too expensive. Thus, most ESI collections are active file collections and/or email collections.
How active file collections are performed
Active files are those that can be seen by normal users. They may include hidden or system files, but they do not include the computer’s Random Access Memory or any deleted files. Files in the Windows Recycle Bin are considered active files and are subject to collection using active file collection methods.
The first step is defining which files need to be collected. This definition can range from “everything” to files of a few specific types containing only certain key words. Since the cost of processing is usually related to the size of the data being processed, it is generally more economical to be as specific as possible without leaving out relevant files.
Once the files have been identified, it is mostly a matter of copying them in a manner that both avoids spoliation and provides a means of certifying the contents of the copies.
What to remember
The one thing to remember about active file collections is that they can be a potential minefield of spoliation. To avoid this, use software that is designed to preserve the metadata, the timestamps, and the data within the copied files. Some products, such as SafeCopy 2 from Pinpoint Labs are designed specifically for this purpose. Others, like Harvester, also from Pinpoint Labs, offer this feature as well as the ability to cull data by key word search and also support deduplication, email, and deNISTing.
The most important aspects of active file collections are preservation and validation.
Preservation refers to the preservation of the file data, its timestamps (when the file was created, last modified, and last accessed), and any other metadata contained within the file. If any of this data is compromised, the usefulness and admissibility of the file comes into question.
Validation refers to the ability to certify that the contents of the copy are the same as the contents of the original. This is usually done using a hash (analogous to a fingerprint of the files data). It may also be done using a bitwise comparison of the data in both the file and the copy, but since this method requires the same amount of storage as the files themselves and offers no means of independent verification, it is not in common use.
A couple weeks ago, I outlined what computer forensics and electronic discovery have in common and how they differ. I’d like to expand on this topic by identifying some common obstacles encountered when using popular computer forensic software for typical electronic discovery projects.
A typical computer forensic case may involve:
A typical electronic discovery project may involve:
Generally speaking, the primary obstacles encountered when using off-the-shelf computer forensic software for electronic discovery are:
If you anticipate reviewing a large ESI collection using one of the common litigation support review tools, make sure that your service provider can process and produce compatible output files for production sets. Don’t assume that all computer forensic examiners are equipped to handle large scale ESI projects. On the other hand, not all EED service providers have the appropriate tools to complete a thorough computer investigation.
Each day, corporate IT managers, computer forensic examiners, and litigation support professionals are tasked with performing ESI collections for relevant files which reside in file shares, on client systems, and other popular data sources. The content may include Microsoft Exchange mailboxes, departmental data, individual custodian files, internet logs, telephone logs, or other critical corporate content.
Over 4 years ago, Pinpoint Labs released SafeCopy version 2.0 (SafeCopy 2) which alleviated several common problems encountered when using alternative copy utilities to collect client files. Here are a few of those problems that the SafeCopy 2 upgrade addressed:
In September 2009, Pinpoint Labs released One Click Collect – Harvester (Portable/Server), which was a new product that included the proven SafeCopy 2 engine. The Pinpoint Harvester 2.0 ESI collection software includes:
![]() |
||
| Great for Legal Holds | ![]() |
![]() |
| Preserve Metadata and Time Stamps | ![]() |
![]() |
| Filter by Extension and Date Range | ![]() |
![]() |
| Select from multiple data sources | ![]() |
![]() |
| Compatible with all electronic and litigation platforms | ![]() |
![]() |
| 100% File copy verification | ![]() |
![]() |
| Extensive chain of custody report | ![]() |
![]() |
| Process file lists | ![]() |
![]() |
| Resume easily | ![]() |
![]() |
| Supports path lengths greater than 255 characters | ![]() |
![]() |
| Transfer licenses quickly to another location | ![]() |
![]() |
| Create and deploy remote collections | ![]() |
|
| Keyword Filter MS Outlook PSTs | ![]() |
|
| Keyword Filter Loose Files | ![]() |
|
| Keyword Filter Attachments | ![]() |
|
| Keyword Filter Archives | ![]() |
|
| Dedupe and Filter Multiple PSTs | ![]() |
|
| Regenerate New PSTs | ![]() |
|
| Export Emails to 8 Different Message Formats | ![]() |
|
| Remove System Files Listed in NSRL (deNISTing) | ![]() |
|
| Filter by Header Signature | ![]() |
|
| Create Portable and Automated Collection Jobs | ![]() |
|
| Preconfigured Work Orders | ![]() |
|
| Can Be Used for In-House, Production-Level Culling (deNIST/dedupe) | ![]() |
|
| Scriptable Profiles and Collection Jobs | ![]() |
|
| Easily Save and Reuse Job Settings | ![]() |
|
![]() |
||
Pinpoint Labs has a proven record of developing defensible, affordable ESI collection software. Many Fortune 500 companies, government agencies, and computer forensic professionals rely on SafeCopy 2 and One Click Collect – Harvester every day.
‘Imaging a hard drive’ is a phrase that is commonly used for preserving the contents of a custodian hard drive or server. It can also be used to describe when a custodian hard drive is cloned. It is worth taking some time to understand the differences and the advantages and disadvantages of each process.
Forensic Imaging
A forensic image or evidence file container (such as EnCase, DD, Expert Witness, and SMART) is often created using software that is running on a computer forensic examiner’s laptop or lab computer. The examiner will connect the drive to a write blocker and use software to create a forensic image of the entire contents of the source drive on a separate target hard drive. The process may also capture multiple forensic images to a single hard drive.
Hard Drive Cloning
Cloning a hard drive during collection uses a target drive to make an exact duplicate (bit stream copy) of the original hard drive. This process is normally completed using hardware referred to as hard drive cloning equipment.
A primary difference between imaging and cloning is that the files in a forensic image can’t be accessed by common litigation support applications or electronic discovery software (such as LAW PreDiscovery, Discovery Cracker, and IPRO) or litigation support databases (such as Concordance, Summation, and Ringtail).
Forensic images are designed to be accessed by computer forensic software (such as Encase, FTK, Winhex, and ProDiscover). If you need to access the original custodian information in a forensic image without using computer forensic software, then you will need to have it restored to a hard drive in the original native format. You could also look into purchasing the Mount Image Pro software (http://www.mountimage.com/purchase-forensic-software.php) that will allow you to view the contents of a forensic image without converting or restoring it to the native format.
Cost and Redundancy Considerations
If you want to compare the cost of different computer examiners, keep in mind that the lowest hourly rate doesn’t mean the lowest total price. An examiner using hardware-based cloning equipment can usually complete the process faster than using software to create a forensic image.
If you rely on a single forensic image or hard drive clone and find out later that there was a problem, you probably won’t have a second chance to preserve and collect the information. It’s well worth the additional cost to create a 2nd backup of the source hard drive. When comparing examiner rates, you will need to compare the hourly and per drive costs to determine the total price. Also, consider what you will be charged to restore a forensic image to a new drive, because this may have to be completed before the custodian files can be processed.
Changes are underway in how electronically stored information (ESI) is processed and reviewed. These changes are due to the huge size of repositories – hundreds of gigabytes or multiple terabyte sizes – identified for collection and processing. Corporations and their legal counsel realize that it may not be feasible or affordable to collect and produce all the information identified in larger cases.
Several new software applications have been introduced that offer many of the same features included in popular electronic discovery software (indexing, file and email “de-duplication”, online review, searching and culling). The difference is they are designed to run as an “appliance” application on a corporate network.
What does this mean? It means collections that would have been sent out to a processing vendor are now being deduped, filtered, and produced internally. The culled native files may still be sent out for tiffing, endorsing, and building load files. But it is a significantly reduced subset.
However, this is not to say that outsourcing will cease. But in the years ahead, there will probably be a reduction in the amount of EED/ESI processing that is outsourced. Additionally, once a corporation has invested in the “appliance” software and training to collect, filter and produce their collections, they will probably use it on smaller cases as well that were previously outsourced.
Systems that require a computer forensic investigation, or need to be collected by a third party, will still require individuals with the appropriate skills and credentials’ to image or clone media, and then analyze the contents as we do now. However, an increasing amount of electronic discovery processing will be performed at the client site with automated assistance to save time, money, and handle larger projects.