1.866 LUCIDATA (582.4328) | info@lucidatainc.com
Leave nothing uncovered. Leave nothing to chance.

Search millions of documents, find facts fast. Because felicitous access!

drywall

On motion by defendants to compel plaintiffs’ response to interrogatories…

The judge says that the electronic document database makes facts easy to find; this is not like the old days, when businesses communicated with smoke signals:Clipboard ImageRemember those days? The days when businesses communicated with smoke signals?

drywall2

Judge Michael Baylson says that in this post-smoke-signal era counsel can “search a large collection of documents for specific facts, without significant burden… the actual searching is not expensive.” Judge Baylson goes on to get rather specific regarding the types of “facts” which are easily searched:Clipboard Image (1)I agree that it is easy to search ESI databases. To search. Finding information is more difficult. Compiling the found information to develop a record of facts is even more difficult. The example provided by the judge, where we use our ESI database to figure out when trade shows took place and who attended those shows is not (necessarily) “easy” and “inexpensive.”

Of note:

1. The parties are three months into an eight month discovery phase. Plaintiff’s offered to answer the interrogatories within 60 days of the close of this phase of discovery. Defendants rejected the offer

2. The judge orders the plaintiffs to answer the interrogatories but he limits the scope by acknowledging that the answers will not be complete and only requires disclosure of facts “currently available.”

Lesson learned:

The judge’s decision is surprising to this blogger because of the rationale used. I just do not agree that “facts” are that easy to identify in a database of millions of documents. However, plaintiff loses this one, so what can we do better next time? The memo in opposition is missing a detailed plan describing how document review is being conducted, what analysis techniques are going to be applied, what the schedule is, how much it will cost, etc. Now, I am not sure this would alter the result in this case because the result here is rather toothless. However, the judge makes the point that early answers to contention questions should advance the goals of the Federal Rules of Civil Procedure.Clipboard Image (2)The opposition to the motion should specifically state why early answers do not make determination just, speedy, and inexpensive. In other words, state specifically why early answers will make determination less just, less speedy, and more expensive. The way you do this is by detailing your document review plan.

link to RECAP docket

Review too fast? Waive privilege. Plus, where is the priv log?

Attorney reviewed documents so fast that the judge didn’t believe he conducted a privilege review. But note that the judge, during the review-speed analysis, conflated pages with documents and says that 10 seconds is too fast to review a document. However, the calculus results in 10 seconds per PAGE which you might find more reasonable.

 

Attorney reviewed documents so fast that the judge didn't believe he conducted a privilege review. But note that the judge, during the review-speed analysis, conflated pages with documents and says that 10 seconds is too fast to review a document. However, the calculus results in 10 seconds per PAGE which you might find more reasonable.     FT v JP Morgan-900x2245  A party placing itself in a time vise does not get the same leeway as a pressured but blameless litigant.

A party placing itself in a time vise does not get the same leeway as a pressured but blameless litigant.

Your Data and Heartbleed

With the news about Heartbleed making the rounds last week, we at LuciData would like to take this opportunity to assure our customers that no data was compromised due to Heartbleed.  While we have multiple client-facing services, they are not secured with any of the vulnerable versions of OpenSSL.

We would also like to take this time to note a few things that anyone can do to maintain their security online.  The first is to maintain strong passwords, though in this case even a strong password may have been revealed by the Heartbleed vulnerability.  Another important key to online security is preventing password re-use.  If you use that really strong password on every website, if one of those sites has a security flaw you could have problems with other sites.

To mitigate this potential for all of your online accounts to be compromised, we recommend using a password manager.  Most password managers can generate strong passwords for you, and make it easy to keep track of them all – you only need to remember the one “master” password.  Password managers also make it easier to change passwords, which should be done on a regular basis.

Google search for Lawyers (aka google hacks)

The two Google search hacks that I use most often are:

site: and filetype:

The site: hack limits the Google search to a specific site. For example, try

forms site:law.umn.edu

Note that you do not need a specific site, you can limit to a top level domain

forms site:.edu

The filetype: hack limits the results to… wait for it… a specified file type. Try

forms site:law.umn.edu filetype:pdf

 

But there are more, check these out:

Quotes. This is obvious, but enclose phrases in quotes.

Plus and Minus. Use + and – to include or exclude specific words for phrases.

Tilde. Use ~ to search for a word and its synonyms.

Wildcard. Use * as a placeholder for any word.

Two periods. Use .. as a “dash” between two numerical values; dates, prices, whatever.

Or. Use Or or a pipe (|) between words to indicate or. The pipe is shift+backslash and should be below your backspace key.

There are tons more. Do a Google search for “google hacks” or check out sites like http://jwebnet.net/advancedgooglesearch.html.

Digital Evidence Preservation

When it comes to potential litigation, few things are as critical to a case as the proper forensic preservation of evidence.

Removing the potential for spoliation is the driving force behind digital evidence preservation practices.  The lifespan of data stored on computers and smartphones is not always immediately clear, so it’s important to preserve the data as soon as possible.  When your client becomes a party to litigation, they also may have an instinct to do ill-advised things with their data.

Despite a court-issued preservation order, some clients will immediately delete the data, often leaving behind telltale clues to a forensic examiner. Some clients will back up that data somewhere else before deleting, thinking it can be kept away from prying eyes, when in reality that process may create additional evidence.

Even continued regular use of a device can lead to evidence spoliation. For example, a client might clear their internet history on a regular basis, or run a virus scan that deletes certain files if they are encountered.  Some cell phone models may only contain a limited number of call records or text messages and will overwrite previous entries as the device is used.

It’s not easy to explain away this behavior when there was a well-established duty to preserve the data on those devices. Remember that the longer you wait, the more likely it is that data will be lost.

LuciData is capable of preserving data on computers, smartphones, tablets, and any other device which may contain important information.  Preservation is the most important step if the data in question may be of use.  Even if you never need us to perform any forensic analysis of the data, it’s better to get it properly collected right away rather than to leave it to chance.

Fallout

FALLOUT measures how quickly PRECISION drops as RECALL increases.

If I want to increase my recall, I need to get more black fish in the bowl. So I adjust my search. I get maximum recall by leaving no black fish behind. So maybe I set my search so that EVERY fish is caught. 100% RECALL. But what about PRECISION?

Remember that PRECISION in the percentage of retrieved documents that are responsive. So in this example PRECISION dropped to 50%.

falloutBowl

 

 

ref2

Search Metrics – visual guide to precision, recall, elusion, fallout, negative predictive value, prevalence, specificity, false negative rate, accuracy, error, and false alarm rate

If you are not sure what these letters mean and how they are derived, please take a look at the intro post which is more of a primer into contingency tables. If you understand contingency tables but want to get on the same page regarding the quadrants and the respective letter codes used here open the intermediate contingency table post.

recall

RECALL is the percentage of responsive documents that the search found; the number of responsive documents in the search divided by the number of responsive documents in the population. The population contains five responsive documents, the search found three of them.

3/5 = .6

 

precision

PRECISION is the percentage of retrieved documents that are responsive; it is the number of responsive documents in the search results divided by the total number of
documents in the search results. The search retrieved four documents, three of them are responsive.

3/4 = .75

 

elusion

ELUSION is the percentage of unretrieved documents which are responsive and should have been retrieved, or the proportion of predicted negatives that are incorrect. The search left six unretrieved documents, two of them are responsive.

2/6=.33

Instead of counting the responsive documents that we found, we count the ones left behind. H. L. Roitblat, Measurement in eDiscovery

 

fallout

FALLOUT is the percent of nonresponsive documents retrieved. The population has five nonresponsive documents, the search incorrectly retrieved one of them.

1/5=.2

FALLOUT measures how quickly PRECISION drops as RECALL increases.

ref

NPV

NEGATIVE PREDICTIVE VALUE reflects the
percentage of non-retrieved documents that are
in fact not responsive. The search yielded six non-retrieved documents – of
these, four were not responsive.

4/6=.67

NPV is also 100% – ELUSION

prevalence

PREVALENCE is the percentage of all documents
which are true responsive. The population has ten documents, five are responsive.

5/10=.5

NOTICE: This metric does not care about search results.

 

specificity

SPECIFICITY is the percentage of true nonresponsive documents that are currently identified as nonresponsive. The population has five nonresponsive documents, four were correctly identified.

4/5=.8

 

fnr

FALSE NEGATIVE RATE is the percentage of Responsive documents that are missed.

2/5=.4

False Negative Rate plus Recall = 100% (remember that Recall is aka True Positive Rate)

 

Accuracy

ACCURACY is the percentage of documents that the search correctly retrieved.

(3/4)/10=.7

Accuracy is 100% – Error

In highly prevalent or rich data sets (Or sets with extremely low prevalence or richness), Accuracy is a poor measure. Consider a set with 95 percent nonresponsive documents – 95 percent accuracy can be achieved by marking everything nonresponsive.

 

Error 

ERROR is the percentage of documents that are incorrectly coded.

(2+1)/10=30%

Error can also be calculated: 100% – Accuracy

The warning regarding extremes of prevalence or richness applies to Error as well. The utility of Error as a search metric goes down as richness gets extremely high or low.

 

FalseAlarm   

FALSE ALARM RATE is the percentage of search retrieved documents that are nonresponsive.

1/4=.25

This metric does not care about the null set.

 

Search Metrics Bunny Slope

This series of posts explaining search metrics was inspired by a previous post about precision and recall in information seeking. This post, the bunny slope, goes to perhaps exhausting lengths to describe how the contingency table is built. If you are already familiar with true positives, true negatives, and the difference between responsiveness and a search result, stop reading now and get over to the page with a more advanced treatment of things like prevalence, specificity, and negative predictive value.

BunnySlope_01BunnySlope_02BunnySlope_03BunnySlope_04BunnySlope_05BunnySlope_06BunnySlope_07BunnySlope_08BunnySlope_10BunnySlope_11BunnySlope_12       

Now you are ready for some Search Metrics