1.866 LUCIDATA (582.4328) | info@lucidatainc.com
Leave nothing uncovered. Leave nothing to chance.

Your Data and Heartbleed

With the news about Heartbleed making the rounds last week, we at LuciData would like to take this opportunity to assure our customers that no data was compromised due to Heartbleed.  While we have multiple client-facing services, they are not secured with any of the vulnerable versions of OpenSSL.

We would also like to take this time to note a few things that anyone can do to maintain their security online.  The first is to maintain strong passwords, though in this case even a strong password may have been revealed by the Heartbleed vulnerability.  Another important key to online security is preventing password re-use.  If you use that really strong password on every website, if one of those sites has a security flaw you could have problems with other sites.

To mitigate this potential for all of your online accounts to be compromised, we recommend using a password manager.  Most password managers can generate strong passwords for you, and make it easy to keep track of them all – you only need to remember the one “master” password.  Password managers also make it easier to change passwords, which should be done on a regular basis.

Google search for Lawyers (aka google hacks)

The two Google search hacks that I use most often are:

site: and filetype:

The site: hack limits the Google search to a specific site. For example, try

forms site:law.umn.edu

Note that you do not need a specific site, you can limit to a top level domain

forms site:.edu

The filetype: hack limits the results to… wait for it… a specified file type. Try

forms site:law.umn.edu filetype:pdf


But there are more, check these out:

Quotes. This is obvious, but enclose phrases in quotes.

Plus and Minus. Use + and – to include or exclude specific words for phrases.

Tilde. Use ~ to search for a word and its synonyms.

Wildcard. Use * as a placeholder for any word.

Two periods. Use .. as a “dash” between two numerical values; dates, prices, whatever.

Or. Use Or or a pipe (|) between words to indicate or. The pipe is shift+backslash and should be below your backspace key.

There are tons more. Do a Google search for “google hacks” or check out sites like http://jwebnet.net/advancedgooglesearch.html.

Digital Evidence Preservation

When it comes to potential litigation, few things are as critical to a case as the proper forensic preservation of evidence.

Removing the potential for spoliation is the driving force behind digital evidence preservation practices.  The lifespan of data stored on computers and smartphones is not always immediately clear, so it’s important to preserve the data as soon as possible.  When your client becomes a party to litigation, they also may have an instinct to do ill-advised things with their data.

Despite a court-issued preservation order, some clients will immediately delete the data, often leaving behind telltale clues to a forensic examiner. Some clients will back up that data somewhere else before deleting, thinking it can be kept away from prying eyes, when in reality that process may create additional evidence.

Even continued regular use of a device can lead to evidence spoliation. For example, a client might clear their internet history on a regular basis, or run a virus scan that deletes certain files if they are encountered.  Some cell phone models may only contain a limited number of call records or text messages and will overwrite previous entries as the device is used.

It’s not easy to explain away this behavior when there was a well-established duty to preserve the data on those devices. Remember that the longer you wait, the more likely it is that data will be lost.

LuciData is capable of preserving data on computers, smartphones, tablets, and any other device which may contain important information.  Preservation is the most important step if the data in question may be of use.  Even if you never need us to perform any forensic analysis of the data, it’s better to get it properly collected right away rather than to leave it to chance.


FALLOUT measures how quickly PRECISION drops as RECALL increases.

If I want to increase my recall, I need to get more black fish in the bowl. So I adjust my search. I get maximum recall by leaving no black fish behind. So maybe I set my search so that EVERY fish is caught. 100% RECALL. But what about PRECISION?

Remember that PRECISION in the percentage of retrieved documents that are responsive. So in this example PRECISION dropped to 50%.





Search Metrics – visual guide to precision, recall, elusion, fallout, negative predictive value, prevalence, specificity, false negative rate, accuracy, error, and false alarm rate

If you are not sure what these letters mean and how they are derived, please take a look at the intro post which is more of a primer into contingency tables. If you understand contingency tables but want to get on the same page regarding the quadrants and the respective letter codes used here open the intermediate contingency table post.


RECALL is the percentage of responsive documents that the search found; the number of responsive documents in the search divided by the number of responsive documents in the population. The population contains five responsive documents, the search found three of them.

3/5 = .6



PRECISION is the percentage of retrieved documents that are responsive; it is the number of responsive documents in the search results divided by the total number of
documents in the search results. The search retrieved four documents, three of them are responsive.

3/4 = .75



ELUSION is the percentage of unretrieved documents which are responsive and should have been retrieved, or the proportion of predicted negatives that are incorrect. The search left six unretrieved documents, two of them are responsive.


Instead of counting the responsive documents that we found, we count the ones left behind. H. L. Roitblat, Measurement in eDiscovery



FALLOUT is the percent of nonresponsive documents retrieved. The population has five nonresponsive documents, the search incorrectly retrieved one of them.


FALLOUT measures how quickly PRECISION drops as RECALL increases.



percentage of non-retrieved documents that are
in fact not responsive. The search yielded six non-retrieved documents – of
these, four were not responsive.


NPV is also 100% – ELUSION


PREVALENCE is the percentage of all documents
which are true responsive. The population has ten documents, five are responsive.


NOTICE: This metric does not care about search results.



SPECIFICITY is the percentage of true nonresponsive documents that are currently identified as nonresponsive. The population has five nonresponsive documents, four were correctly identified.




FALSE NEGATIVE RATE is the percentage of Responsive documents that are missed.


False Negative Rate plus Recall = 100% (remember that Recall is aka True Positive Rate)



ACCURACY is the percentage of documents that the search correctly retrieved.


Accuracy is 100% – Error

In highly prevalent or rich data sets (Or sets with extremely low prevalence or richness), Accuracy is a poor measure. Consider a set with 95 percent nonresponsive documents – 95 percent accuracy can be achieved by marking everything nonresponsive.



ERROR is the percentage of documents that are incorrectly coded.


Error can also be calculated: 100% – Accuracy

The warning regarding extremes of prevalence or richness applies to Error as well. The utility of Error as a search metric goes down as richness gets extremely high or low.



FALSE ALARM RATE is the percentage of search retrieved documents that are nonresponsive.


This metric does not care about the null set.


Search Metrics Bunny Slope

This series of posts explaining search metrics was inspired by a previous post about precision and recall in information seeking. This post, the bunny slope, goes to perhaps exhausting lengths to describe how the contingency table is built. If you are already familiar with true positives, true negatives, and the difference between responsiveness and a search result, stop reading now and get over to the page with a more advanced treatment of things like prevalence, specificity, and negative predictive value.


Now you are ready for some Search Metrics 

Intermediate Search Metrics

This tank is the document population. Fish are documents. White fish are nonresponsive, black fish are responsive.

The bowl represents the search results.


After searching you get some nonresponsive fish in the bowl and you leave some responsive fish behind.

The search results can be identified and grouped into nine categories, A – I.


The nine categories are plotted on a Contingency Table:


A Correct Positive
B False Negative
C True Responsive
D False Positive
E True Negative
F True Nonresponsive
G Search Responsive
H Search Nonresponsive
I Corpus

From this Contingency Table we calculate Search Metrics

If you are not ready to go on, get introductory information regarding calculating the quadrants and populating a contingency table.

Ewald v. Royal Norwegian Embassy – read your 26(f) agreement!

Diplomatic  Note Form of  Production  Agreement If the judge urges you to address  discovery issues in a 26(f) report. Do it. purports  to address  handling of  any issue  related to  discovery limited waiver of  Vienna Convention  (inviolability of  documents) for  this litigation Plaintiffs - I am talking to you. United States District Judge largely affirmed a  magistrate judge's order regarding plaintiff's  various discovery motions.  After reviewing the Magistrate Judge’s Order for clear error, the  Court, in large part, affirms the Order. The Court respectfully  reverses the Order as it relates to discovery of text messages  and voice messages contained on certain mobile devices. Nowhere near what  plaintiff asked for The 26(f) report contains the following item: How the parties propose handling any issues relating to the discovery of electronically stored  information, including the form or forms in which it should be produced: Produce TIF format and to follow Exhibit A Meet and confer to vet search terms before involving the Court It is not a surprise to any of the parties in this case that there were tablets,  text messages, cell phones, and laptops involved. All of these devices were known  prior to the initiation of litigation, and it is common knowledge that ESI is  contained on all of these devices. Nevertheless, they were not included in any  initial discovery discussions. To go back and engage in discovery with respect to  those devices at this stage in the litigation and in light of the expenses and  costs the parties have already incurred is simply not feasible. 129 discovery Ellen S. Ewald v. Royal Norwegian Embassy only mentions TIFs and a  note about vetting  search terms FOPA contains information about TIFs,  Bates Numbers, OCR, load files, metadata,  delimiters., and deduplication Judge finds that the negotiated production format overrides  Rule 34; defendant is not required to provide designations. Form of  Production  Agreement The  Court urged  the parties to  negotiate and  address (ESI issues)  in the FOPA under their  obligations in drafting  and formulating a  Rule 26(f) report. The vague 26(f) report and inadequate FOPA left the Plaintiff with no basis for additional discovery. Plus, this single-plaintiff employment case has already cost both parties $750,000 each.  denied Fed. R. Civ. P. 26(b)(2)(C)(iii) Proportionality The 26(f) report references FOPA as the source  of guidance for ANY discovery issue. Think The 26(f) Report did not identify sources of ESI nor  how ESI would be collected. Plaintiff cannot now  request laptops, memory cards, and tablets.  Negotiate is not gap filler 34 Plaintiff's document requests did not specifically  request text messages or voicemails and failed to  include text messages in the definition of document.  Judge denies request for phones (reversed in part). BS Be Specific Plaintiff brings motion to compel: inspection of phones and laptops information about how discovery was conducted designate documents according to request unredacted versions of redacted documents various studies related to discrimination and bullying Five requests. Five denials. no worldwide it's too late RULE

“It is not a surprise to any of the parties in this case that there were tablets, text messages, cell phones, and laptops involved. All of these devices were known prior to the initiation of litigation, and it is common knowledge that ESI is contained on all of these devices. Nevertheless, they were not included in any initial discovery discussions. To go back and engage in discovery with respect to those devices at this stage in the litigation and in light of the expenses and costs the parties have already incurred is simply not feasible.”