detector makes detecting data containing Personally Identifiable Information (PII) quick, easy, and scalable. It provides high-level functions that can take vectors and data.frames and return important summary statistics in a convenient data.frame. Once complete, detector will be able to detect the following types of PII:

  • Full name
  • Home address
  • E-mail address
  • National identification number
  • Passport number
  • Social Security number
  • IP address
  • Vehicle registration plate number
  • Driver’s license number
  • Credit card number
  • Date of birth
  • Birthplace
  • Telephone number
  • Latitude and longtiude

State of the Union

Complete!

  • E-mail address
  • Telephone number
  • National identification number

Needs more work…

  • Credit card number

Haven’t even started :(

  • Full name
  • Date of birth
  • Home address
  • IP address
  • Vehicle registration plate number
  • Driver’s license number
  • Birthplace
  • Latitude and longtiude

Installation

You can install the latest development version from CRAN:

install.packages("detector")

Or from GitHub with:

if (packageVersion("devtools") < 1.6) {
  install.packages("devtools")
}
devtools::install_github("paulhendricks/detector")

If you encounter a clear bug, please file a minimal reproducible example on GitHub.

API

Generate data containing fake PII

library(dplyr, warn.conflicts = FALSE)
library(generator)
n <- 6
set.seed(1)
ashley_madison <- 
  data.frame(name = r_full_names(n), 
             snn = r_national_identification_numbers(n), 
             dob = r_date_of_births(n), 
             email = r_email_addresses(n), 
             ip = r_ipv4_addresses(n), 
             phone = r_phone_numbers(n), 
             credit_card = r_credit_card_numbers(n), 
             lat = r_latitudes(n), 
             lon = r_longitudes(n), 
             stringsAsFactors = FALSE)
knitr::kable(ashley_madison, format = "markdown")
name snn dob email ip phone credit_card lat lon
Eldridge Pfannerstill 442-34-5338 1993-04-28 ntakqojv@lgbcyk.rkv 45.84.71.225 6794976958 4125-7204-9193-5140 -2.7018575 8.634988
Augustine Homenick 799-44-6396 1912-09-08 iqg@mtcuh.viy 191.116.55.106 3275827694 2182-5994-2283-9486 -70.4148630 -65.827918
Jennie Runte 941-11-5441 1985-01-12 wjszy@sjhreocvt.gbp 27.128.73.17 7419351735 4370-4866-4735-7857 -45.4091701 -79.932229
Araceli Kunde 290-44-2675 1948-04-28 uljsnvhfr@qfdkumtn.jkd 221.47.229.86 3243246285 6682-5074-2898-9396 -0.2673845 103.514583
Josue Rau 686-88-8446 1996-06-14 c@lqxzkdpi.nfy 157.136.114.185 9169736873 4510-3757-4858-5236 -22.8839925 72.886505
Elnora Zemlak 212-40-7016 1976-01-09 capvnl@nympzf.gsk 143.20.199.87 3295843196 7206-6205-2194-6432 78.2444466 -120.590050

Detect data containing PII

library(detector)
ashley_madison %>% 
  detect %>% 
  knitr::kable(format = "markdown")
column_name has_email_addresses has_phone_numbers has_national_identification_numbers
name FALSE FALSE FALSE
snn FALSE FALSE TRUE
dob FALSE FALSE FALSE
email TRUE FALSE FALSE
ip FALSE FALSE FALSE
phone FALSE TRUE FALSE
credit_card FALSE FALSE FALSE
lat FALSE TRUE FALSE
lon FALSE TRUE FALSE

Citation

To cite package ‘detector’ in publications use:

Paul Hendricks (2015). detector: Detect Data Containing Personally Identifiable Information. R package version 0.1.0. https://CRAN.R-project.org/package=detector

A BibTeX entry for LaTeX users is

@Manual{,
  title = {detector: Detect Data Containing Personally Identifiable Information},
  author = {Paul Hendricks},
  year = {2015},
  note = {R package version 0.1.0},
  url = {https://CRAN.R-project.org/package=detector},
}