Created by: Sara McNamara

Stuck on where to start with data enrichment, cleansing, normalization, and validation? Hereā€™s a great place to get ideas from industry standards:

Process Description Purpose Examples
Email Validation Verifying the format and deliverability of email addresses. Ensure accuracy and prevent bounce rates in email campaigns. Removing invalid emails like example@@domain.com, identifying disposable emails like [email protected].
Phone Number Formatting Standardizing phone numbers to a consistent international format. Improve usability and reduce errors in communication. Converting (123) 456-7890 to +1-123-456-7890, ensuring country codes are included.
Address Standardization Normalizing address data to conform to standardized formats. Facilitate shipping, geocoding, and deduplication. Converting 123 Elm St, Apt 4 to 123 Elm Street, Apartment 4, New York, NY 10001, USA.
Duplicate Detection Identifying and merging duplicate records in a dataset. Improve data cleanliness and reduce redundancies. Merging two records like John Smith, [email protected] and J. Smith, [email protected].
Demographic Enrichment Augmenting records with demographic data like age, income, or occupation. Improve segmentation and personalization in marketing campaigns. Adding Age: 35, Income: $75,000 to a customer profile.
Company Enrichment Adding firmographic data (e.g., company size, revenue, industry) to business records. Enhance B2B targeting and account-based marketing efforts. Adding Industry: Technology, Employees: 500, Revenue: $50M to a company profile.
Data Type Validation Ensuring data conforms to the expected type (e.g., numeric, date, boolean). Prevent errors caused by incorrect data types. Rejecting a value like twenty in an Age field expecting a numeric value.
Product Usage/Support Ticket Activity Data If you sell a SaaS product, pull in user or company activity data like login activity, sub-product activity. If I use Customer Support Ticketing software, I can tag ticket-opening activity and health. Send targeting enablement emails or flag account health based on product activity (or inactivity). Proactively triage customers who may be struggling. If a company has purchased a suite of my products and is only using 1, I might flag that to a CSM and Sales to ensure that they are enabled properly and aware of what they can do with the capabilities they are ignoring. If there is a larger issue discovered, that could impact account health. If a customer doesnā€™t open any support tickets for a long time or opens more tickets than an average customer, they may need additional help or be struggling to use the product.
Normalization of Text Converting text to a consistent format (e.g., trimming whitespace, converting to lowercase). Standardize textual data for better matching and processing. Normalizing NEW YORK to New York.
Date Standardization Converting dates to a consistent format (e.g., ISO 8601: YYYY-MM-DD). Facilitate time-based analyses and integrations. Converting 1/22/25 or 22-Jan-2025 to 2025-01-22.
Geo-Enrichment Adding geographic metadata (e.g., latitude/longitude, census data) to address records. Enable location-based analytics and insights (where needed/applicable). Adding 40.7128Ā° N, 74.0060Ā° W to 123 Elm Street, New York, NY 10001.
Software Install Base Use an enrichment vendor to populate a field when a company uses a specific software vendor. Easily identify if a company is using a competitor or a potential cross-sell partner tool. If a company is using Marketo and I sell a tool that pairs well with Marketo, I pull in that data so I can identify which accounts to focus on. If a company uses Pardot, I might identify them as not a great a fit or I may pitch the company differently.
Missing Data Handling Imputing or flagging missing values in datasets. Reduce bias and ensure completeness of analysis. Filling missing values in an Income column with the average income for that demographic.
Outlier Detection Identifying and handling data points that significantly deviate from the norm. Prevent skewed analysis and ensure data accuracy. Flagging a salary value of $1,000,000 in a dataset where the average is $50,000.
Categorical Mapping Mapping inconsistent categorical values to a standardized list (e.g., "NYC" -> "New York City"). Reduce inconsistencies and improve analytical insights. Standardizing CA and Calif. to California.
Language Detection Identifying the language of text fields and normalizing or translating them. Enable multilingual processing and insights. Detecting and translating Bonjour (French) to Hello (English).
Standardized IDs Adding or normalizing unique identifiers like customer IDs, UUIDs, or primary keys. Facilitate deduplication and relational database operations. Assigning a unique identifier like UUID: 123e4567-e89b-12d3-a456-426614174000 to each customer record.
Custom Field Validation Validating custom fields against defined business rules (e.g., age > 18). Ensure data aligns with business logic and compliance requirements. Rejecting Age: 15 for a product restricted to customers aged 18 and above.
Data Profiling Assessing the structure, content, and quality of datasets to uncover anomalies and patterns. Provide insights into data quality and readiness for use. Identifying that 20% of Phone Number records are incomplete or invalid.
Consent Validation Ensuring data complies with regulatory requirements like GDPR or CCPA (e.g., consent flags). Maintain legal compliance and customer trust. Flagging records missing consent for email marketing or identifying opt-out requests.
Custom Field Standardization Take fields like ā€œJob Titleā€ and create an additional field ā€œJob Roleā€ to enable a picklist of buckets to put contacts into. Keep accurate and specific user-entered data, while also being able to segment personas. If Job Title: Marketing Operations Specialist, we can add a Job Role field that says Job Role: Revenue Operations so we can segment based on the larger Operations umbrella.
Spam Tagging/Exclusion List Look in the database for things like email: [email protected] or Job Title: Student (where applicable) to flag those records as spam and exclude from processes. Save enrichment credits, email deliverability, and sales time by excluding records that are obvious spam. If a record comes in with email address: [email protected], I can flag and exclude through segmentation in processes.
Job Change Flagging Many data enrichment vendors can notify you when a person switches jobs or companies. If a person switches a job or company, you may want to create a new record for them or communicate with them differently, based on this new context. If Jim initially worked as a VP of Sales at Staples but now he works as a CRO at Adobe, I might want to create a new contact record for him and tag his old record as ā€œno longer active.ā€ A new rep/CSM may be assigned to him as well. This tagging will avoid confusion.
Aligning Key Picklists Ensuring that picklist values on fields like Industry are aligned across systems and data sources. This ensures that the system integrations are working properly and avoids confusion/extra work when creating lists and segments. If Salesforce has Industry: Software as a value, Zoominfo has Industry: SaaS as a value, and HubSpot has Industry: Tech as value, but they all mean the same thing to your business, you may want to standardize across to Industry: Software to avoid confusion and promote ease of automation/segmentation setup.

Potential tools for each use case:

1. Data Validation

Tools for ensuring data accuracy and compliance with expected formats.


2. Data Enrichment

Tools to enhance datasets with additional demographic, firmographic, or geographic data.