Created by: Sara McNamara
Stuck on where to start with data enrichment, cleansing, normalization, and validation? Hereās a great place to get ideas from industry standards:
Process | Description | Purpose | Examples |
---|---|---|---|
Email Validation | Verifying the format and deliverability of email addresses. | Ensure accuracy and prevent bounce rates in email campaigns. | Removing invalid emails like example@@domain.com , identifying disposable emails like [email protected] . |
Phone Number Formatting | Standardizing phone numbers to a consistent international format. | Improve usability and reduce errors in communication. | Converting (123) 456-7890 to +1-123-456-7890 , ensuring country codes are included. |
Address Standardization | Normalizing address data to conform to standardized formats. | Facilitate shipping, geocoding, and deduplication. | Converting 123 Elm St, Apt 4 to 123 Elm Street, Apartment 4, New York, NY 10001, USA . |
Duplicate Detection | Identifying and merging duplicate records in a dataset. | Improve data cleanliness and reduce redundancies. | Merging two records like John Smith, [email protected] and J. Smith, [email protected] . |
Demographic Enrichment | Augmenting records with demographic data like age, income, or occupation. | Improve segmentation and personalization in marketing campaigns. | Adding Age: 35, Income: $75,000 to a customer profile. |
Company Enrichment | Adding firmographic data (e.g., company size, revenue, industry) to business records. | Enhance B2B targeting and account-based marketing efforts. | Adding Industry: Technology, Employees: 500, Revenue: $50M to a company profile. |
Data Type Validation | Ensuring data conforms to the expected type (e.g., numeric, date, boolean). | Prevent errors caused by incorrect data types. | Rejecting a value like twenty in an Age field expecting a numeric value. |
Product Usage/Support Ticket Activity Data | If you sell a SaaS product, pull in user or company activity data like login activity, sub-product activity. If I use Customer Support Ticketing software, I can tag ticket-opening activity and health. | Send targeting enablement emails or flag account health based on product activity (or inactivity). Proactively triage customers who may be struggling. | If a company has purchased a suite of my products and is only using 1 , I might flag that to a CSM and Sales to ensure that they are enabled properly and aware of what they can do with the capabilities they are ignoring. If there is a larger issue discovered, that could impact account health. If a customer doesnāt open any support tickets for a long time or opens more tickets than an average customer , they may need additional help or be struggling to use the product. |
Normalization of Text | Converting text to a consistent format (e.g., trimming whitespace, converting to lowercase). | Standardize textual data for better matching and processing. | Normalizing NEW YORK to New York . |
Date Standardization | Converting dates to a consistent format (e.g., ISO 8601: YYYY-MM-DD). | Facilitate time-based analyses and integrations. | Converting 1/22/25 or 22-Jan-2025 to 2025-01-22 . |
Geo-Enrichment | Adding geographic metadata (e.g., latitude/longitude, census data) to address records. | Enable location-based analytics and insights (where needed/applicable). | Adding 40.7128Ā° N, 74.0060Ā° W to 123 Elm Street, New York, NY 10001 . |
Software Install Base | Use an enrichment vendor to populate a field when a company uses a specific software vendor. | Easily identify if a company is using a competitor or a potential cross-sell partner tool. | If a company is using Marketo and I sell a tool that pairs well with Marketo , I pull in that data so I can identify which accounts to focus on. If a company uses Pardot , I might identify them as not a great a fit or I may pitch the company differently. |
Missing Data Handling | Imputing or flagging missing values in datasets. | Reduce bias and ensure completeness of analysis. | Filling missing values in an Income column with the average income for that demographic. |
Outlier Detection | Identifying and handling data points that significantly deviate from the norm. | Prevent skewed analysis and ensure data accuracy. | Flagging a salary value of $1,000,000 in a dataset where the average is $50,000 . |
Categorical Mapping | Mapping inconsistent categorical values to a standardized list (e.g., "NYC" -> "New York City"). | Reduce inconsistencies and improve analytical insights. | Standardizing CA and Calif. to California . |
Language Detection | Identifying the language of text fields and normalizing or translating them. | Enable multilingual processing and insights. | Detecting and translating Bonjour (French) to Hello (English). |
Standardized IDs | Adding or normalizing unique identifiers like customer IDs, UUIDs, or primary keys. | Facilitate deduplication and relational database operations. | Assigning a unique identifier like UUID: 123e4567-e89b-12d3-a456-426614174000 to each customer record. |
Custom Field Validation | Validating custom fields against defined business rules (e.g., age > 18). | Ensure data aligns with business logic and compliance requirements. | Rejecting Age: 15 for a product restricted to customers aged 18 and above. |
Data Profiling | Assessing the structure, content, and quality of datasets to uncover anomalies and patterns. | Provide insights into data quality and readiness for use. | Identifying that 20% of Phone Number records are incomplete or invalid. |
Consent Validation | Ensuring data complies with regulatory requirements like GDPR or CCPA (e.g., consent flags). | Maintain legal compliance and customer trust. | Flagging records missing consent for email marketing or identifying opt-out requests. |
Custom Field Standardization | Take fields like āJob Titleā and create an additional field āJob Roleā to enable a picklist of buckets to put contacts into. | Keep accurate and specific user-entered data, while also being able to segment personas. | If Job Title: Marketing Operations Specialist, we can add a Job Role field that says Job Role: Revenue Operations so we can segment based on the larger Operations umbrella. |
Spam Tagging/Exclusion List | Look in the database for things like email: [email protected] or Job Title: Student (where applicable) to flag those records as spam and exclude from processes. | Save enrichment credits, email deliverability, and sales time by excluding records that are obvious spam. | If a record comes in with email address: [email protected] , I can flag and exclude through segmentation in processes. |
Job Change Flagging | Many data enrichment vendors can notify you when a person switches jobs or companies. | If a person switches a job or company, you may want to create a new record for them or communicate with them differently, based on this new context. | If Jim initially worked as a VP of Sales at Staples but now he works as a CRO at Adobe , I might want to create a new contact record for him and tag his old record as āno longer active .ā A new rep/CSM may be assigned to him as well. This tagging will avoid confusion. |
Aligning Key Picklists | Ensuring that picklist values on fields like Industry are aligned across systems and data sources. | This ensures that the system integrations are working properly and avoids confusion/extra work when creating lists and segments. | If Salesforce has Industry: Software as a value, Zoominfo has Industry: SaaS as a value, and HubSpot has Industry: Tech as value, but they all mean the same thing to your business, you may want to standardize across to Industry: Software to avoid confusion and promote ease of automation/segmentation setup. |
Tools for ensuring data accuracy and compliance with expected formats.
Tools to enhance datasets with additional demographic, firmographic, or geographic data.