101 Data Validation and Cleaning Guide

Yousaf Ishaq
Yousaf Ishaq
Tech Lead
Published:
April 23, 2024
Updated:
December 2, 2024
101 Data Validation and Cleaning Guide

Talking to our users reveals that one of the major aspects product and engineering teams struggle with in data onboarding is implementing and constantly adding validators to cover even the most complex customer edge cases.

Validators, data types, and formats vary a lot across industries and even between companies, which can easily tie up valuable engineering time and resources. Sooner or later, companies usually realize that the number of edge cases to validate across data types is infinite. All customer data is individual and different, leaving customer onboarding and product teams with a considerable workload to add and maintain data validation rules continuously.

Managing customer data imports in a scalable but still flexible way requires pre-defined validators as well as automated data reformatting and cleaning functionalities. The result: you avoid manual efforts with every new edge case.

At nuvo, we differentiate between data validations for Column Types and Cleaning Functions for automated data reformatting. You can use both to cope with the most complex data formats and allow callback functions for data validation with APIs or your own servers.

Introducing: nuvo Column Types, Data Validations & Regex

The nuvo Data Importer provides a seamless and scalable data import experience as it validates all imported data against the required target data schema. The target data schema includes all required columns, the required data type, and any validation aspects of the respective fields. Data types can be standard types such as string, integer, float, or boolean. However, as many common additional data formats exist across industries, we added a large selection of Column Types as pre-written one-click validations. These include formats like dates, email addresses, IBAN, postal codes, and many more. In addition, category or drop-down fields can easily be created using our pre-written Column Types.

Our documentation provides you with a comprehensive list of pre-built Column Types. Below you see an example of a target data schema.

Pre-built Column Types

Having the option to include one-click data types is nice. However, it does not solve all data validation issues — especially not your edge cases.

In addition to defining the data format, e.g., date or email address, nuvo allows adding more advanced validations, such as required, unique, cross-column dependencies as well as custom regular expressions (Regex). All of this can be set up by simply ticking a view box with our no-code Target Data Model Generator in the user platform.

With a growing user base across industries, we are continuously expanding the list of pre-built data types to save countless hours for your engineering team to set up and add new validations and data types by themselves.

Cleaning Functions — Automated Data Validation and Cleaning

With customer data imports, the real magic happens when data is not only checked against the required data type but can be automatically validated and reformatted into the required format. You achieve this with nuvo’s powerful Cleaning Functions.

Cleaning Functions are callback functions that run after a particular event inside the importing workflow. You are able to provide feedback to the user and automatically transform the imported data to your required format for faster data submission and increased data quality.

Cleaning functions are a perfect addition to our validation rules. They cover nearly every use case by transforming data and displaying info, warnings, and errors with customized messages to the user within the “Review Entries” step.

Automatically reformatting phone numbers by removing the zeros at the beginning of the phone number and adding a plus sign instead (see the code example below), calling the Google Maps API to validate addresses and reformat them if necessary, or conducting server callbacks to your own database to avoid duplicate entries or validate against your existing database are only a few examples of what is possible with our automated Cleaning Functions.

At nuvo, we provide three kinds of Cleaning Functions, allowing you to:

  • Reformat all the imported data of particular columns
  • Iterate through every imported entry or row so that you can reformat specific values across rows and columns
  • Access to all imported entries the user has modified to validate them too

The knowledge base in the user platform offers an extensive list of pre-built cleaning functions. More information on the Cleaning Functions is in our documentation.

Pre-built Cleaning Functions

To sum it up, setting up and maintaining data validations and automated data cleaning is considered one of the highest efforts for engineering and product teams implementing data imports. It is critical to be able to deal with complex customer data edge cases and simultaneously provide your users with a delightful data import experience.

With our extensive knowledge base, pre-defined data validations, and Cleaning Functions, it will no longer be a challenge for you to provide the best possible data onboarding experience for your customers.

book a 30-minute call

Let's talk about your data onboarding needs

white visualwhite visual

Keep exploring

icon