How to remove invalid characters from any file

When I find an inefficient process like someone who’s doing manual work in Excel, I cringe when I know there’s a better way to do it.

A friend of mine, Tim, recently uploaded thousands of new products with the trademark ™, copyright ©, and size, i.e. ” or ‘ symbols, to Bigcommerce without changing them to HTML notation and it created lots of junk data. Over 10k rows with invalid characters mixed into the product titles and product descriptions. Not only does this ruin the experience for shoppers, but ranks horribly for organic results.

They were slowly cleaning the product export file using the find and replace function, but it was taking hours. I knew a better way to automate this simple and mundane task and save them a lot of time.

I had recently read a post on stackoverflow.com that talked about cleaning up strings of unprintable characters and it gave us 80% of the solution. I improved the solution to work for any size of data and an unlimited amount of columns. I also added a few variables to the macro code to find the last row of the data and take input from the user to query for the specific column to be cleaned. The data is assumed to have headers in row one with the data following, so I created a constant for the first row. With a final test, I added a few more nuances, and we had a complete solution!

In the end, we cleaned over 90,000 lines of data in a matter of minutes.

I created an Excel String Cleaner add-on to download and try on your file.
Download

First time using an Excel Add-in?
Download the file and watch the video below on how to use it.

Are you an Excel professional?
Use the stringClean macro code below to clean out your non-printable characters.

Sub cleanString()
Dim new_val As String
Dim curr_val As String
Dim curr_cell As String
Dim fr As Double
Dim lr As Double
Dim column As String
Dim col As Double

'set the first row
   fr = 2
'find the lastrow
   lr = ActiveSheet.Cells(ActiveSheet.Rows.Count, 1).End(xlUp).Row
'get input from user for the column as the letter
   column = InputBox("enter the Column Header, ie E")
'change the column to the number (May be redundant)
   col = Cells(1, column).column
'loop through the first row to last row of the specific column
   For a = fr To lr
      'get the current cell string
      curr_cell = ActiveSheet.Cells(a, col).Value
      'if the length of the current string is greater than zero clean it
      If Len(curr_cell) > 0 Then
         new_val = curr_cell
         'loop through each character in the string
         For iter = 1 To Len(curr_cell)
            'get the current character
            curr_val = Mid(curr_cell, iter, 1)
            'if the current character is invalid, remove it
            If Asc(curr_val) > 127 And Asc(curr_val) <> 145 And Asc(curr_val) <> 146 And _
              Asc(curr_val) <> 147 And Asc(curr_val) <> 148 Then
                 new_val = Replace(new_val, curr_val, "")
            End If
         Next iter
         'overwrite the cell with the clean string
         ActiveSheet.Cells(a, col).Value = new_val
      End If
   Next a
'notify when complete.
   Beep
End Sub