Imputer class is now deprecated in Scikitlearn 0.22. Here is how to do it now

The SciKitlearn’s Imputer (sklearn.preprocessing.Imputer) class, widely used for Imputing, cleaning up and manipulating data sets, especially where there are missing, data has been deprecated from SciKitlearn version 0.22.

sklearn.impute.SimpleImputer is now the preferred class, it is similar to sklearn.preprocessing.Imputer but more succinct.

Here is how to use it

Using the following data set (below) as example (stored as data.csv)

Country Age Salary Purchase
USA 25 60000 Yes
USA 30 75000 No
Canada 42 40000 Yes
USA 30000 Yes
Canada 42 No

First we will import numpy (for numpy array functions) and pandas

import numpy as np
import pandas as pd

We will then import the entire data frame (df) from the data set (data.csv) using pandas.read_csv

import numpy as np
import pandas as pd

df = pd.read_csv('data.csv')

For convenience, Lets identify the dependent column name (title_y)

import numpy as np
import pandas as pd

df = pd.read_csv('data.csv')

title_y = 'Purchased'

Now let create the matrix of features and dependent vector from df

import numpy as np
import pandas as pd

df = pd.read_csv('data.csv')

title_y = 'Purchased'

X = df.drop(columns=[title_y]).values 
y = df['title_y'].values 

now, let’s use the new sklearn.impute.SimpleImputer to fix missing data. In this example, we will be replacing missing values with the mean values.

Here are the steps:

  • import SimpleImputer
  • create a SimpleImputer instance (imputer), with arguments missing_values=np.nan (to identify missing fields) and strategy=’mean’
  • fit missing columns (in our example X[:,1:3]) to the imputer
  • transform the columns with missing

import numpy as np
import pandas as pd

df = pd.read_csv('data.csv')

title_y = 'Purchased'

X = df.drop(columns=[title_y]).values 
y = df['title_y'].values 

from sklearn.impute import SimpleImputer 

imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X[:,1:3])
replaced = imputer.transform(X[:,1:3])
X[:,1:3] = replaced

And that’s all!

3 Comments:

    1. Obi Alexander

      That was a mistake. The error was ‘attribute error : “SimpleImputer’ object has no attribute ‘transform’

Leave a Reply to Obi Alexander Cancel reply

Your email address will not be published. Required fields are marked *