The SciKitlearn’s Imputer (sklearn.preprocessing.Imputer) class, widely used for Imputing, cleaning up and manipulating data sets, especially where there are missing, data has been deprecated from SciKitlearn version 0.22.
sklearn.impute.SimpleImputer
is now the preferred class, it is similar to sklearn.preprocessing.Imputer but more succinct.
Here is how to use it
Using the following data set (below) as example (stored as data.csv)
Country | Age | Salary | Purchase |
---|---|---|---|
USA | 25 | 60000 | Yes |
USA | 30 | 75000 | No |
Canada | 42 | 40000 | Yes |
USA | 30000 | Yes | |
Canada | 42 | No | |
… | … | … | … |
First we will import numpy (for numpy array functions) and pandas
import numpy as np
import pandas as pd
We will then import the entire data frame (df) from the data set (data.csv) using pandas.read_csv
import numpy as np
import pandas as pd
df = pd.read_csv('data.csv')
For convenience, Lets identify the dependent column name (title_y)
import numpy as np
import pandas as pd
df = pd.read_csv('data.csv')
title_y = 'Purchased'
Now let create the matrix of features and dependent vector from df
import numpy as np
import pandas as pd
df = pd.read_csv('data.csv')
title_y = 'Purchased'
X = df.drop(columns=[title_y]).values
y = df['title_y'].values
now, let’s use the new sklearn.impute.SimpleImputer to fix missing data. In this example, we will be replacing missing values with the mean values.
Here are the steps:
- import SimpleImputer
- create a SimpleImputer instance (imputer), with arguments missing_values=np.nan (to identify missing fields) and strategy=’mean’
- fit missing columns (in our example X[:,1:3]) to the imputer
- transform the columns with missing
import numpy as np
import pandas as pd
df = pd.read_csv('data.csv')
title_y = 'Purchased'
X = df.drop(columns=[title_y]).values
y = df['title_y'].values
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X[:,1:3])
replaced = imputer.transform(X[:,1:3])
X[:,1:3] = replaced
And that’s all!
Undefined name ‘replaced’
That was a mistake. The error was ‘attribute error : “SimpleImputer’ object has no attribute ‘transform’
i think ‘Purchased’ is not defined