[100 Days of ML Code] OneHotEncoder
One Hot Encoding is process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction. Normally, we will have a dataset of: ╔════════════╦════════╗ ║ CompanyName║ Price ║ ╠════════════╣════════║ ║ VW ║ 20000 ║ ║ Acura ║ 10011 ║ ║ Honda ║ 50000 ║ ║ Honda ║ 10000 ║ ╚════════════╩════════╝ Many ML algorithms cannot work with label data (as a string) directly. Therefore, we need to convert these labels into some numeric value: ╔════════════╦═════════════════╦════════╗ ║ CompanyName Categoricalvalue ║ Price ║ ╠════════════╬═════════════════╣════════║ ║ VW ╬ 1 ║ 20000 ║ ║ Acura ╬ 2 ║ 10011 ║ ║ Honda ╬ 3 ║ 50000 ║ ║ Honda ╬ 3 ║ 10000 ║ ╚════════════╩═════════════════╩════════╝ Now, 'categoricalValue' is a numerical value that represents the companyName. However, the problem with the ab