<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Point 4 Point]]></title><description><![CDATA[Data Engineering & Data Science]]></description><link>https://blog.point-4-point.com</link><image><url>https://substackcdn.com/image/fetch/$s_!HMst!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d3cad7d-c598-4529-a4ee-78da0677e591_707x707.png</url><title>Point 4 Point</title><link>https://blog.point-4-point.com</link></image><generator>Substack</generator><lastBuildDate>Tue, 07 Apr 2026 20:51:41 GMT</lastBuildDate><atom:link href="https://blog.point-4-point.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Point 4 Point]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[point4point@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[point4point@substack.com]]></itunes:email><itunes:name><![CDATA[Ben Perkins]]></itunes:name></itunes:owner><itunes:author><![CDATA[Ben Perkins]]></itunes:author><googleplay:owner><![CDATA[point4point@substack.com]]></googleplay:owner><googleplay:email><![CDATA[point4point@substack.com]]></googleplay:email><googleplay:author><![CDATA[Ben Perkins]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Use GridSearchCV to Tune ML Models]]></title><description><![CDATA[Series: Taiwan Credit Default Risk Dataset]]></description><link>https://blog.point-4-point.com/p/use-gridsearchcv-to-tune-ml-models-32e4d39c2b9a</link><guid isPermaLink="false">https://blog.point-4-point.com/p/use-gridsearchcv-to-tune-ml-models-32e4d39c2b9a</guid><dc:creator><![CDATA[Ben Perkins]]></dc:creator><pubDate>Sun, 02 Oct 2022 21:23:04 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ef0d1966-9569-4d56-bbe0-63b37331faab_800x355.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h4>Series: Taiwan Credit Default Risk&nbsp;Dataset</h4><p>This article is part of a series where we explore, preprocess, and run several machine learning methods on the Taiwan Credit Default dataset. The dataset can be found here: <a href="https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients">dataset</a>&nbsp;.</p><p>In a previous article, we used <strong>XGBoost</strong> as a way to identify the most important features in the dataset, in terms of</p><p>the <strong>KFold</strong> algorithm to help discover the relative merits of several machine learning algorithms for a binary classification task. Specifically, the task is to predict whether a given credit customer will default on their payments or not. <strong>KFold</strong> allowed us to quickly run several bare-bones models on the data and plot the results to help narrow down one or two promising candidates for further development.</p><p>First, let&#8217;s import some of the packages we will need.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g1Hd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38c1ee13-0f58-405d-b48e-4a7048e33b58_800x355.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g1Hd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38c1ee13-0f58-405d-b48e-4a7048e33b58_800x355.png 424w, https://substackcdn.com/image/fetch/$s_!g1Hd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38c1ee13-0f58-405d-b48e-4a7048e33b58_800x355.png 848w, https://substackcdn.com/image/fetch/$s_!g1Hd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38c1ee13-0f58-405d-b48e-4a7048e33b58_800x355.png 1272w, https://substackcdn.com/image/fetch/$s_!g1Hd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38c1ee13-0f58-405d-b48e-4a7048e33b58_800x355.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g1Hd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38c1ee13-0f58-405d-b48e-4a7048e33b58_800x355.png" width="800" height="355" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/38c1ee13-0f58-405d-b48e-4a7048e33b58_800x355.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:355,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g1Hd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38c1ee13-0f58-405d-b48e-4a7048e33b58_800x355.png 424w, https://substackcdn.com/image/fetch/$s_!g1Hd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38c1ee13-0f58-405d-b48e-4a7048e33b58_800x355.png 848w, https://substackcdn.com/image/fetch/$s_!g1Hd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38c1ee13-0f58-405d-b48e-4a7048e33b58_800x355.png 1272w, https://substackcdn.com/image/fetch/$s_!g1Hd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38c1ee13-0f58-405d-b48e-4a7048e33b58_800x355.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Next, we will import the dataset we prepared in the last article. It has <strong>one-hot encoding</strong> applied to the <strong>categorical features</strong> and <strong>standardized scaling</strong> applied to the <strong>numeric features</strong>. The code below will read the CSV file into a <strong>Pandas</strong> dataframe. We set the <code>index_col</code> parameter to <code>'ID'</code> to retain the index from the original file.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vXMn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0095cbbe-5ed0-4f21-a3a9-a71bdfd3621a_800x329.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vXMn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0095cbbe-5ed0-4f21-a3a9-a71bdfd3621a_800x329.png 424w, https://substackcdn.com/image/fetch/$s_!vXMn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0095cbbe-5ed0-4f21-a3a9-a71bdfd3621a_800x329.png 848w, https://substackcdn.com/image/fetch/$s_!vXMn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0095cbbe-5ed0-4f21-a3a9-a71bdfd3621a_800x329.png 1272w, https://substackcdn.com/image/fetch/$s_!vXMn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0095cbbe-5ed0-4f21-a3a9-a71bdfd3621a_800x329.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vXMn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0095cbbe-5ed0-4f21-a3a9-a71bdfd3621a_800x329.png" width="800" height="329" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0095cbbe-5ed0-4f21-a3a9-a71bdfd3621a_800x329.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:329,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vXMn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0095cbbe-5ed0-4f21-a3a9-a71bdfd3621a_800x329.png 424w, https://substackcdn.com/image/fetch/$s_!vXMn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0095cbbe-5ed0-4f21-a3a9-a71bdfd3621a_800x329.png 848w, https://substackcdn.com/image/fetch/$s_!vXMn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0095cbbe-5ed0-4f21-a3a9-a71bdfd3621a_800x329.png 1272w, https://substackcdn.com/image/fetch/$s_!vXMn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0095cbbe-5ed0-4f21-a3a9-a71bdfd3621a_800x329.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The dataframe must be separated so that the <strong>features</strong> are together and the <strong>labels</strong>, or <strong>targets</strong> are alone. Below, we use Pandas to achieve this.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a9lu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdcf6ff5-37ab-40d9-b5d4-7e79c7b7ee90_800x268.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a9lu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdcf6ff5-37ab-40d9-b5d4-7e79c7b7ee90_800x268.png 424w, https://substackcdn.com/image/fetch/$s_!a9lu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdcf6ff5-37ab-40d9-b5d4-7e79c7b7ee90_800x268.png 848w, https://substackcdn.com/image/fetch/$s_!a9lu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdcf6ff5-37ab-40d9-b5d4-7e79c7b7ee90_800x268.png 1272w, https://substackcdn.com/image/fetch/$s_!a9lu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdcf6ff5-37ab-40d9-b5d4-7e79c7b7ee90_800x268.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a9lu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdcf6ff5-37ab-40d9-b5d4-7e79c7b7ee90_800x268.png" width="800" height="268" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cdcf6ff5-37ab-40d9-b5d4-7e79c7b7ee90_800x268.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:268,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a9lu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdcf6ff5-37ab-40d9-b5d4-7e79c7b7ee90_800x268.png 424w, https://substackcdn.com/image/fetch/$s_!a9lu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdcf6ff5-37ab-40d9-b5d4-7e79c7b7ee90_800x268.png 848w, https://substackcdn.com/image/fetch/$s_!a9lu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdcf6ff5-37ab-40d9-b5d4-7e79c7b7ee90_800x268.png 1272w, https://substackcdn.com/image/fetch/$s_!a9lu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdcf6ff5-37ab-40d9-b5d4-7e79c7b7ee90_800x268.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now, we will split our data into <strong>training</strong> and <strong>test</strong> datasets. Each pair has the <strong>X</strong>, for the <strong>features</strong>, and <strong>y</strong> for the <strong>labels</strong>. The <code>random_state</code> is set to ensure <strong>reproducibility</strong> of our results if others want to simulate our steps.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BRHP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcd54c3a-ee0d-4ad6-aad9-4066c8c0ffa4_800x231.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BRHP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcd54c3a-ee0d-4ad6-aad9-4066c8c0ffa4_800x231.png 424w, https://substackcdn.com/image/fetch/$s_!BRHP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcd54c3a-ee0d-4ad6-aad9-4066c8c0ffa4_800x231.png 848w, https://substackcdn.com/image/fetch/$s_!BRHP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcd54c3a-ee0d-4ad6-aad9-4066c8c0ffa4_800x231.png 1272w, https://substackcdn.com/image/fetch/$s_!BRHP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcd54c3a-ee0d-4ad6-aad9-4066c8c0ffa4_800x231.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BRHP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcd54c3a-ee0d-4ad6-aad9-4066c8c0ffa4_800x231.png" width="800" height="231" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fcd54c3a-ee0d-4ad6-aad9-4066c8c0ffa4_800x231.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:231,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BRHP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcd54c3a-ee0d-4ad6-aad9-4066c8c0ffa4_800x231.png 424w, https://substackcdn.com/image/fetch/$s_!BRHP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcd54c3a-ee0d-4ad6-aad9-4066c8c0ffa4_800x231.png 848w, https://substackcdn.com/image/fetch/$s_!BRHP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcd54c3a-ee0d-4ad6-aad9-4066c8c0ffa4_800x231.png 1272w, https://substackcdn.com/image/fetch/$s_!BRHP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcd54c3a-ee0d-4ad6-aad9-4066c8c0ffa4_800x231.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The next step we are going to take is to produce a baseline model for this binary classification task. To do this, we choose the <strong>Logistic Regression</strong> algorithm, which performed near the top of our results last time. The parameters are what we saw as most successful in the last article.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9xcp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1194f0d5-7916-4fe8-b594-b4cefe311734_800x322.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9xcp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1194f0d5-7916-4fe8-b594-b4cefe311734_800x322.png 424w, https://substackcdn.com/image/fetch/$s_!9xcp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1194f0d5-7916-4fe8-b594-b4cefe311734_800x322.png 848w, https://substackcdn.com/image/fetch/$s_!9xcp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1194f0d5-7916-4fe8-b594-b4cefe311734_800x322.png 1272w, https://substackcdn.com/image/fetch/$s_!9xcp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1194f0d5-7916-4fe8-b594-b4cefe311734_800x322.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9xcp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1194f0d5-7916-4fe8-b594-b4cefe311734_800x322.png" width="800" height="322" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1194f0d5-7916-4fe8-b594-b4cefe311734_800x322.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:322,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9xcp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1194f0d5-7916-4fe8-b594-b4cefe311734_800x322.png 424w, https://substackcdn.com/image/fetch/$s_!9xcp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1194f0d5-7916-4fe8-b594-b4cefe311734_800x322.png 848w, https://substackcdn.com/image/fetch/$s_!9xcp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1194f0d5-7916-4fe8-b594-b4cefe311734_800x322.png 1272w, https://substackcdn.com/image/fetch/$s_!9xcp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1194f0d5-7916-4fe8-b594-b4cefe311734_800x322.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We look at the <strong>confusion matrix</strong> for the Logistic Regression classifier and see that the <strong>majority class</strong>, 0, has the most accurate predictions by far. As it turns out, there are many more 0&#8217;s in the dataset than 1&#8217;s. This kind of <strong>class imbalance</strong> can lead to problems with modeling. In a future article, we will look at some potential solutions for class imbalance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b9iF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ad30bd-c1c1-4503-a239-1f95153910a9_354x247.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b9iF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ad30bd-c1c1-4503-a239-1f95153910a9_354x247.png 424w, https://substackcdn.com/image/fetch/$s_!b9iF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ad30bd-c1c1-4503-a239-1f95153910a9_354x247.png 848w, https://substackcdn.com/image/fetch/$s_!b9iF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ad30bd-c1c1-4503-a239-1f95153910a9_354x247.png 1272w, https://substackcdn.com/image/fetch/$s_!b9iF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ad30bd-c1c1-4503-a239-1f95153910a9_354x247.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b9iF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ad30bd-c1c1-4503-a239-1f95153910a9_354x247.png" width="354" height="247" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8ad30bd-c1c1-4503-a239-1f95153910a9_354x247.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:247,&quot;width&quot;:354,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!b9iF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ad30bd-c1c1-4503-a239-1f95153910a9_354x247.png 424w, https://substackcdn.com/image/fetch/$s_!b9iF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ad30bd-c1c1-4503-a239-1f95153910a9_354x247.png 848w, https://substackcdn.com/image/fetch/$s_!b9iF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ad30bd-c1c1-4503-a239-1f95153910a9_354x247.png 1272w, https://substackcdn.com/image/fetch/$s_!b9iF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8ad30bd-c1c1-4503-a239-1f95153910a9_354x247.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <code>classification_report</code> gives details about the <strong>precision</strong>, <strong>recall</strong>, and <strong>f1-score</strong> for the model. We can see the <strong>accuracy</strong> is about <strong>82</strong>%.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VNN5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9636c4d6-956a-4f04-9dde-50a0a19c76a5_800x329.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VNN5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9636c4d6-956a-4f04-9dde-50a0a19c76a5_800x329.png 424w, https://substackcdn.com/image/fetch/$s_!VNN5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9636c4d6-956a-4f04-9dde-50a0a19c76a5_800x329.png 848w, https://substackcdn.com/image/fetch/$s_!VNN5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9636c4d6-956a-4f04-9dde-50a0a19c76a5_800x329.png 1272w, https://substackcdn.com/image/fetch/$s_!VNN5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9636c4d6-956a-4f04-9dde-50a0a19c76a5_800x329.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VNN5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9636c4d6-956a-4f04-9dde-50a0a19c76a5_800x329.png" width="800" height="329" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9636c4d6-956a-4f04-9dde-50a0a19c76a5_800x329.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:329,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VNN5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9636c4d6-956a-4f04-9dde-50a0a19c76a5_800x329.png 424w, https://substackcdn.com/image/fetch/$s_!VNN5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9636c4d6-956a-4f04-9dde-50a0a19c76a5_800x329.png 848w, https://substackcdn.com/image/fetch/$s_!VNN5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9636c4d6-956a-4f04-9dde-50a0a19c76a5_800x329.png 1272w, https://substackcdn.com/image/fetch/$s_!VNN5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9636c4d6-956a-4f04-9dde-50a0a19c76a5_800x329.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Hyperparameters</strong> can be notoriously difficult, and time-consuming, to tune. With <code>GridSearchCV</code> we can compare the performance of several versions of the same base classifier on the same task. We instantiate a classifier object, then hand the <strong>parameter grid</strong> to it.</p><p>Each line of the <code>param_grid</code> has a set of hyperparameter settings that defines the base model further. <code>GridSearchCV</code> runs each model on the dataset, and produces a set of attributes. We can access the attributes, like <code>best_params</code>, to show which configuration works best.</p><p>Below, we experiment with different <strong>regularization</strong> settings and <strong>tolerances</strong> on the Logistic Regression model. Also, the upcoming <strong>XGBoost Classifier</strong> parameter grid is defined.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UmZQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd44ea3-f79e-4d59-8ddb-9046bbb50e54_800x386.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UmZQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd44ea3-f79e-4d59-8ddb-9046bbb50e54_800x386.png 424w, https://substackcdn.com/image/fetch/$s_!UmZQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd44ea3-f79e-4d59-8ddb-9046bbb50e54_800x386.png 848w, https://substackcdn.com/image/fetch/$s_!UmZQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd44ea3-f79e-4d59-8ddb-9046bbb50e54_800x386.png 1272w, https://substackcdn.com/image/fetch/$s_!UmZQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd44ea3-f79e-4d59-8ddb-9046bbb50e54_800x386.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UmZQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd44ea3-f79e-4d59-8ddb-9046bbb50e54_800x386.png" width="800" height="386" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3dd44ea3-f79e-4d59-8ddb-9046bbb50e54_800x386.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:386,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UmZQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd44ea3-f79e-4d59-8ddb-9046bbb50e54_800x386.png 424w, https://substackcdn.com/image/fetch/$s_!UmZQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd44ea3-f79e-4d59-8ddb-9046bbb50e54_800x386.png 848w, https://substackcdn.com/image/fetch/$s_!UmZQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd44ea3-f79e-4d59-8ddb-9046bbb50e54_800x386.png 1272w, https://substackcdn.com/image/fetch/$s_!UmZQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd44ea3-f79e-4d59-8ddb-9046bbb50e54_800x386.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here, we define the <strong>GridsearchCV</strong> instance for the <strong>Logistic Regression</strong> classifier and run the <code>fit</code> method with the training data.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iDd4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebd4df6-7fd0-44a6-ac3f-1ed5f89177d4_800x177.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iDd4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebd4df6-7fd0-44a6-ac3f-1ed5f89177d4_800x177.png 424w, https://substackcdn.com/image/fetch/$s_!iDd4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebd4df6-7fd0-44a6-ac3f-1ed5f89177d4_800x177.png 848w, https://substackcdn.com/image/fetch/$s_!iDd4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebd4df6-7fd0-44a6-ac3f-1ed5f89177d4_800x177.png 1272w, https://substackcdn.com/image/fetch/$s_!iDd4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebd4df6-7fd0-44a6-ac3f-1ed5f89177d4_800x177.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iDd4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebd4df6-7fd0-44a6-ac3f-1ed5f89177d4_800x177.png" width="800" height="177" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ebd4df6-7fd0-44a6-ac3f-1ed5f89177d4_800x177.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:177,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iDd4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebd4df6-7fd0-44a6-ac3f-1ed5f89177d4_800x177.png 424w, https://substackcdn.com/image/fetch/$s_!iDd4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebd4df6-7fd0-44a6-ac3f-1ed5f89177d4_800x177.png 848w, https://substackcdn.com/image/fetch/$s_!iDd4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebd4df6-7fd0-44a6-ac3f-1ed5f89177d4_800x177.png 1272w, https://substackcdn.com/image/fetch/$s_!iDd4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ebd4df6-7fd0-44a6-ac3f-1ed5f89177d4_800x177.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Hh_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F263e47de-16a9-489e-a7b1-1e9871389897_800x175.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Hh_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F263e47de-16a9-489e-a7b1-1e9871389897_800x175.png 424w, https://substackcdn.com/image/fetch/$s_!5Hh_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F263e47de-16a9-489e-a7b1-1e9871389897_800x175.png 848w, https://substackcdn.com/image/fetch/$s_!5Hh_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F263e47de-16a9-489e-a7b1-1e9871389897_800x175.png 1272w, https://substackcdn.com/image/fetch/$s_!5Hh_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F263e47de-16a9-489e-a7b1-1e9871389897_800x175.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Hh_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F263e47de-16a9-489e-a7b1-1e9871389897_800x175.png" width="800" height="175" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/263e47de-16a9-489e-a7b1-1e9871389897_800x175.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:175,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Hh_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F263e47de-16a9-489e-a7b1-1e9871389897_800x175.png 424w, https://substackcdn.com/image/fetch/$s_!5Hh_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F263e47de-16a9-489e-a7b1-1e9871389897_800x175.png 848w, https://substackcdn.com/image/fetch/$s_!5Hh_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F263e47de-16a9-489e-a7b1-1e9871389897_800x175.png 1272w, https://substackcdn.com/image/fetch/$s_!5Hh_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F263e47de-16a9-489e-a7b1-1e9871389897_800x175.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Instead of evaluating the models on the basis of simple accuracy, we have decided to use the <code>roc_auc </code>score. Looks like the best configuration for Logistic Regression is the following:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PE7B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0d61f8-5adb-423f-b7ec-aaff6b52b954_800x177.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PE7B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0d61f8-5adb-423f-b7ec-aaff6b52b954_800x177.png 424w, https://substackcdn.com/image/fetch/$s_!PE7B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0d61f8-5adb-423f-b7ec-aaff6b52b954_800x177.png 848w, https://substackcdn.com/image/fetch/$s_!PE7B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0d61f8-5adb-423f-b7ec-aaff6b52b954_800x177.png 1272w, https://substackcdn.com/image/fetch/$s_!PE7B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0d61f8-5adb-423f-b7ec-aaff6b52b954_800x177.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PE7B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0d61f8-5adb-423f-b7ec-aaff6b52b954_800x177.png" width="800" height="177" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9a0d61f8-5adb-423f-b7ec-aaff6b52b954_800x177.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:177,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PE7B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0d61f8-5adb-423f-b7ec-aaff6b52b954_800x177.png 424w, https://substackcdn.com/image/fetch/$s_!PE7B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0d61f8-5adb-423f-b7ec-aaff6b52b954_800x177.png 848w, https://substackcdn.com/image/fetch/$s_!PE7B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0d61f8-5adb-423f-b7ec-aaff6b52b954_800x177.png 1272w, https://substackcdn.com/image/fetch/$s_!PE7B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a0d61f8-5adb-423f-b7ec-aaff6b52b954_800x177.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Now, we discover the best configuration for XGBoost below:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y403!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88342153-57ed-42ea-972b-22d7e5327535_800x167.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y403!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88342153-57ed-42ea-972b-22d7e5327535_800x167.png 424w, https://substackcdn.com/image/fetch/$s_!y403!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88342153-57ed-42ea-972b-22d7e5327535_800x167.png 848w, https://substackcdn.com/image/fetch/$s_!y403!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88342153-57ed-42ea-972b-22d7e5327535_800x167.png 1272w, https://substackcdn.com/image/fetch/$s_!y403!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88342153-57ed-42ea-972b-22d7e5327535_800x167.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y403!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88342153-57ed-42ea-972b-22d7e5327535_800x167.png" width="800" height="167" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/88342153-57ed-42ea-972b-22d7e5327535_800x167.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:167,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y403!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88342153-57ed-42ea-972b-22d7e5327535_800x167.png 424w, https://substackcdn.com/image/fetch/$s_!y403!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88342153-57ed-42ea-972b-22d7e5327535_800x167.png 848w, https://substackcdn.com/image/fetch/$s_!y403!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88342153-57ed-42ea-972b-22d7e5327535_800x167.png 1272w, https://substackcdn.com/image/fetch/$s_!y403!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88342153-57ed-42ea-972b-22d7e5327535_800x167.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1SKZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4604cd1c-8c3f-49ee-b16b-3ceb8bc452a2_800x369.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1SKZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4604cd1c-8c3f-49ee-b16b-3ceb8bc452a2_800x369.png 424w, https://substackcdn.com/image/fetch/$s_!1SKZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4604cd1c-8c3f-49ee-b16b-3ceb8bc452a2_800x369.png 848w, https://substackcdn.com/image/fetch/$s_!1SKZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4604cd1c-8c3f-49ee-b16b-3ceb8bc452a2_800x369.png 1272w, https://substackcdn.com/image/fetch/$s_!1SKZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4604cd1c-8c3f-49ee-b16b-3ceb8bc452a2_800x369.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1SKZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4604cd1c-8c3f-49ee-b16b-3ceb8bc452a2_800x369.png" width="800" height="369" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4604cd1c-8c3f-49ee-b16b-3ceb8bc452a2_800x369.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:369,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1SKZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4604cd1c-8c3f-49ee-b16b-3ceb8bc452a2_800x369.png 424w, https://substackcdn.com/image/fetch/$s_!1SKZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4604cd1c-8c3f-49ee-b16b-3ceb8bc452a2_800x369.png 848w, https://substackcdn.com/image/fetch/$s_!1SKZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4604cd1c-8c3f-49ee-b16b-3ceb8bc452a2_800x369.png 1272w, https://substackcdn.com/image/fetch/$s_!1SKZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4604cd1c-8c3f-49ee-b16b-3ceb8bc452a2_800x369.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We know better now the best version of each model for this dataset, given the parameters we decided to evaluate. In order to evaluate a classifier&#8217;s performance in this case, we could settle for the simple <code>accuracy</code> score. However, we have already observed a significant imbalance between the two classes in the dataset. In cases like this, <code>roc_auc </code>will allow us to assess performance in a more robust way.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c6Iy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a201b9b-4c2d-416d-99be-8d7c61fb728f_800x313.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c6Iy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a201b9b-4c2d-416d-99be-8d7c61fb728f_800x313.png 424w, https://substackcdn.com/image/fetch/$s_!c6Iy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a201b9b-4c2d-416d-99be-8d7c61fb728f_800x313.png 848w, https://substackcdn.com/image/fetch/$s_!c6Iy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a201b9b-4c2d-416d-99be-8d7c61fb728f_800x313.png 1272w, https://substackcdn.com/image/fetch/$s_!c6Iy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a201b9b-4c2d-416d-99be-8d7c61fb728f_800x313.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c6Iy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a201b9b-4c2d-416d-99be-8d7c61fb728f_800x313.png" width="800" height="313" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a201b9b-4c2d-416d-99be-8d7c61fb728f_800x313.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:313,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c6Iy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a201b9b-4c2d-416d-99be-8d7c61fb728f_800x313.png 424w, https://substackcdn.com/image/fetch/$s_!c6Iy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a201b9b-4c2d-416d-99be-8d7c61fb728f_800x313.png 848w, https://substackcdn.com/image/fetch/$s_!c6Iy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a201b9b-4c2d-416d-99be-8d7c61fb728f_800x313.png 1272w, https://substackcdn.com/image/fetch/$s_!c6Iy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a201b9b-4c2d-416d-99be-8d7c61fb728f_800x313.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <strong>ROC-AUC Curve</strong> plot below compares the <strong>Logistic Regression classifier</strong> to the <strong>XGBoost Classifier</strong> in terms of the the area under the curve of the <strong>ROC</strong>, or <strong>Receiver Operator Characteristic</strong>. For imbalanced datasets in binary classification tasks, this is generally a better measure of a model&#8217;s performance than standard accuracy.</p><p>The red dashed line represents the XGBoost classifier and the orange line shows the Logistic Regression classifier. The <strong>diagonal blue line</strong> is the <em><strong>50%</strong></em> mark, below which the model performs worse than a random guess.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t6rb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8fd4758-73cb-465b-badd-d07e7f1dd1ba_800x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t6rb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8fd4758-73cb-465b-badd-d07e7f1dd1ba_800x600.png 424w, https://substackcdn.com/image/fetch/$s_!t6rb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8fd4758-73cb-465b-badd-d07e7f1dd1ba_800x600.png 848w, https://substackcdn.com/image/fetch/$s_!t6rb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8fd4758-73cb-465b-badd-d07e7f1dd1ba_800x600.png 1272w, https://substackcdn.com/image/fetch/$s_!t6rb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8fd4758-73cb-465b-badd-d07e7f1dd1ba_800x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t6rb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8fd4758-73cb-465b-badd-d07e7f1dd1ba_800x600.png" width="800" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b8fd4758-73cb-465b-badd-d07e7f1dd1ba_800x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!t6rb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8fd4758-73cb-465b-badd-d07e7f1dd1ba_800x600.png 424w, https://substackcdn.com/image/fetch/$s_!t6rb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8fd4758-73cb-465b-badd-d07e7f1dd1ba_800x600.png 848w, https://substackcdn.com/image/fetch/$s_!t6rb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8fd4758-73cb-465b-badd-d07e7f1dd1ba_800x600.png 1272w, https://substackcdn.com/image/fetch/$s_!t6rb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8fd4758-73cb-465b-badd-d07e7f1dd1ba_800x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We can see that the <strong>XGBoost</strong> classifier has the edge, as it covers a bit more area of the whole. To see a detailed, numeric representation, we can generate the actual <strong>label predictions</strong> for each classifier with their respective <code>predict</code> methods. <strong>Scikit-learn</strong> has the <code>classification_report</code> which displays a breakdown of of <strong>precision, recall, f1-score, and accuracy</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8dJ7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F053bd8c4-322c-4121-84c0-23d998d5238e_800x251.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8dJ7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F053bd8c4-322c-4121-84c0-23d998d5238e_800x251.png 424w, https://substackcdn.com/image/fetch/$s_!8dJ7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F053bd8c4-322c-4121-84c0-23d998d5238e_800x251.png 848w, https://substackcdn.com/image/fetch/$s_!8dJ7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F053bd8c4-322c-4121-84c0-23d998d5238e_800x251.png 1272w, https://substackcdn.com/image/fetch/$s_!8dJ7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F053bd8c4-322c-4121-84c0-23d998d5238e_800x251.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8dJ7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F053bd8c4-322c-4121-84c0-23d998d5238e_800x251.png" width="800" height="251" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/053bd8c4-322c-4121-84c0-23d998d5238e_800x251.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:251,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8dJ7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F053bd8c4-322c-4121-84c0-23d998d5238e_800x251.png 424w, https://substackcdn.com/image/fetch/$s_!8dJ7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F053bd8c4-322c-4121-84c0-23d998d5238e_800x251.png 848w, https://substackcdn.com/image/fetch/$s_!8dJ7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F053bd8c4-322c-4121-84c0-23d998d5238e_800x251.png 1272w, https://substackcdn.com/image/fetch/$s_!8dJ7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F053bd8c4-322c-4121-84c0-23d998d5238e_800x251.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><pre><code>precision    recall  f1-score   support

           0       0.84      0.95      0.89      4687
           1       0.66      0.35      0.46      1313

    accuracy                           0.82      6000
   macro avg       0.75      0.65      0.67      6000
weighted avg       0.80      0.82      0.80      6000</code></pre><p>For a visually appealing display of the ability of the classifier to predict each class, the <code>confusion_matrix</code> is useful. We make the graph from the data provided by the <strong>Scikit-learn</strong> <code>confusion_matrix</code> module. Then, we use <code>seaborn</code> and its <code>heatmap</code> graph to make the point visually, with the actual numbers for each class presented. This makes it easy to see the general point and also be able to analyze more in terms of specific data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CsuT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe120b98f-a0f3-4b18-9610-e08c3af03bf9_354x247.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CsuT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe120b98f-a0f3-4b18-9610-e08c3af03bf9_354x247.png 424w, https://substackcdn.com/image/fetch/$s_!CsuT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe120b98f-a0f3-4b18-9610-e08c3af03bf9_354x247.png 848w, https://substackcdn.com/image/fetch/$s_!CsuT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe120b98f-a0f3-4b18-9610-e08c3af03bf9_354x247.png 1272w, https://substackcdn.com/image/fetch/$s_!CsuT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe120b98f-a0f3-4b18-9610-e08c3af03bf9_354x247.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CsuT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe120b98f-a0f3-4b18-9610-e08c3af03bf9_354x247.png" width="354" height="247" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e120b98f-a0f3-4b18-9610-e08c3af03bf9_354x247.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:247,&quot;width&quot;:354,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CsuT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe120b98f-a0f3-4b18-9610-e08c3af03bf9_354x247.png 424w, https://substackcdn.com/image/fetch/$s_!CsuT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe120b98f-a0f3-4b18-9610-e08c3af03bf9_354x247.png 848w, https://substackcdn.com/image/fetch/$s_!CsuT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe120b98f-a0f3-4b18-9610-e08c3af03bf9_354x247.png 1272w, https://substackcdn.com/image/fetch/$s_!CsuT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe120b98f-a0f3-4b18-9610-e08c3af03bf9_354x247.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PGJR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66fdc76-ab28-49aa-9353-0f3909cd1dc0_800x193.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PGJR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66fdc76-ab28-49aa-9353-0f3909cd1dc0_800x193.png 424w, https://substackcdn.com/image/fetch/$s_!PGJR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66fdc76-ab28-49aa-9353-0f3909cd1dc0_800x193.png 848w, https://substackcdn.com/image/fetch/$s_!PGJR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66fdc76-ab28-49aa-9353-0f3909cd1dc0_800x193.png 1272w, https://substackcdn.com/image/fetch/$s_!PGJR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66fdc76-ab28-49aa-9353-0f3909cd1dc0_800x193.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PGJR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66fdc76-ab28-49aa-9353-0f3909cd1dc0_800x193.png" width="800" height="193" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d66fdc76-ab28-49aa-9353-0f3909cd1dc0_800x193.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:193,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PGJR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66fdc76-ab28-49aa-9353-0f3909cd1dc0_800x193.png 424w, https://substackcdn.com/image/fetch/$s_!PGJR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66fdc76-ab28-49aa-9353-0f3909cd1dc0_800x193.png 848w, https://substackcdn.com/image/fetch/$s_!PGJR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66fdc76-ab28-49aa-9353-0f3909cd1dc0_800x193.png 1272w, https://substackcdn.com/image/fetch/$s_!PGJR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd66fdc76-ab28-49aa-9353-0f3909cd1dc0_800x193.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><pre><code>precision    recall  f1-score   support

           0       0.84      0.95      0.89      4687
           1       0.68      0.35      0.47      1313

    accuracy                           0.82      6000
   macro avg       0.76      0.65      0.68      6000
weighted avg       0.81      0.82      0.80      6000</code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_NPi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c8e1335-ca83-4e8d-b92d-4a913c527eac_359x248.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_NPi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c8e1335-ca83-4e8d-b92d-4a913c527eac_359x248.png 424w, https://substackcdn.com/image/fetch/$s_!_NPi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c8e1335-ca83-4e8d-b92d-4a913c527eac_359x248.png 848w, https://substackcdn.com/image/fetch/$s_!_NPi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c8e1335-ca83-4e8d-b92d-4a913c527eac_359x248.png 1272w, https://substackcdn.com/image/fetch/$s_!_NPi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c8e1335-ca83-4e8d-b92d-4a913c527eac_359x248.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_NPi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c8e1335-ca83-4e8d-b92d-4a913c527eac_359x248.png" width="359" height="248" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c8e1335-ca83-4e8d-b92d-4a913c527eac_359x248.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:248,&quot;width&quot;:359,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_NPi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c8e1335-ca83-4e8d-b92d-4a913c527eac_359x248.png 424w, https://substackcdn.com/image/fetch/$s_!_NPi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c8e1335-ca83-4e8d-b92d-4a913c527eac_359x248.png 848w, https://substackcdn.com/image/fetch/$s_!_NPi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c8e1335-ca83-4e8d-b92d-4a913c527eac_359x248.png 1272w, https://substackcdn.com/image/fetch/$s_!_NPi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c8e1335-ca83-4e8d-b92d-4a913c527eac_359x248.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Evaluated from these standpoints, the two classifiers are almost identical. Both have similar scores for each class and accuracy is identical. However, the XGBoost classifier shows slightly more promise. First, it has more correct predictions of both the 0 and 1 classes. Also, the <strong>false positive</strong> and <strong>false negative</strong> rates are lower. In the case of assessing default risk, it is better to keep the false positive rate as low as possible.</p><p>These gains in accuracy and performance are admittedly modest. Earlier, we noted the sizable imbalance in the classes. There are several strategies for dealing with class imbalance and it seems worthwhile to explore some options to improve our predictive power.</p><p>In the next article, we will look into using two techniques called <strong>SMOTE</strong> and <strong>Neighborhood Cleaning Rule</strong> to apply some <strong>sampling</strong> methods to address the imbalances in the dataset and hopefully improve performance.</p>]]></content:encoded></item><item><title><![CDATA[XGBoost & KFold for ML Model Selection]]></title><description><![CDATA[How to use XGBoost to select top features, then KFold to select a model.]]></description><link>https://blog.point-4-point.com/p/xgboost-kfold-for-ml-model-selection-c23b540ca31b</link><guid isPermaLink="false">https://blog.point-4-point.com/p/xgboost-kfold-for-ml-model-selection-c23b540ca31b</guid><dc:creator><![CDATA[Ben Perkins]]></dc:creator><pubDate>Wed, 31 Aug 2022 03:12:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/6f600663-9c5b-49d6-9cbd-ee6a9e4fa6c4_800x433.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h4>Series: Taiwan Credit Default&nbsp;Dataset</h4><p>This article is part of a series where we explore, preprocess, and run several machine learning methods on the <strong>Taiwan Credit Default</strong> dataset. The dataset can be found here: <a href="https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients">dataset</a>&nbsp;.</p><p>Two important questions usually come up when we are trying to figure out how best to model a dataset, and hopefully make predictions based on it:</p><blockquote><p>&#183; Are all the given features <strong>relevant</strong>, and if not, which subset will be best?</p></blockquote><blockquote><p>&#183; Given the best subset of features, which machine learning model will yield the best results?</p></blockquote><p>Because of the limitations of computing power, even now in 2022, it is important to reduce the number of dimensions to as few as possible, <em>without sacrificing significant amounts of information</em>. In this article we will look at one strategy for selecting the best subset of features, using the <strong>XGBoost</strong> algorithm. Then, we will use the resulting subset of data to evaluate several basic classifiers to decide which deserves more time tuning for optimal performance. Let&#8217;s get started!</p><p>The first step is to import some initial packages and import the data from Excel. Below, we have imported the data into <strong>Pandas</strong> with a helper application, <strong>xlrd</strong>. This will allow us to import&nbsp;<strong>.xls</strong> files, and even select the specific sheet in the workbook. This produces a dataframe with 24 columns with 30,000 instances. For a more detailed look at this dataset, see <em><strong><a href="https://blog.point-4-point.com/preparing-data-for-ml-deep-learning-863f51da7522">this article</a>&nbsp;.</strong></em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mc-d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03915a9f-90e7-41b7-9195-8d9f8ab2b876_800x433.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mc-d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03915a9f-90e7-41b7-9195-8d9f8ab2b876_800x433.png 424w, https://substackcdn.com/image/fetch/$s_!mc-d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03915a9f-90e7-41b7-9195-8d9f8ab2b876_800x433.png 848w, https://substackcdn.com/image/fetch/$s_!mc-d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03915a9f-90e7-41b7-9195-8d9f8ab2b876_800x433.png 1272w, https://substackcdn.com/image/fetch/$s_!mc-d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03915a9f-90e7-41b7-9195-8d9f8ab2b876_800x433.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mc-d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03915a9f-90e7-41b7-9195-8d9f8ab2b876_800x433.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03915a9f-90e7-41b7-9195-8d9f8ab2b876_800x433.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mc-d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03915a9f-90e7-41b7-9195-8d9f8ab2b876_800x433.png 424w, https://substackcdn.com/image/fetch/$s_!mc-d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03915a9f-90e7-41b7-9195-8d9f8ab2b876_800x433.png 848w, https://substackcdn.com/image/fetch/$s_!mc-d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03915a9f-90e7-41b7-9195-8d9f8ab2b876_800x433.png 1272w, https://substackcdn.com/image/fetch/$s_!mc-d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03915a9f-90e7-41b7-9195-8d9f8ab2b876_800x433.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>Now, we will import most of the packages, most from <strong>Scikit Learn</strong>, that we will use going forward.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RYIl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4303f2-b372-410f-9167-0d61575e40a5_800x608.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RYIl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4303f2-b372-410f-9167-0d61575e40a5_800x608.png 424w, https://substackcdn.com/image/fetch/$s_!RYIl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4303f2-b372-410f-9167-0d61575e40a5_800x608.png 848w, https://substackcdn.com/image/fetch/$s_!RYIl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4303f2-b372-410f-9167-0d61575e40a5_800x608.png 1272w, https://substackcdn.com/image/fetch/$s_!RYIl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4303f2-b372-410f-9167-0d61575e40a5_800x608.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RYIl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4303f2-b372-410f-9167-0d61575e40a5_800x608.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac4303f2-b372-410f-9167-0d61575e40a5_800x608.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RYIl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4303f2-b372-410f-9167-0d61575e40a5_800x608.png 424w, https://substackcdn.com/image/fetch/$s_!RYIl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4303f2-b372-410f-9167-0d61575e40a5_800x608.png 848w, https://substackcdn.com/image/fetch/$s_!RYIl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4303f2-b372-410f-9167-0d61575e40a5_800x608.png 1272w, https://substackcdn.com/image/fetch/$s_!RYIl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac4303f2-b372-410f-9167-0d61575e40a5_800x608.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>This short piece of code will separate the <strong>features</strong> from the <strong>target variables</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pPun!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36e9314-cdab-43d0-ba45-0622e9384250_800x214.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pPun!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36e9314-cdab-43d0-ba45-0622e9384250_800x214.png 424w, https://substackcdn.com/image/fetch/$s_!pPun!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36e9314-cdab-43d0-ba45-0622e9384250_800x214.png 848w, https://substackcdn.com/image/fetch/$s_!pPun!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36e9314-cdab-43d0-ba45-0622e9384250_800x214.png 1272w, https://substackcdn.com/image/fetch/$s_!pPun!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36e9314-cdab-43d0-ba45-0622e9384250_800x214.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pPun!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36e9314-cdab-43d0-ba45-0622e9384250_800x214.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f36e9314-cdab-43d0-ba45-0622e9384250_800x214.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pPun!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36e9314-cdab-43d0-ba45-0622e9384250_800x214.png 424w, https://substackcdn.com/image/fetch/$s_!pPun!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36e9314-cdab-43d0-ba45-0622e9384250_800x214.png 848w, https://substackcdn.com/image/fetch/$s_!pPun!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36e9314-cdab-43d0-ba45-0622e9384250_800x214.png 1272w, https://substackcdn.com/image/fetch/$s_!pPun!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff36e9314-cdab-43d0-ba45-0622e9384250_800x214.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Our next step is to use <strong>train-test-split</strong> to give us subsets of the data to train and test our models. We will use an <strong>80/20 </strong>split (20% is testing data) and set a <strong>random_state</strong> of 42, which seems to be somewhat of a tradition.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Q1qX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1398a-d344-40a0-af21-64ed945a9d70_800x174.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Q1qX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1398a-d344-40a0-af21-64ed945a9d70_800x174.png 424w, https://substackcdn.com/image/fetch/$s_!Q1qX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1398a-d344-40a0-af21-64ed945a9d70_800x174.png 848w, https://substackcdn.com/image/fetch/$s_!Q1qX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1398a-d344-40a0-af21-64ed945a9d70_800x174.png 1272w, https://substackcdn.com/image/fetch/$s_!Q1qX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1398a-d344-40a0-af21-64ed945a9d70_800x174.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Q1qX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1398a-d344-40a0-af21-64ed945a9d70_800x174.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ede1398a-d344-40a0-af21-64ed945a9d70_800x174.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Q1qX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1398a-d344-40a0-af21-64ed945a9d70_800x174.png 424w, https://substackcdn.com/image/fetch/$s_!Q1qX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1398a-d344-40a0-af21-64ed945a9d70_800x174.png 848w, https://substackcdn.com/image/fetch/$s_!Q1qX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1398a-d344-40a0-af21-64ed945a9d70_800x174.png 1272w, https://substackcdn.com/image/fetch/$s_!Q1qX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1398a-d344-40a0-af21-64ed945a9d70_800x174.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>There are several ways to identify the best subset of features to select for training a model. The basic concept is that we want to retain those features with the most importance for predicting the target variable and leave off the rest of the features that do not benefit the prediction task enough to warrant the added complexity. Complexity can entail overfitting, difficulty of interpretation, and excessive resource needs. Therefore, we will want to minimize it.</p><p>For this example, we will use the popular <strong>XGBoost</strong> algorithm in <strong>classifier</strong> mode. Here, we <strong>fit</strong> the model to the training data and then use the <strong>feature_importances_</strong> attribute to discover the features that <strong>retain the most explanatory variance</strong>, while minimizing the <strong>number of features</strong>. At a certain point, some features yield less explanatory power than they are worth in terms of complexity.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p4hv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45fde942-ef08-43d5-8884-df19a2431033_800x476.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p4hv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45fde942-ef08-43d5-8884-df19a2431033_800x476.png 424w, https://substackcdn.com/image/fetch/$s_!p4hv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45fde942-ef08-43d5-8884-df19a2431033_800x476.png 848w, https://substackcdn.com/image/fetch/$s_!p4hv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45fde942-ef08-43d5-8884-df19a2431033_800x476.png 1272w, https://substackcdn.com/image/fetch/$s_!p4hv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45fde942-ef08-43d5-8884-df19a2431033_800x476.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p4hv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45fde942-ef08-43d5-8884-df19a2431033_800x476.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45fde942-ef08-43d5-8884-df19a2431033_800x476.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p4hv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45fde942-ef08-43d5-8884-df19a2431033_800x476.png 424w, https://substackcdn.com/image/fetch/$s_!p4hv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45fde942-ef08-43d5-8884-df19a2431033_800x476.png 848w, https://substackcdn.com/image/fetch/$s_!p4hv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45fde942-ef08-43d5-8884-df19a2431033_800x476.png 1272w, https://substackcdn.com/image/fetch/$s_!p4hv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45fde942-ef08-43d5-8884-df19a2431033_800x476.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The plot below shows the index numbers of each feature in the dataframe and what <strong>relative importance</strong> each has in predicting the target variable.</p><p>According to this analysis, two features appear to hold a larger share of the predictive weight. Now, it should be noted that by adjusting the <strong>n_estimators, learning_rate</strong> and/or the <strong>max_depth</strong> in the <strong>XGBClassifier</strong>, we may see some features become slightly more prominent. Altogether, for this example, this configuration worked well.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9c2W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8eb35f88-a936-4232-ba19-8c109d6c176c_800x571.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9c2W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8eb35f88-a936-4232-ba19-8c109d6c176c_800x571.png 424w, https://substackcdn.com/image/fetch/$s_!9c2W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8eb35f88-a936-4232-ba19-8c109d6c176c_800x571.png 848w, https://substackcdn.com/image/fetch/$s_!9c2W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8eb35f88-a936-4232-ba19-8c109d6c176c_800x571.png 1272w, https://substackcdn.com/image/fetch/$s_!9c2W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8eb35f88-a936-4232-ba19-8c109d6c176c_800x571.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9c2W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8eb35f88-a936-4232-ba19-8c109d6c176c_800x571.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8eb35f88-a936-4232-ba19-8c109d6c176c_800x571.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9c2W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8eb35f88-a936-4232-ba19-8c109d6c176c_800x571.png 424w, https://substackcdn.com/image/fetch/$s_!9c2W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8eb35f88-a936-4232-ba19-8c109d6c176c_800x571.png 848w, https://substackcdn.com/image/fetch/$s_!9c2W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8eb35f88-a936-4232-ba19-8c109d6c176c_800x571.png 1272w, https://substackcdn.com/image/fetch/$s_!9c2W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8eb35f88-a936-4232-ba19-8c109d6c176c_800x571.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Instead of just picking the top features by sight, we can use a formula of some kind to pick them for us. In this example, we chose to select all features that were <strong>greater than the median</strong> of all feature importances.</p><p>Now, we will fetch the feature names, to make it easier to refer to them in the next steps.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CQmO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73fbc363-dc67-4bd8-99c5-f118010bd058_800x235.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CQmO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73fbc363-dc67-4bd8-99c5-f118010bd058_800x235.png 424w, https://substackcdn.com/image/fetch/$s_!CQmO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73fbc363-dc67-4bd8-99c5-f118010bd058_800x235.png 848w, https://substackcdn.com/image/fetch/$s_!CQmO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73fbc363-dc67-4bd8-99c5-f118010bd058_800x235.png 1272w, https://substackcdn.com/image/fetch/$s_!CQmO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73fbc363-dc67-4bd8-99c5-f118010bd058_800x235.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CQmO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73fbc363-dc67-4bd8-99c5-f118010bd058_800x235.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/73fbc363-dc67-4bd8-99c5-f118010bd058_800x235.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CQmO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73fbc363-dc67-4bd8-99c5-f118010bd058_800x235.png 424w, https://substackcdn.com/image/fetch/$s_!CQmO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73fbc363-dc67-4bd8-99c5-f118010bd058_800x235.png 848w, https://substackcdn.com/image/fetch/$s_!CQmO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73fbc363-dc67-4bd8-99c5-f118010bd058_800x235.png 1272w, https://substackcdn.com/image/fetch/$s_!CQmO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73fbc363-dc67-4bd8-99c5-f118010bd058_800x235.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cr86!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2fc31d3-0b5e-4ad0-946b-6f56c7d6a49d_800x174.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cr86!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2fc31d3-0b5e-4ad0-946b-6f56c7d6a49d_800x174.png 424w, https://substackcdn.com/image/fetch/$s_!cr86!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2fc31d3-0b5e-4ad0-946b-6f56c7d6a49d_800x174.png 848w, https://substackcdn.com/image/fetch/$s_!cr86!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2fc31d3-0b5e-4ad0-946b-6f56c7d6a49d_800x174.png 1272w, https://substackcdn.com/image/fetch/$s_!cr86!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2fc31d3-0b5e-4ad0-946b-6f56c7d6a49d_800x174.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cr86!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2fc31d3-0b5e-4ad0-946b-6f56c7d6a49d_800x174.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2fc31d3-0b5e-4ad0-946b-6f56c7d6a49d_800x174.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cr86!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2fc31d3-0b5e-4ad0-946b-6f56c7d6a49d_800x174.png 424w, https://substackcdn.com/image/fetch/$s_!cr86!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2fc31d3-0b5e-4ad0-946b-6f56c7d6a49d_800x174.png 848w, https://substackcdn.com/image/fetch/$s_!cr86!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2fc31d3-0b5e-4ad0-946b-6f56c7d6a49d_800x174.png 1272w, https://substackcdn.com/image/fetch/$s_!cr86!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2fc31d3-0b5e-4ad0-946b-6f56c7d6a49d_800x174.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Of those features we decided to retain, some are <strong>categorical</strong> while others are <strong>continuous numeric</strong> features. Therefore, we will want to apply a different type of transformation to each so that the models can use the data well. To do this, we will use the <strong>Scikit Learn</strong> <strong>ColumnTransformer</strong> to apply the transformations appropriately.</p><p>There are several ways to prepare categorical data for modeling. In this case, we will apply the <strong>OneHotEncoder</strong> to all categorical features. Since we decided to be selective about the features we retained, the complexity burden will thereby be lessened. For each categorical feature, the <strong>OneHotEncoder</strong> will create a new feature for every binary outcome within the original feature. For instance, if a column has 4 possible categories, the <strong>OHE</strong> creates 4 columns and then puts either 0 or 1 in each row to declare whether that category is absent or present in that row. We end up with a <strong>sparse data representation</strong> that accounts for the categories but is more easily used by models.</p><p>For the<strong> </strong>numeric features, we will keep it simple and apply the <strong>StandardScaler</strong> to them. This <strong>standardizes</strong> each feature to <strong>remove the mean and scale to unit variance</strong>. With this transformation, the data will not be as susceptible to <strong>outliers</strong>. There are other types of scaling, such as the <strong>MinMaxScaler</strong>, but we will use the <strong>StandardScaler</strong>.</p><p>We apply the column transformations to both the training and test data for <strong>X</strong> with the <strong>fit</strong> method.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wX8B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2163023-2c19-440f-be7d-0c5afe594f76_800x568.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wX8B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2163023-2c19-440f-be7d-0c5afe594f76_800x568.png 424w, https://substackcdn.com/image/fetch/$s_!wX8B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2163023-2c19-440f-be7d-0c5afe594f76_800x568.png 848w, https://substackcdn.com/image/fetch/$s_!wX8B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2163023-2c19-440f-be7d-0c5afe594f76_800x568.png 1272w, https://substackcdn.com/image/fetch/$s_!wX8B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2163023-2c19-440f-be7d-0c5afe594f76_800x568.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wX8B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2163023-2c19-440f-be7d-0c5afe594f76_800x568.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2163023-2c19-440f-be7d-0c5afe594f76_800x568.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wX8B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2163023-2c19-440f-be7d-0c5afe594f76_800x568.png 424w, https://substackcdn.com/image/fetch/$s_!wX8B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2163023-2c19-440f-be7d-0c5afe594f76_800x568.png 848w, https://substackcdn.com/image/fetch/$s_!wX8B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2163023-2c19-440f-be7d-0c5afe594f76_800x568.png 1272w, https://substackcdn.com/image/fetch/$s_!wX8B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2163023-2c19-440f-be7d-0c5afe594f76_800x568.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>We can see the new feature names of the transformed dataset below:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OWvq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F065ac13c-34b7-41bc-97c6-834e3ac3897b_800x153.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OWvq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F065ac13c-34b7-41bc-97c6-834e3ac3897b_800x153.png 424w, https://substackcdn.com/image/fetch/$s_!OWvq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F065ac13c-34b7-41bc-97c6-834e3ac3897b_800x153.png 848w, https://substackcdn.com/image/fetch/$s_!OWvq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F065ac13c-34b7-41bc-97c6-834e3ac3897b_800x153.png 1272w, https://substackcdn.com/image/fetch/$s_!OWvq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F065ac13c-34b7-41bc-97c6-834e3ac3897b_800x153.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OWvq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F065ac13c-34b7-41bc-97c6-834e3ac3897b_800x153.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/065ac13c-34b7-41bc-97c6-834e3ac3897b_800x153.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OWvq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F065ac13c-34b7-41bc-97c6-834e3ac3897b_800x153.png 424w, https://substackcdn.com/image/fetch/$s_!OWvq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F065ac13c-34b7-41bc-97c6-834e3ac3897b_800x153.png 848w, https://substackcdn.com/image/fetch/$s_!OWvq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F065ac13c-34b7-41bc-97c6-834e3ac3897b_800x153.png 1272w, https://substackcdn.com/image/fetch/$s_!OWvq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F065ac13c-34b7-41bc-97c6-834e3ac3897b_800x153.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mAM9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1acb10d-3da7-47d4-a55f-8a5006931f50_800x724.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mAM9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1acb10d-3da7-47d4-a55f-8a5006931f50_800x724.png 424w, https://substackcdn.com/image/fetch/$s_!mAM9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1acb10d-3da7-47d4-a55f-8a5006931f50_800x724.png 848w, https://substackcdn.com/image/fetch/$s_!mAM9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1acb10d-3da7-47d4-a55f-8a5006931f50_800x724.png 1272w, https://substackcdn.com/image/fetch/$s_!mAM9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1acb10d-3da7-47d4-a55f-8a5006931f50_800x724.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mAM9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1acb10d-3da7-47d4-a55f-8a5006931f50_800x724.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d1acb10d-3da7-47d4-a55f-8a5006931f50_800x724.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mAM9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1acb10d-3da7-47d4-a55f-8a5006931f50_800x724.png 424w, https://substackcdn.com/image/fetch/$s_!mAM9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1acb10d-3da7-47d4-a55f-8a5006931f50_800x724.png 848w, https://substackcdn.com/image/fetch/$s_!mAM9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1acb10d-3da7-47d4-a55f-8a5006931f50_800x724.png 1272w, https://substackcdn.com/image/fetch/$s_!mAM9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1acb10d-3da7-47d4-a55f-8a5006931f50_800x724.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>With the data pared down to more essential features and transformed for use in machine learning models, we move on to create the apparatus to test a <strong>series of classifiers </strong>side-by-side, to get a better idea as to which one(s) will be worth investigating further. This step is often called <strong>model selection</strong>. The models, for the most part, do not have any hyperparameter changes. We run a vanilla version of each to attempt to get a ballpark estimate of the relative merits of each.</p><blockquote><p>The steps below were borrowed from, and inspired by, the book, <em><strong>Machine Learning and Data Science Blueprints for Finance</strong></em> by Hariom Tatsat, Sahil Puri, and Brad Lookabaugh (O&#8217;Reilly, 2021), 978&#8211;1&#8211;492&#8211;07305&#8211;5.</p></blockquote><blockquote><p>I recommend reading it. It has succinct and useful descriptions of common techniques in ML, especially for the financial domain.</p></blockquote><p>The first step is to create a <strong>Python list</strong>, and then <strong>append</strong> each of our <strong>candidate classifiers</strong> to it.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KnBl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0628eb5-92db-4895-8aaa-afe2bf849c67_800x328.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KnBl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0628eb5-92db-4895-8aaa-afe2bf849c67_800x328.png 424w, https://substackcdn.com/image/fetch/$s_!KnBl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0628eb5-92db-4895-8aaa-afe2bf849c67_800x328.png 848w, https://substackcdn.com/image/fetch/$s_!KnBl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0628eb5-92db-4895-8aaa-afe2bf849c67_800x328.png 1272w, https://substackcdn.com/image/fetch/$s_!KnBl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0628eb5-92db-4895-8aaa-afe2bf849c67_800x328.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KnBl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0628eb5-92db-4895-8aaa-afe2bf849c67_800x328.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0628eb5-92db-4895-8aaa-afe2bf849c67_800x328.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KnBl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0628eb5-92db-4895-8aaa-afe2bf849c67_800x328.png 424w, https://substackcdn.com/image/fetch/$s_!KnBl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0628eb5-92db-4895-8aaa-afe2bf849c67_800x328.png 848w, https://substackcdn.com/image/fetch/$s_!KnBl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0628eb5-92db-4895-8aaa-afe2bf849c67_800x328.png 1272w, https://substackcdn.com/image/fetch/$s_!KnBl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0628eb5-92db-4895-8aaa-afe2bf849c67_800x328.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Next, we set parameter values, like <strong>n_folds</strong> and <strong>scoring</strong> for the <strong>KFold</strong> <strong>cross-validator</strong>. Here, the choice of <strong>roc_auc</strong> as a scoring metric is important. Our task is <strong>supervised binary classification</strong>, which means we have access to the target variable, and it is a binary choice, 0 or 1.</p><p>The <strong>Area Under the Curve of the Receiver Operating Characteristic</strong> <strong>graph</strong> is well-suited to this classification task. It plots the true positive rate against the false positive rate while varying a discrimination threshold. The score is the <em>area of the graph that is under the line produced.</em> The perfect score is <strong>1.0</strong> and a not-so-good score is anywhere <strong>below 0.5</strong>. The lower the score, the worse the classifier performs. Also, a score below 0.5 indicates that the classifier shows no real ability to predict the target.</p><p>Finally, we will import the Python <strong>time</strong> module to be able to compute the seconds that elapse for running each classifier. We print the results to the console.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nrA6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffa5e52-9a5e-43ac-a29f-8d6107acb9de_800x568.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nrA6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffa5e52-9a5e-43ac-a29f-8d6107acb9de_800x568.png 424w, https://substackcdn.com/image/fetch/$s_!nrA6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffa5e52-9a5e-43ac-a29f-8d6107acb9de_800x568.png 848w, https://substackcdn.com/image/fetch/$s_!nrA6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffa5e52-9a5e-43ac-a29f-8d6107acb9de_800x568.png 1272w, https://substackcdn.com/image/fetch/$s_!nrA6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffa5e52-9a5e-43ac-a29f-8d6107acb9de_800x568.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nrA6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffa5e52-9a5e-43ac-a29f-8d6107acb9de_800x568.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ffa5e52-9a5e-43ac-a29f-8d6107acb9de_800x568.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nrA6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffa5e52-9a5e-43ac-a29f-8d6107acb9de_800x568.png 424w, https://substackcdn.com/image/fetch/$s_!nrA6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffa5e52-9a5e-43ac-a29f-8d6107acb9de_800x568.png 848w, https://substackcdn.com/image/fetch/$s_!nrA6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffa5e52-9a5e-43ac-a29f-8d6107acb9de_800x568.png 1272w, https://substackcdn.com/image/fetch/$s_!nrA6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffa5e52-9a5e-43ac-a29f-8d6107acb9de_800x568.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zS2i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd715c14b-b514-4766-8a42-e0ac53918f04_800x356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zS2i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd715c14b-b514-4766-8a42-e0ac53918f04_800x356.png 424w, https://substackcdn.com/image/fetch/$s_!zS2i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd715c14b-b514-4766-8a42-e0ac53918f04_800x356.png 848w, https://substackcdn.com/image/fetch/$s_!zS2i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd715c14b-b514-4766-8a42-e0ac53918f04_800x356.png 1272w, https://substackcdn.com/image/fetch/$s_!zS2i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd715c14b-b514-4766-8a42-e0ac53918f04_800x356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zS2i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd715c14b-b514-4766-8a42-e0ac53918f04_800x356.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d715c14b-b514-4766-8a42-e0ac53918f04_800x356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zS2i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd715c14b-b514-4766-8a42-e0ac53918f04_800x356.png 424w, https://substackcdn.com/image/fetch/$s_!zS2i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd715c14b-b514-4766-8a42-e0ac53918f04_800x356.png 848w, https://substackcdn.com/image/fetch/$s_!zS2i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd715c14b-b514-4766-8a42-e0ac53918f04_800x356.png 1272w, https://substackcdn.com/image/fetch/$s_!zS2i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd715c14b-b514-4766-8a42-e0ac53918f04_800x356.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Finally, we will make a series of <strong>boxplots</strong> to help us compare the different classifiers.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wwk2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e20710-d8a9-4068-8333-843ec55bf239_800x328.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wwk2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e20710-d8a9-4068-8333-843ec55bf239_800x328.png 424w, https://substackcdn.com/image/fetch/$s_!wwk2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e20710-d8a9-4068-8333-843ec55bf239_800x328.png 848w, https://substackcdn.com/image/fetch/$s_!wwk2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e20710-d8a9-4068-8333-843ec55bf239_800x328.png 1272w, https://substackcdn.com/image/fetch/$s_!wwk2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e20710-d8a9-4068-8333-843ec55bf239_800x328.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wwk2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e20710-d8a9-4068-8333-843ec55bf239_800x328.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2e20710-d8a9-4068-8333-843ec55bf239_800x328.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wwk2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e20710-d8a9-4068-8333-843ec55bf239_800x328.png 424w, https://substackcdn.com/image/fetch/$s_!wwk2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e20710-d8a9-4068-8333-843ec55bf239_800x328.png 848w, https://substackcdn.com/image/fetch/$s_!wwk2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e20710-d8a9-4068-8333-843ec55bf239_800x328.png 1272w, https://substackcdn.com/image/fetch/$s_!wwk2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2e20710-d8a9-4068-8333-843ec55bf239_800x328.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jKzP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0496d58c-432d-42e6-8ab1-b2db6ea90218_800x533.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jKzP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0496d58c-432d-42e6-8ab1-b2db6ea90218_800x533.png 424w, https://substackcdn.com/image/fetch/$s_!jKzP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0496d58c-432d-42e6-8ab1-b2db6ea90218_800x533.png 848w, https://substackcdn.com/image/fetch/$s_!jKzP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0496d58c-432d-42e6-8ab1-b2db6ea90218_800x533.png 1272w, https://substackcdn.com/image/fetch/$s_!jKzP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0496d58c-432d-42e6-8ab1-b2db6ea90218_800x533.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jKzP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0496d58c-432d-42e6-8ab1-b2db6ea90218_800x533.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0496d58c-432d-42e6-8ab1-b2db6ea90218_800x533.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jKzP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0496d58c-432d-42e6-8ab1-b2db6ea90218_800x533.png 424w, https://substackcdn.com/image/fetch/$s_!jKzP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0496d58c-432d-42e6-8ab1-b2db6ea90218_800x533.png 848w, https://substackcdn.com/image/fetch/$s_!jKzP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0496d58c-432d-42e6-8ab1-b2db6ea90218_800x533.png 1272w, https://substackcdn.com/image/fetch/$s_!jKzP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0496d58c-432d-42e6-8ab1-b2db6ea90218_800x533.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Compared side-by-side, we can see that the <strong>XGBoost</strong> classifier performed better than the others. Sharing second place, <strong>Logistic Regression</strong> and <strong>Linear Discriminant Analysis </strong>show promise. With this information, we can go on to focus on one or two of these classifiers and tune their respective hyperparameters. By using this method upfront, we save time and energy by focusing on developing models that show genuine potential for performing the task at hand.</p><p>In another article, we will do just that and see how well one of these non-neural network approaches can classify credit default instances.</p>]]></content:encoded></item><item><title><![CDATA[Preparing Data for ML & Deep Learning]]></title><description><![CDATA[Series: Taiwan Credit Default Dataset]]></description><link>https://blog.point-4-point.com/p/preparing-data-for-ml-deep-learning-863f51da7522</link><guid isPermaLink="false">https://blog.point-4-point.com/p/preparing-data-for-ml-deep-learning-863f51da7522</guid><dc:creator><![CDATA[Ben Perkins]]></dc:creator><pubDate>Tue, 23 Aug 2022 03:23:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/41820b43-5bd0-4cab-9ce8-5a2584664b7c_800x433.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Series: Taiwan Credit Default Dataset</p><p>This article is part of a series where we explore, preprocess, and run several machine learning methods on the Taiwan Credit Default dataset. The dataset can be found here: <a href="https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients">dataset</a>&nbsp;.</p><p>For any dataset we work with, it must be cleaned up and transformed so we can use it for building models and/or deriving insights from visualizations. This step, often called &#8220;data preprocessing&#8221;, or &#8220;data wrangling&#8221;, goes together with the overall process of EDA, or exploratory data analysis. Ultimately, to gain optimal value from our data, we need to know what is in the dataset: details about the features, such as data type, and encoding; and as much as possible about the potential of the data to teach us.</p><p>The example we will use in this article is a dataset from 2005 (link is above) which shows details on credit card default in Taiwan over the span of a few months. Twenty-three features were collected, with a mixture of categorical and numeric variables. A single binary column gives the target variable, for which 0 stands for &#8216;no default&#8217; and 1 for &#8216;default. I suggest reading the brief on the dataset at the link above to gain more knowledge of the dataset, which is from the well-known <strong>UCI Machine Learning</strong> data repository.</p><p>Let&#8217;s get started by importing the dataset into Pandas:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yubz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab658b78-38af-49f7-9a0e-d604d3101799_800x433.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yubz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab658b78-38af-49f7-9a0e-d604d3101799_800x433.png 424w, https://substackcdn.com/image/fetch/$s_!Yubz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab658b78-38af-49f7-9a0e-d604d3101799_800x433.png 848w, https://substackcdn.com/image/fetch/$s_!Yubz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab658b78-38af-49f7-9a0e-d604d3101799_800x433.png 1272w, https://substackcdn.com/image/fetch/$s_!Yubz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab658b78-38af-49f7-9a0e-d604d3101799_800x433.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yubz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab658b78-38af-49f7-9a0e-d604d3101799_800x433.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab658b78-38af-49f7-9a0e-d604d3101799_800x433.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Yubz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab658b78-38af-49f7-9a0e-d604d3101799_800x433.png 424w, https://substackcdn.com/image/fetch/$s_!Yubz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab658b78-38af-49f7-9a0e-d604d3101799_800x433.png 848w, https://substackcdn.com/image/fetch/$s_!Yubz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab658b78-38af-49f7-9a0e-d604d3101799_800x433.png 1272w, https://substackcdn.com/image/fetch/$s_!Yubz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab658b78-38af-49f7-9a0e-d604d3101799_800x433.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>Notice here we have imported the <a href="https://xlrd.readthedocs.io/en/latest/">xlrd</a> package. This package helps importing an older Microsoft Excel file type, the <strong>xls </strong>file. It may have to be installed in your Python environment; in which case we should be able to install it with pip.</p><p>The dataframe column names are below:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H9uR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22048748-b98d-43b2-9c6d-5830dda87cad_800x171.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H9uR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22048748-b98d-43b2-9c6d-5830dda87cad_800x171.png 424w, https://substackcdn.com/image/fetch/$s_!H9uR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22048748-b98d-43b2-9c6d-5830dda87cad_800x171.png 848w, https://substackcdn.com/image/fetch/$s_!H9uR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22048748-b98d-43b2-9c6d-5830dda87cad_800x171.png 1272w, https://substackcdn.com/image/fetch/$s_!H9uR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22048748-b98d-43b2-9c6d-5830dda87cad_800x171.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H9uR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22048748-b98d-43b2-9c6d-5830dda87cad_800x171.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/22048748-b98d-43b2-9c6d-5830dda87cad_800x171.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H9uR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22048748-b98d-43b2-9c6d-5830dda87cad_800x171.png 424w, https://substackcdn.com/image/fetch/$s_!H9uR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22048748-b98d-43b2-9c6d-5830dda87cad_800x171.png 848w, https://substackcdn.com/image/fetch/$s_!H9uR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22048748-b98d-43b2-9c6d-5830dda87cad_800x171.png 1272w, https://substackcdn.com/image/fetch/$s_!H9uR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22048748-b98d-43b2-9c6d-5830dda87cad_800x171.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cwS3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a45826-cd8f-4a5c-96d4-252a716cfa22_800x330.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cwS3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a45826-cd8f-4a5c-96d4-252a716cfa22_800x330.png 424w, https://substackcdn.com/image/fetch/$s_!cwS3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a45826-cd8f-4a5c-96d4-252a716cfa22_800x330.png 848w, https://substackcdn.com/image/fetch/$s_!cwS3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a45826-cd8f-4a5c-96d4-252a716cfa22_800x330.png 1272w, https://substackcdn.com/image/fetch/$s_!cwS3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a45826-cd8f-4a5c-96d4-252a716cfa22_800x330.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cwS3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a45826-cd8f-4a5c-96d4-252a716cfa22_800x330.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21a45826-cd8f-4a5c-96d4-252a716cfa22_800x330.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cwS3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a45826-cd8f-4a5c-96d4-252a716cfa22_800x330.png 424w, https://substackcdn.com/image/fetch/$s_!cwS3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a45826-cd8f-4a5c-96d4-252a716cfa22_800x330.png 848w, https://substackcdn.com/image/fetch/$s_!cwS3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a45826-cd8f-4a5c-96d4-252a716cfa22_800x330.png 1272w, https://substackcdn.com/image/fetch/$s_!cwS3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21a45826-cd8f-4a5c-96d4-252a716cfa22_800x330.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>We know from the dataset description that the last column, &#8216;default payment next month&#8217;, is our target variable. Before we transform the data, it may be simpler to separate the &#8216;X&#8217; data, that is the features, from the &#8216;Y&#8217;, target data. Below, we use the Pandas indexer, <strong>iloc, </strong>to make two dataframes, &#8216;X&#8217; and &#8216;Y&#8217;:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U563!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F191f167e-537c-4b6c-b162-95403be467a5_800x207.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U563!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F191f167e-537c-4b6c-b162-95403be467a5_800x207.png 424w, https://substackcdn.com/image/fetch/$s_!U563!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F191f167e-537c-4b6c-b162-95403be467a5_800x207.png 848w, https://substackcdn.com/image/fetch/$s_!U563!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F191f167e-537c-4b6c-b162-95403be467a5_800x207.png 1272w, https://substackcdn.com/image/fetch/$s_!U563!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F191f167e-537c-4b6c-b162-95403be467a5_800x207.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U563!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F191f167e-537c-4b6c-b162-95403be467a5_800x207.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/191f167e-537c-4b6c-b162-95403be467a5_800x207.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!U563!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F191f167e-537c-4b6c-b162-95403be467a5_800x207.png 424w, https://substackcdn.com/image/fetch/$s_!U563!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F191f167e-537c-4b6c-b162-95403be467a5_800x207.png 848w, https://substackcdn.com/image/fetch/$s_!U563!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F191f167e-537c-4b6c-b162-95403be467a5_800x207.png 1272w, https://substackcdn.com/image/fetch/$s_!U563!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F191f167e-537c-4b6c-b162-95403be467a5_800x207.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>We will operate on &#8216;X&#8217; and leave &#8216;Y&#8217; aside for now.</p><p>Next, we will prepare the data for general machine learning and deep learning model building. When we read the dataset description, we gathered which features are categorical, and which are numeric.</p><p>The categorical features in this dataset use integers to encode the categories. Even though these are technically numbers, if we want the categorical nature of each to be represented, it will help to encode them. There are several different methods for encoding categorical features. In this instance, I have chosen <strong>one-hot encoding</strong>. This method will take a single categorical feature and create a new set of features where each represents the presence or absence of a given category.</p><p>For example, the &#8216;MARRIAGE&#8217; feature in this set has three possible values: (1, 2, 3). The OneHotEncoder from SciKit-Learn will create three new features that can only have two values, 0 or 1. These indicate absence or presence of that specific category within the instance, or row, of the dataset. The overall result is a sparser representation of the same information. Later, this will likely help any machine learning models or neural networks we use to understand the data better.</p><p>To handle the numeric features, we will apply the StandardScaler to them. This is designed to standardize the features so that the overall dataset has a relatively similar range. It will find the mean and standard deviation of each feature, subtract the mean, and then divide by the standard deviation. This operation is simple and common when preparing data; it retains the variance of the features but places it within a common scale.</p><p>In the code below, we have created two lists, one for the categorical features and one for the numeric (continuous) feature names. Doing this beforehand will make running the transformations easier. Now, we can use the SciKit-Learn ColumnTransformer to construct a dual-pronged transformer to apply the chosen transformations to each subset of columns. The use of the &#8220;remainder=&#8217;drop&#8217;&#8221; parameter within the ColumnTransformer will exclude the original features from the resulting dataframe. It should be noted as well that the string placed in the first position of each transformer definition will be the prefix for each of the new features created.</p><p>After defining the ColumnTransformer, we run the <strong>fit</strong> method on the &#8216;X&#8217; dataframe:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cf1u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F801e41cb-16dc-4be6-915b-4fca75f3f19f_800x606.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cf1u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F801e41cb-16dc-4be6-915b-4fca75f3f19f_800x606.png 424w, https://substackcdn.com/image/fetch/$s_!cf1u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F801e41cb-16dc-4be6-915b-4fca75f3f19f_800x606.png 848w, https://substackcdn.com/image/fetch/$s_!cf1u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F801e41cb-16dc-4be6-915b-4fca75f3f19f_800x606.png 1272w, https://substackcdn.com/image/fetch/$s_!cf1u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F801e41cb-16dc-4be6-915b-4fca75f3f19f_800x606.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cf1u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F801e41cb-16dc-4be6-915b-4fca75f3f19f_800x606.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/801e41cb-16dc-4be6-915b-4fca75f3f19f_800x606.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cf1u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F801e41cb-16dc-4be6-915b-4fca75f3f19f_800x606.png 424w, https://substackcdn.com/image/fetch/$s_!cf1u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F801e41cb-16dc-4be6-915b-4fca75f3f19f_800x606.png 848w, https://substackcdn.com/image/fetch/$s_!cf1u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F801e41cb-16dc-4be6-915b-4fca75f3f19f_800x606.png 1272w, https://substackcdn.com/image/fetch/$s_!cf1u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F801e41cb-16dc-4be6-915b-4fca75f3f19f_800x606.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Below, the configuration of the ColumnTransformer object is shown:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GqJx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71968db-eb32-4a72-b0d7-52e0fe2cc6e4_800x436.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GqJx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71968db-eb32-4a72-b0d7-52e0fe2cc6e4_800x436.png 424w, https://substackcdn.com/image/fetch/$s_!GqJx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71968db-eb32-4a72-b0d7-52e0fe2cc6e4_800x436.png 848w, https://substackcdn.com/image/fetch/$s_!GqJx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71968db-eb32-4a72-b0d7-52e0fe2cc6e4_800x436.png 1272w, https://substackcdn.com/image/fetch/$s_!GqJx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71968db-eb32-4a72-b0d7-52e0fe2cc6e4_800x436.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GqJx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71968db-eb32-4a72-b0d7-52e0fe2cc6e4_800x436.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a71968db-eb32-4a72-b0d7-52e0fe2cc6e4_800x436.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GqJx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71968db-eb32-4a72-b0d7-52e0fe2cc6e4_800x436.png 424w, https://substackcdn.com/image/fetch/$s_!GqJx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71968db-eb32-4a72-b0d7-52e0fe2cc6e4_800x436.png 848w, https://substackcdn.com/image/fetch/$s_!GqJx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71968db-eb32-4a72-b0d7-52e0fe2cc6e4_800x436.png 1272w, https://substackcdn.com/image/fetch/$s_!GqJx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa71968db-eb32-4a72-b0d7-52e0fe2cc6e4_800x436.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>A call to the <strong>get_feature_names_out</strong> method will display all the new feature names. A truncated list is shown below for brevity&#8217;s sake.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zv4R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9f2dd1-7638-4f73-ad1f-a02976fd5053_800x186.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zv4R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9f2dd1-7638-4f73-ad1f-a02976fd5053_800x186.png 424w, https://substackcdn.com/image/fetch/$s_!zv4R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9f2dd1-7638-4f73-ad1f-a02976fd5053_800x186.png 848w, https://substackcdn.com/image/fetch/$s_!zv4R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9f2dd1-7638-4f73-ad1f-a02976fd5053_800x186.png 1272w, https://substackcdn.com/image/fetch/$s_!zv4R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9f2dd1-7638-4f73-ad1f-a02976fd5053_800x186.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zv4R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9f2dd1-7638-4f73-ad1f-a02976fd5053_800x186.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d9f2dd1-7638-4f73-ad1f-a02976fd5053_800x186.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zv4R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9f2dd1-7638-4f73-ad1f-a02976fd5053_800x186.png 424w, https://substackcdn.com/image/fetch/$s_!zv4R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9f2dd1-7638-4f73-ad1f-a02976fd5053_800x186.png 848w, https://substackcdn.com/image/fetch/$s_!zv4R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9f2dd1-7638-4f73-ad1f-a02976fd5053_800x186.png 1272w, https://substackcdn.com/image/fetch/$s_!zv4R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d9f2dd1-7638-4f73-ad1f-a02976fd5053_800x186.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5EPf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca24f714-c79b-4468-b8ff-4d5ba5800da0_800x335.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5EPf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca24f714-c79b-4468-b8ff-4d5ba5800da0_800x335.png 424w, https://substackcdn.com/image/fetch/$s_!5EPf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca24f714-c79b-4468-b8ff-4d5ba5800da0_800x335.png 848w, https://substackcdn.com/image/fetch/$s_!5EPf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca24f714-c79b-4468-b8ff-4d5ba5800da0_800x335.png 1272w, https://substackcdn.com/image/fetch/$s_!5EPf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca24f714-c79b-4468-b8ff-4d5ba5800da0_800x335.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5EPf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca24f714-c79b-4468-b8ff-4d5ba5800da0_800x335.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ca24f714-c79b-4468-b8ff-4d5ba5800da0_800x335.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5EPf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca24f714-c79b-4468-b8ff-4d5ba5800da0_800x335.png 424w, https://substackcdn.com/image/fetch/$s_!5EPf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca24f714-c79b-4468-b8ff-4d5ba5800da0_800x335.png 848w, https://substackcdn.com/image/fetch/$s_!5EPf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca24f714-c79b-4468-b8ff-4d5ba5800da0_800x335.png 1272w, https://substackcdn.com/image/fetch/$s_!5EPf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca24f714-c79b-4468-b8ff-4d5ba5800da0_800x335.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AV6f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf6df1e3-a7b0-43e1-b84f-381482b88e75_800x284.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AV6f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf6df1e3-a7b0-43e1-b84f-381482b88e75_800x284.png 424w, https://substackcdn.com/image/fetch/$s_!AV6f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf6df1e3-a7b0-43e1-b84f-381482b88e75_800x284.png 848w, https://substackcdn.com/image/fetch/$s_!AV6f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf6df1e3-a7b0-43e1-b84f-381482b88e75_800x284.png 1272w, https://substackcdn.com/image/fetch/$s_!AV6f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf6df1e3-a7b0-43e1-b84f-381482b88e75_800x284.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AV6f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf6df1e3-a7b0-43e1-b84f-381482b88e75_800x284.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df6df1e3-a7b0-43e1-b84f-381482b88e75_800x284.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AV6f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf6df1e3-a7b0-43e1-b84f-381482b88e75_800x284.png 424w, https://substackcdn.com/image/fetch/$s_!AV6f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf6df1e3-a7b0-43e1-b84f-381482b88e75_800x284.png 848w, https://substackcdn.com/image/fetch/$s_!AV6f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf6df1e3-a7b0-43e1-b84f-381482b88e75_800x284.png 1272w, https://substackcdn.com/image/fetch/$s_!AV6f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf6df1e3-a7b0-43e1-b84f-381482b88e75_800x284.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Now, we need to create a new transformed dataframe. We use Pandas to create a new &#8216;X_prep&#8217; dataframe. Notice we can simply call the <strong>transform(X).toarray()</strong> method to give the new dataset and the&nbsp;<strong>.get_feature_names_out()</strong> method to feed in the new column names. The prior index is retained by specifying the <strong>index=X.index&nbsp;. </strong>Calling <strong>info()</strong> on the new dataframe reveals all of the columns and their data types. When we do this, we will see that every column is a floating-point number.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u1pu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d0b5eea-b5c6-487f-8c38-7b12613b0e95_800x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u1pu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d0b5eea-b5c6-487f-8c38-7b12613b0e95_800x258.png 424w, https://substackcdn.com/image/fetch/$s_!u1pu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d0b5eea-b5c6-487f-8c38-7b12613b0e95_800x258.png 848w, https://substackcdn.com/image/fetch/$s_!u1pu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d0b5eea-b5c6-487f-8c38-7b12613b0e95_800x258.png 1272w, https://substackcdn.com/image/fetch/$s_!u1pu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d0b5eea-b5c6-487f-8c38-7b12613b0e95_800x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u1pu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d0b5eea-b5c6-487f-8c38-7b12613b0e95_800x258.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7d0b5eea-b5c6-487f-8c38-7b12613b0e95_800x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!u1pu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d0b5eea-b5c6-487f-8c38-7b12613b0e95_800x258.png 424w, https://substackcdn.com/image/fetch/$s_!u1pu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d0b5eea-b5c6-487f-8c38-7b12613b0e95_800x258.png 848w, https://substackcdn.com/image/fetch/$s_!u1pu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d0b5eea-b5c6-487f-8c38-7b12613b0e95_800x258.png 1272w, https://substackcdn.com/image/fetch/$s_!u1pu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d0b5eea-b5c6-487f-8c38-7b12613b0e95_800x258.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The next to last step is to concatenate the new X_prep features to the target column, Y. The code below will achieve this, and we will have a dataframe of features combined with the target variable.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AVlO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69140f0b-6916-4a35-85a7-e9362977a17b_800x183.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AVlO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69140f0b-6916-4a35-85a7-e9362977a17b_800x183.png 424w, https://substackcdn.com/image/fetch/$s_!AVlO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69140f0b-6916-4a35-85a7-e9362977a17b_800x183.png 848w, https://substackcdn.com/image/fetch/$s_!AVlO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69140f0b-6916-4a35-85a7-e9362977a17b_800x183.png 1272w, https://substackcdn.com/image/fetch/$s_!AVlO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69140f0b-6916-4a35-85a7-e9362977a17b_800x183.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AVlO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69140f0b-6916-4a35-85a7-e9362977a17b_800x183.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69140f0b-6916-4a35-85a7-e9362977a17b_800x183.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AVlO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69140f0b-6916-4a35-85a7-e9362977a17b_800x183.png 424w, https://substackcdn.com/image/fetch/$s_!AVlO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69140f0b-6916-4a35-85a7-e9362977a17b_800x183.png 848w, https://substackcdn.com/image/fetch/$s_!AVlO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69140f0b-6916-4a35-85a7-e9362977a17b_800x183.png 1272w, https://substackcdn.com/image/fetch/$s_!AVlO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69140f0b-6916-4a35-85a7-e9362977a17b_800x183.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>For easy use with most of the upcoming work at creating machine learning models and so forth, it makes sense to write the new standardized and one-hot encoded dataset into a CSV file. Then, we will have the same dataset to start with for each of our models with no need to repeat the initial processing.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y3bZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c555610-cd85-4b55-bf2e-4f0ebb17a72c_800x207.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y3bZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c555610-cd85-4b55-bf2e-4f0ebb17a72c_800x207.png 424w, https://substackcdn.com/image/fetch/$s_!Y3bZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c555610-cd85-4b55-bf2e-4f0ebb17a72c_800x207.png 848w, https://substackcdn.com/image/fetch/$s_!Y3bZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c555610-cd85-4b55-bf2e-4f0ebb17a72c_800x207.png 1272w, https://substackcdn.com/image/fetch/$s_!Y3bZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c555610-cd85-4b55-bf2e-4f0ebb17a72c_800x207.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y3bZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c555610-cd85-4b55-bf2e-4f0ebb17a72c_800x207.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c555610-cd85-4b55-bf2e-4f0ebb17a72c_800x207.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y3bZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c555610-cd85-4b55-bf2e-4f0ebb17a72c_800x207.png 424w, https://substackcdn.com/image/fetch/$s_!Y3bZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c555610-cd85-4b55-bf2e-4f0ebb17a72c_800x207.png 848w, https://substackcdn.com/image/fetch/$s_!Y3bZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c555610-cd85-4b55-bf2e-4f0ebb17a72c_800x207.png 1272w, https://substackcdn.com/image/fetch/$s_!Y3bZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c555610-cd85-4b55-bf2e-4f0ebb17a72c_800x207.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In the next episode, we will load this dataset from the file and continue to get acquainted with the data and see how some basic machine learning models perform on it.</p>]]></content:encoded></item></channel></rss>