<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[masirjafri]]></title><description><![CDATA[masirjafri]]></description><link>https://blog.masirjafri.in</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1766650238510/5242db67-ef73-45a8-b3dd-139885d1d43c.jpeg</url><title>masirjafri</title><link>https://blog.masirjafri.in</link></image><generator>RSS for Node</generator><lastBuildDate>Fri, 10 Apr 2026 14:13:22 GMT</lastBuildDate><atom:link href="https://blog.masirjafri.in/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Understanding Monorepos with Turborepo (Hands-On Guide)]]></title><description><![CDATA[As the name suggests, a monorepo is a single repository (for example, one GitHub repo) that contains all parts of your application. This includes frontend code, backend services, DevOps scripts, shared libraries, configs, and anything else related to...]]></description><link>https://blog.masirjafri.in/understanding-monorepos-with-turborepo</link><guid isPermaLink="true">https://blog.masirjafri.in/understanding-monorepos-with-turborepo</guid><category><![CDATA[turborepo]]></category><category><![CDATA[monorepo]]></category><category><![CDATA[Build Tools]]></category><category><![CDATA[JavaScript]]></category><category><![CDATA[TypeScript]]></category><category><![CDATA[ci-cd]]></category><category><![CDATA[devtools]]></category><category><![CDATA[full stack]]></category><category><![CDATA[Node.js]]></category><category><![CDATA[React]]></category><dc:creator><![CDATA[Masir Jafri]]></dc:creator><pubDate>Thu, 25 Dec 2025 08:48:08 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1766651648981/04d7ca77-749a-4fa1-af03-96408fd8be99.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As the name suggests, a <strong>monorepo</strong> is a single repository (for example, one GitHub repo) that contains all parts of your application. This includes frontend code, backend services, DevOps scripts, shared libraries, configs, and anything else related to the project. Instead of splitting things across multiple repositories, everything lives in one place.</p>
<p>Some popular projects that use monorepos are:</p>
<p><a target="_blank" href="https://github.com/resend/react-email">https://github.com/resend/react-email</a></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766585072671/80d8861c-c5a8-4b25-b61a-3c814c4beeb4.png" alt class="image--center mx-auto" /></p>
<p><a target="_blank" href="https://github.com/calcom/cal.com/tree/main">https://github.com/calcom/cal.com/tree/main</a></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766585142738/86a3971a-65e8-4389-b1dc-8c0efcea10f0.png" alt class="image--center mx-auto" /></p>
<p>These are real production-grade open-source examples where multiple apps and packages coexist inside one repository.</p>
<h2 id="heading-why-not-just-use-simple-folders">Why Not Just Use Simple Folders?</h2>
<p>Now a common question that might arrive is:</p>
<blockquote>
<p><em>“Why do I even need a monorepo? Why can’t I just create folders like</em> <code>/frontend</code>, <code>/backend</code>, and <code>/devops</code> inside one repository?”</p>
</blockquote>
<p>The answer is <strong>you absolutely can</strong>, and in some cases, you actually should.</p>
<p>If your services are highly decoupled, don’t share code, and don’t depend on each other at all, simple folders (or even separate repositories) work perfectly fine. For example, if you have a Golang backend service and a completely independent JavaScript service, there’s no strong reason to force them into a monorepo with shared tooling.</p>
<p>Monorepos start making sense when multiple parts of your system evolve together and <strong>share code or configurations</strong>.</p>
<h2 id="heading-why-monorepos-are-useful">Why Monorepos Are Useful</h2>
<p>One of the biggest advantages of a monorepo is <strong>shared code reuse</strong>. You can create shared UI components, common utility functions, shared TypeScript types, or validation logic in one place and use them across multiple applications. This avoids duplication and prevents version mismatch issues.</p>
<p>Another big benefit is <strong>collaboration</strong>. When frontend and backend live in the same repository, changes that span multiple services become easier to manage. A single pull request can update both sides, and everyone has visibility into the full system instead of just one isolated part.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766586721541/7f44f9e2-c905-48f8-86c8-590918e8e97e.png" alt class="image--center mx-auto" /></p>
<p>Monorepos also shine when it comes to <strong>build optimization and CI/CD performance</strong>. Tools like <strong>Turborepo</strong> can cache build outputs and skip unnecessary work if nothing has changed. This drastically reduces build and test times, especially in large projects and CI pipelines.</p>
<p>Finally, <strong>centralized tooling</strong> make things simpler. Instead of maintaining separate ESLint configs, TypeScript configs, and formatting rules for every project, you can define them once and reuse them everywhere. This leads to more consistency and less maintenance overhead.</p>
<h2 id="heading-common-monorepo-tools-in-the-nodejs-ecosystem">Common Monorepo Tools in the Node.js Ecosystem</h2>
<p>There are multiple tools available to manage monorepos in Node.js. <a target="_blank" href="https://lerna.js.org/"><strong>Lerna</strong></a> and <a target="_blank" href="https://github.com/nrwl/nx"><strong>Nx</strong></a> are full-fledged monorepo frameworks that handle dependency management and workspace configuration. <a target="_blank" href="https://classic.yarnpkg.com/lang/en/docs/workspaces/"><strong>Yarn</strong>/<strong>npm-workspaces</strong></a> provide a simpler approach focused on dependency linking.</p>
<p><a target="_blank" href="https://turborepo.com/"><strong>Turborepo</strong></a>, which we’ll focus on, is slightly different. It is <strong>not exactly a monorepo framework</strong>. Instead, it acts as a <strong>build system orchestrator</strong>, and that distinction is important.</p>
<h2 id="heading-a-bit-of-turborepo-history">A Bit of Turborepo History</h2>
<p>Turborepo was created by <strong>Jared Palmer</strong> and later acquired by <strong>Vercel</strong> in December 2021. Since then, it has become one of the most widely used tools for managing builds in large JavaScript and TypeScript monorepos. Vercel itself uses Turborepo heavily across its internal projects</p>
<h2 id="heading-build-system-vs-build-system-orchestrator-vs-monorepo-framework">Build System vs Build System Orchestrator vs Monorepo Framework</h2>
<p>A <strong>build system</strong> is the tool that actually transforms your code. In JavaScript and TypeScript projects, this includes compiling TypeScript, bundling files, minifying output, running tests, and so on. Examples include <code>tsc</code>, <code>vite</code>, <code>webpack</code>, and <code>esbuild</code>.</p>
<p>A <strong>build system orchestrator</strong>, which is where Turborepo fits, does not perform these tasks itself. Instead, it coordinates them. Turborepo lets you define tasks like <code>build</code>, <code>dev</code>, or <code>lint</code>, and then runs the appropriate commands in each package using the actual build tools you choose.</p>
<p>A <strong>monorepo framework</strong> focuses more on workspace structure, dependency relationships, and package management. Nx and Lerna fall into this category.</p>
<h2 id="heading-what-makes-turborepo-powerful">What Makes Turborepo Powerful</h2>
<p>The biggest reason Turborepo is fast is <strong>caching</strong>. If you run a build and then run it again without changing any relevant files, Turborepo simply reuses the cached output instead of rebuilding everything from scratch. This is especially useful in CI environments.</p>
<p>Turborepo also supports <strong>parallel execution</strong>, meaning independent tasks can run at the same time instead of sequentially. This makes much better use of your machine’s resources and significantly reduces total build time.</p>
<p>On top of that, Turborepo understands the <strong>dependency graph</strong> of your monorepo. It knows which packages depend on which others and ensures tasks run in the correct order.</p>
<h2 id="heading-initializing-a-turborepo">Initializing a Turborepo</h2>
<p>To create a new Turborepo, you can simply run:</p>
<pre><code class="lang-bash">npx create-turbo@latest
</code></pre>
<p>During setup, select <strong>npm workspaces</strong> as the monorepo framework. Once initialization is complete, you’ll see a structured folder layout with apps and packages already wired together.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766587726499/c54994f0-354b-4314-8777-d7a178eba4a4.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-understanding-the-folder-structure">Understanding the Folder Structure</h2>
<p>By default, the generated project contains two main sections: <strong>apps</strong> and <strong>packages</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766588134118/2104299f-4b92-4da0-b60d-dd2ab9e941ef.png" alt class="image--center mx-auto" /></p>
<p>The <strong>apps</strong> folder contains end-user applications such as websites and backend services. For example, <code>apps/web</code> is a Next.js website, and <code>apps/docs</code> is a documentation site.</p>
<p>The <strong>packages</strong> folder contains helper and shared modules. This includes UI components, shared TypeScript configuration, and ESLint configuration. These packages are meant to be reused across multiple apps.</p>
<h2 id="heading-running-the-project">Running the Project</h2>
<p>From the root folder, running:</p>
<pre><code class="lang-bash">npm run dev
</code></pre>
<p>starts all development servers at once. You’ll notice multiple applications running on different ports, such as <a target="_blank" href="http://localhost:3000"><code>localhost:3000</code></a> and <a target="_blank" href="http://localhost:3001"><code>localhost:3001</code></a>.</p>
<p>This demonstrates how multiple apps can live in one repo while sharing common code.</p>
<h2 id="heading-exploring-root-packagejson"><strong>Exploring root package.json</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766588994552/bd34d5c1-7568-45ac-8ae3-44f4428b2986.png" alt class="image--center mx-auto" /></p>
<p>This represents what command runs when you run</p>
<ol>
<li><p><code>npm run build</code></p>
</li>
<li><p><code>npm run dev</code></p>
</li>
<li><p><code>npm run lint</code></p>
</li>
</ol>
<p><code>turbo build</code> goes into all packages and apps and runs <code>npm run build</code> inside them (provided they have it) . Same for <code>dev</code> and <code>lint</code></p>
<h2 id="heading-exploring-packagesui"><strong>Exploring packages/ui</strong></h2>
<ol>
<li><h3 id="heading-packagejson"><code>package.json</code></h3>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766642366278/14efa811-d536-4f49-a88c-e321b08c0c32.png" alt class="image--center mx-auto" /></p>
<ol start="2">
<li><h3 id="heading-srcbuttontsx"><code>src/button.tsx</code></h3>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766643349240/ecb42fb4-107d-4acf-a99d-10f21d1ec036.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-exploring-appsweb"><strong>Exploring apps/web</strong></h2>
<ol>
<li><h3 id="heading-dependencies">Dependencies</h3>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766588134118/2104299f-4b92-4da0-b60d-dd2ab9e941ef.png" alt class="image--center mx-auto" /></p>
<p>It is a simple next.js app. But it uses some <code>UI components</code> from the <code>packages/ui</code> module</p>
<ol start="2">
<li><h3 id="heading-packagejson-1"><code>package.json</code></h3>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766643881430/6a111993-7267-427e-bd6b-4090ae95424e.png" alt class="image--center mx-auto" /></p>
<p>If you explore package.json of <code>apps/web</code>, you will notice <code>@repo/ui</code> as a dependency</p>
<ol start="3">
<li><h3 id="heading-pagetsx"><code>page.tsx</code></h3>
</li>
</ol>
<p>The <code>packages/ui</code> module exports reusable UI components. These components are imported into applications like <code>apps/web</code> and <code>apps/docs</code> using a workspace dependency such as <code>@repo/ui</code>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766644393124/404c6b75-f8a5-4fb2-a7ee-bdfa60d1082c.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1766644416323/85a19a26-71ba-4c78-a315-ce2cf0d54658.png" alt class="image--center mx-auto" /></p>
<p>When you import a component like <code>Button</code> into a Next.js page, it behaves just like a normal local dependency, even though it lives in a different folder.</p>
<h2 id="heading-adding-a-new-page-using-shared-ui">Adding a New Page Using Shared UI</h2>
<p>To add a new <code>/admin</code> page in the web app, you can create an <code>Admin</code> component inside <code>packages/ui</code>, export it, and then use it inside <code>apps/web/app/admin/page.tsx</code>.</p>
<p>Add <code>Admin.tsx</code> in <code>packages/ui/src</code> and write the code :</p>
<pre><code class="lang-javascript"><span class="hljs-string">"use client"</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> Admin = <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-keyword">return</span> (
    <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">h1</span>&gt;</span>
        hey this is admin component
    <span class="hljs-tag">&lt;/<span class="hljs-name">h1</span>&gt;</span></span>
  );
};
</code></pre>
<p>After running:</p>
<pre><code class="lang-bash">npm run dev
</code></pre>
<p>the page becomes available at:</p>
<pre><code class="lang-bash">http://localhost:3000/admin
</code></pre>
<p>Turborepo also provides generators that can automate these steps, making it faster to scaffold new components.</p>
<h2 id="heading-adding-a-react-app-vite-to-the-monorepo">Adding a React App (Vite) to the Monorepo</h2>
<p>Go to the <code>apps</code> folder:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> apps
</code></pre>
<p>Create a fresh Vite app:</p>
<pre><code class="lang-bash">npm create vite@latest
</code></pre>
<p>Update the <code>package.json</code> of the new React app to include the shared UI package:</p>
<pre><code class="lang-javascript"><span class="hljs-string">"@repo/ui"</span>: <span class="hljs-string">"*"</span>
</code></pre>
<p>Go back to the root folder and install dependencies:</p>
<pre><code class="lang-bash"><span class="hljs-built_in">cd</span> ..
npm install
</code></pre>
<p>Start the dev servers:</p>
<pre><code class="lang-bash">npm run dev
</code></pre>
<p>Try importing a component from the <code>ui</code> package and render it in your React app.</p>
<p>Now ,add a <code>turbo.json</code> file inside the React app folder to override the build outputs:</p>
<pre><code class="lang-javascript">{
  <span class="hljs-string">"extends"</span>: [<span class="hljs-string">"//"</span>],
  <span class="hljs-string">"tasks"</span>: {
    <span class="hljs-string">"build"</span>: {
      <span class="hljs-string">"outputs"</span>: [<span class="hljs-string">"dist/**"</span>]
    }
  }
}
</code></pre>
<p>This tells Turborepo that the build output for this app lives in the <code>dist</code> folder, allowing proper caching and build optimizations.</p>
<p>Ref : <a target="_blank" href="https://turborepo.com/docs/reference/package-configurations">https://turborepo.com/docs/reference/package-configurations</a></p>
<h2 id="heading-caching-in-turborepo">Caching in Turborepo</h2>
<p>One of the easiest ways to see Turborepo’s caching in action is to run:</p>
<pre><code class="lang-bash">npm run build
</code></pre>
<p>twice in a row. The second run will be significantly faster because Turborepo pulls results from cache. You can inspect cached files inside:</p>
<pre><code class="lang-bash">node_modules/.cache/turbo
</code></pre>
<p>Ref : <a target="_blank" href="https://turborepo.com/docs/crafting-your-repository/caching">https://turborepo.com/docs/crafting-your-repository/caching</a></p>
<h2 id="heading-adding-a-nodejs-app-to-the-monorepo">Adding a NodeJS App to the Monorepo</h2>
<p>Adding a Node.js backend app follows the same pattern as frontend apps. The main difference is that tools like <code>tsc</code> don’t perform optimally in large monorepos, so tools like <strong>esbuild</strong> or <strong>tsup</strong> are preferred.</p>
<p>Create a new backend app inside <code>apps</code>:</p>
<pre><code class="lang-bash">mkdir apps/backend
<span class="hljs-built_in">cd</span> apps/backend
</code></pre>
<p>Initialize an empty TypeScript project:</p>
<pre><code class="lang-bash">npm init -y
npx tsc --init
</code></pre>
<p>Update <code>tsconfig.json</code> to extend the shared base config:</p>
<pre><code class="lang-javascript">{
  <span class="hljs-string">"extends"</span>: <span class="hljs-string">"@repo/typescript-config/base.json"</span>,
  <span class="hljs-string">"compilerOptions"</span>: {
    <span class="hljs-string">"lib"</span>: [<span class="hljs-string">"ES2015"</span>],
    <span class="hljs-string">"module"</span>: <span class="hljs-string">"CommonJS"</span>,
    <span class="hljs-string">"outDir"</span>: <span class="hljs-string">"./dist"</span>
  },
  <span class="hljs-string">"exclude"</span>: [<span class="hljs-string">"node_modules"</span>],
  <span class="hljs-string">"include"</span>: [<span class="hljs-string">"."</span>]
}
</code></pre>
<p>Install dependencies:</p>
<pre><code class="lang-bash">npm i express @types/express
</code></pre>
<p>Create <code>src/index.ts</code>:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> express <span class="hljs-keyword">from</span> <span class="hljs-string">"express"</span>;

<span class="hljs-keyword">const</span> app = express();

app.get(<span class="hljs-string">"/"</span>, <span class="hljs-function">(<span class="hljs-params">req, res</span>) =&gt;</span> {
  res.json({
    <span class="hljs-attr">message</span>: <span class="hljs-string">"hello world"</span>
  });
});
</code></pre>
<p>Add a <code>turbo.json</code> inside the backend app:</p>
<pre><code class="lang-javascript">{
  <span class="hljs-string">"extends"</span>: [<span class="hljs-string">"//"</span>],
  <span class="hljs-string">"tasks"</span>: {
    <span class="hljs-string">"build"</span>: {
      <span class="hljs-string">"outputs"</span>: [<span class="hljs-string">"dist/**"</span>]
    }
  }
}
</code></pre>
<p>Install <code>esbuild</code>:</p>
<pre><code class="lang-bash">npm install esbuild
</code></pre>
<p>Add a build script to <code>package.json</code>:</p>
<pre><code class="lang-bash"><span class="hljs-string">"build"</span>: <span class="hljs-string">"esbuild src/index.ts --platform=node --bundle --outdir=dist"</span>
</code></pre>
<p>You can also create a shared <strong>common</strong> package that exports constants, types, or utility functions and use it across frontend and backend apps. This is one of the biggest advantages of a monorepo setup.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Monorepos are not mandatory, and they’re not always the right choice. But when your project has multiple applications that share code and evolve together, a monorepo can drastically improve developer experience, consistency, and build performance.</p>
<p>I’d love to hear different perspectives and connect with people who are exploring or already working with monorepos and developer tooling.</p>
<p>Thanks for taking the time to read!</p>
<p><strong>Signing off,<br />Masir Jafri</strong></p>
]]></content:encoded></item><item><title><![CDATA[Feature Engineering - The Art of Shaping Data]]></title><description><![CDATA[So yes, the topic is about Feature Engineering which is an essential part of Machine Learning Pipelines .
Since the topic is very broad and subjective so in this blog, we’ll see some typical definitions, examples, and a high-level intuition of featur...]]></description><link>https://blog.masirjafri.in/feature-engineering-the-art-of-shaping-data</link><guid isPermaLink="true">https://blog.masirjafri.in/feature-engineering-the-art-of-shaping-data</guid><category><![CDATA[MLTips]]></category><category><![CDATA[MLConcepts]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[feature engineering]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[#AIForBeginners ]]></category><category><![CDATA[Data Preprocessing]]></category><category><![CDATA[AI]]></category><category><![CDATA[data cleaning ]]></category><category><![CDATA[feature selection]]></category><category><![CDATA[feature extraction]]></category><category><![CDATA[Pca]]></category><category><![CDATA[Artificial Intelligence]]></category><category><![CDATA[Learning Journey]]></category><dc:creator><![CDATA[Masir Jafri]]></dc:creator><pubDate>Sun, 09 Nov 2025 20:05:49 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762714824652/7e5eb995-b8d5-4418-83b4-3a0400bc63ee.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>So yes, the topic is about <strong>Feature Engineering</strong> which is an essential part of Machine Learning Pipelines .</p>
<p>Since the topic is very broad and subjective so in this blog, we’ll see some typical definitions, examples, and a high-level intuition of feature engineering without going too deep. So lets start from very beginning.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762715166991/419e1048-3d8d-4d0a-8717-494fb9065c27.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-what-are-features-and-feature-engineering">What Are Features and Feature Engineering?</h3>
<p>After hearing the word <em>feature</em>, the first question that comes to mind is <em>What exactly is a feature, and what is feature engineering?</em></p>
<p>In very simple terms, you can think of <strong>features as the columns of a dataset (except the label or target column)</strong>.While <strong>Feature Engineering</strong> is the process of transforming raw data into meaningful features that improve the performance of a machine learning model. Basically, it’s the <em>art of making data more understandable and useful</em> for ML algorithms.</p>
<p>There’s a popular saying that perfectly justifies its importance:</p>
<blockquote>
<p>Feature engineering isn’t just a process, it’s an art form. Because in the end, a bad algorithm with great features will always outperform a great algorithm with bad ones.</p>
</blockquote>
<h3 id="heading-the-four-categories-of-feature-engineering">The Four Categories of Feature Engineering</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762715326071/2ee471f1-dc42-48f0-97b6-042db6fc13e1.png" alt class="image--center mx-auto" /></p>
<p>Feature engineering is broadly divided into four categories:</p>
<ol>
<li><p><strong>Feature Transformation</strong></p>
</li>
<li><p><strong>Feature Construction</strong></p>
</li>
<li><p><strong>Feature Selection</strong></p>
</li>
<li><p><strong>Feature Extraction</strong></p>
</li>
</ol>
<p>Let’s understand them one by one.</p>
<h2 id="heading-feature-transformation">Feature Transformation :</h2>
<p><strong>Feature Transformation</strong> means modifying existing features to make them more suitable for model training <em>without changing their actual meaning</em>. It focuses on <strong>cleaning and standardizing data</strong>, and is generally divided into these subparts:</p>
<h3 id="heading-a-missing-value-imputation">a) Missing Value Imputation</h3>
<p>Since machine learning models can’t handle missing values directly, we need to either <strong>fill (impute)</strong> or <strong>remove</strong> them. Common imputation strategies include</p>
<ul>
<li><p><strong>Mean</strong>, <strong>Median</strong>, or <strong>Mode</strong> replacement</p>
</li>
<li><p><strong>Model-based or predictive imputation</strong></p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762715610519/91600bf2-6071-4c90-8a37-d9dee6b861c5.jpeg" alt class="image--center mx-auto" /></p>
<p>In the image, you can see missing values filled with the column’s mean which is a basic example of <em>Missing Value Imputation</em>.</p>
<h3 id="heading-b-handling-categorical-values">b) Handling Categorical Values</h3>
<p>Categorical features like <em>Gender</em>, <em>Country</em>, or <em>Rating</em> cannot be directly used in most ML models because they are non-numeric. So, we convert them into numerical form using encoding techniques.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762715789900/bb777d7b-803e-4cb9-a226-0256526bc9ea.png" alt class="image--center mx-auto" /></p>
<p>As shown in the image , “Horse”, “Cat”, “Dog”, “Sheep”, “Lion” are turned into [1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1] using one hot encoding method. That’s a perfect example of handling categorical values.</p>
<h3 id="heading-c-outlier-detection">c) Outlier Detection</h3>
<p><strong>Outliers</strong> are extreme values that differ greatly from the rest of the data. If not handled properly, they can bias the model and mislead its learning process. Lets understand this with an example :</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762715947249/7f1d99bb-e841-4fbf-89e4-b274645b9ae3.png" alt class="image--center mx-auto" /></p>
<p>Imagine a class of 10 students , 9 of them got placement averaging salary of <strong>₹5 LPA</strong> . Now imagine of <em>Sharma Ji’s son</em> who cracked placement of <strong>₹ 1.5 CR/year</strong>. Now the average placement of the class has jumped to ₹ 19.5 LPA which is misleading . The hike in average has been created because of the <em>Sharma Ji's Son</em> alone . So here he is the outlier in this case and and that’s why outlier detection and processing are essential.</p>
<h3 id="heading-d-feature-scaling">d) Feature Scaling</h3>
<p><strong>Feature Scaling</strong> means bringing all numerical features to a similar range so that no single feature dominates others. In the example, we have:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762716511791/5045e952-e8b0-494c-b5de-dc2c76d58294.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p><strong>Age:</strong> ranges between 30–50</p>
</li>
<li><p><strong>Income:</strong> ranges between 40,000–90,000</p>
</li>
</ul>
<p>The large range of income can heavily influence algorithms that depends on <strong>distance calculations</strong> (like KNN or K-Means).To fix this, we use techniques like:</p>
<p><strong>Standardization :</strong> It is a scaling technique where the values are centered around the mean with a unit standard deviation.</p>
<p><strong>Normalization :</strong> It is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling.</p>
<h2 id="heading-feature-construction">Feature Construction :</h2>
<p><strong>Feature Construction</strong> means creating new features from existing ones to better capture underlying patterns. This step often depends on <strong>domain knowledge</strong>.</p>
<p>For example If you have <em>weight</em> and <em>height</em> columns, you can create a new feature i.e <strong>BMI</strong> column which is now representing both the factors. Here is one more example , Below is the famous titanic dataset :</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762716851449/a8e26b89-de4c-4cf4-9ee4-d2e04a2d178d.png" alt class="image--center mx-auto" /></p>
<p>It includes columns such as <strong>SibSp - Sibling Spouse</strong> and <strong>ParCh - Parent Child</strong>. In this case we can combine them both in a single column called <strong>family_members</strong> .</p>
<h2 id="heading-feature-selection">Feature Selection :</h2>
<p><strong>Feature Selection</strong> is the process of choosing only the most relevant features that contribute to model performance and removing the redundant or irrelevant ones.</p>
<p>We need to remember that :</p>
<blockquote>
<p>More features ≠ better accuracy</p>
<p>Think of it this way that <em>a diamond is formed from coal, but not every piece of coal becomes a diamond.</em></p>
</blockquote>
<p>Lets have some intuition about feature selection with the help of MNIST dataset :</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762717131893/359f288a-7770-47da-808d-40ee0c62ef03.png" alt class="image--center mx-auto" /></p>
<p>Here in the dataset each image is 28×28 pixels (784 features). However, many of these pixels are just zeros (black background). So by selecting only the relevant features (i.e features with some values), the model becomes faster and more efficient.</p>
<h2 id="heading-feature-extraction">Feature Extraction :</h2>
<p><strong>Feature Extraction</strong> transforms the original features into a new set of features that still represent the underlying information but in a more compressed way. One of the great example is <strong>PCA (Principal Component Analysis)</strong> for application of Feature Extraction.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762717401206/84680720-66bc-444d-a947-0f0665dc7c06.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762717420187/21ec32d5-b8de-48d0-8b91-34be832b38f8.png" alt class="image--center mx-auto" /></p>
<p>Here we took 4 numerical features (i.e sepal length, sepal width, petal length and petal width) from the <em>Iris</em> dataset. After applying PCA, those 4 features can be reduced to 2 principal components preserving most of the variance but reducing dimensionality.</p>
<p><strong>Why use Feature Extraction?</strong></p>
<ul>
<li><p><strong>High dimensionality</strong> (too many features cause overfitting and slow training)</p>
</li>
<li><p><strong>Correlation/Redundancy</strong> (some features overlap in meaning)</p>
</li>
<li><p><strong>Visualization</strong> (reduces data to 2D/3D for easy pattern recognition)</p>
</li>
</ul>
<p>Now, you might be confused between Feature Selection and Feature Extraction. So lets clear that first before moving forward to conclusion.</p>
<p>The only difference between them is :</p>
<blockquote>
<p><strong>Feature Selection:</strong> I’ll pick the best.<br /><strong>Feature Extraction:</strong> I’ll create the best.</p>
</blockquote>
<p>In <strong>Feature Selection</strong>, we <em>keep</em> only the most useful features while in <strong>Feature Extraction</strong>, we <em>combine or transform</em> existing features into new ones. The aim of Feature Selection is to keeps relevant features while the aim of Feature Extraction is to reduce dimensions and compresses information.</p>
<h2 id="heading-conclusion">Conclusion :</h2>
<p>Feature Engineering is more than just a technical process. It is the very subjective way to bridge the data and algorithms in Machine Learning. Good features can turn even a simple model into a powerful one.</p>
<p>I would love to hear different perspectives and connect with people exploring this field.</p>
<p>Thank you for reading!</p>
<p><strong>Signing off,<br />Masir Jafri</strong></p>
]]></content:encoded></item><item><title><![CDATA[ML vs DL : My Basic Understanding as a Beginner]]></title><description><![CDATA[So let’s get started with some background. Artificial Intelligence (AI) is not something newly invented. Research in AI started way back in the 1950s–1960s. The major problem at that time was that:

Data wasn't available in large amounts

Collecting ...]]></description><link>https://blog.masirjafri.in/ml-vs-dl-my-basic-understanding-as-a-beginner</link><guid isPermaLink="true">https://blog.masirjafri.in/ml-vs-dl-my-basic-understanding-as-a-beginner</guid><category><![CDATA[#ArtificialIntelligence ]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[neural networks]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[techblog]]></category><category><![CDATA[Artificial Intelligence]]></category><dc:creator><![CDATA[Masir Jafri]]></dc:creator><pubDate>Thu, 06 Nov 2025 14:36:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1762439725247/a4740ae9-4611-40bf-bcd5-16f14e698b6c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>So let’s get started with some background. Artificial Intelligence (AI) is not something newly invented. Research in AI started way back in the 1950s–1960s. The major problem at that time was that:</p>
<ul>
<li><p>Data wasn't available in large amounts</p>
</li>
<li><p>Collecting data was slow and expensive</p>
</li>
<li><p>Hardware was weak and not powerful enough</p>
</li>
</ul>
<p>Then came the 2000s–2010s.<br />The internet became common, smartphones came into existence, data started growing very rapidly, and hardware evolved as well (GPUs, TPUs, optimized chips, cloud storage, etc.). Today even mobile phones have <strong>8–12 GB RAM and terabytes of storage</strong> which was unthinkable back then.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762437872716/05fdcac4-d319-4d5b-a161-17a9c5a80b28.jpeg" alt class="image--center mx-auto" /></p>
<h2 id="heading-what-is-machine-learning">What is Machine Learning?</h2>
<p><strong>So now lets understand about ML first with the typical definition</strong></p>
<blockquote>
<p>It is the methodology of data analysis used to find the hidden insights and patterns from the data without explicitly programming it.</p>
</blockquote>
<p>I know it is the typical definition but i didn’t any found any better way to explain this. The important phrase to notice here is <strong>“without explicitly programming.”</strong></p>
<p>Let’s understand this with an example. You want to classify emails as <strong>spam</strong> or <strong>ham (genuine)</strong>.<br />If you use traditional programming, you’d write a long list of <code>if-else</code> rules like:</p>
<ul>
<li><p>If mail has <em>too many links</em> → spam</p>
</li>
<li><p>If sender is unknown → spam</p>
</li>
<li><p>If mail contains “check it out” 4+ times → spam</p>
</li>
</ul>
<p>But this manual approach has two big problems:</p>
<ol>
<li><p><strong>You cannot imagine all possible spam patterns in advance</strong></p>
</li>
<li><p><strong>Spammers can easily bypass your rules</strong> (e.g., replace “check it out” with “have a look”)</p>
</li>
</ol>
<p>Machine Learning removes this headache. Instead of writing rules, you feed the model <strong>examples of spam and not-spam emails</strong>, and the model learns the pattern on its own. It will cover mostly every case and would work even when the format of spam mail is changed.</p>
<h2 id="heading-what-is-deep-learning">What is Deep Learning?</h2>
<p>We can consider Deep Learning simply as a subset of Machine Learning, but it becomes useful when <strong>ML reaches its limits</strong>. And here comes the typical definition.</p>
<p><strong>Definition :</strong></p>
<blockquote>
<p>“Deep Learning is a type of Machine Learning that uses multiple layers of neural networks to automatically learn patterns from large amounts of data, without us manually defining the features.”</p>
</blockquote>
<p>The limitation of Machine Learning is that it <em>manually decide</em> the features while in DL, the model <em>automatically learns</em> the features.</p>
<p>Example:<br />If we want to classify images of cats and dogs using ML, we need to define features like:</p>
<ul>
<li><p>Ears, Size, Color, etc. for cats</p>
</li>
<li><p>Tail Type, Fur Pattern, etc. for dogs</p>
</li>
</ul>
<p>But the problem is that , there are too many variations ie different breeds, sizes, colors, lighting, angles, etc. So manually defining features becomes impossible in this case.</p>
<p>Deep Learning solves this by using <strong>neural networks with many layers</strong>, where each layer learns patterns on its own like edges, shapes, textures, etc.<br />The more layers → the deeper the network → the better it understands complex data like images, audio, speech, video, etc.</p>
<p>Now the question may arise that if the Deep Learning models are that much efficient , why we need to still use the Machine Learning model? So here is the answer :</p>
<blockquote>
<p>“You don’t use a sword where the work of a needle is required.”</p>
</blockquote>
<p>Deep Learning Require very very Huge Datasets with Very High computational power and long training time. So yes, DL is powerful, but <strong>not always necessary</strong>. For small datasets and simple problems, ML is still the better, faster, cheaper choice.</p>
<p>Also, unlike ML, DL models usually <strong>get better as data increases</strong> while ML models often hit a limit where more data doesn’t improve performance .</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1762439165390/2fbf42d2-ff0b-44e3-b874-7e31340a4915.webp" alt class="image--center mx-auto" /></p>
<h2 id="heading-conclusion">Conclusion:</h2>
<p>Both ML and DL have their own benefits, challenges, and ideal use cases. ML is simple, faster, and works well with small data while DL is powerful, automatic, and great for high dimensional data like images, audio, video, text but it requires huge data and powerful hardware.</p>
<p>I would love to hear different perspectives and connect with people exploring this field.</p>
<p><strong>Signing off,<br />Masir Jafri</strong></p>
]]></content:encoded></item></channel></rss>